Dashboards for Nerds: DataFrame Plotting in Databricks
I don't like BI tools. I use Databricks AI/BI, and I stopped using Power BI and Qlik a long time ago. However, I always feel like something is missing. One option could be to create dashboards from charts generated by Matplotlib and pandas. However, since I'm not a fan of pandas, I usually give up on that approach.
Now, finally, there is something for me: Spark native plotting. I no longer need to convert a dataframe to a pandas object. Under the hood, it uses pandas and plotly, but I don't see it and avoid cumbersome steps, so I can use it directly on a dataframe.
df.plot(kind=”line”, x="category", y="int_val")
Before we see some nice examples, let's consider what's next. I'd like to have the option to add charts generated by this code to a Databricks AI/BI dashboard, which can create the best dashboard for data nerds in the entire world.
Let's see how it works on this dataframe:
data = [ (1, 120, 80, 200), (2, 150, 90, 220), (3, 170, 100, 240), (4, 160, 130, 250), (5, 180, 120, 260), (6, 200, 140, 280), ] cols = ["month", "electronics", "furniture", "clothing"] df = spark.createDataFrame(data, cols)
Line Chart: monthly trend by category
# ──────────────────────────────────────────────────────────────────────────────── # Line chart – monthly trend by category # ──────────────────────────────────────────────────────────────────────────────── df.plot.line( x="month", y=["electronics", "furniture", "clothing"], title="Monthly Sales by Category" )
Bar Chart: same data in bar form
# ──────────────────────────────────────────────────────────────────────────────── # Stacked bar chart – same data in bar form # ──────────────────────────────────────────────────────────────────────────────── df.plot.bar( x="month", y=["electronics", "furniture", "clothing"], title="Monthly Sales – Stacked Bar" )
Pie Chart
# ──────────────────────────────────────────────────────────────────────────────── # Pie Lake :-) # ──────────────────────────────────────────────────────────────────────────────── df.plot.pie( x="month", y="electronics", title="Pie Chart" )
Histogram: distribution of electronics sales
# ──────────────────────────────────────────────────────────────────────────────── # Histogram – distribution of electronics sales # ──────────────────────────────────────────────────────────────────────────────── df.select("electronics").plot( kind="hist", bins=2, title="Electronics Sales Distribution")
For more great charts and options, visit https://plotly.com/python/plotly-express/
Require runtime 17.0 at least.