Dashboards for Nerds: DataFrame Plotting in Databricks

I don't like BI tools. I use Databricks AI/BI, and I stopped using Power BI and Qlik a long time ago. However, I always feel like something is missing. One option could be to create dashboards from charts generated by Matplotlib and pandas. However, since I'm not a fan of pandas, I usually give up on that approach.

Now, finally, there is something for me: Spark native plotting. I no longer need to convert a dataframe to a pandas object. Under the hood, it uses pandas and plotly, but I don't see it and avoid cumbersome steps, so I can use it directly on a dataframe.

df.plot(kind=”line”, x="category", y="int_val")

Before we see some nice examples, let's consider what's next. I'd like to have the option to add charts generated by this code to a Databricks AI/BI dashboard, which can create the best dashboard for data nerds in the entire world.

Let's see how it works on this dataframe:

data = [
    (1, 120,  80, 200),
    (2, 150,  90, 220),
    (3, 170, 100, 240),
    (4, 160, 130, 250),
    (5, 180, 120, 260),
    (6, 200, 140, 280),
]
cols = ["month", "electronics", "furniture", "clothing"]
df = spark.createDataFrame(data, cols)

Line Chart: monthly trend by category

# ────────────────────────────────────────────────────────────────────────────────
# Line chart – monthly trend by category
# ────────────────────────────────────────────────────────────────────────────────
df.plot.line(
    x="month",
    y=["electronics", "furniture", "clothing"],
    title="Monthly Sales by Category"
)

Bar Chart: same data in bar form

# ────────────────────────────────────────────────────────────────────────────────
# Stacked bar chart – same data in bar form
# ────────────────────────────────────────────────────────────────────────────────
df.plot.bar(
    x="month",
    y=["electronics", "furniture", "clothing"],
    title="Monthly Sales – Stacked Bar"
)

Pie Chart

# ────────────────────────────────────────────────────────────────────────────────
# Pie Lake :-)
# ────────────────────────────────────────────────────────────────────────────────
df.plot.pie(
    x="month",
    y="electronics",
    title="Pie Chart"
)

Histogram: distribution of electronics sales

# ────────────────────────────────────────────────────────────────────────────────
# Histogram – distribution of electronics sales
# ────────────────────────────────────────────────────────────────────────────────
df.select("electronics").plot(
    kind="hist",
    bins=2,
    title="Electronics Sales Distribution")

For more great charts and options, visit https://plotly.com/python/plotly-express/
Require runtime 17.0 at least.

Hubert Dudek

Databricks MVP | Advisor to Databricks Product Board and Technical advisor to SunnyData

https://www.linkedin.com/in/hubertdudek/
Next
Next

Grant individual permission to secrets read in Unity Catalog