Hidden Magic Commands in Databricks Notebooks

Databricks

Jan 8

If you're working with Databricks notebooks, you're probably familiar with basic magic commands like %sql, %python, and %run. But there's a whole world of lesser-known commands that can dramatically improve your productivity and debugging workflow.

Let's explore the hidden gems that can transform how you work with Databricks notebooks.

Discovering all available magic commands

Before we dive in, here's how to see every magic command at your disposal:

%lsmagic

This reveals an extensive list of commands (some inherited from the Jupyter ecosystem).

While not all of them work in Databricks (some are Jupyter-specific), the ones that do work are incredibly useful. Here’s a list of 12 of them.

1. Write and Read Directly in Your Workspace

One of my personal favorites is %%writefile which lets you create files directly from notebook cells:

  
    %%writefile test.txt
bla bla

The file is instantly saved to your workspace.

To read it back, use the %cat command:

%cat test.txt

Output of the above code:

This simple workflow is perfect for quickly creating configuration files, sample data, or documentation without leaving your notebook.

2. Performance measurement: track execution time

Understanding how long your operations take is crucial for optimization. Databricks notebooks offer two timing commands:

a. Single execution timing:

  
    %%time
df = spark.range(10_000_000)
df.count()
  

Output of the above code:

b. Multiple execution benchmarking:

  
    %%timeit
spark.range(1_000_000).count()

The %%timeit command runs your code multiple times and provides statistical analysis of execution times.

Output of the above code:

Please remember these considerations when timing Spark operations:

Spark has warm-up costs that affect initial runs
Lazy evaluation means transformations aren't timed until an action occurs
Caching effects can skew results in subsequent runs

3. Capturing output: suppress and store cell results

Sometimes you need to run code without cluttering your notebook output, or you want to capture results for later processing. The %%capture magic command does both:

  
    %%capture out
df = spark.range(1_000_000)
df.show()
  

The cell executes silently, and all output is stored in the out variable. You can then access it programmatically in subsequent Python cells.

4. Shell commands: direct access to the underlying system

Need to interact with the file system or run shell commands? Databricks notebooks make it easy:

  
    %%bash
ls /Workspace/

Output of the above code:

For common bash commands, you can skip the %%bash prefix and use shortcuts. Instead of %%bash Is, simply write %Is:

This shorthand works for most standard Linux commands and even gives you colorized output.

5. Bridging Shell and Python

Here's where things get really interesting. You can redirect shell command output directly into Python variables using %sx:

files = %sx ls /Workspace

The files variable now contains the shell output as a Python list, ready for further processing. This integration between shell and Python is invaluable for automation and data processing workflows.

6. Cell output management

When working with large datasets or verbose operations, excessive cell output can freeze your browser or overwhelm your driver. Protect yourself by limiting cell output:

%set_cell_max_output_size_in_mb 1

This sets a 1MB limit on cell output, preventing accidental browser crashes from massive print statements.

7. Skip cells temporarily

Need to temporarily disable a cell without deleting it? Use %skip:

The cell appears grayed out and is skipped during notebook execution, making it perfect for debugging or temporarily disabling code blocks.

8. Variable inspection: see what’s in your workspace

Lost track of your variables? Use %who for a simple list or %whos for detailed information:

This displays all variables in your namespace along with their types and values.

9. Command history: review your execution path

The %history command shows you all the commands you've executed in the current session:

This is particularly useful when you need to retrace your steps or understand the sequence of operations that led to a particular result.

10. Session logging: capture everything automatically

For documentation or debugging purposes, you can quickly enable IPython logging by using %logstart, and it can be later stopped by using %logstop

  
    %logstart -o -t ipython_log.py
%logstate
print("this will be logged")
%logstop
  

11. Documentation access: quick help for any object

%pinfo comes really handy and allows us immediate access to documentation. This provides immediate access to docstrings and method signatures, no need to switch to external documentation.

12. Environment variables: view and set configuration

Lastly, we often need to check current environment variables.

a. View all environment variables:

b. Set or view a specific variable:

  
    %env MY_FLAG=1
%env MY_FLAG

Making the most of magic commands

These magic commands aren't just shortcuts; they're powerful tools that can streamline your workflow, improve efficiency, and provide deeper insights into your code execution. The key is to experiment with them in your daily work and discover which ones best fit your specific use cases.

Let us know which magic commands you are using. Are there any we didn't cover that have become essential to your workflow? The Databricks notebook environment is rich with hidden features; there's always more to discover.

Databricks#Notebooks#NotebookPySparkPerformance OptimizationData EngineeringAnalytics

Hubert Dudek

Databricks MVP | Advisor to Databricks Product Board and Technical advisor to SunnyData

https://www.linkedin.com/in/hubertdudek/

Hidden Magic Commands in Databricks Notebooks

Discovering all available magic commands

1. Write and Read Directly in Your Workspace

2. Performance measurement: track execution time

3. Capturing output: suppress and store cell results

4. Shell commands: direct access to the underlying system

5. Bridging Shell and Python

6. Cell output management

7. Skip cells temporarily

8. Variable inspection: see what’s in your workspace

9. Command history: review your execution path

10. Session logging: capture everything automatically

11. Documentation access: quick help for any object

12. Environment variables: view and set configuration

Making the most of magic commands

95% of GenAI projects fail. How to become part of the 5%

5 Reasons You Should Be Using LakeFlow Jobs as Your Default Orchestrator