17  Plotting and visualization

Warning

Any changes you make to the code on this page, including your solutions to exercises, are temporary. If you leave or reload the page, all changes will be lost. If you would like to keep your work, copy and paste your code into a separate file or editor where it can be saved permanently.

Data visualization helps us understand the structure and patterns in our data. In Python, most visualizations are created using the libraries Matplotlib, its Pandas interface, and Seaborn, which together provide a flexible framework for both quick exploration and publication-quality graphics.

17.1 Introduction to Matplotlib

Matplotlib is the foundational library for plotting in Python. We imported its pyplot interface as plt above.

Let’s start with a simple example plotting two polynomial functions:

17.2 Plotting with Pandas and Seaborn

While Matplotlib provides the foundation for all plots, Pandas and Seaborn offer convenient interfaces built on top of it.

17.2.1 Pandas

As we have already seen in previous chapters, Pandas provides a .plot method for Series and DataFrames to quickly generate plots directly from our data.

Key parameters of the .plot method include:

  • kind: type of plot: line (default), bar, barh (horizontal bar plot), hist (histogram), box, kde (kernel density estimation) or density, area, pie, scatter (only for DataFrames), hexbin (only for DataFrames)
  • x, y: variables (e.g. a column name or a list of column names) to plot

We can also call methods .plot.<kind> directly, e.g. df.plot.bar(...) instead of df.plot(kind="bar", ...) or df.plot.hist(...) instead of df.plot(kind="hist", ...).

17.2.2 Seaborn

In Seaborn there are dedicated functions for different types of plots (e.g. barplot, scatterplot, boxplot, etc.), and most of them support the following parameters:

  • data: input dataset (Series or DataFrame)
  • x, y: variables to plot
  • hue: grouping by a categorical variable represented by different colors (if supported by the plot type)
  • row and col: creating subplots for different values of a variable (if supported by the plot type)

Tip: Use Pandas .plot for quick exploratory plots and Seaborn for more detailed visualizations.

17.3 Plot catalog

Below is an overview of the most common plot types, and how to create them using both Pandas and Seaborn. The goal is not to memorize all functions, but to see patterns in how the plotting APIs are structured.

We will use two datasets for these examples:

17.3.1 Bar plot

17.3.2 Count plot (Seaborn only)

17.3.3 Pie plot (Pandas only)

17.3.4 Line plot

17.3.5 Scatter plot

We can use lmplot to visualize simple linear relationships and regression fits:

17.3.6 Histogram and density estimation

17.3.7 Box and violin plots

17.3.8 Strip and swarm plots (Seaborn only)

17.3.9 Joint and pair plots (Seaborn only)

Further examples and customization options can be found in Seaborn’s example gallery.