From the Internet, intrusion and deletion
Today, I would like to share with you a visualization of dry goods. It introduces Plotly, a powerful open source Python drawing library, and teaches you how to draw better charts with super simple (even just one line!) code.
The reason I've been sticking with matplotlib before is because of the hundreds of hours of time that I've "sunk" in learning its complex syntax. It also led me to spend countless late nights searching StackOverflow for how to "format the date" or "add a second Y-axis".
But we now have a better option - like Plotly , an easy-to-use, well-documented, and powerful open-source Python plotting library . Take a deep dive today and see how it can draw even better charts with super simple (even just one line!) of code.
All the code in this article has been open sourced on Github, all the diagrams are interactive, please use Jupyter notebook to view.
(Github source code address: https://github.com/WillKoehrsen/Data-Analysis/blob/master/plotly/Plotly%20Whirlwind%20Introduction.ipynb)
(Example chart drawn by plotly. Image source: plot.ly)
Plotly overview
The Python package for plotly is an open source codebase based on plot.js, which is based on d3.js. What we actually use is a library that encapsulates plotly called cufflinks, which makes it easier for you to use plotly and Pandas data tables to work together.
*Note: Plotly itself is a visualization technology company with several different products and open source toolsets. Plotly's Python library is free to use. In offline mode, you can create an unlimited number of graphs. In online mode, because Plotly's sharing service is used, only 25 graphs can be generated and shared.
All visualizations in this article were done in Jupyter Notebook using the offline mode plotly + cufflinks library. After installing with pip install cufflinks plotly , you can import it in Jupyter with code like this:
Univariate Distributions: Histograms and Box Plots
The univariate analysis chart is often the standard practice when starting data analysis, and the histogram is basically one of the must-have charts for univariate distribution analysis (although it has some shortcomings).
Take the total number of likes on a blog post as an example (see Github for the original data: https://github.com/WillKoehrsen/Data-Analysis/tree/master/medium ), make a simple interactive histogram:
(df in the code is a standard Pandas dataframe object)
(interactive histogram created with plotly+cufflinks)
For students who are used to matplotlib , you only need to type one more letter (change .plot to .iplot ) to get a more beautiful interactive chart! Clicking on an element on the image reveals detailed information, zooms in and out, and (we'll talk about it next) highlights and filters certain parts and more.
If you want to draw a stacked column chart, just do this:
Simple processing of pandas data table and generate bar chart:
As shown above, we can combine the power of plotly + cufflinks and pandas together . For example, we can use .pivot() to do a pivot table analysis first, and then generate a bar chart.
For example, to count the number of new fans brought by each article in different publishing channels:
The benefit of interactive charts is that we can explore the data and break down sub-items for analysis at will. Box plots can provide a lot of information, but if you can't see the specific values, you're probably missing a lot of it!
Scatter plot
Scatter plots are at the heart of most analyses and allow us to see how a variable has changed over time, or how the relationship between two (or more) variables has changed.
time series analysis
In the real world, a considerable part of the data has a time element. Fortunately, plotly + cufflinks comes with features to support time series visualization analysis.
Taking the article data I published on the "Towards Data Science" website as an example, let's build a dataset indexed by publication time to see how the popularity of the article changes:
In the image above, we accomplish several things with one line of code:
-
Automatically generate beautiful time series x-axis
-
Add a second Y-axis because the ranges of the two variables do not match
-
Put the title of the article in the label displayed on hover
To display more data, we can conveniently add text annotations:
(scatterplot with text annotations)
In the code below, we color a bivariate scatterplot by the third categorical variable:
Next we're going to play with something complicated: logarithmic axes. We do this by specifying the layout parameter of plotly (for different layouts, please refer to the official documentation https://plot.ly/python/reference/ ), at the same time we put the point size (size parameter) and a The value variable read_ratio (read ratio) is bound, the larger the number, the larger the size of the bubble.
If we want to be a little more complicated (see the Github source code for details), we can even cram 4 variables into a single image! (However, it is not recommended that you do this)
As before, we can combine pandas with plotly+cufflinks to achieve many useful graphs:
It is recommended that you check the official documentation, or the source code, which has more examples and function examples. With just one or two lines of code, you can add text annotations, auxiliary lines, best-fit lines and other useful elements to your charts, while maintaining the original interactive functions.
Advanced Drawing Features
Next, we will introduce some special charts in detail. You may not use them very often, but I guarantee that as long as you use them well, you will definitely be impressed. We're going to use plotly's figure_factory module to generate awesome graphs with just one line of code!
Scatter Plot Matrix
Scatterplot matrices (also known as SPLOMs) are a great choice if we want to explore relationships between many different variables:
Even such complex graphs are fully interactive, allowing us to explore the data in greater detail.
Relationship Heatmap
To illustrate the relationship between multiple numerical variables, we can calculate their correlation and visualize it in the form of an annotated heatmap:
custom theme
In addition to the endless variety of charts, Cufflinks also provides many different coloring themes, so that you can easily switch between different chart styles. The following two figures are the "space" theme and the "ggplot" theme:
In addition, there are 3D diagrams (surfaces and bubbles):
For users who are interested in research, it is not difficult to make a pie chart:
Edit in Plotly Chart Studio
After you have generated these graphs in Jupyter Notebook, you will notice a small link in the lower right corner of the graph that says "Export to plot.ly". If you click on this link, you will be redirected to a "plot workshop" (https://plot.ly/create/).
Here, you can further revise and polish your diagram before final presentation. You can add callouts, choose the color of certain elements, keep everything organized, and produce an awesome diagram. Later, you can also publish it on the web, generating a link for others to view.
The following two graphs were made in the Chart Workshop:
After talking so much, are you tired of watching? However, we have not exhausted all the functions of this library. Due to space limitations, there are some better charts and examples, so please visit the official documents of plotly and cufflinks to check them one by one.
(Plotly interactive map showing domestic wind farm data in the US. Source: plot.ly)
at last ……
The worst thing about the sunk cost fallacy is that people often only realize how much time they've wasted when they give up on previous efforts.
When choosing a drawing library, the features you need most are:
-
One line of code charts needed to quickly explore data
-
Interactive elements needed to split/study data
-
Option to drill down to details when needed
-
Easy to customize before final presentation
From now on, the best choice to use Python language to achieve the above functions is plotly. It allows us to quickly generate visual diagrams, and the interactive features allow us to better understand the information.
I'll admit that plotting is definitely the most enjoyable part of working in data science, and plotly makes these tasks much more enjoyable.
(A graph showing the pleasure of plotting in Python over time. Source: towardswardsdatascience.com)
2022 is the time to upgrade your Python plotting library and make yourself faster, stronger and more beautiful in data science and visualization!
Long press or scan the QR code below to get free Python open courses and hundreds of gigabytes of learning materials packaged by the big guys , including but not limited to Python e-books, tutorials, project orders, source code, cracked software, etc.
▲ Scan the QR code - get it for free
Wonderful review of the past The best combination for writing Python code on Windows!
Dependency management of Python packages, solved!
7 Python practical project codes, let you advance to the gods in minutes!Efficiently process large files with Python
Click to read the original text to learn more