click on blue
picture
Follow us

From the Internet, intrusion and deletion


Today, I would like to share with you a visualization of dry goods. It introduces Plotly, a powerful open source Python drawing library, and teaches you how to draw better charts with super simple (even just one line!) code.


The reason I've been sticking with matplotlib before is because of the hundreds of hours of time that I've "sunk" in learning its complex syntax. It also led me to spend countless late nights searching StackOverflow for how to "format the date" or "add a second Y-axis".


But we now have a better option - like Plotly , an easy-to-use, well-documented, and powerful open-source Python plotting library  . Take a deep dive today and see how it can draw even better charts with super simple (even just one line!) of code.


All the code in this article has been open sourced on Github, all the diagrams are interactive, please use Jupyter notebook to view.

(Github source code address: https://github.com/WillKoehrsen/Data-Analysis/blob/master/plotly/Plotly%20Whirlwind%20Introduction.ipynb)


picture

(Example chart drawn by plotly. Image source: plot.ly)


Plotly overview


The Python package for plotly is an open source codebase based on plot.js, which is based on d3.js. What we actually use is a library that encapsulates plotly called cufflinks, which makes it easier for you to use plotly and Pandas data tables to work together.


*Note: Plotly itself is a visualization technology company with several different products and open source toolsets. Plotly's Python library is free to use. In offline mode, you can create an unlimited number of graphs. In online mode, because Plotly's sharing service is used, only 25 graphs can be generated and shared.


All visualizations in this article were done in Jupyter Notebook using the offline mode plotly + cufflinks library. After installing with  pip install cufflinks plotly  , you can import it in Jupyter with code like this:


picture


Univariate Distributions: Histograms and Box Plots


The univariate analysis chart is often the standard practice when starting data analysis, and the histogram is basically one of the must-have charts for univariate distribution analysis (although it has some shortcomings).


Take the total number of likes on a blog post as an example (see Github for the original data: https://github.com/WillKoehrsen/Data-Analysis/tree/master/medium ), make a simple interactive histogram:


picture

(df ​​in the code is a standard Pandas dataframe object)


picture

(interactive histogram created with plotly+cufflinks)


For students who are used to  matplotlib  , you only need to type one more letter (change  .plot  to  .iplot  ) to get a more beautiful interactive chart! Clicking on an element on the image reveals detailed information, zooms in and out, and (we'll talk about it next) highlights and filters certain parts and more.


If you want to draw a stacked column chart, just do this:


picture


picture


Simple processing of  pandas  data table and generate bar chart:


picture


picture


As shown above, we can combine the power of plotly + cufflinks and pandas together . For example, we can use  .pivot()  to do a pivot table analysis first, and then generate a bar chart.


For example, to count the number of new fans brought by each article in different publishing channels:


picture

picture


The benefit of interactive charts is that we can explore the data and break down sub-items for analysis at will. Box plots can provide a lot of information, but if you can't see the specific values, you're probably missing a lot of it!


Scatter plot


Scatter plots are at the heart of most analyses and allow us to see how a variable has changed over time, or how the relationship between two (or more) variables has changed.


time series analysis


In the real world, a considerable part of the data has a time element. Fortunately, plotly + cufflinks comes with features to support time series visualization analysis.


Taking the article data I published on the "Towards Data Science" website as an example, let's build a dataset indexed by publication time to see how the popularity of the article changes:


picture


picture


In the image above, we accomplish several things with one line of code:


  • Automatically generate beautiful time series x-axis

  • Add a second Y-axis because the ranges of the two variables do not match

  • Put the title of the article in the label displayed on hover


To display more data, we can conveniently add text annotations:


picture


picture

(scatterplot with text annotations)


In the code below, we color a bivariate scatterplot by the third categorical variable:


picture


picture


Next we're going to play with something complicated: logarithmic axes. We do this by specifying the layout parameter of plotly (for different layouts, please refer to the official documentation https://plot.ly/python/reference/ ), at the same time we put the point size (size parameter) and a The value variable  read_ratio  (read ratio) is bound, the larger the number, the larger the size of the bubble.


picture

picture


If we want to be a little more complicated (see the Github source code for details), we can even cram 4 variables into a single image! (However, it is not recommended that you do this)


picture

As before, we can combine pandas with plotly+cufflinks to achieve many useful graphs:


picture

picture

It is recommended that you check the official documentation, or the source code, which has more examples and function examples. With just one or two lines of code, you can add text annotations, auxiliary lines, best-fit lines and other useful elements to your charts, while maintaining the original interactive functions.

Advanced Drawing Features


Next, we will introduce some special charts in detail. You may not use them very often, but I guarantee that as long as you use them well, you will definitely be impressed. We're going to use plotly's  figure_factory  module to generate awesome graphs with just one line of code!


Scatter Plot Matrix


Scatterplot matrices (also known as SPLOMs) are a great choice if we want to explore relationships between many different variables:


picture

picture

Even such complex graphs are fully interactive, allowing us to explore the data in greater detail.


Relationship Heatmap


To illustrate the relationship between multiple numerical variables, we can calculate their correlation and visualize it in the form of an annotated heatmap:

picture

picture

custom theme


In addition to the endless variety of charts, Cufflinks also provides many different coloring themes, so that you can easily switch between different chart styles. The following two figures are the "space" theme and the "ggplot" theme:


picture

picture

In addition, there are 3D diagrams (surfaces and bubbles):


picture


picture


For users who are interested in research, it is not difficult to make a pie chart:


picture


Edit in Plotly Chart Studio


After you have generated these graphs in Jupyter Notebook, you will notice a small link in the lower right corner of the graph that says "Export to plot.ly". If you click on this link, you will be redirected to a "plot workshop" (https://plot.ly/create/).


Here, you can further revise and polish your diagram before final presentation. You can add callouts, choose the color of certain elements, keep everything organized, and produce an awesome diagram. Later, you can also publish it on the web, generating a link for others to view.


The following two graphs were made in the Chart Workshop:


picture

picture

After talking so much, are you tired of watching? However, we have not exhausted all the functions of this library. Due to space limitations, there are some better charts and examples, so please visit the official documents of plotly and cufflinks to check them one by one.


picture

(Plotly interactive map showing domestic wind farm data in the US. Source: plot.ly)



at last ……


The worst thing about the sunk cost fallacy is that people often only realize how much time they've wasted when they give up on previous efforts.


When choosing a drawing library, the features you need most are:

  • One line of code charts needed to quickly explore data

  • Interactive elements needed to split/study data

  • Option to drill down to details when needed

  • Easy to customize before final presentation


From now on, the best choice to use Python language to achieve the above functions is plotly. It allows us to quickly generate visual diagrams, and the interactive features allow us to better understand the information.


I'll admit that plotting is definitely the most enjoyable part of working in data science, and plotly makes these tasks much more enjoyable.

picture

(A graph showing the pleasure of plotting in Python over time. Source: towardswardsdatascience.com)


2022 is the time to upgrade your Python plotting library and make yourself faster, stronger and more beautiful in data science and visualization!


picture


Long press or scan the QR code below to get  free Python open courses and hundreds of gigabytes of learning materials packaged by the big guys , including but not limited to Python e-books, tutorials, project orders, source code, cracked software, etc.

picture

Scan the QR code - get it for free



Wonderful review of the past




The best combination for writing Python code on Windows!
Dependency management of Python packages, solved!
7 Python practical project codes, let you advance to the gods in minutes!
Efficiently process large files with Python



Click  to read the original text to learn more