Today, I will rewrite pandas. I have written two articles about pandas before and feel that the writing is not good enough. Now I have a deeper understanding of pandas in combination with the work content. This article only explains the functions that I commonly use in my work. For detailed usage, please refer to the previous recommendations below.


picture

Recommended in the past


Using Pandas

Add, delete, search and modify in Pandas


Install

pip install pandas
Because pandas also depends on other packages and needs to be downloaded by itself, anyway, just pip whatever is missing.



Table of contents

1. The use of Pandas at work

2. Common functions


1. The use of Pandas at work

What I use most at work is to read non-standard data from another container and process it with python, and finally store it in Excel as a data source for other Excel calls, as shown below.

picture

At the beginning, I learned a bunch of functions in pandas, but now I basically forget them. Personally, I use pandas the most for reading and storage. As for data operations, all operations are performed in python syntax. Next, let’s introduce the commonly used functions.


2. Common functions

Create an excel file first

import pandas as pd
data = {'city': ['北京', '上海', '广州', '深圳'], '2018': [33105, 36011, 22859, 24221]}data = pd.DataFrame(data)data.to_excel('excel练习.xlsx', index=False)

2.1, read data

df = pd.read_excel('excel练习.xlsx')print(df)print(type(df))print(df.values)print(type(df.values))

operation result:

picture

Figure 2-1

The DataFrame type is not very convenient to operate in python, so use values ​​to convert to the form of a numpy array (this is very similar to a list and can be manipulated like a list).

Notice:

When reading data, you can specify the data type of a column. Recently, I encountered a problem as shown in the figure below. When I read the data and then store it in the text type, 001 becomes a number 1.

picture

Figure 2-2

To solve this problem, you only need to add the parameter converters to specify the number when reading the data. This column is a character type.
pd.read_excel(file_path, converters={'编号': str})

read data from a column

df = pd.read_excel('excel练习.xlsx')data = df['city']print(data)

operation result:

picture

Figure 2-3

2.2. Store data

data = {'city': ['北京', '上海', '广州', '深圳'],        '2018': [33105, 36011, 22859, 24221]}data = pd.DataFrame(data)data.to_excel('excel练习.xlsx', index=False)
Storing data is very simple, you only need to process the data like this, and then convert it into a DataFrame type before it can be stored.

2.3. Delete data

Delete a row containing a value, delete a specified row/column, remove duplicates, remove 0 values, remove null values

2.3.1, delete a row containing a value

data = df[df.city != '深圳']

operation result:

picture

Figure 2-4

2.3.2, delete the specified line drop()

data = df.drop([0, 1], axis=0)

delete lines 0 and 1

operation result:

picture

Figure 2-5

2.3.3, delete the specified column drop()

data = df.drop(['2018'], axis=1)

operation result:

picture

Figure 2-6

2.3.4, deduplication drop_duplicates()

data.drop_duplicates(['city'])data.drop_duplicates(['city', '2018'], keep='last')
Deduplication of the specified column (this column can contain multiple elements such as: ['city', '2018']), the first occurrence of data is retained by default, but the last occurrence of data can be retained by adding the parameter keep='last ' .

2.3.5, go to 0 value

data2 = {'city': ['北京', '上海', '广州(粤语)', '深圳', '四川', '未知', 0],         '2018': [33105, 36011, 22859, 24221, np.nan, 0, 0]}data2 = pd.DataFrame(data2)# 方法一df = data2[(data2.T != 0).any()]# 方法二df2 = data2.loc[(data2 != 0).any(1)]print(df)print('==================')print(df2)

operation result:

picture

Figure 2-7

explain:

The any() method will determine whether each row meets the conditions, as follows:
data2 = {'city': ['北京', '上海', '广州(粤语)', '深圳', '四川', '未知', 0],         '2018': [33105, 36011, 22859, 24221, np.nan, 0, 0]}data2 = pd.DataFrame(data2)df = (data2.T != 0).any()

operation result:

picture

Figure 2-8

If the specified column has a value of 0, the entire row will be deleted, and you can refer to 2.3.1.

2.3.6, remove the null value dropna()

data2 = {'city': ['北京', '上海', '广州(粤语)', '深圳', '四川'],         '2018': [33105, 36011, 22859, 24221, np.nan]}data2 = pd.DataFrame(data2)print(data2.dropna())

operation result:

picture

Figure 2-9

Without any parameters, as long as a row contains a null value, the entire row will be deleted.
When adding an axis , you can delete rows/columns that contain null values.
data2 = {'city': ['北京', '上海', '广州(粤语)', '深圳', '四川'],         '2018': [33105, 36011, 22859, 24221, np.nan]}data2 = pd.DataFrame(data2)print(data2.dropna(axis=1))

operation result:

picture

Figure 2-10

2.4. Additional data

This can be seen in Python | Pandas how to append to Excel , I have a detailed introduction in it.


picture

Recommended in the past


Using Pandas

Add, delete, check and modify pandas

pictureBy the way, after reading it, remember to click three times, this is really important to me.