Python to operate Excel

Today, I will rewrite pandas. I have written two articles about pandas before and feel that the writing is not good enough. Now I have a deeper understanding of pandas in combination with the work content. This article only explains the functions that I commonly use in my work. For detailed usage, please refer to the previous recommendations below.

Recommended in the past

Using Pandas

Add, delete, search and modify in Pandas

Install

pip install pandas

Because pandas also depends on other packages and needs to be downloaded by itself, anyway, just pip whatever is missing.

1. The use of Pandas at work

2. Common functions

1. The use of Pandas at work

What I use most at work is to read non-standard data from another container and process it with python, and finally store it in Excel as a data source for other Excel calls, as shown below.

At the beginning, I learned a bunch of functions in pandas, but now I basically forget them. Personally, I use pandas the most for reading and storage. As for data operations, all operations are performed in python syntax. Next, let’s introduce the commonly used functions.

2. Common functions

Create an excel file first

import pandas as pd
data = {'city': ['北京', '上海', '广州', '深圳'],        '2018': [33105, 36011, 22859, 24221]}data = pd.DataFrame(data)data.to_excel('excel练习.xlsx', index=False)

2.1, read data

df = pd.read_excel('excel练习.xlsx')print(df)print(type(df))print(df.values)print(type(df.values))

operation result:

Figure 2-1

The DataFrame type is not very convenient to operate in python, so use values to convert to the form of a numpy array (this is very similar to a list and can be manipulated like a list).

Notice:

When reading data, you can specify the data type of a column. Recently, I encountered a problem as shown in the figure below. When I read the data and then store it in the text type, 001 becomes a number 1.

Figure 2-2

To solve this problem, you only need to add the parameter converters to specify the number when reading the data. This column is a character type.

pd.read_excel(file_path, converters={'编号': str})

read data from a column

df = pd.read_excel('excel练习.xlsx')data = df['city']print(data)

operation result:

Figure 2-3

2.2. Store data

data = {'city': ['北京', '上海', '广州', '深圳'],        '2018': [33105, 36011, 22859, 24221]}data = pd.DataFrame(data)data.to_excel('excel练习.xlsx', index=False)

Storing data is very simple, you only need to process the data like this, and then convert it into a DataFrame type before it can be stored.

2.3. Delete data

Delete a row containing a value, delete a specified row/column, remove duplicates, remove 0 values, remove null values

2.3.1, delete a row containing a value

data = df[df.city != '深圳']

operation result:

Figure 2-4

2.3.2, delete the specified line drop()

data = df.drop([0, 1], axis=0)

delete lines 0 and 1

operation result:

Figure 2-5

2.3.3, delete the specified column drop()

data = df.drop(['2018'], axis=1)

operation result:

Figure 2-6

2.3.4, deduplication drop_duplicates()

data.drop_duplicates(['city'])data.drop_duplicates(['city', '2018'], keep='last')

Deduplication of the specified column (this column can contain multiple elements such as: ['city', '2018']), the first occurrence of data is retained by default, but the last occurrence of data can be retained by adding the parameter keep='last ' .

2.3.5, go to 0 value

data2 = {'city': ['北京', '上海', '广州(粤语)', '深圳', '四川', '未知', 0],         '2018': [33105, 36011, 22859, 24221, np.nan, 0, 0]}data2 = pd.DataFrame(data2)# 方法一df = data2[(data2.T != 0).any()]# 方法二df2 = data2.loc[(data2 != 0).any(1)]print(df)print('==================')print(df2)

operation result:

Figure 2-7

explain:

The any() method will determine whether each row meets the conditions, as follows:

data2 = {'city': ['北京', '上海', '广州(粤语)', '深圳', '四川', '未知', 0],         '2018': [33105, 36011, 22859, 24221, np.nan, 0, 0]}data2 = pd.DataFrame(data2)df = (data2.T != 0).any()

operation result:

Figure 2-8

If the specified column has a value of 0, the entire row will be deleted, and you can refer to 2.3.1.

2.3.6, remove the null value dropna()

data2 = {'city': ['北京', '上海', '广州(粤语)', '深圳', '四川'],         '2018': [33105, 36011, 22859, 24221, np.nan]}data2 = pd.DataFrame(data2)print(data2.dropna())

operation result:

Figure 2-9

Without any parameters, as long as a row contains a null value, the entire row will be deleted.

When adding an axis , you can delete rows/columns that contain null values.

data2 = {'city': ['北京', '上海', '广州(粤语)', '深圳', '四川'],         '2018': [33105, 36011, 22859, 24221, np.nan]}data2 = pd.DataFrame(data2)print(data2.dropna(axis=1))

operation result:

Figure 2-10

2.4. Additional data

This can be seen in Python | Pandas how to append to Excel , I have a detailed introduction in it.

Recommended in the past

Using Pandas

Add, delete, check and modify pandas

By the way, after reading it, remember to click three times, this is really important to me.

Python crawler and data analysis

Personal study notes, crawlers and derivative knowledge with crawlers as the core (data crawling, data visualization, data analysis), and now more financial data is obtained.

68 original content

Install

Table of contents

1. The use of Pandas at work

2. Common functions