pictureSource: Internet


Today I will share with you an article on using openpyxl to operate Excel.


All kinds of data need to be imported into Excel? Multiple Excels to be merged? At present, there are many libraries for processing Excel files in Python, and openpyxl is one of the better ones in terms of function and performance. Next, I will introduce various Excel operations to you.


0 1
Open Excel file



Create a new Excel file

 >>> from openpyxl import Workbook >>> wb = Workbook()

Open an existing Excel file

>>> from openpyxl import load_workbook>>> wb2 = load_workbook('test.xlsx')

When opening large files, use read-only or write-only mode as needed to reduce memory consumption.

wb = load_workbook(filename='large_file.xlsx', read_only=True)
wb = Workbook(write_only=True)

0 2
Get and create worksheets
Get the currently active sheet:
 >>> ws = wb.active

Create a new worksheet:
    >>> ws1 = wb.create_sheet("Mysheet") # insert at the end (default)    # or    >>> ws2 = wb.create_sheet("Mysheet", 0) # insert at first position    # or    >>> ws3 = wb.create_sheet("Mysheet", -1) # insert at the penultimate position

Get a worksheet using the worksheet name:

    >>> ws3 = wb["New Title"]

Get all worksheet names:

    >>> print(wb.sheetnames)    ['Sheet2', 'New Title', 'Sheet1']使用for循环遍历所有的工作表:
>>> for sheet in wb: ... print(sheet.title)
0 3
save
Save to stream to use in network:

    >>> from tempfile import NamedTemporaryFile    >>> from openpyxl import Workbook    >>> wb = Workbook()    >>> with NamedTemporaryFile() as tmp:            wb.save(tmp.name)            tmp.seek(0)            stream = tmp.read()保存到文件:
>>> wb = Workbook() >>> wb.save('balances.xlsx')保存为模板:
>>> wb = load_workbook('document.xlsx') >>> wb.template = True >>> wb.save('document_template.xltx')
0 4
Cell
The cell position is read directly as a key of the worksheet:

    >>> c = ws['A4']

Assign values ​​to cells:

    >>> ws['A4'] = 4    >>> c.value = 'hello, world'

Multiple cells
You can use slices to access ranges of cells:

    >>> cell_range = ws['A1':'C2']

Use numeric format:

    >>> # set date using a Python datetime    >>> ws['A1'] = datetime.datetime(2010, 7, 21)    >>>>>> ws['A1'].number_format    'yyyy-mm-dd h:mm:ss'

Use the formula:

    >>> # add a simple formula    >>> ws["A1"] = "=SUM(1, 1)"

When merging cells, all but the top left cell are removed from the worksheet:

    >>> ws.merge_cells('A2:D2')    >>> ws.unmerge_cells('A2:D2')    >>>>>> # or equivalently    >>> ws.merge_cells(start_row=2, start_column=1, end_row=4, end_column=4)    >>> ws.unmerge_cells(start_row=2, start_column=1, end_row=4, end_column=4)
0 5
row, column
Rows, columns, or ranges of rows and columns can be specified individually:

    >>> colC = ws['C']    >>> col_range = ws['C:D']    >>> row10 = ws[10]    >>> row_range = ws[5:10]

Rows can be iterated over using the Worksheet.iter_rows() method:

    >>> for row in ws.iter_rows(min_row=1, max_col=3, max_row=2):    ...    for cell in row:    ...        print(cell)    <Cell Sheet1.A1>    <Cell Sheet1.B1>    <Cell Sheet1.C1>    <Cell Sheet1.A2>    <Cell Sheet1.B2>    <Cell Sheet1.C2>

The same Worksheet.iter_cols() method will iterate over the columns:

    >>> for col in ws.iter_cols(min_row=1, max_col=3, max_row=2):    ...     for cell in col:    ...         print(cell)    <Cell Sheet1.A1>    <Cell Sheet1.A2>    <Cell Sheet1.B1>    <Cell Sheet1.B2>    <Cell Sheet1.C1>    <Cell Sheet1.C2>

To loop through all rows or columns of a file, you can use the Worksheet.rows property:

    >>> ws = wb.active    >>> ws['C9'] = 'hello world'    >>> tuple(ws.rows)    ((<Cell Sheet.A1>, <Cell Sheet.B1>, <Cell Sheet.C1>),    (<Cell Sheet.A2>, <Cell Sheet.B2>, <Cell Sheet.C2>),    (<Cell Sheet.A3>, <Cell Sheet.B3>, <Cell Sheet.C3>),    (<Cell Sheet.A4>, <Cell Sheet.B4>, <Cell Sheet.C4>),    (<Cell Sheet.A5>, <Cell Sheet.B5>, <Cell Sheet.C5>),    (<Cell Sheet.A6>, <Cell Sheet.B6>, <Cell Sheet.C6>),    (<Cell Sheet.A7>, <Cell Sheet.B7>, <Cell Sheet.C7>),    (<Cell Sheet.A8>, <Cell Sheet.B8>, <Cell Sheet.C8>),    (<Cell Sheet.A9>, <Cell Sheet.B9>, <Cell Sheet.C9>))

or the Worksheet.columns property:

    >>> tuple(ws.columns)    ((<Cell Sheet.A1>,    <Cell Sheet.A2>,    <Cell Sheet.A3>,    <Cell Sheet.A4>,    <Cell Sheet.A5>,    <Cell Sheet.A6>,    ...    <Cell Sheet.B7>,    <Cell Sheet.B8>,    <Cell Sheet.B9>),    (<Cell Sheet.C1>,    <Cell Sheet.C2>,    <Cell Sheet.C3>,    <Cell Sheet.C4>,    <Cell Sheet.C5>,    <Cell Sheet.C6>,    <Cell Sheet.C7>,    <Cell Sheet.C8>,    <Cell Sheet.C9>))

Use Worksheet.append() or iteratively use Worksheet.cell() to add a new row of data:

    >>> for row in range(1, 40):    ...     ws1.append(range(600))
>>> for row in range(10, 20): ... for col in range(27, 54): ... _ = ws3.cell(column=col, row=row, value="{0}".format(get_column_letter(col)))

Insertion operations are cumbersome. A row or rows can be inserted using Worksheet.insert_rows() :

     >>> from openpyxl.utils import get_column_letter     >>> ws.insert_rows(7)      >>> row7 = ws[7]     >>> for col in range(27, 54):    ...         _ = ws3.cell(column=col, row=7, value="{0}".format(get_column_letter(col)))

Worksheet.insert_cols() operates similarly. Worksheet.delete_rows() and Worksheet.delete_cols() are used to delete rows and columns in batches.

0 6
read value only
Use the Worksheet.values ​​property to iterate over all rows in a worksheet, but return only cell values:

    for row in ws.values:       for value in row:         print(value)

Worksheet.iter_rows() and Worksheet.iter_cols() can set the values_only parameter to return only cell values:

    >>> for row in ws.iter_rows(min_row=1, max_col=3, max_row=2, values_only=True):    ...   print(row)    (None, None, None)    (None, None, None)

- For cooperation, communication and reprint, please add WeChat moonhmily1 -

picture