There is a large amount of encrypted Excel that needs to be imported into the database, what should I do? Today I will introduce you to a practical Python package (tool).

msoffcrypto-tool [1] (previously known as ms-offcrypto-tool) is a Python tool and library for decrypting encrypted MS Office files using passwords, intermediate keys, or private keys that generate their escrow keys.

Install

pip install msoffcrypto-tool

example

as a CLI tool (with password)

$ msoffcrypto-tool encrypted.docx decrypted.docx -p Passw0rd

If the password parameter value is omitted, a password is prompted:

$ msoffcrypto-tool encrypted.docx decrypted.docx -p
Password:

Test if the file is encrypted (returns exit code 0 or 1):

$ msoffcrypto-tool document.doc --test -v

As a library, write procedures to call

Library functions support ciphers and more key types.

  • Decrypt the document (either Word or Excel):
import msoffcrypto
with open(full_path, 'rb'as office_in:
    # Open the file
    office_file = msoffcrypto.OfficeFile(office_in)
    # Input the password
    office_file = msoffcrypto.load_key(password='mypassword')

    # open the output
    with open(out_path, 'wb'as office_out:
        # Run decrypt. This will write to the output file.
        office_file.decrypt(office_out)
  • Decrypt Excel, and use Pandas to read the decrypted Excel (in memory)
import msoffcrypto
import io
import pandas as pd

decrypted = io.BytesIO()

with open("encrypted.xlsx""rb"as f:
    file = msoffcrypto.OfficeFile(f)
    file.load_key(password="Passw0rd")  # Use password
    file.decrypt(decrypted)

df = pd.read_excel(decrypted)
print(df)

The above code is the core code for batch importing encrypted Excel.

  • Advanced usage:
# 解密前先验证密码 (默认: False)
# ECMA-376敏捷/标准加密系统允许人们在实际解密文件之前知道提供的密码是否正确。
# 目前,verify_password选项仅对ECMA-376敏捷/标准加密有意义
file.load_key(password="Passw0rd", verify_password=True)

# 使用密钥
file.load_key(private_key=open("priv.pem""rb"))

# 使用中间键, intermediate key (secretKey)
file.load_key(secret_key=binascii.unhexlify("AE8C36E68B4BB9EA46E5544A5FDB6693875B2FDE1507CBC65C8BCF99E25C2562"))

# 在解密前检查数据负载的HMAC(默认:False)
# 目前,verify_integrity选项仅对ECMA-376敏捷加密有意义
file.decrypt(open("decrypted.docx""wb"), verify_integrity=True)

Supported encryption methods

MS-OFFCRYPTO Specifications

  • [x] ECMA-376 (Agile Encryption/Standard Encryption)
    • [x] MS-DOCX (OOXML) (Word 2007-2016)
    • [x] MS-XLSX (OOXML) (Excel 2007-2016)
    • [x] MS-PPTX (OOXML) (PowerPoint 2007-2016)
  • [x] Office Binaries RC4 CryptoAPI
    • [x] MS-DOC (Word 2002, 2003, 2004)
    • [x] MS-XLS (Excel 2002, 2003, 2004) (experimental)
    • [x] MS-PPT (PowerPoint 2002, 2003, 2004) (partial, experimental)
  • [x] Office binaries RC4
    • [x] MS-DOC (Word 97, 98, 2000)
    • [x] MS-XLS (Excel 97, 98, 2000) (experimental)
  • [ ] ECMA-376 (Extensible Encryption)
  • [ ] XOR confusion

More robust Office document decryption code

# Open the file
from pathlib import Path
import msoffcrypto

full_path = Path('input_file.docx')
out_path = Path('output_file.docx')
with open(full_path, 'rb'as office_in:
    try:
        # Load it in to msoffcrypto
        office_file = msoffcrypto.OfficeFile(office_in)
        office_file.load_key(password=password)
    except OSError:
        # OSError will be thrown if you passed in a file that isn't an office file
        return 'not an office file'
    except AssertionError:
        # Office 97~2004 files only:
        # AssertionError will be thrown on load_key if the password is wrong
        return 'wrong password'
    except Exception:
        # xls files only:
        # msoffcrypto will throw a generic Exception on load_key if the file isn't encrypted
        return 'not encrypted'

    if not office_file.is_encrypted():
        # Other than xls files, you can check if a file is encrypted with the .is_encrypted function
        return 'not encrypted'

    # Open your desired output as a file
    with open(out_path, 'wb'as office_out:
        try:
            # load_key just inputs a password; you need to call decrypt to actually decrypt it.
            office_file.decrypt(office_out)
        except error:
            # Office 97~2003 Only: These files aren't supported yet.
            # If the password is CORRECT, msoffcrypto will through a generic 'error'
            return 'encrypted, but decryption not supported'
        except Exception:
            # Finally, msoffcrypto will throw a generic Exception on decrypt if the password is wrong
            return 'wrong password'

    # If you want to overwrite it, you must save it separately and then move it
    # shutil.move(out_path, full_path)

References

[1]

msoffcrypto-tool: https://github.com/nolze/msoffcrypto-tool