0x01 Install beautifulsoup4 library
pip3 install beautifulsoup4
0x02 Initialization operation
Initialize the string to be manipulated through BeautifulSoup
from bs4 import BeautifulSoup
import requests
url = "https://www.dandanzan10.top/dianying/index.html"
heads = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
}
r = requests.get(url, headers=heads)
str = r.text
sp=BeautifulSoup(str,'lxml')
print(sp)
0x03 Get the movie name
1. Right-click the string to be obtained and select Inspect Element
2. Pinocchio is under the h2 tag
3. Code implementation
from bs4 import BeautifulSoup
import requests
url = "https://www.dandanzan10.top/dianying/index.html"
heads = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
}
r = requests.get(url, headers=heads)
str = r.text
sp=BeautifulSoup(str,'lxml')
print(sp.h2.string)
0x04 Get all movie names on this page
from bs4 import BeautifulSoup
import requests
url = "https://www.dandanzan10.top/dianying/index.html"
heads = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
}
r = requests.get(url, headers=heads)
str = r.text
sp=BeautifulSoup(str,'lxml')
for h2 in sp.find_all(name='h2'):
print(h2.string)
运行结果: 匹诺曹
心弦为君而鸣
我的爸爸
犬部!
孩子不想理解
独自生活的人们
欧比旺:绝地归来
欢快的鬼魂
雷神4:爱与雷霆
致命邮件:2001 美国炭疽攻击事件
布朗克斯大战吸血鬼
嚎笑捉鬼队
旅馆闹鬼
闲山:龙的出现
非常宣言
鬼影实录:血亲
小犬与女孩
小鹿乱撞爱上你
单向逃离
防线-秘密护送
爱的透视图
坏种2
婚头转向
海豹自卫队
1. sp.find_all(name='h2'): Get all the contents of the label h2, which is a list
2. Output through the loop
3. Get the string inside through string
0x05 declaration
It is only for safety research and learning. If the tool is used for other purposes, the user shall bear all legal and joint responsibilities, and the author shall not bear any legal and joint responsibilities.
Welcome to the programmers