In the process of crawling web pages, we often find that the data we want to obtain cannot be obtained simply by parsing the HTML code. These data are displayed on the page through AJAX asynchronous loading or JS rendering.
selenuim is an automated testing tool that supports multiple browsers. In the crawler, we can use it to simulate the browser browsing the page, and then solve the problem of JavaScript rendering.
1. Example of use
2. Detailed introduction
2.1 Declare the browser object
That is, tell the program, which browser should be used for the operation
2.2 Visit the page
2.3 Find elements
After successfully accessing the web page, we may need to perform some operations, such as finding the search box and entering a keyword and then hitting the Enter key.
So, you need to find the element in selenium.
2.3.1 Single element
There are two ways for selenium to find elements.
The first is to specify which method to use to find elements, such as specifying CSS selection or xpath to find
The following is a detailed element search method
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector
The second is to use find_element() directly, and the first parameter passed in is the element search method that needs to be used
2.3.2 Multiple elements
Finding multiple elements is basically the same as finding a single element (just add an s to the func that finds a single element).
Finding multiple elements returns a list.
2.4 Element Interaction
Element interaction is to get an element first, and then call the interaction method on the acquired element.
For example, enter text in the search box:
2.5 Interactive actions
An interactive action is to attach an action to an interactive chain and execute it serially, which requires the use of ActionChains.
2.6 Execute JavaScript
such as drag and drop
2.7 Get element information
After the element has been obtained through element search, it may also be necessary to obtain the attributes and text of the element
2.7.1 Get properties
2.8 Frame
If you locate the parent frame, you cannot find the information of the child frame, so you need to switch to the child frame and search again. Similarly, the information of the parent frame cannot be found in the child frame.
2.9 Waiting
When requesting a web page, there may be cases where AJAX loads asynchronously. And selenium will only load the main page, and will not take into account the AJAX situation. Therefore, you need to wait for some time when using it, and let the page load completely before proceeding.
2.9.1 Implicit Waits
When using implicit wait, if the webdriver does not find the specified element, it will continue to wait. After the specified time, if the specified element is still not found, an exception of element not found will be thrown. The default wait time is 0.
Implicit wait is to wait for the entire page.
It should be noted that the implicit wait works on the entire driver cycle, so it only needs to be set once.
2.9.2 Explicit wait
Display wait includes wait condition and wait time .
First, determine whether the waiting condition is established, and if so, return directly; if the condition is not established, the maximum waiting time is the waiting time, and if the waiting waiting condition is still not met after the waiting time is exceeded, an exception is thrown.
Explicit wait is to wait for the specified element.
2.10 Browser forward/backward
back realizes returning to the previous page, forward realizes going to the next page
2.11 Working with Cookies
2.12 Tab Management
Tab management is the browser's tabs. Sometimes we need to add a new tab or delete a tab in the browser, which can be done using selenium.
*Disclaimer: This article is organized on the Internet, and the copyright belongs to the original author. If the source information is incorrect or infringes rights, please contact us for deletion or authorization.