picture

picture


In the process of crawling web pages, we often find that the data we want to obtain cannot be obtained simply by parsing the HTML code. These data are displayed on the page through AJAX asynchronous loading or JS rendering.

selenuim is an automated testing tool that supports multiple browsers. In the crawler, we can use it to simulate the browser browsing the page, and then solve the problem of JavaScript rendering.

1. Example of use

picture


2. Detailed introduction

2.1 Declare the browser object

That is, tell the program, which browser should be used for the operation

picture


2.2 Visit the page

picture


2.3 Find elements

After successfully accessing the web page, we may need to perform some operations, such as finding the search box and entering a keyword and then hitting the Enter key.

So, you need to find the element in selenium.

2.3.1 Single element

There are two ways for selenium to find elements.

The first is to specify which method to use to find elements, such as specifying CSS selection or xpath to find

picture


The following is a detailed element search method

find_element_by_name

find_element_by_xpath

find_element_by_link_text

find_element_by_partial_link_text

find_element_by_tag_name

find_element_by_class_name

find_element_by_css_selector

The second is to use find_element() directly, and the first parameter passed in is the element search method that needs to be used

picture


2.3.2 Multiple elements

Finding multiple elements is basically the same as finding a single element (just add an s to the func that finds a single element).

Finding multiple elements returns a list.

picture


2.4 Element Interaction

Element interaction is to get an element first, and then call the interaction method on the acquired element.

For example, enter text in the search box:

picture


2.5 Interactive actions

An interactive action is to attach an action to an interactive chain and execute it serially, which requires the use of ActionChains.

2.6 Execute JavaScript

such as drag and drop

picture


2.7 Get element information

After the element has been obtained through element search, it may also be necessary to obtain the attributes and text of the element

2.7.1 Get properties

picture


2.8 Frame

If you locate the parent frame, you cannot find the information of the child frame, so you need to switch to the child frame and search again. Similarly, the information of the parent frame cannot be found in the child frame.

picture


2.9 Waiting

When requesting a web page, there may be cases where AJAX loads asynchronously. And selenium will only load the main page, and will not take into account the AJAX situation. Therefore, you need to wait for some time when using it, and let the page load completely before proceeding.

2.9.1 Implicit Waits

When using implicit wait, if the webdriver does not find the specified element, it will continue to wait. After the specified time, if the specified element is still not found, an exception of element not found will be thrown. The default wait time is 0.

Implicit wait is to wait for the entire page.

It should be noted that the implicit wait works on the entire driver cycle, so it only needs to be set once.

picture


2.9.2 Explicit wait

Display wait includes wait condition and wait time .

First, determine whether the waiting condition is established, and if so, return directly; if the condition is not established, the maximum waiting time is the waiting time, and if the waiting waiting condition is still not met after the waiting time is exceeded, an exception is thrown.

Explicit wait is to wait for the specified element.

picture


2.10 Browser forward/backward

back realizes returning to the previous page, forward realizes going to the next page

picture


2.11 Working with Cookies

picture


2.12 Tab Management

Tab management is the browser's tabs. Sometimes we need to add a new tab or delete a tab in the browser, which can be done using selenium.

picture

*Disclaimer: This article is organized on the Internet, and the copyright belongs to the original author. If the source information is incorrect or infringes rights, please contact us for deletion or authorization.

picture