Joke Collection Website - Blessing messages - How to Write Anti-Crawler with python Crawler

How to Write Anti-Crawler with python Crawler

1. From the perspective of UA, UA is a UserAgent, and it is an identity symbol that needs a browser.

UA is a UserAgent, and it is an identity symbol that needs a browser. The anti-crawler mechanism identifies the crawler by judging that there is no UA in the access request header. This judgment method is very low-level, and it is usually not the only judgment standard. Anti-crawler is very simple and can be randomly numbered UA.

2. Judging from cookies: cookies refer to the password login verification of member accounts.

Cookie refers to the password login verification of member accounts, which is judged by distinguishing the frequency of account grabbing in a short time. This method is also very difficult for anti-crawler, which requires multiple accounts to crawl.

3. Judging according to the access frequency

Crawlers often visit the target website many times in a short time, and the anti-crawler mechanism can judge whether it is a crawler by the frequency of a single IP visit. This anti-crawling method is difficult to counter and can only be solved by changing IP.

4, through the verification code to determine

Verification code is a cost-effective implementation of anti-crawler. Anti-crawler usually needs to visit the OCR verification code recognition platform, or use TesseractOCR recognition, or use neural network to train and identify verification codes.

5. Dynamic page loading

Dynamically loaded websites are usually for users to click and view. Crawlers can't interact with pages, which greatly increases the difficulty of crawlers.

Generally speaking, when users grab information on the website, they will be bound by "reptiles", which will hinder users from obtaining information to some extent.