Joke Collection Website - Talk about mood - How to write Python crawler?
How to write Python crawler?
Python captures static data of web pages.
This is very simple, just request the page directly according to the URL. Here to grab the content of the encyclopedia of embarrassing things as an example:
1. Suppose the text we want to capture is as follows, which mainly includes four fields: nickname, content, number of paragraphs and number of comments:
Open the source code of the webpage, and the corresponding webpage structure is as follows. Very simple, the contents of all fields can be found directly:
2. According to the above web page structure, we can write relevant codes to capture web page data. It's simple. First, request the page according to the url address, and then use BeautifulSoup to parse the data (according to tags and attributes), as shown below:
The screenshot of the program running is as follows, and the data has been successfully crawled:
Python captures the dynamic data of web pages.
In many cases, the data of the web page is dynamically loaded, and we can't extract any data by directly crawling the web page. At this time, we need to grab package analysis to find dynamically loaded data, usually json files (of course, it may also be other types of files, such as xml, etc. ), and then request to parse this json file, so that we can get the data we need. Here is an example of grabbing scattered data on Renren Loan:
1. Suppose the data we crawled here are as follows, mainly including five fields: annual interest rate, loan title, term, amount and progress:
2. press F 12 to bring up the developer tool, and click network-> xhr in turn. F5 refreshes the page and you can find the dynamically loaded json file. Details are as follows:
3. Then, according to the above analysis, we can write relevant codes to capture data. The basic idea is similar to the static web page above. First, request json with a request, and then use the json package included with python to parse the data, as shown below:
The screenshot of the program running is as follows, and the data has been successfully obtained:
At this point, we have finished using python to capture web data. Generally speaking, the whole process is very simple. For beginners, requests and BeautifulSoup are very easy to learn and master. You can learn to use them. After you get familiar with it, you can learn the scrapy crawler framework, which can obviously improve the development efficiency. That's good. Of course, if there are encryption and verification codes in the web page, you need to ponder and study the countermeasures yourself. There are also related tutorials and materials on the Internet. If you are interested,
- Previous article:Sentence of borrowing money and not paying it back
- Next article:Express your sadness in one sentence. Tell me about 202 1.
- Related articles
- A composition on the topic of "ambition"
- Introduction of Kyle Polo's Background Story in the glory of the king
- Where is the 2022 Wuhan Chrysanthemum Exhibition?
- Why is the National Games more difficult than the Olympic Games? Can you elaborate on it?
- Talk about the mood of the puppy after death (talk about the sadness of the puppy's death)
- Since she played Guanyin Bodhisattva, many strange faces have appeared around her, and experts can't find the reason. Why?
- How to send it when you are confessed?
- Humor of resurrection with blood after illness
- Going out to play, do you prefer to stay in a homestay or a hotel? Why?
- Sentences that inspire women to be strong