Joke Collection Website - Cold jokes - How to write Python crawler?
How to write Python crawler?
Python captures static data of web pages.
This is very simple, just request the page directly according to the URL. Here to grab the content of the encyclopedia of embarrassing things as an example:
1. Suppose the text we want to capture is as follows, which mainly includes four fields: nickname, content, number of paragraphs and number of comments:
Open the source code of the webpage, and the corresponding webpage structure is as follows. Very simple, the contents of all fields can be found directly:
2. According to the above web page structure, we can write relevant codes to capture web page data. It's simple. First, request the page according to the url address, and then use BeautifulSoup to parse the data (according to tags and attributes), as shown below:
The screenshot of the program running is as follows, and the data has been successfully crawled:
Python captures the dynamic data of web pages.
In many cases, the data of the web page is dynamically loaded, and we can't extract any data by directly crawling the web page. At this time, we need to grab package analysis to find dynamically loaded data, usually json files (of course, it may also be other types of files, such as xml, etc. ), and then request to parse this json file, so that we can get the data we need. Here is an example of grabbing scattered data on Renren Loan:
1. Suppose the data we crawled here are as follows, mainly including five fields: annual interest rate, loan title, term, amount and progress:
2. press F 12 to bring up the developer tool, and click network-> xhr in turn. F5 refreshes the page and you can find the dynamically loaded json file. Details are as follows:
3. Then, according to the above analysis, we can write relevant codes to capture data. The basic idea is similar to the static web page above. First, request json with a request, and then use the json package included with python to parse the data, as shown below:
The screenshot of the program running is as follows, and the data has been successfully obtained:
At this point, we have finished using python to capture web data. Generally speaking, the whole process is very simple. For beginners, requests and BeautifulSoup are very easy to learn and master. You can learn to use them. After you get familiar with it, you can learn the scrapy crawler framework, which can obviously improve the development efficiency. That's good. Of course, if there are encryption and verification codes in the web page, you need to ponder and study the countermeasures yourself. There are also related tutorials and materials on the Internet. If you are interested,
- Previous article:Tell a joke ... be funny.
- Next article:A joke that makes people laugh instantly.
- Related articles
- 2022 Goose Factory released the film list, which excellent film and television dramas will be launched soon?
- 2 1 realistic network golden sentence
- Finding yourself is a big joke.
- Tell a joke about going to my sister-in-law's house at noon.
- A funny joke in a circle of friends
- The reasons why people take drugs
- Wonderful reading experience of human nature
- The joke story of Erguotou treatment
- If tolerance exists, warmth will accompany the composition.
- I want to find an animated film. There is a lizard, a fox, a wild boar and a vulture in the desert. Old hens lay eggs and bombs like airplanes.