Joke Collection Website - Blessing messages - How to capture website data with python?
How to capture website data with python?
Capture the static data of the website (data is in the source code of the webpage): Take the data of the encyclopedia website as an example.
1. Suppose we capture the following data, mainly including the user's nickname, content, number of jokes and number of comments, as follows:
The corresponding web page source code is as follows, including the data we need:
2. Corresponding to the web page structure, the main code is as follows, which is very simple. Requests+BeautifulSoup is mainly used, where requests is used for requesting pages and BeautifulSoup is used for parsing pages:
The screenshot of the program running is as follows, and the data has been successfully crawled:
Grab the dynamic data of the website (the data is not in the source code of the webpage, json and other files): Take the data of Renren Loan website as an example.
1. Suppose we are grabbing bond data, which mainly includes five fields: annual interest rate, loan title, term, amount and progress. The screenshot is as follows:
When you open the web page source code, you will find that these data are not in the web page source code. When you press F 12 for packet analysis, you will find it in the json file, as shown below:
2. After obtaining the url of json file, we can grab the corresponding data. The package used here is similar to the one above. Because it is a json file, we also use json package (parsing json). The main contents are as follows:
The screenshot of the program running is as follows, and the data has been successfully captured:
So far, this paper introduces the capture of these two types of data, including static data and dynamic data. Generally speaking, these two examples are not difficult. They are all entry-level reptiles, and the web page structure is relatively simple. The most important thing is to analyze and extract pages. After getting familiar with it, you can use scrapy to grab data, which is more convenient and efficient. Of course, if the crawled page is complicated, such as verification code and encryption, it needs to be carefully analyzed at this time, and there are some tutorials on the Internet.
- Related articles
- Nokia x5-00 mobile phone cannot receive 10086 SMS and Taobao verification code. We have tried to restore the factory settings, but it didn't work. What happened?
- How to see the remaining flow of Unicom's Li Li flow packet?
- What information will people who sign up for the summer exam receive?
- Good morning, greetings with sunshine and warm heart.
- 2020 Announcement on the introduction of urgently needed professional talents for public institutions in Wangjiang County, Anqing City, Anhui Province
- 32 Wen Er window
- Honor V10 enhanced information message reception abnormal?
- Which microfinance app is more reliable?
- What is the lightweight database (SQLite) stored in the mobile phone?
- How to recover the SMS intercepted by blacklist in Huawei nova9pro?