Joke Collection Website - Joke collection - How to use Python to crawl static websites and their internal resources?

How to use Python to crawl static websites and their internal resources?

This is very simple. It can be easily implemented with the combination of requests+BeautifulSoup. I will briefly introduce it below. Interested friends can try it by themselves. Here is an example of crawling the website data of Encyclopedia of Embarrassing Things (static website):

1. First, install the requests module. Just enter the command "pipinstallrequests" directly in the cmd window, as follows:

2. Then install the bs4 module, which contains BeautifulSoup. Install If so, just like requests, just enter the installation command "pipinstallbs4" directly, as follows:

3. Finally, the requests+BeautifulSoup combination is used to crawl the Encyclopedia of Embarrassing Things. requests is used to request the page, and BeautifulSoup is used to parse the page. To extract data, the main steps and screenshots are as follows:

It is assumed that the crawled data contains the following fields, including user nickname, content, number of funny numbers and number of comments: Then open the source code of the corresponding web page and you can view it directly Go to the field information, the content is as follows, nested in each tag, and then parse these tags to extract data:

Based on the content of the above web page, the test code is as follows, very simple, directly find the corresponding tag, and extract the text content. Available:

The program running screenshot is as follows, the website data has been successfully captured:

At this point, we have completed using python to crawl static websites. In general, the whole process is very simple, and it is also the most basic crawler content. As long as you have a certain python foundation and are familiar with the above examples, you can master it quickly. Of course, you can also use urllib and regular expression matching. Wait, whatever. There are also relevant tutorials and information on the Internet. The introduction is very detailed. If you are interested, you can search it. I hope the content shared above can be helpful to you. You are also welcome to comment and leave messages to add.