Joke Collection Website - Blessing messages - Which is the Python crawler framework, Zhihu?

Which is the Python crawler framework, Zhihu?

1, Scrapy: It is an application framework written to capture website data and extract data structure data. It can be applied to a series of programs including data mining, information processing or storing historical data, and various information data can be easily captured by using this framework.

2.Pyspider: It is a powerful web crawler system implemented in Python. It can write scripts, schedule functions and view the crawling results in real time in the browser interface. The crawling structure of common databases is stored in the back end, and tasks and task priorities are set regularly.

3.Crawley: it can crawl the corresponding website content at high speed, support relational and non-relational databases, and the data can be exported as json, xml, etc.

4.Portia is an open source visual crawler tool, which allows you to crawl websites without any programming knowledge, just annotate the pages you are interested in and create a spider to crawl the data of similar pages.

5. Newspaper: It can be used to extract news, articles and content analysis, use multithreading, and support programming languages above 10.

6. Meitang: It is a Python library, which can extract data from HTML or xml files and realize common ways of document navigation, document search and document modification through your favorite converter; At the same time, it helps you save hours or even days of work time.

7.Grab is a Python framework for creating web crawlers. With Grab, you can create a variety of complex web crawling tools, from simple five-line scripts to complex asynchronous web crawling tools that handle tens of thousands of web pages. Grab provides an api to execute network requests and process received content.

8.Cola: It is a distributed crawler framework. For users, only a few specific functions need to be written, and there is no need to pay attention to the details of distributed operations. Tasks are automatically assigned to multiple machines, and the whole process is transparent to users.