Joke Collection Website - Mood Talk - How to start using Python crawler

How to start using Python crawler

I have been in contact with reptiles for 1 month. From python Xiaobai to deciphering various anti-reptile mechanisms, I'll tell you my direction:

1, learn to use the function of parsing web pages, such as:

Import? urllib.request

What if? __name__？ ==? __main__ ':

Website? =? "..."

Data? =? urllib.request.urlopen(url)。 read()？ # urllib.request.urlope (URL to be parsed)

Data? =? data.decode('unicode_escape '，' ignore ')？ # Decoding in unicode_escape mode

Print (data)

2. Learn regular expressions:

The symbolic meaning of regular expression is as follows. Regular expression filters out the information in the above data, for example:

Def get_all (data):

Reg = r' (search. +)(" )(mars_sead= "。 +title= ")(。 +)(" data-id= ")"

All = recompile (reg);

alllist = re.findall(all，data)

Return all lists

3. Press the result into the array:

What if? __name__？ ==? __main__ ':

Information = []

info.append(get_all(data))

4. Write the array into excel:

Import xlsxwriter

What if? __name__？ ==? __main__ ':

Information = []

info.append(get_all(data))

workbook = xlsxwriter . workbook(' C:\ \ Users \ \ Administrator \ \ Desktop \ \ What ' s file name。 xlsx’)？ # Create an Excel file

Worksheet = workbook.add_worksheet ()? # Create a worksheet object

For I(0, len(info)) in the range:

Worksheet.write (row, column, info[i], font)# Write info[i] line by line.

Workbook.close()# Close excel

Simple reptiles did it, but advanced reptiles didn't teach it. You haven't touched it yet, so you can't understand it.