web crawler - Python place the spider datas in an excel file -

- February 15, 2010

i lose 1 hour each day classify information websites i'm trying build python spider data website , classify them automatically in excel file.

i built part data not know how can append them in excel file code using.

here code:

import requests bs4 import beautifulsoup import xlsxwriter  def spider_list(max_pages):     page = 2     while page < max_pages:         url = 'http://yellow.local.ch/fr/q/morges/bar.html?page=' + str(page)         source_code = requests.get(url)         plain_text = source_code.text         soup = beautifulsoup(plain_text, 'html.parser')         link in soup.findall('a', {'class':'details-entry-title-link'}):             href = link.get('href')             spider_data(href)         page += 1  def spider_data(item_url):     source_code = requests.get(item_url)     plain_text = source_code.text     soup = beautifulsoup(plain_text, 'html.parser')     items in soup.findall('h1'):         print("\n" + items.string)     num in soup.findall('a', {'class':'number'}):         print(num.string)     mail in soup.findall('a', {'class':'redirect'}):         print(mail.string)   spider_list(3)

each group of information should display horizontaly, here exemple:

how should ? ----------- edit -----------

okay, created last part of code doesn't work why?

def spider_data(item_url):     source_code = requests.get(item_url)     plain_text = source_code.text     soup = beautifulsoup(plain_text, 'html.parser') datas = [] items in soup.findall('h1'):     datas.append(items.string) num in soup.findall('a', {'class':'number'}):     datas.append(num.string) mail in soup.findall('a', {'class':'redirect'}):     datas.append(mail.string) csv_create(datas)  def csv_create(data):     myfile = open('mydatas.csv', 'wb')     wr = csv.writer(myfile, quoting=csv.quote_all)     wr.writerow(data)

excel can read .csv files. if have lines of text this: "title1, number1, website1 \n" you'll excel file looks that. either use python's built-in csv methods or build pandas dataframe , use to_csv (which save having worry writing commas , newline characters). helps

Search This Blog

Core code

web crawler - Python place the spider datas in an excel file -

Comments

Post a Comment

Popular posts from this blog

php - Admin SDK -- get information about the group -

Python Error - TypeError: input expected at most 1 arguments, got 3 -

qt - Passing a QObject to an Script function with QJSEngine? -