web crawler - Python place the spider datas in an excel file -
i lose 1 hour each day classify information websites i'm trying build python spider data website , classify them automatically in excel file.
i built part data not know how can append them in excel file code using.
here code:
import requests bs4 import beautifulsoup import xlsxwriter def spider_list(max_pages): page = 2 while page < max_pages: url = 'http://yellow.local.ch/fr/q/morges/bar.html?page=' + str(page) source_code = requests.get(url) plain_text = source_code.text soup = beautifulsoup(plain_text, 'html.parser') link in soup.findall('a', {'class':'details-entry-title-link'}): href = link.get('href') spider_data(href) page += 1 def spider_data(item_url): source_code = requests.get(item_url) plain_text = source_code.text soup = beautifulsoup(plain_text, 'html.parser') items in soup.findall('h1'): print("\n" + items.string) num in soup.findall('a', {'class':'number'}): print(num.string) mail in soup.findall('a', {'class':'redirect'}): print(mail.string) spider_list(3)
each group of information should display horizontaly, here exemple:
how should ? ----------- edit -----------
okay, created last part of code doesn't work why?
def spider_data(item_url): source_code = requests.get(item_url) plain_text = source_code.text soup = beautifulsoup(plain_text, 'html.parser') datas = [] items in soup.findall('h1'): datas.append(items.string) num in soup.findall('a', {'class':'number'}): datas.append(num.string) mail in soup.findall('a', {'class':'redirect'}): datas.append(mail.string) csv_create(datas) def csv_create(data): myfile = open('mydatas.csv', 'wb') wr = csv.writer(myfile, quoting=csv.quote_all) wr.writerow(data)
excel can read .csv files. if have lines of text this: "title1, number1, website1 \n" you'll excel file looks that. either use python's built-in csv methods or build pandas dataframe , use to_csv (which save having worry writing commas , newline characters). helps
Comments
Post a Comment