node.js - Chrome automatically changed DOM, or different from what cheerio gets -
so writing web scraping application using cheerio.js
. things going until noticed cheerio
$('tbody tr')
return nothing, while when open same website in chrome, jquery
$('tbody tr')
return rows in table body. in cheerio
's body, there no tbody. structure <table><theader></theader><tr></tr><tr></tr></table>
. did chrome make change? did cheerio passed html response incorrectly?
following 3 html code snippets same when rendered html browser, yet original code different.
no
thead
notbody
in source code<table><tr><td>row1</td></tr><tr><td>row2</td></tr></table>
no
tbody
in source code<table><thead></thead><tr><td>row1</td></tr><tr><td>row2</td></tr></table>
tbody
, nothead
in source code<table><tbody><tr><td>row1</td></tr><tr><td>row2</td></tr></tbody></table>
according w3schools.com browsers can use thead
, tbody
, tfoot
elements enable scrolling of table body independently of header , footer.
browsers can optimize, normalize or modify dom before using display, long used dom renders intended.
in case, cheerio
parser reads source code (result of node.js
request) as-is , creates in-memory dom representation can traverse/modify later.
while jquery
when run browser reads normalized , optimized dom parsed , processed html browser.
while 2 doms may different, same when presented user not bug, feature
Comments
Post a Comment