今天爬 facebook 网站的时候遇到一个奇怪的现象,有一条网址链接( facebook 站外),浏览器上是显示放在了<a>Tag 下的 href 里,但是我用爬虫工具(微软开发提供)的时候,工具里面显示的却是这样一串链接:
http://l.facebook.com/l.php?u=http%3A%2F%2Fwww.15wing.af.mil%2FUnits%2F735thAirMobilitySquadron.aspx&h=ATNg9KAgWaURccDb_FrA2uwozGwj0h3u_LIfRLjEawpTgETIW5_CIKrTaRzu5hDdvzBEIvz352BsKMeKvK9TizrS09bTfmWuPZFxTpDNTfwKELjX3hs3p4TdFWA&s=1 ,
差不多二次跳转的意思,这个是怎么实现的,我的理解是 facebook 为了防爬虫做的一个保护,但事实我又抓到了链接数据,完整的 Tag 是这么写的
浏览器:<a href="http://www.elephantjournal.com/" target="_blank" rel="nofollow" onmouseover="LinkshimAsyncLink.swap(this, "http:\/\/www.elephantjournal.com\/");" onclick="LinkshimAsyncLink.referrer_log(this, "http:\/\/www.elephantjournal.com\/", "\/si\/ajax\/l\/render_linkshim_log\/?u=http\u00253A\u00252F\u00252Fwww.elephantjournal.com\u00252F&h=ATP5Caih-YKbb5V_iuyP2oFeV1FXrh3P3KmTSjf-b9xeGTfgtIAzUpfOZ7CfRRRYfiULH6pIVvWIt66KhCWD7rhOpVfZC-ThhOaMU7CR_AEvo7BzANvpaXhKQT3f&render_verification=0&enc&d");">www.elephantjournal.com/</a>
爬虫工具:<a href="http://l.facebook.com/l.php?u=http%3A%2F%2Fwww.15wing.af.mil%2FUnits%2F735thAirMobilitySquadron.aspx&h=ATNg9KAgWaURccDb_FrA2uwozGwj0h3u_LIfRLjEawpTgETIW5_CIKrTaRzu5hDdvzBEIvz352BsKMeKvK9TizrS09bTfmWuPZFxTpDNTfwKELjX3hs3p4TdFWA&s=1" target="_blank" rel="nofollow" onmouseover="LinkshimAsyncLink.swap(this, "http:\/\/www.elephantjournal.com\/");" onclick="LinkshimAsyncLink.referrer_log(this, "http:\/\/www.elephantjournal.com\/", "\/si\/ajax\/l\/render_linkshim_log\/?u=http\u00253A\u00252F\u00252Fwww.elephantjournal.com\u00252F&h=ATP5Caih-YKbb5V_iuyP2oFeV1FXrh3P3KmTSjf-b9xeGTfgtIAzUpfOZ7CfRRRYfiULH6pIVvWIt66KhCWD7rhOpVfZC-ThhOaMU7CR_AEvo7BzANvpaXhKQT3f&render_verification=0&enc&d");">www.elephantjournal.com/</a>
那么问题来了 这里的 onmouseover="LinkshimAsyncLink.swap(this, "http:\/\/www.elephantjournal.com\/");"事件是什么意思 我写 JS 从来没见到过, facebook 自带的 React 框架也没这语法啊 还有就是页面上现实的 herf 为何与我爬下来的不一样