1
mutoulbj 2015-06-17 16:57:55 +08:00
只要数据是可以获取到,再javascript里也没关系,自己处理下文本,再从中提取就可以了。
|
2
mhycy 2015-06-17 17:01:42 +08:00
分析JS逻辑,最简单直接用正则表达式抓取后重建索引
|
3
hiboshi 2015-06-17 17:15:56 +08:00
在js里面就更简单了直接正则匹配js文件
|
4
fangjinmin 2015-06-17 17:54:38 +08:00 1
import urllib2
import re from bs4 import BeautifulSoup url="http://www.exporivaschuh.it/catalogue/15ES2/search.html" soup = BeautifulSoup(urllib2.urlopen(url).read()) script = soup.findAll('script')[0].string p1 = re.compile('new e \(.*\)') arrEs = p1.findall(script) f = open('companysofchina.csv', 'w') for e in arrEs: e = e.replace('new e (','').replace(')', '') arrItems = eval('[' + e + ']') if arrItems[3] == 'CN': company = arrItems[0] tel = arrItems[9] email = arrItems[10] f.write(company + ',' + tel + ',' + email + '\n') f.close() |
5
redhatping OP @fangjinmin 没错,搞定。。。 非常感谢 ,
|
6
aeshfawre 2016-07-09 06:39:47 +08:00
@redhatping 已发邮件
|