我直接 json.loads(response.text)
返回的报错
Traceback (most recent call last): 780 File "/home/shenjianlin/.local/lib/python3.4/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks 781 current.result = callback(current.result, *args, **kw) 782 File "/home/shenjianlin/my_project/Espider/Espider/spiders/xxgkmiit.py", line 31, in parse 783 _origin=json.loads(response.text) 784 File "/usr/lib64/python3.4/json/init.py", line 318, in loads 785 return _default_decoder.decode(s) 786 File "/usr/lib64/python3.4/json/decoder.py", line 343, in decode 787 obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 788 File "/usr/lib64/python3.4/json/decoder.py", line 361, in raw_decode 789 raise ValueError(errmsg("Expecting value", s, err.value)) from None 790 ValueError: Expecting value: line 1 column 1 (char 0)
1
Trim21 2019-01-09 12:37:40 +08:00
|
2
Sylv 2019-01-09 12:38:12 +08:00 via iPhone
把外围的 jQuery111102456514014162614_1546997791362(); 去掉才是合法 json 格式。
|
3
vincentxue 2019-01-09 13:05:39 +08:00 via iPhone
说明做了 MIIT 的程序员做了防 JSON 劫持。原因可以参见这里: http://www.10tiao.com/html/788/201811/2247489959/1.html
|
4
royzxq 2019-01-09 13:12:14 +08:00
请参考 #1 #2
|
7
Ewig OP @fan2006 Traceback (most recent call last):
922 File "/home/shenjianlin/.local/lib/python3.4/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks 923 current.result = callback(current.result, *args, **kw) 924 File "/home/shenjianlin/my_project/Espider/Espider/spiders/xxgkmiit.py", line 30, in parse 925 _origin=json.loads(response.text.split(');\r\n')[0][1:]) 926 File "/usr/lib64/python3.4/json/__init__.py", line 318, in loads 927 return _default_decoder.decode(s) 928 File "/usr/lib64/python3.4/json/decoder.py", line 343, in decode 929 obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 930 File "/usr/lib64/python3.4/json/decoder.py", line 361, in raw_decode 931 raise ValueError(errmsg("Expecting value", s, err.value)) from None 932 ValueError: Expecting value: line 1 column 1 (char 0) |
10
est 2019-01-09 15:18:15 +08:00
|
11
hoythan 2019-01-09 15:19:26 +08:00
你把地址给 urldecode 回去,就可以看到一个 callback 的方法
http://xxgk.miit.gov.cn/gdnps/searchIndex.jsp?params=%7B%22goPage%22%3A4%2C%22orderBy%22%3A%5B%7B%22orderBy%22%3A%22publishTime%22%2C%22reverse%22%3Atrue%7D%2C%7B%22orderBy%22%3A%22orderTime%22%2C%22reverse%22%3Atrue%7D%5D%2C%22pageSize%22%3A10%2C%22queryParam%22%3A%5B%7B%7D%2C%7B%7D%2C%7B%22shortName%22%3A%22fbjg%22%2C%22value%22%3A%22%2F1%2F29%2F1146295%2F1652858%2F1652930%22%7D%5D%7D&callback=jQuery111102456514014162614_1546997791362&_=1546997791366 callback=jQuery111102456514014162614_1546997791362 你可以修改这个值为你自己需要的值。 |
12
wd 2019-01-09 15:20:20 +08:00
连字符串操作都不会还写什么代码。。。。
|
13
hoythan 2019-01-09 15:21:10 +08:00
原理就是 script 的方式加载接口解决跨域的问题,然后 js 内容是执行一个函数,至于函数名称就是 你自己定义的 callback
然后加载好 js 后就会跑你定义的 callback,你就可以拿到数据。 jsonp 知识了解下: https://blog.csdn.net/hansexploration/article/details/80314948 |
14
hoythan 2019-01-09 15:24:50 +08:00
|
15
hoythan 2019-01-09 15:25:54 +08:00
|
16
Sylv 2019-01-09 15:32:21 +08:00 via iPhone
@hoythan 我了解这是为了 jsonp,但是 lz 的需求是用 Python 读取这段 json,最直接的方法就是去掉 callback() 后 json.loads,他肯定不懂且也不需要去了解 jsonp 知识。
|
17
Sylv 2019-01-09 15:35:47 +08:00 via iPhone
@Ewig 如果你不知道如何去掉外围的字符串,我建议你是停下来先认真学习一遍 Python 和编程基础知识。
|
18
vincentxue 2019-01-10 10:04:35 +08:00 via iPhone
@hoythan 受教了,看走眼了。
|