用python从A站抓取一个<div>放到自己的网页上,发现<div>变成<div,而且网页显示的<div>源代码。google说是unicode转译,求教怎么在网页中正常显示抓取的<div>。
1
ciba1990 OP 新手求教。。。。。在线等ing
|
2
wkdhf233 2015-07-10 00:53:28 +08:00
完全没看明白你在说啥
|
3
imlonghao 2015-07-10 00:55:06 +08:00 via Android
No code no bb...
|
4
ciba1990 OP @wkdhf233 就是我在a站抓取了一段<div>代码放到自己网页,<>在我网页源代码现实成<,而且不能正常显示网页。
|
5
Septembers 2015-07-10 00:57:12 +08:00 1
|
6
ciba1990 OP @imlonghao
<html> <head> </head> <body> <div class="searchResults" id="searchResults"> <h2>Web results</h2> <ul> <li> <h3><a href="https://www.python.org/" target="_blank">Welcome to Python.org</a></h3> <p class="url">https://www.python.org/<span class="date"> - 7 hours ago</span></p> <p>The official home of the Python Programming Language.</p> </li><li class="sameHostResult"> <h3><a href="https://www.python.org/downloads/" target="_blank">Download Python | Python.org</a></h3> <p class="url">https://www.python.org/downloads/</p> <p>... 2015-05-23 Download Release Notes <br> · Python 3.4.3 2015-02-25 Download ...</br></p> </li><li> <h3><a href="http://www.pyhton.org/" target="_blank">Wrong Page ?</a></h3> <p class="url">http://www.pyhton.org/</p> <p>If you were trying to reach Phyton website please copy and past the following <br> URL in your browser: http://www.phyton.org. YOU MAY HAVE GOTTEN HERE BY<br> ...</br></br></p> </li><li> <h3><a href="http://www.salome-platform.org/forum/forum_10/211874468" target="_blank">Creating geometry using <b>pyhton</b> code — SALOME Platform</a></h3> <p class="url">http://www.salome-platform.org/forum/forum_10/211874468</p> <p>Hello everyone!,. I'm almost new in salome; I build up a simple geometry (n <br> nodes and n-1 beams) using the salome gui. It took me a long time; then I <br> discovered ...</br></br></p> </li><li> <h3><a href="http://developers.gigya.com/display/GD/Pyhton+SDK+Change+Log" target="_blank"><b>Pyhton</b> SDK Change Log - Gigya Documentation - Developers Guide</a></h3> <p class="url">http://developers.gigya.com/display/GD/Pyhton+SDK+Change+Log</p> <p>Jun 10, 2015 <b>...</b> Version 2.17 - 26 Apr 2015. Bug fix regarding URL encoding. The Python SDK <br> now restores urllib handlers after completing requests to Gigya.</br></p> </li><li> <h3><a href="" target="_blank"><b>Pyhton</b> - You A Me LifeIine Full Promo Dancehall 2015 - YouTube</a></h3> <p class="url"></p> <p>Feb 16, 2015 <b>...</b> <b>Pyhton</b> - You A Me LifeIine ○Full Promo○ Dancehall 2015. IamDjChigga ... Up <br> Hot DJ Chigga <b>Pyhton</b> A Good Artists the Thing Loud...$$$$$.</br></p> </li><li class="sameHostResult"> <h3><a href="" target="_blank"><b>Pyhton</b> - Mommy Nah Worry No More Full Promo Dancehall 2015 <b>...</b></a></h3> <p class="url"></p> <p>Mar 20, 2015 <b>...</b> <b>Pyhton</b> - Mommy Nah Worry No More ○Full Promo○ Dancehall 2015. <br> IamDjChigga. SubscribeSubscribedUnsubscribe ...</br></p> </li><li> <h3><a href="https://www.thenewboston.com/forum/topic.php?id=6569" target="_blank"><b>Pyhton</b> GUI´s - thenewboston Forum</a></h3> <p class="url">https://www.thenewboston.com/forum/topic.php?id=6569</p> <p>May 2, 2015 <b>...</b> Can anyone recommend a good book( i.e. as in paper) to use as a reference <br> work with Python GUis. There are lots of excellent videos etc on ...</br></p> </li><li> <h3><a href="http://www.gamefaqs.com/psp/932978-metal-gear-solid-portable-ops/answers/189967-how-do-i-beat-pyhton" target="_blank">How do I beat <b>pyhton</b>? - Metal Gear Solid: Portable Ops Answers for <b>...</b></a></h3> <p class="url" title="http://www.gamefaqs.com/psp/932978-metal-gear-solid-portable-ops/answers/189967-how-do-i-beat-pyhton">http://www.gamefaqs.com/psp/932978-metal-gear-solid-portable-ops/answe...</p> <p>For Metal Gear Solid: Portable Ops on the PSP, a GameFAQs Answers question <br> titled "How do I beat <b>pyhton</b>?".</br></p> </li><li> <h3><a href="https://bugs.launchpad.net/bugs/1415067" target="_blank">Bug #1415067 “QtiPlot crashed when chossing <b>Pyhton</b> as default sc <b>...</b></a></h3> <p class="url">https://bugs.launchpad.net/bugs/1415067</p> <p>Jan 27, 2015 <b>...</b> I installed qtiplot and worked on it for a while. Changing the Default scripting <br> language to <b>Pyhton</b> in Preferences, I end with this problem.</br></p> </li> </ul> </div> </body> </html> |
7
imlonghao 2015-07-10 00:58:00 +08:00 via Android
爬虫代码
|
8
wkdhf233 2015-07-10 01:01:35 +08:00
@ciba1990 它转义了你给替换回来呗,连正则都不用。。
话说第一次见到采集连着html标签一起采的,你拿正则把关键内容切出来然后标签自己输出不就啥事没有了 |
9
ciba1990 OP @wkdhf233 正则怎么用,
html=urllib2.urlopen(url).read() soup = BeautifulSoup(html) link = soup.find_all('div') mydiv=str(link[0]) 这是我爬虫代码,新手上路。 |
10
ciba1990 OP @imlonghao
html=urllib2.urlopen(url).read() soup = BeautifulSoup(html) link = soup.find_all('div') mydiv=str(link[0]) |
11
imlonghao 2015-07-10 01:10:07 +08:00 via Android
import HTMLParser
html_parser = HTMLParser.HTMLParser() s = html_parser.unescape(s) |
12
imlonghao 2015-07-10 01:10:35 +08:00 via Android
把mydiv带进去s的地方
|
13
ciba1990 OP |
14
icedx 2015-07-10 01:18:30 +08:00 via Android
模板被转义了吧
|
16
lcqtdwj 2015-07-10 01:26:08 +08:00 1
{% autoescape off %}
{{ keyword}} {% endautoescape %} 查查文档,就是不要自动转义 |
18
sallowdish 2015-07-10 02:51:25 +08:00
要顯示code就放到<pre></pre>裏面,要顯示内容就turn off html escape
|
19
imlonghao 2015-07-10 06:52:07 +08:00 via Android
Django取消模板转义
|
20
loading 2015-07-10 08:01:44 +08:00 via Android
flask有自动转,是安全考虑。
楼主但是说说你用了什么库! 基本代码都不贴,没人需要你的代码的,都想帮你。 开源的爬虫代码有很多的。 |
21
thinkmore 2015-07-10 09:52:33 +08:00
将抓取到的内容进行转义就行了,前后台均可
|