求教如何爬取全网的某类信息数据，包括公众号内的信息

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› virtualenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› Pyflakes

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

This topic created in 561 days ago, the information mentioned may be changed or developed.

如何实现在没有具体网页地址的情况下爬取到我想要的信息，包含公众号

爬取

信息

数据

10 replies • 2024-12-06 12:00:18 +08:00

shadowyue

Dec 6, 2024

那你这个功能相当于特定内容的搜索引擎了

YJi

Dec 6, 2024

我司有接口可以输出

sir283

Dec 6, 2024 via Android

一、掏钱买接口。
二、自己掏钱买设备，然后模拟点击，捕获内容入库保存。
三、逆向、抓包、hook 客户端。

tf2

Dec 6, 2024

加钱买就行。

dispuri

Dec 6, 2024

@YJi 哪个公司呀

YJi

Dec 6, 2024

@dispuri 有数据需求么？

lingxmo

Dec 6, 2024

对接搜索引擎

EatIce

Dec 6, 2024

@YJi 怎么联系

YJi

Dec 6, 2024

@EatIce 我 wx：WUpYXzA5Mjg= （ base64 解一下

XinPingQiHe

Dec 6, 2024

这种情况都是先调用百度搜索（辅以其它类似搜索引擎）+你的关键词，然后分析搜索结果，（注意有多页面用参数翻页）。
解析每条搜索结果，用程序爬取对应网页内容。必要时分析网页上面的外链，搜索更多相关数据。
有那些基础数据之后，在后期，你也可以建立自己的缓存，自己到对应的网站去抓取，抓取中，收录更多的相关网址。。。