V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
LYRYRYlo
V2EX  ›  问与答

网站搜索链接被爬虫投广告了

  •  
  •   LYRYRYlo · 5 天前 via Android · 1142 次点击

    网站用的是 WordPress ,自带搜索功能 链接是 example.com/?s=“搜索内容”

    有天 Google Search Engine 发邮件说有大量重复链接,看了下是爬虫大量请求 example.com/?s=“广告内容”,导致这些链接被 Google 识别,但 Google 没有编入搜索引擎

    Bing 上的站长工具没有报告问题,将这些广告链接都编入搜索引擎了

    在 Bing 上搜索过这些广告内容,发现好多 WordPress 站点都被这么搞了

    现在的方案是在搜索时增加了验证码,想问问各位 V 友有什么更好的解决方案,毕竟搜索自己的站点关键词搜出来那些全国可飞广告真的不太好...

    7 条回复    2025-04-06 13:31:08 +08:00
    opengps
        1
    opengps  
       5 天前
    可以 robots.txt 指定这个路径不加入搜索引擎的索引
    olaloong
        2
    olaloong  
       5 天前 via Android   ❤️ 1
    robots.txt

    User-agent: *

    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-content/plugins/
    Disallow: /wp-content/themes/
    Disallow: /wp-content/cache/
    Disallow: /wp-json/
    Disallow: /xmlrpc.php

    Disallow: /cgi-bin/
    Disallow: /trackback/
    Disallow: /comments/
    Disallow: /?s=
    Disallow: /author/

    Allow: /wp-content/uploads/
    id7368
        3
    id7368  
       4 天前 via iPhone   ❤️ 2
    User-agent: *
    Allow: /wp-*/uploads/*
    Allow: /wp-*/themes/*
    Allow: /archives/user/1
    Disallow: /trackback
    Disallow: /wp-*
    Disallow: /\?p=*
    Disallow: /?p=*
    Disallow: /?s=*
    Disallow: /*/attachment/*
    Disallow: /archives/user/*
    User-agent: MJ12bot
    Disallow: /
    User-agent: istellabot
    Disallow: /
    User-agent: SemrushBot
    Disallow: /
    User-agent: SemrushBot-SA
    Disallow: /
    User-agent: Dotbot
    Disallow: /
    User-agent: CriteoBot/0.1
    Disallow: /
    User-agent: ClaudeBot
    Disallow: /
    User-agent: AI2Bot
    Disallow: /
    User-agent: Ai2Bot-Dolma
    Disallow: /
    User-agent: Amazonbot
    Disallow: /
    User-agent: anthropic-ai
    Disallow: /
    User-agent: Applebot
    Disallow: /
    User-agent: Applebot-Extended
    Disallow: /
    User-agent: Bytespider
    Disallow: /
    User-agent: CCBot
    Disallow: /
    #User-agent: ChatGPT-User
    #Disallow: /
    User-agent: Claude-Web
    Disallow: /
    User-agent: ClaudeBot
    Disallow: /
    User-agent: cohere-ai
    Disallow: /
    User-agent: Diffbot
    Disallow: /
    User-agent: DuckAssistBot
    Disallow: /
    User-agent: FacebookBot
    Disallow: /
    User-agent: facebookexternalhit
    Disallow: /
    User-agent: FriendlyCrawler
    Disallow: /
    User-agent: Google-Extended
    Disallow: /
    User-agent: GoogleOther
    Disallow: /
    User-agent: GoogleOther-Image
    Disallow: /
    User-agent: GoogleOther-Video
    Disallow: /
    User-agent: GPTBot
    Disallow: /
    User-agent: iaskspider/2.0
    Disallow: /
    User-agent: ICC-Crawler
    Disallow: /
    User-agent: ImagesiftBot
    Disallow: /
    User-agent: img2dataset
    Disallow: /
    User-agent: ISSCyberRiskCrawler
    Disallow: /
    User-agent: Kangaroo Bot
    Disallow: /
    User-agent: Meta-ExternalAgent
    Disallow: /
    User-agent: Meta-ExternalFetcher
    Disallow: /
    #User-agent: OAI-SearchBot
    #Disallow: /
    User-agent: omgili
    Disallow: /
    User-agent: omgilibot
    Disallow: /
    User-agent: PerplexityBot
    Disallow: /
    User-agent: PetalBot
    Disallow: /
    User-agent: Scrapy
    Disallow: /
    User-agent: Sidetrade indexer bot
    Disallow: /
    User-agent: Timpibot
    Disallow: /
    User-agent: VelenPublicWebCrawler
    Disallow: /
    User-agent: Webzio-Extended
    Disallow: /
    User-agent: YouBot
    Disallow: /
    id7368
        4
    id7368  
       4 天前 via iPhone
    如果允许用户注册那一定要把用户目录也屏蔽,这些人还会在用户名和简介了刷广告喂给爬虫
    LYRYRYlo
        5
    LYRYRYlo  
    OP
       4 天前 via Android
    @olaloong
    @id7368
    感谢二位大佬指教
    LYRYRYlo
        6
    LYRYRYlo  
    OP
       4 天前 via Android
    @opengps 感谢
    ysc3839
        7
    ysc3839  
       3 天前 via Android   ❤️ 1
    搜索页面不要把用户输入的内容写到页面内,只显示结果。
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   3106 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 20ms · UTC 12:43 · PVG 20:43 · LAX 05:43 · JFK 08:43
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.