{
    "version": "https://jsonfeed.org/version/1.1",
    "title": "SEO私馆",
    "home_page_url": "https://www.seosiguan.com/",
    "feed_url": "https://www.seosiguan.com/post/608.json",
    "language": "zh-Hans",
    "items": [
        {
            "id": "https://www.seosiguan.com/post/608.html",
            "url": "https://www.seosiguan.com/post/608.html",
            "title": "网站禁止垃圾蜘蛛访问抓取教程说明",
            "content_html": "<p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">很多国外商业蜘蛛实际对网站没有什么价值，尤其国内网站的SEO优化运营更是用不上，但是如果不屏蔽就会对网站造成很大的性能宽带消耗，引起SEO优化负面影响，所以可以根据自己的需要去处理。</span></p><p style=\"white-space: normal;\"><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\"><strong>nginx服务端禁止Scrapy等工具的抓取</strong></span></p><p style=\"white-space: normal;\"><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) {</span></p><p style=\"white-space: normal;\"><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">return 403;</span></p><p style=\"white-space: normal;\"><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">}</span></p><p style=\"white-space: normal;\"><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\"><strong>nginx服务端禁止非GET|HEAD|POST方式的抓取</strong></span></p><p style=\"white-space: normal;\"><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">if ($request_method !~ ^(GET|HEAD|POST)$) {</span></p><p style=\"white-space: normal;\"><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">return 403;</span></p><p style=\"white-space: normal;\"><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">}</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\"><strong>nginx服务端禁止指定蜘蛛抓取</strong></span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">if ($http_user_agent ~ &quot;MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$&quot; )</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">{</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">return 444;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">}</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\"><strong>IIS服务端</strong></span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;configuration&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;system.webServer&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;rewrite&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;rules&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;rule name=&quot;Block spider&quot;&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;match url=&quot;(^robots.txt$)&quot; ignoreCase=&quot;false&quot; negate=&quot;true&quot; /&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;conditions&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;add input=&quot;{HTTP_USER_AGENT}&quot; pattern=&quot;MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$&quot;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">ignoreCase=&quot;true&quot; /&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;/conditions&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;action type=&quot;AbortRequest&quot; /&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;/rule&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;/rules&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;/rewrite&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;/system.webServer&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;/configuration&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\"><strong>IIS6请在isapi重写组件中添加规则</strong></span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">#Block spider</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">RewriteCond %{HTTP_USER_AGENT} (MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$) [NC]</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">RewriteRule !(^/robots.txt$) - [F]</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\"><strong>apache服务端</strong></span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;IfModule mod_rewrite.c&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">RewriteEngine On</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">#Block spider</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">RewriteCond %{HTTP_USER_AGENT} &quot;MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$&quot; [NC]</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">RewriteRule !(^robots\\.txt$) - [F]</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">&lt;/IfModule&gt;</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\"><strong>通过robots.txt禁止</strong></span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">对于遵循robots协议的蜘蛛，可以直接在robots禁止。上面常见的无用蜘蛛禁止方法如下，将下面的内容加入到网站根目录下面的robots.txt就可以了。常见的一些屏蔽恶意蜘蛛代码如下，也可以根据自己需要进行增减。</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">User-agent: SemrushBot</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">Disallow: /</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">User-agent: DotBot</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">Disallow: /</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">User-agent: MegaIndex.ru</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">Disallow: /</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">User-agent: MauiBot</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">Disallow: /</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">User-agent: AhrefsBot</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">Disallow: /</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">User-agent: MJ12bot</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">Disallow: /</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">User-agent: BLEXBot</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">Disallow: /</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\"><strong>常见的网络恶意垃圾爬虫蜘蛛</strong></span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">上面说的搜索引擎爬虫能给网站带来流量，也有许多爬虫除了增加服务器负担，对网站没任何好处，应该屏蔽掉。</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\"><strong>1、MJ12Bot</strong></span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">MJ12Bot 是英国著名SEO公司Majestic的网络爬虫，其抓取网页给需要做SEO的人用，不会给网站带来流量。</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\"><strong>2、AhrefsBot</strong></span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">AhrefsBot 是知名SEO公司Ahrefs的网页爬虫。其同样抓取网页给SEO专业人士用，不会给网站带来流量。</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\"><strong>3、SEMrushBot</strong></span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">SEMrushBot 也是SEO、营销公司的网络爬虫。</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\"><strong>4、DotBot</strong></span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">DotBot 是 Moz.com 的网页爬虫，抓取数据用来支持 Moz tools 等工具。</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\"><strong>5、MauiBot</strong></span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">MauiBot 不同于其他爬虫，这个爬虫连网站都没有，UA只显示一个邮箱：”MauiBot (crawler.feedback+wc@gm ail.com)“。神奇的是这个看起来是个人爬虫，竟然遵循robots协议，算得上垃圾爬虫的一股清流。</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\"><strong>6、MegaIndex.ru</strong></span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">这是一个提供反向链接查询的网站的蜘蛛，因此它爬网站主要是分析链接，并没有什么作用。遵循robots协议。</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\"><strong>7、BLEXBot</strong></span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">这个是webmeup下面的蜘蛛，作用是收集网站上面的链接，对我们来说并没有用处。遵循robots协议。</span></p><p><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">结语：以上来源收集汇总，请根据自己实际需要进行准确设置。</span></p><p><a href=\"https://www.seosiguan.com/post/607.html\" target=\"_blank\" title=\"警惕垃圾蜘蛛影响网站SEO收录排名\" style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;; text-decoration: underline;\"><span style=\"font-family: 微软雅黑, &quot;Microsoft YaHei&quot;;\">警惕垃圾蜘蛛影响网站SEO收录排名</span></a><br/></p>",
            "content_text": "很多国外商业蜘蛛实际对网站没有什么价值，尤其国内网站的SEO优化运营更是用不上，但是如果不屏蔽就会对网站造成很大的性能宽带消耗，引起SEO优化负面影响，所以可以根据自己的需要去处理。nginx服务端禁止Scrapy等工具的抓取if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) {return 403;}nginx服务端禁止非GET|HEAD|POST方式的抓取if ($request_method !~ ^(GET|HEAD|POST)$) {return 403;}nginx服务端禁止指定蜘蛛抓取if ($http_user_agent ~ \"MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$\" ){return 444;}IIS服务端<?xml version=\"1.0\" encoding=\"UTF-8\"?><configuration><system.webServer><rewrite><rules><rule name=\"Block spider\"><match url=\"(^robots.txt$)\" ignoreCase=\"false\" negate=\"true\" /><conditions><add input=\"{HTTP_USER_AGENT}\" pattern=\"MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$\"ignoreCase=\"true\" /></conditions><action type=\"AbortRequest\" /></rule></rules></rewrite></system.webServer></configuration>IIS6请在isapi重写组件中添加规则#Block spiderRewriteCond %{HTTP_USER_AGENT} (MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$) [NC]RewriteRule !(^/robots.txt$) - [F]apache服务端<IfModule mod_rewrite.c>RewriteEngine On#Block spiderRewriteCond %{HTTP_USER_AGENT} \"MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$\" [NC]RewriteRule !(^robots\\.txt$) - [F]</IfModule>通过robots.txt禁止对于遵循robots协议的蜘蛛，可以直接在robots禁止。上面常见的无用蜘蛛禁止方法如下，将下面的内容加入到网站根目录下面的robots.txt就可以了。常见的一些屏蔽恶意蜘蛛代码如下，也可以根据自己需要进行增减。User-agent: SemrushBotDisallow: /User-agent: DotBotDisallow: /User-agent: MegaIndex.ruDisallow: /User-agent: MauiBotDisallow: /User-agent: AhrefsBotDisallow: /User-agent: MJ12botDisallow: /User-agent: BLEXBotDisallow: /常见的网络恶意垃圾爬虫蜘蛛上面说的搜索引擎爬虫能给网站带来流量，也有许多爬虫除了增加服务器负担，对网站没任何好处，应该屏蔽掉。1、MJ12BotMJ12Bot 是英国著名SEO公司Majestic的网络爬虫，其抓取网页给需要做SEO的人用，不会给网站带来流量。2、AhrefsBotAhrefsBot 是知名SEO公司Ahrefs的网页爬虫。其同样抓取网页给SEO专业人士用，不会给网站带来流量。3、SEMrushBotSEMrushBot 也是SEO、营销公司的网络爬虫。4、DotBotDotBot 是 Moz.com 的网页爬虫，抓取数据用来支持 Moz tools 等工具。5、MauiBotMauiBot 不同于其他爬虫，这个爬虫连网站都没有，UA只显示一个邮箱：”MauiBot (crawler.feedback+wc@gm ail.com)“。神奇的是这个看起来是个人爬虫，竟然遵循robots协议，算得上垃圾爬虫的一股清流。6、MegaIndex.ru这是一个提供反向链接查询的网站的蜘蛛，因此它爬网站主要是分析链接，并没有什么作用。遵循robots协议。7、BLEXBot这个是webmeup下面的蜘蛛，作用是收集网站上面的链接，对我们来说并没有用处。遵循robots协议。结语：以上来源收集汇总，请根据自己实际需要进行准确设置。警惕垃圾蜘蛛影响网站SEO收录排名",
            "date_published": "2022-03-21T18:39:08+00:00",
            "date_modified": "2022-03-22T15:32:15+00:00",
            "summary": "很多国外商业蜘蛛实际对网站没有什么价值，尤其国内网站的SEO优化运营更是用不上，但是如果不屏蔽就会对网站造成很大的性能宽带消耗，引起SEO优化负面影响，所以可以根据自己的需要去处理。"
        }
    ]
}