--- title: "网站禁止垃圾蜘蛛访问抓取教程说明" url: "https://www.seosiguan.com/post/608.html" id: "https://www.seosiguan.com/post/608.html" language: "zh-Hans" --- 很多国外商业蜘蛛实际对网站没有什么价值，尤其国内网站的SEO优化运营更是用不上，但是如果不屏蔽就会对网站造成很大的性能宽带消耗，引起SEO优化负面影响，所以可以根据自己的需要去处理。 **nginx服务端禁止Scrapy等工具的抓取** if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) { return 403; } **nginx服务端禁止非GET|HEAD|POST方式的抓取** if ($request_method !~ ^(GET|HEAD|POST)$) { return 403; } **nginx服务端禁止指定蜘蛛抓取** if ($http_user_agent ~ "MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$" ) { return 444; } **IIS服务端** **IIS6请在isapi重写组件中添加规则** #Block spider RewriteCond %{HTTP_USER_AGENT} (MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$) [NC] RewriteRule !(^/robots.txt$) - [F] **apache服务端** RewriteEngine On #Block spider RewriteCond %{HTTP_USER_AGENT} "MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$" [NC] RewriteRule !(^robots\.txt$) - [F] **通过robots.txt禁止** 对于遵循robots协议的蜘蛛，可以直接在robots禁止。上面常见的无用蜘蛛禁止方法如下，将下面的内容加入到网站根目录下面的robots.txt就可以了。常见的一些屏蔽恶意蜘蛛代码如下，也可以根据自己需要进行增减。 User-agent: SemrushBot Disallow: / User-agent: DotBot Disallow: / User-agent: MegaIndex.ru Disallow: / User-agent: MauiBot Disallow: / User-agent: AhrefsBot Disallow: / User-agent: MJ12bot Disallow: / User-agent: BLEXBot Disallow: / **常见的网络恶意垃圾爬虫蜘蛛** 上面说的搜索引擎爬虫能给网站带来流量，也有许多爬虫除了增加服务器负担，对网站没任何好处，应该屏蔽掉。 **1、MJ12Bot** MJ12Bot 是英国著名SEO公司Majestic的网络爬虫，其抓取网页给需要做SEO的人用，不会给网站带来流量。 **2、AhrefsBot** AhrefsBot 是知名SEO公司Ahrefs的网页爬虫。其同样抓取网页给SEO专业人士用，不会给网站带来流量。 **3、SEMrushBot** SEMrushBot 也是SEO、营销公司的网络爬虫。 **4、DotBot** DotBot 是 Moz.com 的网页爬虫，抓取数据用来支持 Moz tools 等工具。 **5、MauiBot** MauiBot 不同于其他爬虫，这个爬虫连网站都没有，UA只显示一个邮箱：”MauiBot (crawler.feedback+wc@gm ail.com)“。神奇的是这个看起来是个人爬虫，竟然遵循robots协议，算得上垃圾爬虫的一股清流。 **6、MegaIndex.ru** 这是一个提供反向链接查询的网站的蜘蛛，因此它爬网站主要是分析链接，并没有什么作用。遵循robots协议。 **7、BLEXBot** 这个是webmeup下面的蜘蛛，作用是收集网站上面的链接，对我们来说并没有用处。遵循robots协议。结语：以上来源收集汇总，请根据自己实际需要进行准确设置。 [警惕垃圾蜘蛛影响网站SEO收录排名](https://www.seosiguan.com/post/607.html)