Log In | Subscribe | | |

Why is BingBot so stupid?

Publication: 
Editorial Staff
chiefofficersnet

One has to wonder whether being listed on Microsoft's Bing search engine is worth the trouble. It's stupid and it slows down websites generating log entries that look suspiciously similar to those created by hackers and spambots. It's time to decide whether to simply block BingBot from all access to sites and accept that means an absence of web presence (except there are a couple of tricks that can keep web presence while keeping BingBot away from active sites).

BingBot operates from several server groups and several geographical locations. That means that its persistent visits to sites cannot be blocked in a single step. Worse, on our own sites, we have found that BingBot persists in visiting directories that we have told it, via robots.txt, not to visit.

Our lots show, daily, vast numbers of Page Not Found responses and, in many cases, those visits are BingBot.

The problem is not that BingBot visits, it's that it doesn't learn what not to visit. So it persists in trying to find pages that have not existed for several years, in directories that have not existed for almost as long. Someone didn't think it was a good idea to program BingBot to apply a "three strikes and your out" rule to links under which, if a 404 response is received, deletes the entry from BingBot's routine and registers it as an address to never be visited again, even if it pops up on a link on another site.

So, as we have established that BingBot is a BadBot when it comes to following instructions in robots.txt, the next question is whether we should blacklist BingBot altogether. It's very tempting but Microsoft makes it hard "we do not publish a list of IP addresses or ranges from which we crawl the Internet."

Beware: Don’t Use Hardcoded IP Addresses or Address Ranges

So, by using the reverse/forward DNS lookup method you can easily verify that an IP address is coming from Bing. It is important to note that like other search engines, Bing does not publish a list of IP addresses or ranges from which we crawl the Internet. The reason is simple: the IP addresses or ranges we use can change any time, so responding to requests differently based on a hardcoded list is not a recommended approach and may cause problems down the line.
- https://blogs.bing.com/webmast...

That's a line that must not only be gone down but which is increasingly red and increasingly attractive to cross.

Our logs reveal that a tiny proportion of our traffic comes from Bing. Commercially, then, it is not a difficult decision to lock it out before it even starts to look around our site and, because our logs, daily, give us new IP addresses, we can choose to ban specific IP addresses or even ranges. That's slightly time consuming but it's highly effective and technically very simple to do.

If we do, in fact, ban Bing from our sites, it does not mean that we disappear from Bing: a small, independent, page with a handful of simple, linked, html pages describing our services and with links to the full site will be accessible to Bing but if it tries to follow the links, it will be prevented from completing that process. We can try to use the "nofollow" request, but that it not guaranteed to work.

The result is that we can maintain a presence of sorts, we can reduce the ridiculous load Bing places on our servers and, equally importantly, we won't have to plough through vast amounts of log data to find useful data.

We could say that it's a no brainer - but the no brain is Bing and its flawed design.

---------------- Advertising ----------------

World Nomads
Travel Insurance

Tags: