Baiduspider is the generic name for Baidu's official web crawler. Baiduspider is the general name for three different types of crawlers: a desktop crawler that simulates a user on a desktop, a mobile crawler that simulates a user on a mobile device, and a mini-program crawler that simulates a user within the Baidu mobile app.
Your website will probably be crawled by both Baiduspider Desktop and Baiduspider Mobile, and your mini-program will be crawled by Baiduspider Mini-Program. You can identify the subtype of Baiduspider by looking at the user agent string in the request. However, all crawler types obey the same product token (user agent token) in robots.txt. That means any directive set for Baiduspider will take effect for all types of Baiduspider, so you cannot selectively target either Baiduspider Smartphone or Baiduspider Desktop using robots.txt.
What are the Baiduspider User-Agents?
Baiduspider has three different user agents, as below:
1、Mozilla/5.0(Linux;u;Android 4.2.2;zh-cn;) AppleWebKit/534.46 (KHTML,like Gecko)Version/5.1 Mobile Safari/10600.6.3 (compatible; Baiduspider/2.0;+http://www.baidu.com/search/spider.html) 2、Mozilla/5.0 (iPhone;CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko)Version/9.0 Mobile/13B143 Safari/601.1 (compatible; Baiduspider-render/2.0;+http://www.baidu.com/search/spider.html)
1、Mozilla/5.0(compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html) 2、Mozilla/5.0(compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)
Mozilla/5.0 (iPhone;CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko)Version/9.0 Mobile/13B143 Safari/601.1 (compatible; Baiduspider-render/2.0;Smartapp; +http://www.baidu.com/search/spider.html)
How to Verify Baiduspider?
Nowadays it's really easy to fake the user agent information of an HTTP request, so how to verify Baiduspider? You can do that in just two steps:
Step 1 - Run a reverse DNS lookup
Run a reverse DNS lookup on the accessing IP address in your server logs, using the host command. Verify that the domain name is either *.baidu.com or *.baidu.jp.
The detailed verification methods on Linux, Windows, and macOS are as follows:
On Linux, you can use the
host command to run a reverse DNS lookup to determine whether the request is from a real Baiduspider.
On Windows, you can use the nslookup command to check the IP address.
On Windows, you can use the
nslookup command to check the IP address.
On macOS, you can use the
dig command to perform DNS lookup.
Step 2 - Then run a forward DNS lookup
Run a forward DNS lookup on the domain name retrieved in Step 1 using the host command on the retrieved domain name. Verify that it's the same as the original accessing IP address from your logs.
Below is an example:
> host 126.96.36.199 188.8.131.52. in-addr.arpa domain name pointer baiduspider-123-206-198-68.crawl.baidu.com. > host baiduspider-123-206-198-68.crawl.baidu.com baiduspider-123-206-198-68.crawl.baidu.com has address 184.108.40.206
How to block Baiduspider?
To block Baiduspider from crawling your website, you can simply add the code below in your robots.txt file:
User-Agent: Baiduspider Disallow: /
Having trouble with or questions about Baiduspider or other Baidu SEO-related issues? Feel free to contact us →