Google and WordPress Robots.txt Handling Is Being Looked Into
One of the takeaways from the Google Webmaster Conference was that if Google tries to access your robots.txt file is unreachable but it does exist then Google won’t crawl your site. Google said about 26% of the time GoogleBot cannot reach a robots.txt file. WordPress might make changes in order to reduce this error rate.
Here is one of many tweets about this:
Seriously? One out of four times googlebot cannot reach a site’s robots.txt? ???? then they won’t crawl the entire site!! #gwcps pic.twitter.com/wC49yC40zI
— Raffaele Asquer (@raffasquer) November 4, 2019
Now, with WordPress, Joost de Valk from Yoast said ” for sites you can’t reach the robots.txt for, is a subset of those WordPress sites? A larger subset than you’d normally expect maybe?” He added that he is “trying to figure out if we should be safer in how WordPress generates robots.txt files.”
Gary Illyes from Google said he believes WordPress is generally okay with this issue but he will look into it further to see if WordPress can make some small changes here.
WP is usually fine i think as it doesn’t control network afaik, and someone must’ve misconfigured something real bad if the robotstxt comes back with 5xx. That said, I’ll run an analysis and then i can say for sure
— Gary “鯨理/경리” Illyes (@methode) November 6, 2019
Got it. I’ll look
— Gary “鯨理/경리” Illyes (@methode) November 6, 2019
I love this dialog between Google and Yoast (which is very tied to WordPress).
Forum discussion at Twitter.
Update: I upset Gary again, and for the record, the latest intel was the percentage of robots.txt Google cannot reach.
Yeah I’ve known for a long time. The stats were interesting though, never heard the numbers were that high.
— Joost de Valk (@jdevalk) November 6, 2019