Hello! Today we at GrowEasy are going to discuss how to block Googlebot from crawling certain parts of a page and how to prevent Googlebot from accessing a site at all.
Block Googlebot from certain sections of a webpage
Some say it's impossible to stop Googlebot from crawling specific sections of a webpage, such as the "also bought" areas of product pages.
The short version is that you can't block crawling a specific section of an HTML page.
Later, I'll suggest two potential strategies for dealing with the problem, emphasizing that neither is a perfect solution.
The first solution is to use the HTML data-nosnippet attribute to prevent text from appearing in the search sample.
Alternatively, you can use iframes or JavaScript with the source blocked by robots.txt, although I caution that this is also not a good idea.
Access the Brave index using the Brave Search API
Power your search and AI applications with the fastest growing independent search engine since Bing. Access an index of billions of pages with a single API call.
Using a robotic iframe or JavaScript file can cause crawling and indexing issues that are difficult to diagnose and resolve.
If the content in question is reused across multiple pages, this is not a problem that needs to be fixed.
You don't need to block Googlebot from seeing this kind of duplication.
Block Googlebot from accessing a website
In response to a question about preventing Googlebot from accessing any part of a website, we provide an easy to implement solution.
The simplest way is robots.txt: if you add disallow: / to Googlebot's user agent, Googlebot will leave your site alone as long as you keep that rule there.
For those looking for a more reliable solution, we suggest another method:
If you want to block even network access, you'll need to create firewall rules that load our IP ranges into a deny rule.
See Google's official documentation for a list of Googlebot IP addresses.
In summary
While it is impossible to prevent Googlebot from accessing certain sections of an HTML page, methods such as using the data-nosnippet attribute can provide control.
When you're considering completely blocking Googlebot access to your site, a simple disallow rule in your robots.txt file will do the trick. However, more extreme measures such as creating specific firewall rules are also possible.