Stated Copyrights on WordPress
That means they should have provided an AI blocking system for its writers. They didn’t. I can try this below.
If your site has already been crawled (It has already been visited by chatgbt) then it’s likely already included in multiple datasets.
Nevertheless, by blocking Common Crawl it’s possible to opt out your website content from being included in new datasets sourced from newer Common Crawl datasets.
This is what I meant at the very beginning of the article when I wrote that the process is “neither straightforward nor guaranteed to work.”
The CCBot User-Agent string is:
Add the following to your robots.txt file to block the Common Crawl bot:
An additional way to confirm if a CCBot user agent is legit is that it crawls from Amazon AWS IP addresses.