Is your Content Audit not running properly? In this article, you'll find troubleshooting tips that may help you resolve your issues.
Content Audit set up issues
You might be facing one of the following problems during the configuration of the Content Audit:
- "we couldn’t audit your domain. No sitemap files can be found at the specified URLs.”
- "your sitemap.xml file is invalid.”
- or a similar note
Follow these troubleshooting steps to fix the most likely problems you could run into during campaign set up:
By default, the Content Audit tries to find your sitemap on any of these eight destinations:
- https://www.domain/sitemap_index.xml
- http://www.domain/sitemap_index.xml
- http://domain/sitemap_index.xml
- https://www.domain/sitemap_index.xml
- https://www.domain/sitemap.xml
- http://www.domain/sitemap.xml
- http://domain/sitemap.xml
- https://www.domain/sitemap.xml
If we couldn’t find the sitemap automatically, you can use the “Add sitemap link” button to add the sitemap URL:
There may be cases when you are not aware of the existence of the sitemap, but you have it — we recommend that you check it with your web designer or SEO specialist.
We also take into account your Robots.txt file. This file can both help start an audit and prevent a bot from getting to your website.
A Robots.txt file gives instructions to bots about how to crawl (or not crawl) the pages of a website. To check the Robots.txt file of a website, enter the root domain of your site followed by /robots.txt. For example, the Robots.txt file on example.com is found at http://www.example.com/robots.txt
You can inspect your Robots.txt to see if there are any disallow commands that would prevent crawlers like ours from accessing your website.
To allow the SemrushBot (SemrushBot-CT; https://www.semrush.com/bot/) to crawl your site, add the following to your robots.txt file:
User-agent: SemrushBot-CT
Disallow:
(leave a blank space after “Disallow:”)
To help our bot to find the sitemap automatically, you can add the following line anywhere in your robots.txt file to specify the path to your sitemap:
Sitemap: http://domain/sitemap_location.xml
If you see the following code on the main page of a website, it tells us that we’re not allowed to index/follow links on it and our access is blocked.
<meta name="robots" content="noindex, nofollow" >
Additionally, a page containing at least one of the following: "noindex", "nofollow", "none", will lead to a crawling error.
To allow our bot to crawl such a page, remove the “noindex” tag from your page’s code.
Another reason that the audit won’t start may be due to blocking of our bot. To whitelist the bot, you need to contact your webmaster or hosting provider and ask them to whitelist the SemrushBot-CT.
The bot's IP addresses are:
- 85.208.98.50
- 18.197.42.174
- 35.177.199.105
- 13.48.30.170
The bot is using a standard 80 HTTP port to connect.
If you use any plugins (Wordpress, for example) or CDNs (content delivery networks) to manage your site, you will have to whitelist the bot IP within those as well.
For whitelisting on Wordpress, contact Wordpress support.
Common CDNs that block our crawler include:
- Cloudflare — read how to whitelist here.
- Imperva — read how to whitelist here (add Semrush as a “Good bot”).
- ModSecurity — read how to whitelist here.
- Sucuri — read how to whitelist here.
Thus, make sure that the sitemap file is available to be visited by the bot, e.g. there is no block of our requests by user-agent or by IP.
Please note: If you have shared hosting, it is possible that your hosting provider may not allow you to whitelist any bots or edit the Robots.txt file.
- The sitemap should be correctly formatted in accordance with the sitemap protocol.
- The sitemap should contain only the URLs of the domain you would like to analyze.
There is a technical limitation allowing for no more than 20k pages analyzed per audit and no more than 100 embedded sitemaps in a sitemap index.
If your sitemap consists of other sitemaps which in turn also contain links to other sitemaps and not the list of URLs, then we will not be able to proceed with the audit in such a case.
We don’t show the subdomains of a domain, then in case you need to audit your subdomain, it will require you to set up another project to do that.
I don't have a sitemap file yet, what should I do?
If the sitemap is in progress or inaccessible, you can submit a list of URLs for analysis. The file for upload should be a .txt, .xml or .csv, less than 10 MB in size:
You need to make sure that URLs in the file match the project domain and there is nothing more in the file besides the list of URLs that match the domain name.
Content Analyzer and Google Analytics integration issues
While integrating Google Analytics property to your Content Audit campaign, you may get the following error message:
Applications use access tokens to make API requests on behalf of a user. It could be that your access token has expired, and our tool cannot access your account data. It can happen if, for example, your Google account password has changed or something has gone wrong during the connection set up. To resolve this issue, please try to revoke access and re-connect your accounts.
Another reason for this warning - the view you’ve selected does not return any data for URLs specified in your audit. Semrush pulls Google Analytics data from the Landing page report under the Site Content tab. In order to check this report, you should navigate to Behaviour→ Site Content → Landing Pages report:
If pages in this report do not match the scope of your audit (or if there are no URLs in the report), you will see the warning message, and no data will be pulled. To fix the issue, please make sure to choose the property that contains audited pages and that URLs in the GA report are formatted correctly (they shouldn’t include any level domain before the path).
Please note that currently you can connect Google Analytics 4 (GA4) only to the SEO Dashboard in Semrush. The rest of the tools that support GA integration can be paired only with the Universal Analytics property.
Additional Troubleshooting Tips
Subfolders to pull the URLs from are picked up from the sitemaps by default. To add more pages or parts of the domain to Content Analyzer you can:
- Restart the campaign and select the corresponding subfolder;
- Upload a file to include all the necessary URLs (up to 20k);
- If the total number of pages you wish to analyze is over 20k, create an additional project to cover the extra pages.
Contact Semrush Support
If you still are having issues running your Content Audit, contact our Support team, we are happy to help!