Your browser is out of date. The site might not be displayed correctly. Please update your browser.

Troubleshooting Content Analyzer

Manual

Is your Content Audit not running properly? In this article, you'll find troubleshooting tips that may help you resolve your issues.

Content Audit set up issues

You might be facing one of the following problems during the configuration of the Content Audit:

  • "we couldn’t audit your domain. No sitemap files can be found at the specified URLs.”  
  • "your sitemap.xml file is invalid.”
  • or a similar note

Follow these troubleshooting steps to fix the most likely problems you could run into during campaign set up:

By default, the Content Audit tries to find your sitemap on any of these eight destinations:

  • https://www.domain/sitemap_index.xml
  • http://www.domain/sitemap_index.xml
  • http://domain/sitemap_index.xml
  • https://www.domain/sitemap_index.xml
  • https://www.domain/sitemap.xml
  • http://www.domain/sitemap.xml
  • http://domain/sitemap.xml
  • https://www.domain/sitemap.xml

If we couldn’t find the sitemap automatically, you can use the “Add sitemap link” button to add the sitemap URL:

Troubleshooting Content Analyzer image 1

There may be cases when you are not aware of the existence of the sitemap, but you have it — we recommend that you check it with your web designer or SEO specialist.

We also take into account your Robots.txt file. This file can both help start an audit and prevent a bot from getting to your website.

A Robots.txt file gives instructions to bots about how to crawl (or not crawl) the pages of a website. To check the Robots.txt file of a website, enter the root domain of your site followed by /robots.txt. For example, the Robots.txt file on example.com is found at http://www.example.com/robots.txt  

You can inspect your Robots.txt to see if there are any disallow commands that would prevent crawlers like ours from accessing your website.

To allow the SemrushBot (SemrushBot-CT; https://www.semrush.com/bot/) to crawl your site, add the following to your robots.txt file:

User-agent: SemrushBot-CT
Disallow:   

(leave a blank space after “Disallow:”)

To help our bot to find the sitemap automatically, you can add the following line anywhere in your robots.txt file to specify the path to your sitemap:

Sitemap: http://domain/sitemap_location.xml

If you see the following code on the main page of a website, it tells us that we’re not allowed to index/follow links on it and our access is blocked.

<meta name="robots" content="noindex, nofollow" >

Additionally, a page containing at least one of the following: "noindex", "nofollow", "none", will lead to a crawling error.

To allow our bot to crawl such a page, remove the “noindex” tag from your page’s code.

Another reason that the audit won’t start may be due to blocking of our bot. To whitelist the bot, you need to contact your webmaster or hosting provider and ask them to whitelist the SemrushBot-CT.

The bot's IP addresses are: 

  • 85.208.98.50
  • 18.197.42.174
  • 35.177.199.105
  • 13.48.30.170

The bot is using a standard 80 HTTP port to connect.

If you use any plugins (Wordpress, for example) or CDNs (content delivery networks) to manage your site, you will have to whitelist the bot IP within those as well.

For whitelisting on Wordpress, contact Wordpress support.

Common CDNs that block our crawler include:

  • Cloudflare — read how to whitelist here.
  • Imperva — read how to whitelist here (add Semrush as a “Good bot”).
  • ModSecurity — read how to whitelist here.
  • Sucuri — read how to whitelist here.

Thus, make sure that the sitemap file is available to be visited by the bot, e.g. there is no block of our requests by user-agent or by IP.

Please note: If you have shared hosting, it is possible that your hosting provider may not allow you to whitelist any bots or edit the Robots.txt file.

  • The sitemap should be correctly formatted in accordance with the sitemap protocol.
  • The sitemap should contain only the URLs of the domain you would like to analyze.

There is a technical limitation allowing for no more than 20k pages analyzed per audit and no more than 100 embedded sitemaps in a sitemap index.

If your sitemap consists of other sitemaps which in turn also contain links to other sitemaps and not the list of URLs, then we will not be able to proceed with the audit in such a case. 

We don’t show the subdomains of a domain, then in case you need to audit your subdomain, it will require you to set up another project to do that.

I don't have a sitemap file yet, what should I do?

If the sitemap is in progress or inaccessible, you can submit a list of URLs for analysis. The file for upload should be a .txt, .xml or .csv, less than 10 MB in size: 

Troubleshooting Content Analyzer image 2

You need to make sure that URLs in the file match the project domain and there is nothing more in the file besides the list of URLs that match the domain name.

Content Analyzer and Google Analytics integration issues

While integrating Google Analytics property to your Content Audit campaign, you may get the following error message:

Troubleshooting Content Analyzer image 3

Applications use access tokens to make API requests on behalf of a user. It could be that your access token has expired, and our tool cannot access your account data. It can happen if, for example, your Google account password has changed or something has gone wrong during the connection set up. To resolve this issue, please try to revoke access and re-connect your accounts.

Another reason for this warning - the view you’ve selected does not return any data for URLs specified in your audit. Semrush pulls Google Analytics data from the Landing page report under the Site Content tab. In order to check this report, you should navigate to Behaviour→ Site Content → Landing Pages report:

Troubleshooting Content Analyzer image 4

If pages in this report do not match the scope of your audit (or if there are no URLs in the report), you will see the warning message, and no data will be pulled. To fix the issue, please make sure to choose the property that contains audited pages and that URLs in the GA report are formatted correctly (they shouldn’t include any level domain before the path).

Please note that currently you can connect Google Analytics 4 (GA4) only to the SEO Dashboard in Semrush. The rest of the tools that support GA integration can be paired only with the Universal Analytics property.

Additional Troubleshooting Tips

Subfolders to pull the URLs from are picked up from the sitemaps by default. To add more pages or parts of the domain to Content Analyzer you can:

  • Restart the campaign and select the corresponding subfolder;
  • Upload a file to include all the necessary URLs (up to 20k);
  • If the total number of pages you wish to analyze is over 20k, create an additional project to cover the extra pages.
You can update the metrics and results of your audit by clicking the refresh / last update button. “Content update on” refers to the publication date, not the last update. The "last update" refers to the date of the last update of metrics.
A homepage cannot be monitored in Post Tracking. That's one global limitation, however, the tool was intended for monitoring particular articles and posts, so we hope it will not give you any inconvenience. 
If the content on your pages is presented via JavaScript and not raw HTML we won’t be able to analyse it. This is because our bot cannot parse JavaScript content.

Contact Semrush Support

If you still are having issues running your Content Audit, contact our Support team, we are happy to help!

Manual
Show more