How to do a Technical SEO Audit

Technical SEO is the foundation your website is built on, ensuring that search engines can find, crawl, and understand your content. Use the below technical SEO checklist to audit essential areas including crawlability, indexing, page speed, and site security.

Don’t forget that a comprehensive technical SEO audit begins with understanding the site’s context. Tailor the scope of your audit to the website in question. For example, an hreflang audit should be included as part of a technical audit for an international site with different language versions, but not for a single-language website.

Also, note that not all of your findings will be equally problematic. A high number of 404s after intentionally removing outdated content is to be expected, but an unexplainable increase in 404s should be investigated further.

XML Sitemap

XML sitemap is a file that helps search engines to understand your website structure and crawl it. It includes a list of all pages on your site, the pages prioritization, when they were last modified, and how frequently they are updated. Usually, the pages will be categorized by topic, post, product, etc.

You’ll probably check the sitemap at the beginning of a technical audit. Find the sitemap of any page by typing /sitemapl.xml after the URL, for example, https://datachai.com/sitemap.xml. If the site has multiple sitemaps, use /sitemap_index.xml

Register your sitemap with Google Search Console, which includes several tools to check technical SEO metrics such as mobile responsiveness and page speed. The XML Sitemap Report will give you the technical insight to achieve 1:1 ratio of URLs added to the site and updating the sitemap.

If your website follows technical SEO best practices and has a strong internal linking structure that allows search engines to discover all important pages, an XML sitemap may not be strictly necessary. However, maintaining a sitemap is still considered best practice, particularly for large websites.

A well-maintained sitemap should include all indexable URLs that return a 200 OK status code and accurately reflect the site’s structure. Ideally, there should be a 1:1 relationship between the canonical URLs you want search engines to index and the URLs listed in the sitemap. URLs returning 4xx or 5xx errors, unnecessary parameterized URLs, redirected pages, and other non-indexable content should be excluded. Orphaned pages should also be reviewed and either integrated into the site’s internal linking structure or removed from the sitemap if they are not intended to be indexed.

Server Response Code and Redirects

Bulk check source codes with this Google Apps Script:

// Get http server response status code
function getStatusCode(url){
var options = {
'muteHttpExceptions': true,
'followRedirects': false
};
var statusCode ;
try {
statusCode = UrlFetchApp .fetch(url) .getResponseCode() .toString() ;
}

catch( error ) {
statusCode = error .toString() .match( / returned code (\d\d\d)\./ )[1] ;
}

finally {
return statusCode ;
}
}

// Exceed importxml limit
function importRegex(url, regexInput) {
var output = '';
var fetchedUrl = UrlFetchApp.fetch(url, {muteHttpExceptions: true});
if (fetchedUrl) {
var html = fetchedUrl.getContentText();
if (html.length && regexInput.length) {
output = html.match(new RegExp(regexInput, 'i'))[1];
}
}
Utilities.sleep(1000);
return unescapeHTML(output);
}

Then, use RegEx redirects if you need to bulk redirect multiple source URLs to the same destination, or an .htaccess file for smaller-scale redirects on Apache servers. If your site is hosted on WordPress, consider using a dedicated redirect management plugin or your SEO plugin’s redirect feature. Finally, if you are permanently removing a page, serve a 410 status code to indicate it is gone; this helps Google remove it from the index more quickly. Optionally, remove internal links to the page to avoid orphaned references.

Canonicals

Even if you don’t have multiple parameter-based URLs of each page, different versions of your pages using https, http, www. .html, etc. can quickly add up. That’s where the rel=canonical tag comes in, allowing you to manage duplicate content by specifying the canonical or preferred version of your page. This functions to report duplicate content and tell Google to consolidate the ranking signals, so your page won’t be disadvantaged.

If you are using a CMS like Wix or Squarespace, your web hosting service might automatically add canonical tags with the clean URL.

robots.txt

The robots.txt file, also called the robots exclusion protocol or standard, is a text file that tells search engines which pages to crawl or not crawl. You can see the robots.txt file for any website by adding /robots.txt to the end of the domain. For example, https://www.datachai.com/robots.txt

Search engines check the robots.txt file before crawling a site. Pages disallowed in robots.txt will not be crawled, but they can still be indexed if other pages link to them. To fully prevent a page from appearing in search results, use a noindex meta tag or HTTP header instead of relying solely on robots.txt.

Crawl Budget

Crawl budget refers to the amount of crawling resources Google allocates to a website within a given timeframe. For most small and medium-sized websites, crawl budget is not a concern. However, on very large websites, inefficient crawling can delay the discovery and recrawling of important pages.

Performing regular log file analysis can provide insights about how Googlebot (and other web crawlers and users) are crawling your website, giving you the necessary information to optimize the crawl budget.

JavaScript SEO

JavaScript is great for creating animations, interactive forms, and content elements that respond to user actions, making websites more dynamic, engaging and user-friendly. However, websites that have been developed with JavaScript frameworks such as React, Angular, or Vue.js face unique SEO challenges. These days, almost all websites use JavaScript in some form, making JavaScript SEO an essential component of technical SEO.

What is JavaScript SEO?

JavaScript SEO focuses on ensuring that websites built with JavaScript are easily crawled, understood, and indexed by search engines. JavaScript itself does not inherently hurt SEO. In fact, it’s often used to make websites more user-friendly which is a good thing for SEO. The problem arises with client-side rendering, where browsers use JavaScript to dynamically load content, enabling rich user-interactivity but potentially slowing down initial load times and negatively affecting SEO.

How to Implement SEO-Friendly JavaScript

In 2018, Google announced dynamic rendering, a technique where you switch between client-side rendered content and pre-rendered content for certain user agents, allowing you to deliver the full client-side rendered experience to users while getting as much content as possible to crawlers like Googlebot.

However, Google has since updated their documentation to clarify that dynamic rendering is a workaround and not a long-term solution for problems with JavaScript-generated content. Google recommends using server-side rendering, static rendering, or hydration instead.

Core Web Vitals

Core Web Vitals are a set of metrics developed by Google to measure key aspects of real-world user experience on a webpage. They focus on loading performance, interactivity, and visual stability.

Core Web Vitals are part of Google’s Page Experience signals and can help identify opportunities to improve usability and site performance. While page experience is considered in Google’s ranking systems, content relevance and quality remain much stronger ranking factors.

There are 3 core web vitals:

  • Largest Contentful Paint (LCP)
  • Interaction to Next Paint (INP)
  • Cumulative Layout Shift (CLS)

The goal of Core Web Vitals is to provide standardized metrics that help site owners improve the experience users have when interacting with a website.

Largest Contentful Paint (LCP)

Largest Contentful Paint (LCP) measures the time it takes for the largest visible content element within the viewport to render. This is typically a large image, video poster image, or block of text and serves as an indicator of perceived page load speed.

Of course, lower (faster) scores are better. In general, LCP <2.5s is considered to be good, and >4s should be improved. Note that if the largest text block or image element changes while the page is loading, then the most recent one is used to measure LCP.

LCP is one of the more difficult core web vitals to troubleshoot because there are many factors that could cause slow load speed. Common causes of poor LCP include slow server response times, render-blocking CSS or JavaScript, unoptimized images, and excessive client-side rendering.

Interaction to Next Paint (INP)

Interaction to Next Paint (INP) measures a page’s overall responsiveness to user interactions. It evaluates how quickly a page responds to actions such as clicks, taps, and keyboard inputs throughout a user’s visit.

Unlike the retired First Input Delay (FID) metric, which measured only the first interaction, INP considers interactions across the entire page lifecycle and provides a more complete picture of responsiveness. In general, INP <200ms is considered good, while >500ms should be improved.

High INP scores are often caused by excessive JavaScript execution, long-running tasks on the main thread, large third-party scripts, and complex client-side rendering.

Cumulative Layout Shift (CLS)

Cumulative Layout Shift (CLS) measures the visual stability of a webpage by tracking unexpected layout shifts that occur while the page is loading or being used.

Unexpected shifts can occur when images, advertisements, embeds, or dynamically injected content cause existing page elements to move.

<0.1 is considered good, and >.25 is generally considered a poor score. Common causes of poor CLS include images or videos without specified dimensions, dynamically inserted content, web fonts that cause reflow, and advertisements that resize after loading.

How to Measure Core Web Vitals

Core web vitals are incorporated into many Google tools that you probably already use, such as Search Console, Lighthouse, and PageSpeed Insights. In addition, a Chrome extension called Web Vitals is now available to measure the core web vitals in real time.

Because user experiences vary based on device capabilities, network conditions, and geographic location, performance metrics may differ significantly between users.

Google evaluates Core Web Vitals using the 75th percentile of real user experiences collected through the Chrome User Experience Report (CrUX). This and other concepts are discussed in an episode of Search Off the Record.

On the topic of page speed, it’s also best practice to adopt a fast DNS provider, minimize HTTP requests by reducing CSS, scripts, and plugins, and compress pages by optimizing images and cleaning critical code, especially in the first view.

Note that the core web vitals explained above are included in Google’s page experience signal. Of course, core web vitals are not the only user experience metrics to focus on. All other web vitals such as total blocking time (TBT), first contentful paint (FCP), speed index (SI), and time to interactive (TTI) are non-core web vitals. As Google continuously improves its understanding of user experience, it will update the web vitals regularly.

Google primarily uses mobile-first indexing, meaning the mobile version of a page is generally used for indexing and ranking purposes. As a result, mobile usability and performance remain important considerations during a technical SEO audit. Responsive design, fast loading times, stable layouts, and accessible navigation all contribute to a better user experience across devices. AMP (Accelerated Mobile Pages) is no longer required for Top Stories eligibility and is generally unnecessary for most websites. Focus on overall performance and user experience instead.

Website Security

Securing your website is first and foremost about protecting sensitive data and preventing cyberattacks, but did you know that it’s also an important factor in SEO strategy? Search engines like Google prioritize user experience, and site security is one of the key elements of a positive user experience.

SSL/TLS and HTTPS

SSL/TLS is the encryption technology behind HTTPS, which creates a secure connection between a web server and a browser. It’s easy to identify a secure website because the URL begins with https:// rather than http://.

In 2014, Google announced its “https everywhere” initiative and confirmed that HTTPS would be used as a ranking signal. While HTTPS is considered a relatively lightweight ranking factor, it has become a standard expectation for both users and search engines.

Google Chrome now displays warnings when users visit non-secure websites. These days, most website builders such as Wix include HTTPS by default. If not, you should install and maintain a valid SSL/TLS certificate on your website.

In November 2025, the Google security team announced that Chrome will make https the default by October 2026, meaning users will have to give permission before any non-secure site can load.

Web Application Firewall (WAF)

A Web Application Firewall (WAF) is a security solution that acts as a barrier between your website and malicious traffic. WAFs work by analyzing web traffic, identifying potential threats, and blocking potentially harmful requests that could exploit vulnerabilities in your website’s code or server configuration, before they reach the web server. Most modern WAF solutions like Cloudflare are available as cloud-based services, and can be easily integrated with your website. There are several ways that WAFs can enhance your website security, from an SEO perspective.

Protect Against Cyber Attacks and Threats

WAFs are designed to protect your website from a variety of cyber threats such as SQL injection, DDoS (distributed denial of service), cross-site scripting (XSS), and other types of Layer 7 attacks. Websites that are regularly attacked are at risk of slow load times and downtime. Websites experiencing frequent downtime, slow response times, or degraded user experience due to attacks may see negative SEO impacts because search engines and users cannot reliably access the content. WAFs protect your website and ensure that it remains fast and responsive, thereby improving its user experience and SEO ranking.

Although web crawlers from search engines are essential for indexing, not all bots are friendly. Malicious bots can scrape your content, flood your site with fake traffic, or try to access sensitive data. This wastes server resources and, more importantly, could lead to security vulnerabilities. With a WAF, you can set up specific rules to block or challenge suspicious bots, ensuring that only legitimate users and crawlers can access your site, thus protecting both your website’s integrity and SEO rankings.

Enhance Page Speed and Mobile Optimization

A lesser-known benefit of using a WAF is that it can also boost site performance. Some WAFs include features like rate limiting, bot filtering, and traffic caching, all of which can reduce the load on your server and speed up your website. Since page speed is a ranking factor for Google, having a WAF that optimizes traffic and blocks unnecessary requests can lead to better performance and improved user experience.

Trust Signals

If Google detects malware or other threats on your website, it may flag it as unsafe, leading to a drop in SEO. WAFs can prevent this from happening by detecting and blocking malicious traffic before it reaches your site.

In certain industries, data protection regulations such as GDPR require websites to maintain a high level of security. By using a WAF, you demonstrate your commitment to maintaining a safe online environment, to both users and search engines. Websites with strong security measures are more likely to be trusted by users, leading to higher engagement rates, longer sessions, and ultimately, better SEO performance.

Schema

Schema, also called structured data markup, enhances search results through the addition of rich snippets. This allows you to display details like star ratings, product prices, or event dates directly in the SERP. Adding schema by itself is not a technical SEO factor, but it is recommended by Google and can indirectly help improve rankings and increase page views.

Faq schema example below:

{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [{
"@type": "Question",
"name": "What kinds of companies have you worked with in the past?",
"acceptedAnswer": {
"@type": "Answer",
"text": "I have worked with companies across B2B technology, marketing and advertising, healthcare, lifestyle, and other industries, including both international and local businesses."
}
},{
"@type": "Question",
"name": "What size websites have you worked on for SEO?",
"acceptedAnswer": {
"@type": "Answer",
"text": "I’ve worked on global multilingual websites with over 3 million monthly visitors, as well as smaller businesses and startups with a few hundred monthly visitors."
}
}]
}

Add Schema via WordPress Plugin

If your website is hosted on WordPress, use the Schema plugin to add structured markup to your pages. This plugin uses JSON-LD, which is recommended by Google and also supported by Bing.

Add Schema Manually

If your site isn’t hosted on WordPress site or you prefer not to rely on a plugin, you can manually add schema with a few more steps. Schema is usually added to the page header, although it’s possible to add it to the body or footer as well. Some recent WordPress themes include specific text blogs to add schema to the body.

While schema can be deployed through Google Tag Manager, implementing structured data directly in the page source is generally preferred because it is easier to maintain, validate, and troubleshoot. If you use Google Tag Manager, the structured data will be hidden within a container, making it difficult for Google’s algorithms to read and give it appropriate weight.

To add schema manually, first, use a tool like MERKLE to generate the baseline markup. Although this is already fine to use as is, you can also paste the baseline markup in a text editor and continue to edit and customize the structured data for each page.

If you are using Sublime Text, go to View > Syntax > JavaScript > JSON to set your syntax appropriately.

Finally, insert additional properties that were not available on MERKLE as needed.

Add html Strings to Schema

Basic html strings can be added to schema, for example, if you’d like to include a bulleted list or hyperlink. An important thing to remember here is to escape double quotes when writing html, and simply replace them with single quotes.

Google Search displays the following HTML tags; all other tags are ignored:
<h1> through <h6>, <br>, <ol>, <ul>, <li>, <a>, <p>, <div>, <b>, <strong>, <i>, and <em>.

Validate Schema

Use Google’s Rich Results Testing Tool to make sure that your schema markup is being read properly, and Structured Data Testing Tool to actually see all of the structured data on the page. The final step will be to request indexing for the page that you added markup, via Google Search Console. Within a few days, you should see your markup under the enhancements sidebar.

Note that in June 2021, Google has limited FAQ rich results to a maximum of 2 per snippet, so your snippet real estate may be a bit smaller. If you have 3 or more FAQs marked up, Google will show the 2 that are most relevant to the search query.

Log File Analysis

The log file is your website’s record of every request made to your server. It includes important information such as: the URL of the requested page, http status code, IP address of the request server, timestamp, user agent making the request, request method (GET/POST), client IP, and referrer.

Log file analysis provides insights into how Googlebot (and other web crawlers and users) are crawling your website. The log file analysis will help you answer important technical questions such as:

  • How frequently is Googlebot crawling your site?
  • How is the crawl budget being allocated?
  • How often are new and updated pages being crawled?

You can identify where the crawl budget is being used inefficiently, such as unnecessarily crawling static or irrelevant pages, and make improvements accordingly.

Obtain the Log File

The log file is stored on the web server and can be accessed through the hosting environment, and is commonly found in the following locations.

  • Apache: /var/log/access_log
  • Nginx: logs/access.log
  • IIS: %SystemDrive%\inetpub\logs\LogFiles

Tools and Software

Convert your .log file to a .csv and analyze it in Microsoft Excel or Google Sheets, or use an online log file analyzer such as SEMRush or Screaming Frog Log File Analyser. The best log file analyzer will depend on your website and what tools you might already be using for technical SEO.

Limitations

Performing regular log file analysis can be extremely useful for technical SEO, but there are some limitations. Page access and actions that occur via cache memory, proxy servers, and AJAX will not be reflected in the log file. If multiple users access your website with the same IP address, it will be counted as only one user. On the other hand, if one user uses dynamic IP assignment, the log file will show multiple accesses, overestimating the traffic count.

Posted by Rei Wakayama