Getting Started

When starting a content inventory and audit process, many people look for examples—templates—on which to base their own work. This isn’t a new process, after all, and many others have done them before and have shared their own methodologies and templates.  

Before the availability of a variety of inexpensive (or free) crawling tools that automate the inventory process, creating a comprehensive inventory could be an intensive, tedious process and it was necessary to manually create a template into which the information gathered manually, or pieced together from various tools and reports, could be consolidated.

Now that you can begin your content inventory and audit using an automated tool, you already have a number of the standard data points that go into an inventory and you don’t need to go out and find someone else’s spreadsheet template. This standard set of information is typically exportable as a spreadsheet. While there are a few scenarios where you may only need the numbers—scoping a project or a content migration—if you are using the inventory sheet as the foundation of an audit, you can use that sheet and then supplement it with the additional site- or project-specific information you wish to track.

If you are inclined to go old-school and manually generate your inventory, either because you don’t have access to a crawler, because you can generate a file list from your content management system, or because your site is small enough that it isn’t an enormous effort (and you want to get very hands-on with your site’s content and structure), you will still start your project by gathering those basic data points. The difference is that you’ll need to gather them yourself and copy the relevant information into your sheet.

I’m referring to using a spreadsheet for this task; there may well be other tools you choose to use, but I have found that a spreadsheet works fine for organizing, structuring, and filtering your inventory.

Inventory Data

The basic data in an inventory includes:

  • URL

  • File type (for example, html, PDF, .doc, .jpg, etc.)

  • File size (not all tools include this and it isn’t strictly necessary unless you’re also evaluating site performance)

  • Date (this may or may not be helpful, depending on whether there is an accurate date in the header of your HTML page, which is where a crawler will look)

  • Meta title

  • Meta description

  • H1 tag text (some crawlers will also include H2s)

  • Word count

  • Count of images on each page

  • Count of all videos and audio files on each page

  • Count of all documents on each page

  • Count of links into and out of each page

Note that although an inventory is thought of as a purely quantitative exercise, by reviewing this basic information about each page and looking for patterns, you will already be able to do some of the analysis that’s typically part of an audit.

Analyzing Your Inventory

Some of the information you can glean from your inventory data includes:

URL

Looking at the URL structure allows you to evaluate several things:

  • Length and clarity: For both human readability and search engine optimization (SEO), shorter URLs are better. Very long URLs may not be rendered by some browsers and they certainly won’t be memorable to a human who may later want to directly type it in. It’s also best practice to use hyphens (rather than underscores or blank spaces) between words in URLs—a quick look at the URL list will help you identify whether your URLs follow this practice.

  • URLs that are composed of session IDs or other parameters provide no information to the user to help set expectations of the content likely to exist at that location. Multiple parameters may also affect whether a page is crawled by search engines like Google, too, so again, identifying and addressing poorly-constructed URLs is not only a favor to your human users but gives you the opportunity to improve your site’s ranking.

  • Navigational structure: It is common to use a content inventory as the basis of a hierarchical site map. If the URLs represent a logical directory structure, you have a great start at creating that map.

Type

The type, or format, of the content—for example, HTML, video, image—is another basic piece of information to identify the overall structure and content mix of your site. Does your site include a large number of PDFs? You may want to flag those for review and/or incorporation into the site in a more usable (and indexable!) way. Are there videos? Another content type to review for relevance and currency.

File size

This data may interest your web management team, who care about the size of pages and their effect on load time and performance.

Metadata: Title, Keywords, Description

Although keywords are no longer used for search ranking, the title and description are still very important. The title appears in the browser as well as in search results, so it’s important that it be unique and descriptive (get those keywords in there!) without being too long. Best practice is 70 or fewer characters.

The description also appears in search engine results, so you will want to review it to see how well it actually represents the content on the page and is engaging or informative enough to entice readers to click through to the page.

Links In/Out

Links in and out allows you to know, for example, all the places where you might need to update a link if a page moves or is deleted. You may also find pages with minimal links in, but they are important pages that you would want to be cross-linked more broadly.

Images and documents

Looking at the count of images  gives you the chance to do a little number-crunching to see the ratio of files to pages or sort to identify pages that have no images at all. Knowing where documents are linked is helpful particularly if you are planning a migration and need to maintain those links.

Word count

Looking at word count can allow you to quickly spot pages that are either so short that they may not contain enough useful information to warrant being a page or pages that are longer than readability guidelines would suggest. Pages of a similar type with very different word counts may point you to an issue with consistency in terms of the content depth of your pages.

Custom data

In addition to those standard data points, you will probably want to add in your own columns for information specific to your context. These might include your analytics data, content types (for example, article or help topic), content owners, buyer’s journey steps… whatever is relevant to your project. For auditing, you’ll want a column for notes.

Note that if you are creating custom columns in advance of your audit, and especially if you will be sharing the results with others or working collaboratively on the audit, it’s best to create controlled lists of terms for the columns. This helps maintain consistency and enables better analysis once you’ve done your audit.

Now that you have your template set up, you’re ready to start auditing.