Why?
Oversimplified: the more shallow, simple and clean a URL is, the better.
Every extra https://example.com/directory/and?url_parameter adds another level of depth to a site that can potentially waste a search engine spiders time on your site.
Finding out how deep URLs go is as easy as counting directories by counting /s and &s.
Heres how you do that automatically with Excel.
Get A List Of URLs
Get your list of URLs by crawling the site, scraping Google, getting a sitemap " anything.
Most of the time Excel can import just about anything and show it in a pretty structured manner.
Like last time, Im using the sitemap of Allied American University only Ive added a few fake URL query parameters.
Insert Directory Depth Formula
With your URLs in column A, heres the formula I insert to have Excel count directory depth:
=SUM(LEN(A2)-LEN(SUBSTITUTE(A2,"/","")))/LEN("/")+SUM(LEN(A2)-LEN(SUBSTITUTE(A2,"=","")))/LEN("=")-2
My data always has headers so the first URL is at line 2.
To repeat the formula for the other URLs, figure out how many URL lines there are. Easy way in Excel to go to the last line in a column is to put your cursor in the first cell, then press CTRL + SHIFT + cursor down.
Fill in: B2:B107 (in my case you might have more or less URLs). This of course presumes your URL depth information is in column B.
From the Home tab, click Fill and choose Fill Down.
Done
We can now see that /default.aspx is one step away from the domain or that /academics/degree/concentration/default.aspx is 4.
Just as easily we can see that /about-aau/contact-us.aspx?para1=hop¶2=dop is, due to its URL parameters, also 4 deep.
From here on we could do things like
- sort the URLs by URL depth
- highlight all URLs that are at or after an arbitrary level