Ever wondered if a crucial piece of text or code is present site-wide? Maybe some analytics, tracking, or tag manager code?
Or how about when you need to find old email addresses, specific spelling errors or similar? This is where site-wide custom text search can help. With it you can find answers to questions like "which pages on my site are missing Google Analytics", "how to find old Google Analytics code", or "is Google Tag Manager placed at the right place on all pages".
A1 Website Analyzer
One crawler tool that allows for custom search is our secret super tool A1 Website Analyzer. It can search in the full code of a page using regular expressions. Don't know regular expressions? No worries; if your needs are simple, chances are you can simply write the text you are searching for or use one of the presets. But if you have complex needs, like finding variations of code blocks, regular expressions can be your savior.
Learning the basics of regular expressions will be one of the most valuable things you can do as a web developer or even just as a geek user. Besides finding the things you need, advanced search and replaces and similar, many code libraries also contains functions that use regular expressions.
If you already know regex or don't care, you can skip right to the search tutorial itself.
Regex
When using regular expressions it is important to understand special characters have special meaning:
- ".+" will match any character one to infinite times.
- ".*" will match any character zero to infinite times.
- ".*?" will match any character until the next part of the regular expression code can match something.
- "s*" will match any whitespace character zero to infinite times.
- "s+" will match any whitespace character one to infinite times.
- "s" will match one whitespace character one time.
- "[0-9a-zA-Z]" will match an English lowercase/uppercase letter or digit one time.
- "[^<]*" will match any character except "<" zero to infinite times.
- "(center|centre)" will match "center" or "centre"
- "(center|centre)?" like above, but will continue with the next regular expression part even if no match
Say we want to look for occurrences of the following text strings:
- search engine peoples
- Search Engine Peoples
- Search Engine Professionals
This regex can find any and all of these:
(S|s)+earch (E|e)ngine (P|p)(rofessionals|eoples)
For more information on regular expressions, try these resources:
- Regular-Expressions.info
- The 30 Minute Regex Tutorial
- RegexBuddy. One of my top software tools recommendations. The program helps you create regular expressions, test them, and export them to be used in any kind of code.
Code Search Tutorial
In this demonstration, we'll configure A1 Website Analyzer to search for two types of Google Analytics code throughout all pages it crawls.
We first select the presets "ga_old" and "ga_new":
When selecting them in the popup presets, they are automatically added to the dropdown list:
After we run the scan and inspect the results, we make sure to enable the column that shows custom search results.
This column will contain the results. Examples of how to read them:
- Old and new analytics code found in the page:
ga_old=1;ga_new=1 - Old analytics code found once in the page:
ga_old=1 - Old analytics code found twice in th epage:
ga_old=2
Taking It Further
Now is the time to insert your own regular expression search strings. Remember that from the presets you can see the format in A1 Website Analyzer is:
"name=expression"
This is because that besides the regular expression itself, A1 Website Analyzer also needs a "name" it can use for showing the site search results.
When you have written your new regular expression, e.g.
SEPMISSPELL=(S|s)+earch (E|e)ngine (P|p)(rofessionals|eoples)
you can add it using the [+] button:
Example Searches
Some useful examples on how to add [+] searches for:
Google Tag Manager Code
If Google Tag Manager used in page:
gt=<iframe src="//www.googletagmanager.com/
Nofollow Present In Code
If "nofollow" used in any page links:
anf=<a [^>]*?rel="?nofollow"?
(Note: A1 Website Analyzer already has functionality to show links found on a page - this includes information such as "nofollow")
Frame Tag Used In Code
If "frame" tags used in page:
fra=<(iframe|frame)(s|>)
Having learned above, you are now ready to initiate crawls of websites doing site-wide custom searches of just about anything!
Now Read:
Next Steps:
Hand-Picked Related Articles:
* Includes images from CyberHades, Pleuntje