1. Introducing Regular Expressions
To get a head start on introducing regular expressions , Ill start with an example. Its one that youve experienced hundreds of times. When you enter customer data online, many web forms ask you for an email address. To avoid an incorrectly typed address, an immediate validation makes sense. One way would be to split the string in parts (before and after the @ character), analyzing the positions of the dots and the number of characters after the last dot (thats the top-level domain).
After a few ifs and loops, youre almost done. Or you simply use a regular expression:
[_a-zA-Z0-9-]+(\.[_a-zA-Z0-9-]+)* @ [a-zA-Z0-9-]+\.([azA-Z]{2,3})$
Did you get that? If this is your first time with regular expressions , it is probably hard to read.
Regular expressions are a form of pattern recognition in regular text (read: string). Regular expressions compare the pattern with the text. The whole expression, encapsulated as an object in a script or programming language, would return either true or false . The result obviously tells the caller whether the comparison was successful or not. Hence, the expression can be better understood if you see it in the context of an actual language. Because this book is dedicated to JavaScript , the usage in that language would look like this:
1 var email = "joerg@krause.net";
3 console.log(check(email));
5 function check(email) {
6 if (email.match(/^[_a-zA-Z0-9-]+(\.[_a-zA-Z0-9-]+)*@[a-zA-Z0-9-]+\\
7 .([a-zA-Z]{2,3})$/)) {
8 return true ;
9 } else {
10 return false ;
11 }
12 }
Here the expression is made by using typical literals as a boundary /expression/ and the comparison is made by the match function that each string object provides. Be aware that the slashes must not be put in quotes. Its a literal that creates an object.
A Test Environment
For the first steps, it is helpful to use a JavaScript test console that is available online. I recommend using
Figure 1-1.
The example in Repl.it
REPL
The term REPL is an abbreviation for Read-Eval-Print-Loop. Its a method of working interactively with a script language. Read more about it on
Copy or Scaffold?
Im going to show many useful expressions in Chapter . This first chapter, however, shows both trivial and non-trivial expressions that are ready to use. Because the expressions are sometimes tricky and hard to read, you can download all examples.
Website
Visit this books
If you want to become a professional JavaScript developer, regular expressions are a part of your toolset. You should try to understand the expressions completely and start creating your own.
How Does It Work?
You might be curious to know how the preceding expression works.
If you start analyzing such expressions, youd best start with the extraction of special characters. These include one of the following in this particular expression: , $, +, *, ?, [], () . All other characters do not possess a special meaning here. Regular characters are a minority. Usually, such patterns use placeholders and descriptive characters more than actual letters in a word.
Here is an overview of the special characters :
lets the recognition start at the beginning. If you write x , the expression will match the letter x only if it appears at the very first character.
$ lets you define where the pattern ends.
* is a placeholder that means no or any number of characters.
+ is a placeholder that means one or any number of characters.
? is a placeholder that means no or one character.
[a-z] defines one character out of group of letters or digits. You can use uppercase letters, lowercase letters, or digits by simply placing them in the brackets, or you can define them as a range, as shown in the example.
() groups characters or strings of characters. You can use the set operators * , +, and ? in such a group, too.
{} is a repetition marker that defines the character before the braces. It can be repeated multiple times. The range can be defined by numbers; if the start and the end are given separately, the numbers are written with a comma ( {3,7} ).
\ (the backslash) masks metacharacters and special characters so that they do no longer possess a special meaning.
. represents exactly one character. If you actually need a dot, just write \. .
Now you can easily split the expression quite well. The @ character represents itself and the first step splits the expression:
1 ^[_a-zA-Z0-9-]+(\.[_a-zA-Z0-9-]+)*
2 @
3 [a-zA-Z0-9-]+\.([a-zA-Z]{2,3})$
The part before the @ character must have at least one character. This is forced by the first character definition [_a-zA-Z0-9-] , with all acceptable characters together with the + sign. Then, the expression might be followed by an actual dot \. , which by itself can be followed by one or more characters. The whole dot plus more characters group is optional and can be repeated endlessly ( * ). The second part is similar. The set definition is missing the underscore character, which may not appear in regular domain names. The dot is not optional (no set operator) and the remaining part can be two or three characters long (thats ignoring the new domain names with four or more characters , but I think you get the idea).
Resolving Expressions
The expression might appear here, but its still far from being perfect.
There are a few cases where the expression may reject acceptable email addresses and also accept irregular ones. An expression is still hard to explain. Because youll be challenged with more complex examples soon, Ill use another layout here:
1 ^ // Start at the beginning
2 [_a-zA-Z0-9-] // Define a group
3 + // One or more times
4 ( // Group 1
5 \. // a "real" dot
6 [_a-zA-Z0-9-] // Another group
7 + // One or more times
8 ) // /* End group 1 */
9 * // Group optional or multiple times
10 @ // The @ character
11 [a-zA-Z0-9-] // Character definition
12 + // On or multiple times
13 \. // A "real" dot
14 ( // Group 2
15 [a-zA-Z] // A character definition
16 {2,3} // Two or three characters
17 ) // /* End group 2 */
18 $ // End of text must come now
Thats a lot easier, isnt it? Unfortunately, you cant write it in that way in JavaScript . Im going to use this form only to break down tricky expressions. If you struggle with an expression in this book, just try to resolve the pattern by putting each character on one line and write down what it means .