Regular expressions in JavaScript. RegExp object

The RegExp class in JavaScript is a regular expression - an object that describes a character pattern. RegExp objects are typically created using the special literal syntax below, but they can also be created using the RegExp () constructor.

Syntax

// using special literal syntax var regex = / pattern / flags; // using the constructor var regex = new RegExp ("pattern", "flags"); var regex = new RegExp (/ pattern /, "flags");

Parameter values:

Regular Expression Flags

FlagDescription
gLets you find all matches, rather than stopping after the first match ( global match flag).
iAllows for case-insensitive matching ( ignore case flag).
mThe matching is done across multiple lines. Starting and ending characters (^ and $) are processed on multiple lines, that is, matching occurs at the beginning or end of each line (delimiters \ n or \ r), and not just at the beginning or end of the entire line ( multiline flag).
uThe pattern will be interpreted as a sequence of Unicode code points ( unicode flag).
yMatching occurs by the index pointed to by the lastIndex property of this regular expression, while the match is not performed at a later or earlier index ( sticky flag).

Character sets

Metacharacters

SymbolDescription
. Finds one character other than a character new line, or an end-of-line character (\ n, \ r, \ u2028 or \ u2029).
\ dFinds the character of a digit in the basic Latin alphabet. Equivalent to using a character set.
\ DFinds any character that is not a digit in the basic Latin alphabet. Equivalent to the character set [^ 0-9].
\ sFinds a single whitespace character. Whitespace refers to space, tab, page feed, line feed, and other Unicode whitespace characters. Equivalent to the character set [\ f \ n \ r \ t \ v \ u00a0 \ u1680 \ u180e \ u2000 \ u2001 \ u2002 \ u2003 \ u2004 \ u2005 \ u2006 \ u2007 \ u2008 \ u2009 \ u200a \ u2028 \ u2029 \ u202f \ u205f \ u3000].
\ SFinds a single character that is not whitespace. Whitespace refers to space, tab, page feed, line feed, and other Unicode whitespace characters. Equivalent to character set [^ \ f \ n \ r \ t \ v \ u00a0 \ u1680 \ u180e \ u2000 \ u2001 \ u2002 \ u2003 \ u2004 \ u2005 \ u2006 \ u2007 \ u2008 \ u2009 \ u200a \ U2028 \ u2029 \ u202f \ u205f \ u3000].
[\ b]Finds the backspace character (special character \ b, U + 0008).
\0 Finds the character 0 (zero).
\ nFinds the newline character.
\ fFinds the page feed character.
\ rFinds a carriage return character.
\ tFinds a horizontal tab character.
\ vFinds a vertical tab character.
\ wFinds any alphanumeric character in the basic Latin alphabet, including the underscore. Equivalent to a character set.
\ WFinds any character that is not a basic Latin alphabet character. Equivalent to the character set [^ a-Za-z0-9_].
\ cXFinds a control character in a string. Where X is a letter from A to Z. For example, / \ cM / stands for Ctrl-M.
\ xhhFinds a character using a hexadecimal value (hh is a two-digit hexadecimal value).
\ uhhhhFinds a character using UTF-16 encoding (hhhh is a four-digit hexadecimal value).
\ u (hhhh) or
\ u (hhhhh)
Finds a character with a Unicode value U + hhhh or U + hhhhh (hexadecimal value). Only when the u flag is given.
\ Indicates that the next character is special and should not be interpreted literally. For characters that are usually treated in a special way, indicates that the next character is not special and should be interpreted literally.

Restrictions

Quantifiers

SymbolDescription
n *Matching occurs on any string containing zero or more occurrences of a character n.
n +Matching occurs with any string containing at least one character n.
n?Matching occurs on any line preceding the element n zero or one time.
n (x)Matches any string containing a sequence of characters n a certain number of times x. X
n (x,) x occurrences of the preceding element n. X must be a positive integer.
n (x, y)Matches any line containing at least x, but no more than with y occurrences of the preceding element n. X and y must be positive integers.
n *?
n +?
n ??
n (x)?
n (x,)?
n (x, y)?
Matching occurs by analogy with the quantifiers *, +,? and (...), however, the search is performed with the smallest possible match. The default is greedy mode,? at the end of the quantifier allows you to specify a "non-greedy" mode in which the matching is repeated as few times as possible.
x (? = y)Allows you to match x, only if for x should y.
x (?! y)Allows you to match x, only if for x it does not follow y.
x | yMatching occurs with any of the specified alternatives.

Grouping and backlinks

SymbolDescription
(x)Finds a symbol x and remember the result of the match ("capturing brackets"). The matched substring can be called from the elements of the resulting array ..., [n], or from the properties of the predefined RegExp $ 1 ..., $ 9 object.
(?: x)Finds a symbol x, but do not remember the result of the match ("non-capturing parentheses"). The matched substring cannot be called from the elements of the resulting array ..., [n], or from the properties of the predefined RegExp $ 1 ..., $ 9 object.
\ nA back reference to the last substring that matches the nth substring in parentheses in the regular expression (parentheses are numbered from left to right). n must be a positive integer.

This article has covered the basics of using regular expression in Javascript.

Introduction

What is a regular expression?

A JS regex is a sequence of characters that form a search rule. This rule can then be used to search the text as well as replace it. In practice, a regular expression can even consist of a single character, but more complex search patterns are more common.

In Javascript, regular expressions are also objects. These are patterns used to match sequences of characters in strings. They are used in the exec () and test () methods of the RegExp object, as well as in the match (), replace (), search, and split () methods of the String object.

Example

var pattern = / example / i

/ example / i is a regular expression. example is a template ( which will be used in the search). i is a modifier indicating case sensitivity.

Preparing a regular expression

JS regular expressions consist of a pattern and a modifier. The syntax will be something like this:

/ pattern / modifiers;

The template defines the search rule. It consists of simple characters like / abc / or a combination of simple and special characters: / abc / or / Chapter (d +). d /.

Template table

Modifiers allow queries to be case sensitive, global, and so on. They are used to conduct case sensitive searches as well as global searches.

Modifier table

We are now ready to apply JS regular expressions. There are two main ways to do this: using a regex object or a regex for string.

Using a regex object

Create a regular expression object

This object describes a character pattern. It is used for pattern matching. There are two ways to construct a regular expression object.

Method 1: using a regex literal that consists of a pattern enclosed in slashes, for example:

var reg = / ab + c /;

Regular expression literals trigger pre-compilation of the regular expression when the script is parsed. If the regular expression is persistent, then use it to improve performance.

Method 2: by calling the constructor function of the RegExp object, for example:

var reg = new RegExp ("ab + c");

Using the constructor allows you to compile the JS regular expression at runtime. Use this method if the regular expression will change or you do not know the pattern in advance. For example, if you receive information from a user who enters a search term.

Regular Expression Object Methods

Let's take a look at a few common methods of the regular expression object:

  • compile () ( deprecated in version 1.5) - compiles a regular expression;
  • exec () - Performs matching on a string. Returns the first match;
  • test () - Performs matching on a string. Returns true or false;
  • toString () - Returns the string value of the regular expression.

Examples of

Using test ()

The test () method is a regular expression of a RegExp object. It searches for a pattern string, and depending on the result it returns, it returns true or false. The following JS regular expression example shows how to search in a string for the character “ e”:

var patt = / e /; patt.test ("The world's best things are free!");

Since here the line contains “ e”, The result of this code will be true.

You don't need to put regular expressions in a variable at all. The same query can be made in one line:

/e/.test("The world's best things are free! ");

Using exec ()

It searches the string for the specified search rule, and returns the found text. If no match was found, then the result is null.

Let's see the method in action, using the example of the same symbol “ e”:

/e/.exec("The world's best things are free! ");

Since the line contains “ e”, The result of this code will be .e.

Applying a regular expression to a string

In Javascript, these expressions can also be used with two methods of the String object: search () and replace (). They are needed to perform search and replace in the text.

  • The search () method - uses an expression to find a match, and returns information about the location of the match;
  • Replace () method - returns a modified string with a replaced template.

Examples of

Applying JS Regular Expression to Perform a Case-Sensitive Search for the phrase “ w3schools" in line:

var str = "Visit W3Schools"; var n = str.search (/ w3schools / i);

The result in n is 6.

The search method also accepts a string as an argument. The string argument will be converted to a regular expression:

Using string to find the phrase “ W3schools" in line.

Regular Expressions

Regular expression is an object describing a character pattern. The RegExp class in JavaScript represents regular expressions, and the String and RegExp class objects define methods that use regular expressions to perform pattern matching and search operations on text with replace. The regular expression grammar in JavaScript contains a fairly complete subset of the regular expression syntax used in Perl 5, so if you are experienced with the Perl language, you can easily describe patterns in JavaScript programs.

Features of Perl regular expressions that are not supported in ECMAScript include the s (single-line mode) and x (extended syntax) flags; escape sequences \ a, \ e, \ l, \ u, \ L, \ U, \ E, \ Q, \ A, \ Z, \ z, and \ G and other extended constructs starting with (?.

Defining regular expressions

In JavaScript, regular expressions are represented by objects RegExp... RegExp objects can be created using the RegExp () constructor, but more often they are created using special literal syntax. Just as string literals are specified as quoted characters, regular expression literals are specified as characters enclosed in a slash (/) pair. Thus, JavaScript code can contain strings like this:

Var pattern = / s $ /;

This line creates a new RegExp object and assigns it to the pattern variable. This RegExp object searches for any strings ending with the character "s". The same regular expression can be defined using the RegExp () constructor:

Var pattern = new RegExp ("s $");

A regex pattern specification consists of a sequence of characters. Most characters, including all alphanumeric characters, literally describe the characters that must be present. That is, the regular expression / java / matches all strings containing the substring "java".

The other characters in regular expressions are not intended to be used to find their exact equivalents, but rather have special meanings. For example, the regular expression / s $ / contains two characters. The first s indicates a search for a literal character. Second, $ is a special metacharacter that denotes the end of a line. So this regex will match any string ending with s.

The following sections describe the various characters and metacharacters used in regular expressions in JavaScript.

Literal characters

As noted earlier, all alphabetic characters and numbers in regular expressions match themselves. The regular expression syntax in JavaScript also supports the ability to specify some non-alphabetic characters using escape sequences that begin with a backslash character (\). For example, \ n matches a line feed character. These symbols are listed in the table below:

Some punctuation marks have special meanings in regular expressions:

^ $ . * + ? = ! : | \ / () { } -

The meaning of these symbols is explained in the following sections. Some of them have special meaning only in certain regex contexts, while in other contexts they are taken literally. Typically, however, to include any of these characters literally in a regular expression, you must prepend it with a backslash. Other characters, such as quotes and @, have no special meaning and simply match themselves in regular expressions.

If you can't remember exactly which character should be preceded by a \, you can safely put a backslash in front of any of the characters. However, keep in mind that many letters and numbers have special meanings along with the slash character, so the letters and numbers you are literally looking for should not be preceded by a \. To include the backslash character itself in the regexp, you obviously need to put another backslash character in front of it. For example, the following regular expression matches any string containing a backslash character: / \\ /.

Character classes

Individual literal characters can be combined into character classes by enclosing them in square brackets. A character class matches any character contained in this class. Therefore, the regular expression // matches one of the characters a, b, or c.

Negated character classes can also be defined to match any character other than those indicated in parentheses. A negated character class is specified by the ^ character as the first character following the left parenthesis. The regular expression / [^ abc] / matches any character other than a, b, or c. In character classes, a range of characters can be specified using a hyphen. Search for all characters of the Latin alphabet in lower case is performed using the expression //, and any letter or number from the Latin character set can be found using the expression //.

Certain character classes are used especially frequently, so the regular expression syntax in JavaScript includes special characters and escape sequences to denote them. For example, \ s matches whitespace, tabs, and any Unicode whitespace characters, and \ S matches any non-Unicode whitespace characters.

The table below lists these special characters and the syntax of the character classes. (Note that some of the character class escape sequences match only ASCII characters and are not extended to work with Unicode characters. You can explicitly define own classes Unicode characters, for example, the expression / [\ u0400- \ u04FF] / matches any Cyrillic character.)

JavaScript regex character classes
Symbol Correspondence
[...] Any of the characters in parentheses
[^...] Any of the characters not listed in parentheses
. Any character other than a newline or other Unicode string delimiter
\ w Any ASCII text character. Equivalent to
\ W Any character that is not an ASCII text character. Equivalent to [^ a-zA-Z0-9_]
\ s Any Unicode whitespace character
\ S Any non-whitespace character from the Unicode set. Note that \ w and \ S are not the same
\ d Any ASCII digits. Equivalent to
\ D Any character other than ASCII digits. Equivalent to [^ 0-9]
[\ b] Backspace character literal

Note that escape sequences for class special characters can be enclosed in square brackets. \ s matches any whitespace character, and \ d matches any digit, therefore / [\ s \ d] / matches any whitespace character or digit.

Repetition

With the knowledge of the syntax of regular expressions obtained so far, we can describe a two-digit number as / \ d \ d / or four-digit numbers as / \ d \ d \ d \ d /, but we cannot, for example, describe a number, any number of digits, or a string of three letters followed by an optional digit. These more complex patterns use regular expression syntax to indicate how many times a given regular expression element can be repeated.

Repetition symbols always follow the pattern to which they apply. Some types of repetition are used quite often, and there are special symbols to indicate these cases. For example, + matches one or more instances of the previous pattern. The following table provides a summary of the repetition syntax:

The following lines show some examples:

Var pattern = / \ d (2,4) /; // Matches a two to four digit number pattern = / \ w (3) \ d? /; // Matches exactly three word characters and one optional digit pattern = / \ s + java \ s + /; // Matches the word "java" with one or more spaces // before and after it pattern = / [^ (] * /; // Matches zero or more characters other than the open parenthesis

Be careful when using the repetition characters * and ?. They can correspond to the absence of the pattern specified in front of them and, therefore, to the absence of symbols. For example, the regular expression / a * / matches the string "bbbb" because it does not contain the a character.

The repetition characters listed in the table correspond to the maximum number of repetitions that can be used to search for subsequent parts of the regular expression. We say that this is "greedy" repetition. It is also possible to implement repetition in a non-greedy manner. It is enough to indicate after the symbol (or symbols) of the repetition a question mark: ??, + ?, *? or even (1.5) ?.

For example, the regular expression / a + / matches one or more instances of the letter a. Applied to the string "aaa", it matches all three letters. On the other hand, the expression / a +? / Matches one or more instances of the letter a and selects the least possible number of characters. Applied to the same line, this pattern matches only the first letter a.

An "unsafe" repetition does not always give the expected result. Consider the pattern / a + b /, which matches one or more a characters followed by a b character. When applied to the string "aaab", it matches the entire string.

Now let's check the "non-greedy" version of / a +? B /. One would think that it should match the character b, preceded by only one character a. If applied to the same string, "aaab" would be expected to match the single a and the last b. However, in reality the whole string matches this pattern, as in the case of the "greedy" version. The point is that a regular expression pattern search is performed by finding the first position in the string from which a match becomes possible. Since a match is possible from the first character of the string, shorter matches starting with subsequent characters are not even considered.

Alternatives, grouping and links

The regular expression grammar includes special characters to define alternatives, grouping subexpressions, and references to previous subexpressions. Pipe symbol | serves to separate alternatives. For example, / ab | cd | ef / matches either the string "ab", or the string "cd", or the string "ef", and the pattern / \ d (3) | (4) / matches either three digits or four lowercase letters ...

Note that alternatives are processed from left to right until a match is found. If a match is found with the left alternative, the right one is ignored, even if a "better" match can be achieved. Therefore, when the pattern / a | ab / is applied to the string "ab", it will only match the first character.

Parentheses have several meanings in regular expressions. One of them is grouping individual elements into one subexpression, so that the elements when using the special characters |, *, +,? and others are treated as a whole. For example, / java (script)? / Matches the word "java" followed by the optional word "script", and / (ab | cd) + | ef) / matches either the string "ef" or one or more repetitions of the same from the strings "ab" or "cd".

Another use of parentheses in regular expressions is to define subpatterns within a pattern. When a regular expression match is found in the target string, you can extract the portion of the target string that matches any specific parenthesized subpattern.

Suppose you want to search for one or more lowercase letters followed by one or more numbers. To do this, you can use the pattern / + \ d + /. But suppose also that we only want the numbers at the end of each match. If you put this part of the pattern in parentheses (/ + (\ d +) /), then you can extract numbers from any matches we find. How this is done will be described below.

Related to this is another use of parenthesized subexpressions, allowing you to refer to subexpressions from the previous part of the same regular expression. This is accomplished by specifying one or more digits after the \. The numbers refer to the position of the parenthesized subexpression within the regular expression. For example, \ 1 refers to the first subexpression, and \ 3 refers to the third. Note that subexpressions can be nested within one another, so the position of the left parenthesis is used in the count. For example, in the following regular expression, a nested subexpression (cript) reference would look like \ 2:

/ (ava (cript)?) \ sis \ s (fun \ w *) /

A reference to a previous subexpression does not point to the pattern of that subexpression, but to the found text that matches that pattern. Therefore, links can be used to impose a constraint that selects portions of a string that contain exactly the same characters. For example, the following regex matches zero or more characters within single or double quotes... However, it does not require that the opening and closing quotes match each other (i.e., that both quotes are single or double):

/[""][^""]*[""]/

We can require quotation marks to match by means of such a link:

Here, \ 1 matches the first subexpression. In this example, the link imposes the constraint that the closing quotation mark matches the opening quotation mark. This regex does not allow single quotes inside double quotes, and vice versa.

It is also possible to group elements in a regular expression without creating a numbered reference to those elements. Instead of simply grouping elements between (and), start the group with (?: And end it with). Consider, for example, the following pattern:

/ (ava (?: cript)?) \ sis \ s (fun \ w *) /

Here the subexpression (?: Cript) is only needed for grouping so that the repetition character? Can be applied to the group. These modified parentheses do not create a link, so in this regex, \ 2 refers to text that matches the pattern (fun \ w *).

The following table lists the alternatives, grouping, and reference operators in regular expressions:

Javascript selection, grouping, and reference regex characters
Symbol Meaning
| Alternative. Matches either the subexpression on the left or the subexpression on the right.
(...) Grouping. Groups elements into a single unit that can be used with the characters *, +,?, | etc. Also remembers symbols corresponding to this group for use in subsequent links.
(?:...) Grouping only. Groups elements into a single whole, but does not remember the symbols corresponding to this group.
\ number Matches the same characters that were found when matching the group with the number number. Groups are subexpressions within (possibly nested) brackets. Group numbers are assigned by counting left parentheses from left to right. Groups formed with symbols (?: Are not numbered.

Specifying the match position

As described earlier, many elements of a regular expression match one character per string. For example, \ s matches one whitespace character. Other regular expression elements match the positions between characters, not the characters themselves. For example, \ b matches a word boundary — the boundary between \ w (ASCII text character) and \ W (non-text character), or the boundary between an ASCII text character and the beginning or end of a line.

Elements such as \ b do not define any characters that must be present in the found string, but they do define valid positions for matching. These elements are sometimes called regex anchor elements because they anchor the pattern to a specific position in the string. The most common anchor elements used are ^ and $, which anchor patterns to the beginning and end of a line, respectively.

For example, the word "JavaScript" on its own line can be found using the regular expression / ^ JavaScript $ /. To find a single word "Java" (and not a prefix, for example, in the word "JavaScript"), you can try using the pattern / \ sJava \ s /, which requires a space before and after the word.

But this solution raises two problems. First, it will only find the word "Java" if it is surrounded by spaces on both sides, and it cannot find it at the beginning or end of the line. Second, when this pattern does match, the string returned by it will contain leading and trailing spaces, which is not exactly what we want. So instead of a pattern that matches whitespace \ s, we'll use a pattern (or anchor) that matches the word boundaries \ b. The following expression will turn out: / \ bJava \ b /.

The \ B anchor element matches a position that is not a word boundary. That is, the pattern / \ Bcript / will match the words "JavaScript" and "postscript" and will not match the words "script" or "Scripting".

Arbitrary regular expressions can also be used as anchor conditions. Putting an expression between the characters (? = And) turns it into a lookahead match against subsequent characters, requiring those characters to match the specified pattern, but not to be included in the match string.

For example, to find a match for a common programming language followed by a colon, you can use the expression / ava (cript)? (? = \:) /. This pattern matches the word "JavaScript" in the string "JavaScript: The Definitive Guide", but it will not match the word "Java" in the string "Java in a Nutshell" because it is not followed by a colon.

If you enter the condition (?!, Then it will be a negative lookahead for subsequent characters, requiring that the following characters do not match the specified pattern. For example, the pattern / Java (?! Script) (\ w *) / matches the substring "Java", followed by capital letter and any number of ASCII text characters, provided that the substring "Java" is not followed by the substring "Script". It will match the string "JavaBeans" but not the string "Javanese", it will match the string "JavaScrip" but not the strings "JavaScript" or "JavaScripter".

The table below lists the regular expression anchor characters:

Regular Expression Anchor Characters
Symbol Meaning
^ Matches the start of a string expression or the start of a string in a multi-line search.
$ Matches the end of a string expression or the end of a string in a multi-line search.
\ b Corresponds to a word boundary, i.e. matches the position between \ w and \ W, or between \ w and the beginning or end of a string. (Note, however, that [\ b] matches the backspace character.)
\ B Matches a position that is not a word boundary.
(? = p) A positive lookahead check for subsequent characters. Requires subsequent characters to match p, but does not include those characters in the found string.
(?! p) Negative lookahead check for subsequent characters. Requires the following characters not to match p.

Flags

And one more, final element of the regular expression grammar. Regular expression flags specify high-level pattern matching rules. Unlike the rest of the regular expression grammar, flags are specified not between slash characters, but after the second one. There are three flags supported in JavaScript.

Flag i specifies that pattern matching should be case insensitive, and flag g- that the search should be global, i.e. all matches in the string must be found. Flag m searches for a pattern in multiline mode. If the string expression being searched contains line feed characters, then in this mode the anchor characters ^ and $, in addition to matching the beginning and end of the entire string expression, also match the beginning and end of each text string. For example, / java $ / im matches both "java" and "Java \ nis fun".

These flags can be combined in any combination. For example, to search for the first occurrence of the word "java" (or "Java", "JAVA", etc.) in a case-insensitive manner, you can use the case-insensitive regular expression / \ bjava \ b / i. And to find all occurrences of this word in a string, you can add the flag g: / \ bjava \ b / gi.

String class methods for pattern matching

Up to this point, we've discussed the grammar of the regexp generated, but haven't looked at how the regexp can actually be used in JavaScript. In this section, we will discuss the methods of the String object that use regular expressions for pattern matching and search and replace. Then we'll continue our discussion of pattern matching with regular expressions by looking at the RegExp object and its methods and properties.

Strings support four methods using regular expressions. The simplest of these is the method search ()... It takes a regular expression as an argument and returns either the position of the first character of the substring found, or -1 if no match was found. For example, the following call will return 4:

Var result = "JavaScript" .search (/ script / i); // 4

If the argument to the search () method is not a regular expression, it is first converted by passing it to the RegExp constructor. The search () method does not support global searches and ignores the g flag in its argument.

Method replace () performs a search and replace operation. It takes a regular expression as its first argument and a replacement string as its second. The method searches the line for which it is called to match the specified pattern.

If the regular expression contains the g flag, the replace () method replaces any matches it finds with the replacement string. Otherwise, it only replaces the first match it finds. If the first argument of the replace () method is a string and not a regular expression, then the method performs a literal search for the string, rather than converting it to a regular expression using the RegExp () constructor, as the search () method does.

As an example, we can use the replace () method to consistently capitalize the word "JavaScript" for an entire line of text:

// Regardless of the case of characters, replace with a word in the required case var result = "javascript" .replace (/ JavaScript / ig, "JavaScript");

The replace () method is more powerful than this example might suggest. Let me remind you that parenthesized subexpressions inside a regular expression are numbered from left to right, and that the regular expression remembers the text that matches each of the subexpressions. If the replacement string contains a $ followed by a digit, the replace () method replaces those two characters with the text that matches the specified subexpression. This is a very useful feature. We can use it, for example, to replace straight quotes in a string with typographic quotes, which are simulated by ASCII characters:

// A quote is a quote followed by any number of non-quote characters (we remember them), // these characters are followed by another quote var quote = / "([^"] *) "/ g; // Replace the straight quotes with typographical ones and leave "$ 1" unchanged // the content of the quote stored in $ 1 var text = "" JavaScript "is an interpreted programming language."; Var result = text.replace (quote, "" $ 1 "") ; // "JavaScript" is an interpreted programming language.

An important point to note is that the second argument to replace () can be a function that dynamically computes the replacement string.

Method match () is the most common of the regular expression methods of the String class. It takes a regular expression as its only argument (or converts its argument to a regular expression by passing it to the RegExp () constructor) and returns an array containing the search results. If the g flag is set in the regular expression, the method returns an array of all matches in the string. For example:

// will return ["1", "2", "3"] var result = "1 plus 2 equals 3" .match (/ \ d + / g);

If the regular expression does not contain the g flag, the match () method does not perform a global search; it just looks for the first match. However, match () returns an array even when the method does not perform a global search. In this case, the first element of the array is the substring found, and all the remaining elements are subexpressions of the regular expression. Therefore, if match () returns an array arr, then arr will contain the entire string found, arr will contain the substring that matches the first subexpression, and so on. Paralleling the replace () method, we can say that arr [n] is filled with the contents of $ n.

For example, take a look at the following code to parse a URL:

Var url = /(\w+):\/\/((\w.tory+)\/(\S*)/; var text = "Visit our site http: //www..php"; var result = text.match (url); if (result! = null) (var fullurl = result; // Contains "http: //www..php" var protocol = result; // Contains "http" var host = result; // Contains "www..php ")

It should be noted that for a regular expression that does not have the g global search flag set, the match () method returns the same value as the exec () method of the regular expression: the returned array has index and input properties, as described in the discussion of the exec ( ) below.

The last of the methods of the String object that uses regular expressions is split ()... This method splits the string for which it is called into an array of substrings using the argument as a delimiter. For example:

"123,456,789" .split (","); // Returns ["123", "456", "789"]

The split () method can also take a regular expression as an argument. This makes the method more powerful. For example, you can specify a delimiter that allows an arbitrary number of whitespace characters on both sides:

"1, 2, 3, 4, 5" .split (/ \ s *, \ s * /); // Returns ["1", "2", "3", "4", "5"]

RegExp object

As mentioned, regular expressions are represented as RegExp objects. In addition to the RegExp () constructor, RegExp objects support three methods and several properties.

The RegExp () constructor takes one or two string arguments and creates a new RegExp object. The first argument to the constructor is a string containing the body of the regular expression, i.e. the text that must appear between the slash characters in the regex literal. Note that string literals and regular expressions use the \ character to denote escape sequences, so when passing the regular expression to the RegExp () constructor as a string literal, you must replace each \ with a \\ pair.

The second argument to RegExp () may be missing. If specified, it defines the flags of the regular expression. It must be one of the characters g, i, m, or a combination of these characters. For example:

// Finds all 5-digit numbers in a string. Note // the use of \\ var zipcode = new RegExp ("\\ d (5)", "g");

The RegExp () constructor is useful when the regular expression is dynamically generated and therefore cannot be represented using the regular expression literal syntax. For example, to find a string entered by the user, you create a regular expression at runtime using RegExp ().

RegExp Properties

Each RegExp object has five properties. Property source- a read-only string containing the text of the regular expression. Property global - boolean A read-only flag that specifies the g flag in the regular expression. Property ignoreCase is a read-only boolean value that determines whether the i flag is present in the regular expression. Property multiline is a read-only boolean value that specifies the presence of the m flag in the regular expression. And the last property lastIndex is a read / write integer. For patterns with the g flag, this property contains the position number in the string at which the next search should start. As described below, it is used by the exec () and test () methods.

RegExp methods

RegExp objects define two methods that perform pattern matching; they behave similarly to the methods of the String class described above. The main method of the RegExp class used for pattern matching is exec ()... It is similar to the previously mentioned match () method of the String class, except that it is a method of the RegExp class that takes a string as an argument, and not a method of the String class that takes a RegExp argument.

The exec () method executes a regular expression on the specified string, i.e. searches for a match in a string. If no match is found, the method returns null. However, if a match is found, it returns the same array as the array returned by the match () method for searching without the g flag. The zero element of the array contains the string that matches the regular expression, and all subsequent elements are substrings that match all the subexpressions. In addition, the property index contains the position number of the character with which the corresponding fragment begins, and the property input refers to the string that was searched.

Unlike match (), the exec () method returns an array whose structure does not depend on the presence of the g flag in the regular expression. Let me remind you that when passing a global regular expression, the match () method returns an array of matches found. And exec () always returns one match, but provides about it full information... When exec () is called on a regular expression containing the g flag, the method sets the lastIndex property of the regular expression object to the position number of the character immediately following the found substring.

When the exec () method is called a second time for the same regular expression, it starts searching at the character whose position is specified in the lastIndex property. If exec () does not find a match, the lastIndex property is set to 0. (You can also set lastIndex to zero at any time, which should be done in all cases where the search completes before the last match on the same line is found, and starts a search on another string with the same RegExp object.) This special behavior allows exec () to be called repeatedly to iterate over all regular expression matches in the string. For example:

Var pattern = / Java / g; var text = "JavaScript is funnier than Java!"; var result; while ((result = pattern.exec (text))! = null) (console.log ("Found" "+ result +" "" + "at position" + result.index + "; next search will start with" + pattern .lastIndex);)

Another method of RegExp object - test () which is much simpler than the exec () method. It takes a string and returns true if the string matches a regular expression:

Var pattern = / java / i; pattern.test ("JavaScript"); // Returns true

Calling test () is equivalent to calling exec (), which returns true if exec () returns non-null. For this reason, the test () method behaves the same as the exec () method when called for a global regular expression: it starts looking for the specified string at the position given by the lastIndex property, and if it finds a match, sets the lastIndex property to the character position number, directly next to the found match. Therefore, using the test () method, you can also form a loop for traversing a line as with the exec () method.

new RegExp (pattern [, flags])

regex BEFORE

It is known that preferred literal syntax(/ test / i).

If the regular expression is not known in advance, then it is preferable to create the regular expression (in a character string) using the (new RegExp) constructor.

But pay attention, since the "forward slash" \ plays the role of switching the code, in the string literal (new RegExp) it has to be written twice: \\

Flags

i ignore case when matching

g global match, as opposed to local (by default, match only the first instance of a pattern) allows matches to all instances of the pattern

Operators

What How Description Usage
i flag makes reg. expression is case insensitive / testik / i
g flag global search / testik / g
m flag can be matched against many strings that can be retrieved from textarea
character class operator character set matching - any character in the range from a to z;
^ operator caret except [^ a-z] - any character EXCEPT characters in the range from a to z;
- hyphen operator indicate the range of values, inclusive - any character in the range from a to z;
\ escaping operator escapes any next character \\
^ match start operator pattern matching must happen at the beginning / ^ testik / g
$ end-of-match operator pattern matching should happen at the end / testik $ / g
? operator? makes the character optional / t? est / g
+ operator + / t + est / g
+ operator + the symbol must be present once or repeatedly / t + est / g
* operator * the symbol must be present once or repeatedly, or not at all / t + est / g
{} operator () set a fixed number of repetitions of the character / t (4) est / g
{,} operator (,) set the number of repetitions of a character within certain limits / t (4.9) est / g

Predefined character classes

Predefined member Comparison
\ t horizontal tab
\ n Line translation
. Any character other than Line feed
\ d Any tenth digit, which is the same as
\ D Any character other than the tenth digit, which is the same as [^ 0-9]
\ w Any character (numbers, letters and underscore) which is the same
\ W Any character other than numbers, letters, and the underscore, which is the same as [^ A-Za-z0-9]
\ s Any space character
\ S Any character other than a space
\ b Word border
\ B NOT a word boundary, but its internal. part

Grouping ()

If you want to apply an operator, for example, + (/ (abcd) + /) to a member group, you can use parentheses ().

Fixation

The part of the regular expression enclosed in parentheses () is called fixation.

Consider the following example:

/ ^ () k \ 1 /

\ 1 is not any character from a, b, c.
\ 1 is any character that initiates match the first character... That is, the character matched with \ 1 is unknown until the regex is resolved.

Non-fixed groups

Brackets () are used in 2 cases: for grouping and for committing. But there are situations when we need to use () only for grouping, since no commit is required, in addition, by removing unnecessary commits, we make it easier for the regular expression processing engine.

So to prevent commit before the opening parenthesis, you must put:?:

Str = "

Hello world!
"; found = str.match (/<(?:\/?)(?:\w+)(?:[^>] *?)> / i); console.log ("found without fix:", found); // ["
" ]

Test function

Regexp.test ()

The test function checks if the regular expression matches the string (str). Returns either true or false.

Usage example:

Javascript

function codeF (str) (return /^\d(5)-\d(2)/.test(str);) //console.log(codeF("12345-12ss ")); // true //console.log(codeF("1245-12ss ")); // false

Match function

str.match (regexp)

The match function returns an array of values, or null if no match is found. Check it out: if the g flag is absent in the regular expression (for performing a global search), then the match method will return the first match in the string, while, as you can see from the example, in an array of matches get FIXED(the part of the regular expression enclosed in parentheses).

Javascript

str = "For information, see: Chapter 3.4.5.1"; re = / head (\ d + (\. \ d) *) / i // with commits (no global flag) found = str.match (re) console.log (found); // ["Chapter 3.4.5.1", "3.4.5.1", ".1"]

If you provide the match () method with a global regular expression (with flag g), then an array will also be returned, but with GLOBAL matches... That is, no committed results are returned.

Javascript

str = "For information see: Chapter 3.4.5.1, Chapter 7.5"; re = / head (\ d + (\. \ d) *) / ig // no commits - globally found = str.match (re) console.log (found); // ["Chapter 3.4.5.1", "Chapter 7.5"]

Exec function

regexp.exec (str)

The exec function checks if the regular expression matches the string (str). Returns an array of results (with commits) or null. Each subsequent call to the exec method (for example, when using while) occurs (due to automatic update when executing exec of the index of the end of the last search lastIndex) go to the next global match (if flag g is checked).

Javascript

var html = "
BAM! BUM!
"; var reg = /<(\/?)(\w+)([^>] *?)> / g; //console.log(reg.exec(html)); // ["
"," "," div "," class = "test" "] while ((match = reg.exec (html))! == null) (console.log (reg.exec (html));) / * [" "," "," b "," "] [" "," "," em "," "] ["
"," / "," div "," "] * /

Without a global flag, the match and exec methods work identically. That is, they return an array with the first global match and commits.

Javascript

// match var html = "
BAM! BUM!
"; var reg = /<(\/?)(\w+)([^>] *?)> /; // no global console.log (html.match (reg)); // ["
"," "," div "," class = "test" "] // exec var html ="
BAM! BUM!
"; var reg = /<(\/?)(\w+)([^>] *?)> /; // no global console.log (reg.exec (html)); // ["
"," "," div "," class = "test" "]

Replace function

str.replace (regexp, newSubStr | function)
  • regexp - reg. expression;
  • newSubStr - the line to which the found expression in the text is changed;
  • function - called for each match found with a variable parameter list (recall that a global search in a string finds all instances of a pattern match).

The return value of this function serves as a replacement.

Function parameters:

  • 1 - Complete matched substring.
  • 2 - The meaning of bracket groups (fixations).
  • 3 - Index (position) of the match in the original string.
  • 4 - The original string.

The method does not change the calling string, but returns a new one after replacing the matches. To perform a global search and replace, use regexp with the g flag.

"GHGHGHGTTTT" .replace (// g, "K"); // "KKKKKKKKKKK"

Javascript

function upLetter (allStr, letter) (return letter.toUpperCase ();) var res = "border-top-width" .replace (/ - (\ w) / g, upLetter); console.log (res); // borderTopWidth

Last updated: 1.11.2015

Regular Expressions represent a pattern that is used to find or modify a string. To work with regular expressions in JavaScript, an object is defined RegExp.

There are two ways to define a regular expression:

Var myExp = / hello /; var myExp = new RegExp ("hello");

The regular expression used here is pretty simple: it consists of a single word "hello". In the first case, the expression is placed between two forward slashes, and in the second case, the RegExp constructor is used, in which the expression is passed as a string.

RegExp methods

To determine if a regular expression matches a string, the test () method is defined in the RegExp object. This method returns true if the string matches the regular expression, and false if it doesn't.

Var initialText = "hello world!"; var exp = / hello /; var result = exp.test (initialText); document.write (result + "
"); // true initialText =" beautifull wheather "; result = exp.test (initialText); document.write (result); // false - there is no" hello "in the initialText line

The exec method works similarly - it also checks if the string matches a regular expression, only now this method returns the part of the string that matches the expression. If there is no match, then null is returned.

Var initialText = "hello world!"; var exp = / hello /; var result = exp.exec (initialText); document.write (result + "
"); // hello initialText =" beautifull wheather "; result = exp.exec (initialText); document.write (result); // null

Character groups

A regular expression does not have to be regular strings, but it can also include special regular expression syntax elements. One of these elements represents groups of characters enclosed in square brackets. For example:

Var initialText = "defenses"; var exp = / [abc] /; var result = exp.test (initialText); document.write (result + "
"); // true initialText =" city "; result = exp.test (initialText); document.write (result); // false

The expression [abc] indicates that the string must have one of three letters.

If we need to determine the presence of alphabetic characters from a certain range in a string, then we can set this range once:

Var initialText = "defenses"; var exp = / [a-z] /; var result = exp.test (initialText); document.write (result + "
"); // true initialText =" 3di0789 "; result = exp.test (initialText); document.write (result); // false

In this case, the string must contain at least one character from range a-z.

If, on the contrary, it is not necessary for the string to have only certain characters, then it is necessary to put the ^ sign in square brackets before the enumeration of characters:

Var initialText = "defenses"; var exp = / [^ a-z] /; var result = exp.test (initialText); document.write (result + "
"); // false initialText =" 3di0789 "; exp = / [^ 0-9] /; result = exp.test (initialText); document.write (result); // true

In the first case, the string should not have only characters from the range a-z, but since the string "defenses" consists only of characters from this range, the test () method returns false, that is, the regular expression does not match the stock.

In the second case ("3di0789"), the string should not contain only numeric characters. But since the string also contains letters, the string matches the regular expression, so the test method returns true.

If necessary, we can collect combinations of expressions:

Var initialText = "home"; var exp = / [dt] o [nm] /; var result = exp.test (initialText); document.write (result); // true

The expression [dt] o [nm] indicates those strings that may contain substrings "house", "volume", "don", "tone".

Expression properties

    The global property allows you to find all substrings that match the regular expression. By default, when searching for substrings, the regular expression selects the first found substring from the string that matches the expression. Although there can be many substrings in a string that also match the expression. For this, this property is used in the form of the symbol g in expressions

    The ignoreCase property allows you to find substrings that match the regular expression, regardless of the case of the characters in the string. To do this, the i character is used in regular expressions

    The multiline property allows you to find substrings that match a regular expression in multiline text. To do this, the m symbol is used in regular expressions.

For example:

Var initialText = "hello world"; var exp = / world /; var result = exp.test (initialText); // false

There is no match between the string and the expression, since the "world" differs from the "world" in case. In this case, you need to change the regular expression by adding the ignoreCase property to it:

Var exp = / world / i;

Well, we can also use several properties at once.