Meta Characters
Meta characters are a sub-set of ASCII character set which take part in building a regular expression. e.g. +,$,^ etc.. Thus these characters instruct regex engines to perform specific operations. If we want to instruct regex engine to deal with thm as normal characters instead of meta characters we need to escape them with backward slash '\'.e.g regex "firstname\.lastname" instructs the engine to ignore special meaning of '.' and to consider as a '.' character.
Following list gives overview of mostly used meta characters.
. Dot - any character in a line
In normal search we use ? to specify a single wildcard character and * to specify sequence characters till next character. e.g. to search a file we use IAdb*.dll But in regex * is used for repetition. Also note that in a line phrase. This mean that the behavior of ‘.’ can be altered using mode settings like SingleLine or MultiLine mode to notify regex engine whether to match a newline (\n or \r\n) with a ‘.’ or to stop at new line. e.g:
Search String:
using System;
using System.IO;
using System.Text;
RegEx: “System.*”
Explanation: In MultiLine mode this matches all references with System and its decedents till end of line.
Matches in non-Single Line mode:
a)System;
b)System.IO;
c)System.Text;
Matches in Single Line mode:
a)System;System.IO;System.Text;
Note that semicolon is also matched in each line
\ - back slash
It is already mentioned that these are used to instruct regex engine to consider them as
normal characters. And when used with a number like \1 or \2, this specifies a back reference number. Back references will be covered seperately.
[ ] - opening and closing square bracket
Any group of characters to be matched are specified within these brackets. Examples are mentioned below.
( ) - round brackets
These are used to ho sub-expressions or back references. Back references will be covered later. Sub-expressions are similar to programming language sub expressions.
{ } curly brackets
These are used with iterators. We have seen this in Part 2 for five digit length. Its format is like
mandatory and can contain values from 0 to value of y. And y is optional to specify. and has to be any integer.
* Iterator to iterate for zero or more times. (0 or more times)
? Iterator to iterate for zero or one time only. (0 or 1 times)
+ Iterator to iterate for at least once or more times. (1 ore more times)
\w alphanumeric character including underscore
\W non-alphanumeric
\d numeric character
\D non-numeric
\s any white space; include
No comments:
Post a Comment