Protocol for the established SSL connection SSL_SESSION_ID
SSL session ID for the established SSL connection Match type
Select Regex Match to require a match or Regex No Match to negate
match. Match criteria
A regular expression to match data in the client cert field selected. Example: If SSL_CLIENT_S_DN is selected OU=management would
match certificates where the client cert has Organisational Unit = management.
1.7. Regular expressions
Web Security Manager has full support for standard PCRE (Perl Compatible Regular Expressions). Following below is a brief regular expression "survival guide". For a more thorough explanation of the subject some links and books are recommended at the end of the section.
1.7.1. What are regular expressions
A regular expression is a formula for matching strings that follow some pattern.
Regular expressions are made up of normal characters and special characters. Normal characters include upper and lower case letters and digits. The characters with special meanings and are described in detail below.
In the simplest case, a regular expression looks like a standard text string. For example, the regular expression "john" contains no special characters. It will match "john" and "john doe" but it will not match "John".
In an input validation context we always want the expression to match the whole string. The ex- pression above would now be expressed as ^john$, where the characters ^ and $ means starting of line and end of line. Now john will only match "john" but not "john doe". To obtain match of "john doe" as well as "john smith" etc. we employ a few more simple special characters. In its simplest form "john lastname" could be expressed as "^john.*$" meaning: A string starting with the characters "john" followed by zero or more (the "*") occurrences of any character (the "."). For those familiar with the simple wild-card character "*" in (a.o.) DOS and Unix, ".*" equals "*" - that is: anything. Specifying anything is not very useful in an input validation context. With regular expressions much more fine grained input validation masks can be defined with the rich set of meta characters, character classes, repetition quantifiers, etc.
1.7.2. Metacharacters
Beginning of string (implied in Web Security Manager)
^
End of string (implied in Web Security Manager)
$
Any character except newline
.
Match 0 or more times
*
Match 1 or more times
+
Match 0 or 1 times; or: shortest match quantifier (i.e. *?)
?
Alternative (like logical OR)
|
Grouping
()
Set of characters (a list of characters)
[]
Repetition modifier
{}
Quote or special
\
Table 5.1. Metacharacters in regular expressions
To present a metacharacter as a data character standing for itself, precede it with \ (e.g. \. matches the full stop character "." only).
Note
In Web Security Manager all regular expressions are forced to match the entire string (URL path or parameter value) by automatically prefixing an expression with "^" and suf- fixing it with "$".
1.7.3. Repetition
Zero or more a's
a*
One or more a's
a+
Zero or one a's (i.e., optional a)
a?
Exactly m a's
a{m}
At least m a's
a{m,}
At least m but at most n a's
a{m,n}
Same as repetition but the shortest match is taken
repetition?
Table 5.2. Repetition in regular expressions Read "a's" as "occurrences of strings, each of which matches the pattern a". Read repetition as any of the repetition expressions listed above it.
Shortest match means that the shortest string matching the pattern is taken. The default is "greedy matching", which finds the longest match.
1.7.4. Special notations with \
tab\t
newline
return (CR)
\r
character with hex. code hh
\xhh
"word" boundary (zero space assertion)
\b
not a "word" boundary
\B
matches any single international character classified as a "word" character (alphanu- meric or _). Examples: A, z, 1, 9, Æ, â
\w
matches any non-"word" character
\W
matches any whitespace character (space, tab, newline)
\s
matches any non-whitespace character
\S
matches any digit character, equiv. to [0-9]
\d
matches any non-digit character
\D
Matches any UNICODE character classified as numeric
\pN
Table 5.3. Notations with \ in Web Security Manager regular expressions
1.7.5. Character sets [...]
A character set is denoted by [...]. Different meanings apply inside a character set ("character class") so that, instead of the normal rules given here, the following apply:
matches any of the characters in the list (c,h,a,r,a,c,t,e,r,s)
[characters]
matches any of the characters from x to y (inclusively) in the ASCII code
[x-y]
matches the hyphen character -
[\-]
matches the newline; other single character denotations with \ apply normally, too
[\n]
Negation. Matches any character except those that [something] denotes; that is, immediately after the leading [ the circumflex ^ means "not" applied to all of the rest
[^something]
Table 5.4. Character sets in regular expressions
1.7.6. Lookaround
The lookaround construct allows for the creation of regular expressions matching something but only when it is followed/preceded or not followed/preceded by something else. Note that the lookaround construct is a zero-width assertion. It is testing for a match of something else but it will not actually match it - that is why it is called an assertion. The lookaround constructs allows for the creation of otherwise impossible or too complex expressions.
In an input validation context look ahead could be used for specifying an expression allowing angle brackets <> but only when they are not closed.
Negative lookahead. Matches "a" when not followed by expres- sion, where expression is any regular expression.
a(?!expression)
Positive lookahead. Matches "a", when followed by expression. a(?=expression)
Negative lookbehind. Matches "a" when not preceded by fixed- expression, where fixed-expression is any regular expression (?<!fixed-expression)a
specifying a fixed number of characters. That is "aaa" wil work but a+ will not work.
Positive lookbehind. Matches "a" when preceded by fixed-ex- pression.
(?<=fixed-expression)a
Table 5.5. Lookaround in regular expressions
1.7.7. Examples
1.7.7.1. Global URL regular expressions
The URL regular expressions filter matches URLs without parameters on a proxy global basis. If a request matches any of the defined regular expressions, it will be marked as valid by Web Se- curity Manager and forwarded to the back-end server.
Matches Expression
URL with the extension html containing any in- ternational word characters, digits, _ and -. (\w
(/[\w\-]+)+\.html
matches upper and lower case alphanumeric characters plus _).
Same URL starting with /abc, including the URL /abc.html.
/abc(?:/[\w\-]+)*\.html
Same URL matching extensions html and htm (/[\w\-]+)+\.html?
Same URL matching extensions html and pdf. (/[\w\-]+)+\.(html|pdf)
URL with the extension html containing any of
the lower case letters abcdefgh. (/[abcdefgh]+)+\.html
Exact match of /index.html /index\.html
"Natural" URL containing international alphanu- meric characters, digits, _ and -.
(/[\w\-]+)+/?
URL with the extension asp starting with /sw
followed by 0-12 digits.
/sw[0-9]{0,12}\.asp
Only URLs /login or /logout /(login|logout)
Any international characters URL with one of the extensions htm, html shtml or pdf.
(/[\w\-]+)+\.(htm|html|shtml|pdf)
Table 5.6. Examples of global URL regular expressions
1.7.7.2. Validating input parameters
matches regular expression
International alphanumeric characters, underscore, a space, dot, @, parentheses and a dash.
^[\w \.@()\-]+$
digits, ASCII characters a-z, a dot and a space.
^[0-9a-za-z. ]+$
only digits.[0-9] can also be expressed as \d ^[0-9]+$
one to five digits.
^[\d]{1,5}$
only lower case ASCII characters from a-z.
matches regular expression
matches only lower case ASCII characters from a-z and limits the total length to maximum 32 characters.
^[a-z]{0,32}$
Table 5.7. Examples of regular expressions for input validation
1.7.7.3. Global parameters
When specifying global parameters both the name and the value are defined using regular expres- sions.
Matches Value
Name
The specific parameter usepf with the
static value true true
usepf
All parameters with name starting with
parm followed by three digits with the [a-zA-Z\d]{3,32}
parm\d{3}
value any combination of letters a-Z
(upper and lowercase) or digits with a minimum length of 3 and a maximum length of 32 characters.
Any parameter with name consisting of international word characters and with
[\w\s_,/:()@$*\.\-]* \w{1,25}
values containing zero or more"friendly characters".
Table 5.8. Examples of global parameters regular expressions
1.7.7.4. Predefined standard classes in Web Security Manager
The following classes are predefined in Web Security Manager. The classes are presented in the order the Automatic Policy Generator evaluates them when automatically mapping classes to
input parameters. Description Regular expression Class No values allowed empty
Digits - a maximum of 32 digits
\d{1,32}
num
Payment card numbers, allows for spaces and hyphens between number groups.
(?:\d{4}[\-\x20]?){2}\d{4,5}[\- \x20]?(?:\d{2,4})?
payment_card
Microsoft identifier with optional preceding and trailing curly brackets.
{?[A-Za-z0-9]{8}-[A-Za-z0-9]{4}- [A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A- Za-z0-9]{12}}?
ms_ident
International alphanumeric char- acters. No spaces. max. 256 chars.
\w{1,256}
alphanum
Simple international text.
(?!.*(\.\.|//).*)[\w\x20+.,\-:@=/]+
text
Simple international URL match. With parameters. Consecutive
(?:ht-
tps?://)?(?!.*(\.\.|//).*)[\w\x20@,.(){}/\- =?&]+
Description Regular expression
Class
"/" or "." not allowed (negative look ahead)
Text input, international, several special characters allowed includ- ing newline.
[\w\s@,.(){}/\-=?&_:]+
standard
Any number of printable charac- ters. Defined by negating charac-
[^\x00-\x08\x0b\x0c\x0e-\x1f\x7f]+
printable
ter class containing non-printable characters.
Anything but newline.
.+
anything
Anything including newline.
(?:.|\n)*
Anything_multiline
Table 5.9. Predefined standard classes in Web Security Manager
1.7.8. Further reading
A number of web sites and books are describing regular expressions in more detail.
1.7.8.1. Web sites
Wikipedia
A general description
http://en.wikipedia.org/wiki/Regular_expression
The 30 Minute Regex Tutorial