Regular Expression Matching



Constructing a regular expression to match particular strings can be a long and painful task. This regex tool aims to help the development of the proper regular expressions by providing a quick way to test your regular expression against an arbitrary string. The regex tool uses POSIX regular expressions, not Perl regular expressions. Details on using regex are found below.

The source for the regex tool is contained within the regex.c; instructions for building the tool can be found here.

The most current version of regex can be found at:

  http://www.castaglia.org/proftpd/contrib/regex.c

Author

Please contact TJ Saunders <tj at castaglia.org> with any questions, concerns, or suggestions regarding this tool.


Installation

To compile regex, simply do:
  gcc -o regex regex.c


Usage

An introduction/tutorial for writing POSIX regular expressions can be found here:
  http://www.castaglia.org/proftpd/doc/contrib/regexp.html
The regex tool can be used while reading the introduction to try out the regular expression features it describes.

Some of the information displayed may not be helpful for you; the regex tool was written to aid in the development of the mod_rewrite module for proftpd, hence why it delves deeper into the specifics of regular expression matching than the average system administrator may need.

Running regex without any parameters, or with the wrong number of parameters, reminds you of that it needs parameters for doing its job:

  # ./regex
  regex: missing required parameters:  

  # ./regex foo
  regex: missing required parameters:  
A simple test:
  # ./regex foo bar
  regex: compiled pattern 'bar' (1 group)
  regex: pattern 'bar' did NOT match string 'foo'
Be aware, however, that your shell may not handle some characters as you would assume:
  # ./regex $foo bar
  regex: missing required parameters:  
In this case, the shell assumed that $foo was a variable, and it tried to replace $foo with its value (apparently an empty string). Similarly:
  # ./regex $foo$ bar
  regex: compiled pattern 'bar' (1 expression)
  regex: pattern 'bar' did NOT match string '$'
When this happens, you may have to surround your string and/or pattern with quotes, in order to prevent your shell from handling these special characters:
  # ./regex '$foo$' bar
  regex: compiled pattern 'bar' (1 expression)
  regex: pattern 'bar' did NOT match string '$foo$'

  # ./regex '$foo$' '$bar$'
  regex: compiled pattern '$bar$' (1 expression)
  regex: pattern '$bar$' did NOT match string '$foo$'
The "compiled pattern" line reports on the pattern that is being compiled as the regular expression; this is how you can tell whether the shell is transforming your pattern into something different before passing it to the regex tool.

In many of these examples, you see "(1 expression)" or "(2 expressions)" displayed; what are these?

  #./regex 39065 '[0123456789]+(\.|,)[0-9]+$'
  regex: compiled pattern '[0123456789]+(\.|,)[0-9]+$' (2 expressions)
  regex: pattern '[0123456789]+(\.|,)[0-9]+$' did NOT match string '39065'
The number of expressions in a given pattern increases if you use alternations (i.e. the | vertical bar character).

Sometimes the regex tool will display a "substring" line, like so:

  # ./regex 39065 '[0-9]+$'
  regex: compiled pattern '[0-9]+$' (1 expression)
  regex: substring[0]: '39065' [0-5]
  regex: pattern '[0-9]+$' matched string '39065'
This shows that the first substring (starting at zero) is 39065, that the substring starts at offset 0, and ends at offset 5.

These examples are from a case where a user needs a particular proftpd Filter expression for some filesystem paths; the strength of regex lies in the ability to test out such filters without running the program itself:

  # ./regex //incoming/39065 '[0123456789]+(\.|,)[0-9]+$'
  regex: compiled pattern '[0123456789]+(\.|,)[0-9]+$' (2 expressions)
  regex: pattern '[0123456789]+(\.|,)[0-9]+$' did NOT match string '//incoming/39065'

  # ./regex //incoming/39065 '^[0-9]+$'
  regex: compiled pattern '^[0-9]+$' (1 expression)
  regex: pattern '^[0-9]+$' did NOT match string '//incoming/39065'

  # ./regex //incoming/39065 '[0-9]+$'
  regex: compiled pattern '[0-9]+$' (1 expression)
  regex: substring[0]: '39065' [11-16]
  regex: pattern '[0-9]+$' matched string '//incoming/39065'



Author: $Author: tj $
Last Updated: $Date: 2005/10/08 21:33:13 $


© Copyright 2005 TJ Saunders
All Rights Reserved