Our News

More News»

PACES Blog

Stata Regular Expressions

Why the need for a regular expression package?

Since 2005, people have had questions about Stata regular expressions; a FAQ on regular expressions from Stata confirms how long the use of regular expressions in Stata has raised questions. With the recent implementation of Unicode support in Stata, regular expressions got a bit of a facelift, with several new functions beginning with the character ‘u’ being added. One challenge with this is the difference in API between the ASCII and Unicode based regular expression functions and the other is the lack of support for POSIX standards that provide tools like character class metacharacters and metacharacters that allow users to specify conditions (e.g., {2,3} to indicate the match must happen twice but not more than three times, etc…). With jregex this is about to change.

Regular Expression Functionality from the package.

Currently the package only includes a replacement function that uses regular expressions, but the program will provide a single API to multiple regular expression functions using subcommands (e.g., jregex replace ...). Additionally, this program provides access to setting all of the compilation flag options listed in the Pattern API as well as options to set several of the options available in the Matcher API.  If there is any functionality that people think would be helpful feel free to let me know and I can try addressing those things sooner.

Contact

83 Power Road, Pawtucket, RI 02860
12313 33rd Ave. NE #202, Seattle, WA 98125
P: 401.499.9719
F: 206.906.9493
E: Info@paces-consulting.org