33.3. POSIX Regular Expressions

List of Examples

33.1. REGEXP:MATCH
33.2. REGEXP:REGEXP-QUOTE
33.3. Count unix shell users

The REGEXP module implements the POSIX regular expressions matching by calling the standard C system facilities. The syntax of these regular expressions is described in many places, such as your local <regex.h> manual and Emacs info pages.

This module is present in the base linking set by default.

When this module is present, *FEATURES* contains the symbol :REGEXP.

Regular Expression API

(REGEXP:MATCH pattern string &KEY (:START 0) :END :EXTENDED :IGNORE-CASE :NEWLINE :NOSUB :NOTBOL :NOTEOL)

This macro returns as first value a REGEXP:MATCH structure containing the indices of the start and end of the first match for the regular expression pattern in string; or no values if there is no match. Additionally, a REGEXP:MATCH structure is returned for every matched "\(...\)" group in pattern, in the order that the open parentheses appear in pattern. If start is non-NIL, the search starts at that index in string. If end is non-NIL, only (SUBSEQ string start end) is considered.

Example 33.1. REGEXP:MATCH

(REGEXP:MATCH "quick" "The quick brown fox jumped quickly.")
⇒ #S(REGEXP:MATCH :START 4 :END 9)
(REGEXP:MATCH "quick" "The quick brown fox jumped quickly." :start 8)
⇒ #S(REGEXP:MATCH :START 27 :END 32)
(REGEXP:MATCH "quick" "The quick brown fox jumped quickly." :start 8 :end 30)
⇒ NIL
(REGEXP:MATCH "\\([a-z]*\\)[0-9]*\\(bar\\)" "foo12bar")
⇒ #S(REGEXP:MATCH :START 0 :END 8) ;
⇒ #S(REGEXP:MATCH :START 0 :END 3) ;
⇒ #S(REGEXP:MATCH :START 5 :END 8)


(REGEXP:MATCH-START match)
(REGEXP:MATCH-END match)
Return the start and end the match; SETF-able.
(REGEXP:MATCH-STRING string match)
Extracts the substring of string corresponding to the given pair of start and end indices of match. The result is shared with string. If you want a fresh STRING, use COPY-SEQ or COERCE to SIMPLE-STRING.
(REGEXP:REGEXP-QUOTE string &OPTIONAL extended)

This function returns a regular expression STRING that matches exactly string and nothing else. This allows you to request an exact string match when calling a function that wants a regular expression.

Example 33.2. REGEXP:REGEXP-QUOTE

(regexp-quote "^The cat$")
⇒ "\\^The cat\\$"


One use of REGEXP:REGEXP-QUOTE is to combine an exact string match with context described as a regular expression. When extended is non-NIL, also quote #\+ and #\?.

(REGEXP:REGEXP-COMPILE string &KEY :EXTENDED :IGNORE-CASE :NEWLINE :NOSUB)
Compile the regular expression string into an object suitable for REGEXP:REGEXP-EXEC.
(REGEXP:REGEXP-EXEC pattern string &KEY :RETURN-TYPE :BOOLEAN (:START 0) :END :NOTBOL :NOTEOL)

Execute the pattern, which must be a compiled regular expression returned by REGEXP:REGEXP-COMPILE, against the appropriate portion of the string.

Returns REGEXP:MATCH structures as multiple values (one for each subexpression which successfully matched and one for the whole pattern), unless :BOOLEAN was non-NIL, in which case return T as an indicator of success, but do not allocate anything.

If :RETURN-TYPE is LIST (or VECTOR), the REGEXP:MATCH structures are returned as a LIST (or a VECTOR) instead. If there are more than MULTIPLE-VALUES-LIMIT REGEXP:MATCH structures to return, a LIST is returned instead of multiple values.

(REGEXP:REGEXP-SPLIT pattern string &KEY (:START 0) :END :EXTENDED :IGNORE-CASE :NEWLINE :NOSUB :NOTBOL :NOTEOL)
Return a list of substrings of string (all sharing the structure with string) separated by pattern (a regular expression STRING or a return value of REGEXP:REGEXP-COMPILE)
(REGEXP:WITH-LOOP-SPLIT (variable stream pattern &KEY (:START 0) :END :EXTENDED :IGNORE-CASE :NEWLINE :NOSUB :NOTBOL :NOTEOL) &BODY body)
Read lines from stream, split them with REGEXP:REGEXP-SPLIT on pattern, and bind the resulting list to variable.
:EXTENDED :IGNORE-CASE :NEWLINE :NOSUB
These options control compilation of a pattern. See <regex.h> for their meaning.
:NOTBOL :NOTEOL
These options control execution of a pattern. See <regex.h> for their meaning.
REGEXP:REGEXP-MATCHER
A valid value for CUSTOM:*APROPOS-MATCHER*. This will work only when your LOCALE is CHARSET:UTF-8 because CLISP uses CHARSET:UTF-8 internally and POSIX constrains <regex.h> to use the current LOCALE.

Example 33.3. Count unix shell users

The following code computes the number of people who use a particular shell:

#!/usr/local/bin/clisp -C
(DEFPACKAGE "REGEXP-TEST" (:use "LISP" "REGEXP"))
(IN-PACKAGE "REGEXP-TEST")
(let ((h (make-hash-table :test #'equal :size 10)) (n 0))
  (with-open-file (f "/etc/passwd")
    (with-loop-split (s f ":")
      (incf (gethash (seventh s) h 0))))
  (with-hash-table-iterator (i h)
    (loop (multiple-value-bind (r k v) (i)
            (unless r (return))
            (format t "[~d] ~s~30t== ~5:d~%" (incf n) k v)))))

For comparison, the almost same (except nice output formatting) can be done by the following Perl:

#!/usr/local/bin/perl -w

use diagnostics;
use strict;

my $IN = $ARGV[0];
open(INF,"< $IN") || die "$0: cannot read file [$IN]: $!\n;";
my %hash;
while (<INF>) {
  chop;
  my @all = split($ARGV[1]);
  my $shell = ($#all >= 6 ? $all[6] : "");
  if ($hash{$shell}) { $hash{$shell} ++; }
  else { $hash{$shell} = 1; }
}
my $ii = 0;
for my $kk (keys(%hash)) {
  print "[",++$ii,"] \"",$kk,"\"  --  ",$hash{$kk},"\n";
}
close(INF);


These notes document CLISP version 2.49Last modified: 2010-07-07