NAME

rdb_expr - a detailed description of rdb expressions


DESCRIPTION

Expressions are formed from column names and operators. Any characters that are special to the UNIX shell must be quoted. These characters include most of the really useful ones for expressions: (, ), *, ;. The characters can either by placed inside of quotes, or may ``escaped'' with a backslash (\) character. All of the rdb commands that use expressions can also read them from a file, where the characters don't have to be quoted. An added benefit in putting expressions in a file is that you can put comments in there. Anything after a # character is ignored.

Comparison operators may be of the form: gt, ge, lt, le, eq, ne, mat, nmat. For example,

         NAME eq Hobbs

The first six are the usual operators, e.g. name eq Hobbs or COUNT gt 100. The last two stand for 'match' and 'non-match' and are used for pattern matching. They are exactally the same as using the Perl operators =~ or !~ respectively, except that pattern matching can be specified easier in expressions, as in:

  NAME  mat   /[Hh]obbs/        <<< First letter either case
  NAME  mat   /hobbs/i          <<< any combination of case
  NAME  nmat  /[aeiou]/i        <<< names without vowels

A warning message is produced on STDERR if either of mat or nmat is used with a numeric type column, but the execution continues. It does not check the =~ or !~ forms.

Logical constructors or and and may be used. Special values of null to indicate an empty data value or any to indicate any value are available.

Sometimes the expression parser doesn't recognize a value appropriately and casts it to the wrong type. This usually happens while trying to use a complicated Perl expression. There are three operators which will attempt to cast the following constant expression as the requested type. It will not cast a string as a number, though.

        _fX     Leave it verbatim
        _fN     It's a number
        _fS     Make it a string

You can use this feature to do some neat things:

        label_text = _fX '("Label0", "Label1", "Label2")' \[ label \]

Certain functions are recognized as well: abs, atan2, cos, exp, hex, int, log, oct, rand, sin, sqrt, subst, lc, uc.

subst is special; it is the perl s string substitution operator, and requires the use of the =~ assignment operator. For a self referential substition, use the following form:

        zap =~ subst /foo/bar/

To assign a substition of another column, use:

        \( zap = zig \) =~ subst /foo/bar/

all other. (Yes, it is clumsy).

Assignment expressions have the form

        NAME <assignment op> <expression>

Valid assignment operators are =, =~, ++, --, +=, -=, /=, *=, etc. Multiple expressions are separated by a semicolon.

Selection expressions have the form

        <expression>

and return true or false.

Since column names and reserved words are parsed by the program, do not put the entire expression in a single pair of quotes as that will prevent the parsing. Also note that column names and reserved words need to be surrounded by blank spaces if they are not individually quoted. For example either form below is fine:

        NAME    eq   "L Brown"
        "NAME"  "eq"  "L Brown"

but do not use this form:

        "NAME  eq  L Brown"

Note that in some rare cases there could be a column name that is identical to a data value specified in an expression using another column name that might cause a problem (this actually happened). For example if two column names are 'N' and 'T', and column 'N' has a data value of 'T', to select all rows where column 'N' is equal to 'T' the normal command would be:

   table  N  eq  T

Unfortunately the 'T' in the expression gets translated to 'column name T', not used as 'data value T'. That is, the expression askes for all rows where the data value of column N equals the data value of column T, legal, but not what was wanted. There is a simple workaround however. The 'T' in the expression can be escaped with a backslash to prevent the translation to a column name, as in the revised command:

  table  N  eq  '\T'

Thus either meaning can be specified, as desired.


Decision Blocks

Expressions may also contain if - elsif else clauses as well. This is most useful when expressions are in files. For example:

        if ( type eq A ) { 
          name = NEW ;
          amt = count * 2 ; 
          type = 'AAA' ;
        } elsif ( type eq B ) {
          name = OLD ;
          amt = count + 1000 ;
          type = 'ZZZ' ;
        } else {
          name = FOO ;
          amt = count + 2000 ;
          type = 'YYY' ;
        }

Note that the statements inside the decision blocks must be enclosed in curly braces. You must put spaces between all of the tokens, as on the command line, but you don't have to quote characters dangerous to the shell. Currently, all quotes around strings are treated as part of the string (yes, that's a ``feature''), and there's no way of using strings with whitespace in them.


Temporary variables

Because the rdb routines are based on Perl, it is possible to sneak some Perl-isms into expressions that allow you to use Perl variables as temporary variables. If a token begins with the character $, the rest of the token is presumed to be a valid Perl variable name, and is passed untouched to Perl. For example, lets say there are numerical columns c, b, and g in the rdb table foo.rdb. One could compute

  c = ( b * c ) ** 2 * ( b * g )

with the following command.

  compute -x \$a = b \* c \; \$b = b \* g \; c = \$a \* \$a \* \$b

The -x option to compute will make it print out the Perl script that will be used to generate the output. Note that rdb columns and Perl variables have different name spaces, so you can mix them. The Perl routine in this example is:

  $. = 0;
  while(<STDIN>)
  {
    chop ;
    @F = split( /\t/, $_, 3 );
    $a = $F[1] * $F[0] ;
    $b = $F[1] * $F[2] ;
    $F[0] = $a * $a * $b ;
    print join( "\t", @F ), "\n" ;
  }

Note that the rdb columns show up as elements of the array $F, while the rdb variables in the compute expression show up literally.

There are two caveats. One shouldn't use any form of $F[n]. If one actually wishes to precede a string with the character $ (for instance if you wish to check for the string $ave $ave $ave!), simply precede the first $ with a backslash character \, and it will be accepted as a string, rather than as a Perl variable name.

        String eq '\$ave $ave $ave!'


Predefined Variables

Some ``special'' variables are defined and may be used in expressions. These are:

        _NR     - the record number (starts at 1)