NAME

rdbflip - writes multiple input columns to a single output column, appending input columns one after another.


SYNOPSIS

rdbflip uses a getopt style parameter interface

    rdbflip --option=value


OPTIONS

input string

Input file name. If equal to 'stdin', data is read from STDIN stream. Note that STDIN cannot be used when the disk flag is set. Defaults to 'stdin'.

output string

Output file name. If equal to 'stdout', data is written to STDOUT stream. Defaults to 'stdout'.

list string

A comma seperated list of column names from the input file. This option may be specified multiple times on the command line. Each time list is specified, that list of input columns represents a new output column. The name of the output column may be specified by prefacing the list of input column names with the output column name followed by a colon ':'. If the list is not prefaced by an output column name followed by a colon ':', the column name defaults to COLXXX where XXX is the column index.

break|nobreak

Boolean flag. If set to break, break columns are generated. Default is nobreak.

breaklist string

A comma seperated list of break column values. This option has no effect if break is not specified. This option may be specified multiple times on the command line. Each breaklist on the command line is associated with an input list. If there are more breaklists than lists, the extra breaklists are ignored. If there are fewer breaklists than lists, the input input column names from list are used as the break values for the extra break columns. The name of the break column may be specified by prefacing the breaklist with the break column name followed by a colon ':'. If the list is not prefaced by a break column name followed by a colon ':', the column name defaults to BREAKXXX where XXX is the column index of the associated list.

regex|noregex

Boolean flag. If set to regex, the column lists are taken to be Perl regular expressions and not comma seperated lists. Columns maintain the order they had in the input file. Default is noregex.

disk|nodisk

Boolean flag. If set to disk, the operations are done by rereading the input file. If false, the input file is read once and the data are stored in memory. Note that if true, input must come from a file and not STDIN. Default is nodisk.

help

Print this help information.

version

Print out version information


DESCRIPTION

rdbflip writes multiple input columns to a single output column, appending input columns one after another.

The user specifies comma seperated lists of input column names via the list parameter. Each invocation of the list parameter represents input columns that will appear in a new output column. The total number of rows in an output table is determined by the length of the largest list of input columns.

Take the RDB table given below:

prompt-1: cat input.rdb

 col0   col1    col2    col3    col4
 N      N       N       N       N
 0.0    1.0     2.0     3.0     4.0
 0.1    1.1     2.1     3.1     4.1
 0.2    1.2     2.2     3.2     4.2
 0.3    1.3     2.3     3.3     4.3
 0.4    1.4     2.4     3.4     4.4

We want to have the values from col1 and col2 be in the same output column. However, we want to maintain the relationship between values in col0 and col1 and values in col0 and col2. The command below shows how rdbflip can do that.

promt-2: rdbflip --input=input.rdb --list=col0 --list=col1,col2

 COL0    COL1
 N       N
 0.0     1.0
 0.1     1.1
 0.2     1.2
 0.3     1.3
 0.4     1.4
 0.0     2.0
 0.1     2.1
 0.2     2.2
 0.3     2.3
 0.4     2.4

If you want a break column indicating where the data in each row originated, try the same command and add the break option.

promt-2: rdbflip --input=input.rdb --list=col0 --list=col1,col2 --break

 COL0    COL1    BREAK0  BREAK1
 N       N       S       S
 0.0     1.0     col0    col1
 0.1     1.1     col0    col1
 0.2     1.2     col0    col1
 0.3     1.3     col0    col1
 0.4     1.4     col0    col1
 0.0     2.0     col0    col2
 0.1     2.1     col0    col2
 0.2     2.2     col0    col2
 0.3     2.3     col0    col2
 0.4     2.4     col0    col2

From the examples above, it is evident that rdbflip will repeat column values when one output column list has fewer members than another. The length of the output table is always equal to the length of the input table time the length of the longest input column list. Thus if the first input list has two elements and and the second input list has three elements, the output table contains two columns that are three times the length of the input table. The example below illustrates how rdbflip handles this case.

prompt-3: rdbflip --input=input.rdb --list=col1,col2 --list=col2,col3,col4 --break

 COL0    COL1    BREAK0  BREAK1
 N       N       S       S
 1.0     2.0     col1    col2
 1.1     2.1     col1    col2
 1.2     2.2     col1    col2
 1.3     2.3     col1    col2
 1.4     2.4     col1    col2
 2.0     3.0     col2    col3
 2.1     3.1     col2    col3
 2.2     3.2     col2    col3
 2.3     3.3     col2    col3
 2.4     3.4     col2    col3
 1.0     4.0     col1    col4
 1.1     4.1     col1    col4
 1.2     4.2     col1    col4
 1.3     4.3     col1    col4
 1.4     4.4     col1    col4

As noted in the SYNOPSIS, the list parameter can optionally specify the name of the output column or leave rdbflip to generate a default column name.

list=col1,col2 results in the output column containing col1 and col2 getting a default name, COLXXX.

list=foo:col1,col2 results in the output column named foo containing col1 and col2.

The breaklist parameter works in a similar manner.


BUGS

The -regex flag presently sets regular expression matching for all of the column lists. It is not possible to set -regex for a single column only.


AUTHOR

Michael Tibbetts ( mtibbetts@cfa.harvard.edu )


VERSION

$Revision: 1.8 $ $Date: 2000/06/23 14:22:17 $