rdbflip - writes multiple input columns to a single output column, appending input columns one after another.
rdbflip uses a getopt style parameter interface
rdbflip --option=value
Input file name. If equal to 'stdin', data is read from STDIN stream. Note that STDIN cannot be used when the disk flag is set. Defaults to 'stdin'.
Output file name. If equal to 'stdout', data is written to STDOUT stream. Defaults to 'stdout'.
A comma seperated list of column names from the input file. This option may be specified multiple times on the command line. Each time list is specified, that list of input columns represents a new output column. The name of the output column may be specified by prefacing the list of input column names with the output column name followed by a colon ':'. If the list is not prefaced by an output column name followed by a colon ':', the column name defaults to COLXXX where XXX is the column index.
Boolean flag. If set to break, break columns are generated. Default is nobreak.
A comma seperated list of break column values. This option has no effect if break is not specified. This option may be specified multiple times on the command line. Each breaklist on the command line is associated with an input list. If there are more breaklists than lists, the extra breaklists are ignored. If there are fewer breaklists than lists, the input input column names from list are used as the break values for the extra break columns. The name of the break column may be specified by prefacing the breaklist with the break column name followed by a colon ':'. If the list is not prefaced by a break column name followed by a colon ':', the column name defaults to BREAKXXX where XXX is the column index of the associated list.
Boolean flag. If set to regex, the column lists are taken to be Perl regular expressions and not comma seperated lists. Columns maintain the order they had in the input file. Default is noregex.
Boolean flag. If set to disk, the operations are done by rereading the input file. If false, the input file is read once and the data are stored in memory. Note that if true, input must come from a file and not STDIN. Default is nodisk.
Print this help information.
Print out version information
rdbflip writes multiple input columns to a single output column, appending input columns one after another.
The user specifies comma seperated lists of input column names via the list parameter. Each invocation of the list parameter represents input columns that will appear in a new output column. The total number of rows in an output table is determined by the length of the largest list of input columns.
Take the RDB table given below:
prompt-1: cat input.rdb
col0 col1 col2 col3 col4 N N N N N 0.0 1.0 2.0 3.0 4.0 0.1 1.1 2.1 3.1 4.1 0.2 1.2 2.2 3.2 4.2 0.3 1.3 2.3 3.3 4.3 0.4 1.4 2.4 3.4 4.4
We want to have the values from col1 and col2 be in the same output column. However, we want to maintain the relationship between values in col0 and col1 and values in col0 and col2. The command below shows how rdbflip can do that.
promt-2: rdbflip --input=input.rdb --list=col0 --list=col1,col2
COL0 COL1 N N 0.0 1.0 0.1 1.1 0.2 1.2 0.3 1.3 0.4 1.4 0.0 2.0 0.1 2.1 0.2 2.2 0.3 2.3 0.4 2.4
If you want a break column indicating where the data in each row originated, try the same command and add the break option.
promt-2: rdbflip --input=input.rdb --list=col0 --list=col1,col2 --break
COL0 COL1 BREAK0 BREAK1 N N S S 0.0 1.0 col0 col1 0.1 1.1 col0 col1 0.2 1.2 col0 col1 0.3 1.3 col0 col1 0.4 1.4 col0 col1 0.0 2.0 col0 col2 0.1 2.1 col0 col2 0.2 2.2 col0 col2 0.3 2.3 col0 col2 0.4 2.4 col0 col2
From the examples above, it is evident that rdbflip will repeat column values when one output column list has fewer members than another. The length of the output table is always equal to the length of the input table time the length of the longest input column list. Thus if the first input list has two elements and and the second input list has three elements, the output table contains two columns that are three times the length of the input table. The example below illustrates how rdbflip handles this case.
prompt-3: rdbflip --input=input.rdb --list=col1,col2 --list=col2,col3,col4 --break
COL0 COL1 BREAK0 BREAK1 N N S S 1.0 2.0 col1 col2 1.1 2.1 col1 col2 1.2 2.2 col1 col2 1.3 2.3 col1 col2 1.4 2.4 col1 col2 2.0 3.0 col2 col3 2.1 3.1 col2 col3 2.2 3.2 col2 col3 2.3 3.3 col2 col3 2.4 3.4 col2 col3 1.0 4.0 col1 col4 1.1 4.1 col1 col4 1.2 4.2 col1 col4 1.3 4.3 col1 col4 1.4 4.4 col1 col4
As noted in the SYNOPSIS, the list parameter can optionally specify the name of the output column or leave rdbflip to generate a default column name.
list=col1,col2 results in the output column containing col1 and col2 getting a default name, COLXXX.
list=foo:col1,col2 results in the output column named foo containing col1 and col2.
The breaklist parameter works in a similar manner.
The -regex flag presently sets regular expression matching for all of the column lists. It is not possible to set -regex for a single column only.
Michael Tibbetts ( mtibbetts@cfa.harvard.edu )
$Revision: 1.8 $ $Date: 2000/06/23 14:22:17 $