RAMADDA SeeSV

RAMADDA User Guide > Data Facilities > ETL Tools

RAMADDA SeeSV

Introduction

The RAMADDA SeeSV can be used both interactively from within RAMADDA as well as via the command line. Help with the interactive version can be viewed here. In depth interactive examples are available here.

For brevity, in the below documentation we're assuming that you are using an alias:

alias seesv="sh /path/to/seesv/seesv.sh"

The examples here show the command line use. If running interactively through the RAMADDA web interface the commands are the same form except for the seesv command call and the specification of the input file. So, for example, if you are calling the tool from the command line as:

seesv -columns column1,column2 -p file.csv > output.csv

The corresponding commands interactively would be:

-columns column1,column2

Install

Download the seesv.zip file from the RAMADDA download site
Unzip the file
Consult the README
SeeSV runs using Java 1.8 or higher. Set the JAVA environment variable to point to the java runtime.
Usage:
seesv <any number of commands> -p <input file> > output file

Basic Use

The general use of the SeeSV tools is as follows:

seesv <a number of processing commands> <some output command> file.csv > output.csv
#or
seesv <a number of processing commands> <some output command> -o output.csv   file.csv 

#To show help:
seesv -help

#To show help on a topic:
seesv -help:some_topic

#With no processing commands and just printing the header:
seesv -printheader file.csv

#Or a shortcut
seesv -ph file.csv

#With no processing commands and csv output:
seesv -p file.csv > output.csv

#With no processing commands and a record oriented  output:
seesv -record file.csv > output.csv

#With the -columns command printing csv output
seesv -columns column1,column2 -p file.csv > output.csv

Column Identifiers

Note: many of these commands use one or more column names from the source file. These column names are converted to a canonical identifier format by lower casing, changing spaces to "_" and removing non alpha numeric characters. So for example if your source file is:

column 1, Column 2, Some "funny" column
 ...

Then the column ids are:

column_1,column_2,some_funny_column

You can always use the -ids command to list the ids

seesv -ids -printheader test.csv

Some Examples

####  selecting columns
#Subset test.csv and only include column1 and column2
seesv -columns column1,column2 -p test.csv > newtest.csv

#You can also use numeric indices - zero based
seesv -columns 0,1 -p test.csv > newtest.csv

#Specify a range of columns
#either using column names:
seesv -columns column1-columnN -p test.csv > newtest.csv
#or indices:
seesv -columns 1-5,6,7,10-15 -p test.csv > newtest.csv

#or mix and match numeric and names
seesv -columns 0-1,columnA,columnB,columnF-columnZ -p test.csv > newtest.csv

#Specify a step:
#seesv -columns start-end:step

#e.g. below gives 0,3,6,9,12,15
seesv -columns 0-15:3

#include all columns except column1
seesv -notcolumns column1 -p test.csv > newtest.csv

#subset columns and then change the values 
seesv -columns column1,column2 -change column1,column2 "pattern" "new value" -p test.csv > newtest.csv

#Only include column1 and then only pass through unique values matching exactly
seesv -columns column1 -unique 0 exact -p test.csv > newtest.csv

#Only pass throught unique values in the first column and then sort the output
seesv -columns column1 -unique 0 -sort column1 -p test.csv > newtest.csv

-version

print version

-commands <file>

file of commands

file: The file of commands. Any # of lines

Input
Specify the input. Default is assumed to be a CSV but can support HTML, JSON, XML, Shapefile, etc.

The default input type is CSV. To print out a CSV file just do: seesv -p test.csv The SeeSV package can also handle zipped or gzipped input files: seesv -p test.csv.zip
seesv -p test.csv.gz Likewise, if you have an XLSX file the package will extract the first sheet in the input file: seesv -p test.xls The SeeSV package does have some support for handling multiple input files. It is assumed that each file is has the same set of columns. Note: not all commands work with multiple input files. seesv -p test1.csv test2.csv Other file types (see below for more details) are handled as: #Read each line as a single column
seesv -lines -p test.txt

#Process xml with the given path
seesv -xml "path.to.rows" -p test.xml

#Process JSON
seesv -json "path.to.array" "object_paths" -p test.json

-delimiter <delimiter>

Specify the input delimiter

delimiter: Use 'space' for space, 'tab' for tab,'?' to guess between tab and space

-inputcomment <comment>

Input comment

comment:

-tab

Use tabs. A shortcut for -delimiter tab

-widths <widths>

Columns are fixed widths

widths: w1,w2,...,wN

For example, with the below data that has Xs to represent spaces

nameXXXvalue1XXXXXvalue2XXXXX   
foo    1          2
bar    3          4

You would set the -widths to be: seesv -widths 7,11,11

-quotesnotspecial

Don't treat quotes as special characters
Normally, when parsing a CSV file if a quote is encountered it is assumed to be a special character enclosing commas or new lines. If the quotes in your CSV file aren't special then use this command.

-cleaninput

Input is one text line per row. i.e., no new lines in a data row. Setting this can improve performance on large files

-start <start pattern>

Start at pattern in source file

start pattern:

-stop <stop pattern>

End at pattern in source file

stop pattern:

-bom

Input has a leading byte order mark (BOM) that should be stripped out

-encoding <encoding>

Specify the file encoding

encoding: File Encoding see https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html values:UTF-8,UTF-16,UTF-16BE,UTF-16LE,UTF-32,UTF-32BE,UTF-32LE,CESU-8,IBM00858,IBM437,IBM775,IBM850,IBM852,IBM855,IBM857,IBM862,IBM866,ISO-8859-1,ISO-8859-13,ISO-8859-15,ISO-8859-2,ISO-8859-4,ISO-8859-5,ISO-8859-7,ISO-8859-9,KOI8-R,KOI8-U,Not available,US-ASCII,windows-1250,windows-1251,windows-1252,windows-1253,windows-1254,windows-1257,x-IBM737,x-IBM874,x-UTF-16LE-BOM,x-UTF-32BE-BOM,x-UTF-32LE-BOM

-header <header>

Raw header

header: Column names

Use this if your file does not have a header.

-multifiles <template>

Treat input files separately

template: File template - ${file_shortname} ${file_name} ${count}

-json <arrayPath> <objectPaths>

Parse the input as json

arrayPath: Path to the array e.g., obj1.arr[2].obj2
objectPaths: One or more paths to the objects e.g. geometry,features

The -json command takes as first argument a path to the array of objects that constitute the data. For example, if you had JSON like:

{
    "p1": {
	"p2": {
	    "values": [
		1,2,3,4
	    ]
	}
    }
}

The command would be: seesv -json p1.p2.values "" Since the values array just holds numeric values there is no column name. The data would be:

If you had a path that had an array component that you wanted to access, for example, the second element of the array, e.g.:

{
    "p1": [
	{"foo":"bar"},
	{
	    "p2": {
		"values": [
		    "1","2","3","4"
		]
	    }
	}
    ]
}

The command would be: seesv -json "p1[1\].p2.values" "" Here is an example where the value array holds objects.

{
    "p1": {
	"p2": {
	    "values": [
		{
		    "name":"jim",
		    "value":5
		},
		{
		    "name":"jill",
		    "value":10
		}		
	    ]
	}
    }
}

The command is: seesv -json "p1.p2.values" "" With the result using the object attribute names as column names:

name	value
jim	5
jill	10

If the object within the array is a nested object and you only want to access an inner object, e.g.:

{
    "p1": {
	"p2": {
	    "values": [
		{
		    "foo":"bar",
		    "goodvalues": {
			"name":"jim",
			"value":5
		    }
		},
		{
		    "foo":"bar",
		    "goodvalues": {
			"name":"jill",
			"value":10
		    }
		},
	    ]
	}
    }
}

You would use the second argument to specify an inner path to the object of interest: seesv -json "p1.p2.values" "goodvalues" If the object within the array is a nested object and you only wanted to access multiple inner objects, e.g.:

{
    "p1": {
	"p2": {
	    "values": [
		{
		    "foo":"bar",
		    "goodvalues": {
			"name":"jim",
			"value":5
		    },
		    "moregoodvalues": {
			"age":74,
		    }    
		},
		{
		    "foo":"bar",
		    "goodvalues": {
			"name":"jill",
			"value":10
		    },
		    "moregoodvalues": {
			"age":80,
		    }
		},
	    ]
	}
    }
}

Then the second argument can be a comma separated list of path specifications:

seesv  -json "p1.p2.values" "goodvalues,moregoodvalues"

Resulting in:

age	name	value
74	jim	5
80	jill	10

-jsonjoin <arrayPaths> <keys> <pattern> <replace> <missing>

Join different arrays in the input JSON

arrayPaths: comma separated list of the array paths
keys: Comma separated list of keys to match on
pattern: Optional pattern to replace the key value with
replace: Pattern replace
missing: Missing value

-jsonvalue <columns> <arrayPath>

Extract a value from a JSON column

columns: Column names
arrayPath: Path to the array e.g., obj1.arr[2].obj2

-geojson <includePolygon>

Parse the input as geojson

includePolygon: Include polygon values:true,false

-pdf

Read input from a PDF file.
The RAMADDA_TABULA environment variable needs to be set to the tabula.sh file found in this release

export RAMADDA_TABULA=/path/to/tabula.sh

-xml <path>

Parse the input as xml

path: Path to the elements

-shapefile <props>

Parse the input shapefile

props: "addPoints true addShapes false"

The -shapefile command can either take a .shp shapefile or a .zip shapefile. If it is a .zip shapefile that contains a .dbf file then then values within the .dbf file are extracted and added as output. The argument is a set of name/value pairs. For now these can contain an addPoints and an addShapes values, e.g. the below will add a point which is the calculated centroid of the shape. seesv -shapefile "addPoints true" -p test.zip With addShapes=true the polygon will be extracted and added as a column value: seesv -shapefile "addShapes true" -p test.zip

-lines

Parse the input as text lines. Treat each line as one column
For each line in the input file this creates a row that contains one column which holds the line.

-htmltable <skip> <pattern> <properties>

Parse tables in the input html file

skip: Number of tables to skip
pattern: Pattern to skip to
properties: Other name value args -
- numTables N:Number of tables to process. Default is 1
- removeEntity true:remove HTML entities
- removePattern pattern
- extractUrls true
- columnN.extractUrls true: N=column number
- stripTags false: strip any HTML tags. Default =true
- columnN.stripTags false: N=column number. Set stripTags for the column

If "pattern" is specified then skip to the first line that matches the pattern. Then skip over the "skip" number of tables. Then process each table up to the numTables (default=1).

The properties are defined as: seesv -htmltable 0 "" "numTable 3 removePattern {some pattern} stripTags true"

-htmlpattern <columns> <startPattern> <endPattern> <pattern>

Parse the input html file

columns: Column names
startPattern:
endPattern:
pattern: Row pattern. Use (...) to match columns

-harvest <pattern>

Harvest links in web page. This results in a 2 column dataset with fields: label,url

pattern: regexp to match

-text <comma separated header> <chunk pattern> <token pattern>

Extract rows from the text

comma separated header:
chunk pattern:
token pattern:

If you have an unstructured text file the -text command can extract out a set of rows and columns from it. The idea is you specify one regular expression (help) for finding a chunk of text in the file that holds the row values and another regular expression to extract the column values from the chunk.

Assume you have a file test.txt

  Hello there how are you
Person:  name: Jim age: 42
and some more text  
Person:  name: Joe age: 53
Person:  name: Sally age: 93
and more

We are going to extract the names and age with: seesv -text "name,age" "(?m)(?s).*?Person:(.*?$)" "name:(.*?)age: *(\d+)" The (?m)(?s) are directives to match multiple lines.
The (.*?$) in the first pattern is the group that is matched.
The second pattern: name:(.*?)age: *(\d+) has 2 groups, one for the name and one for the age.

-extractpattern <comma separated header> <token pattern>

Extract rows from the text

comma separated header:
token pattern:

If you have a row oriented but unstructured text file the -extractpattern command can be used to extract the columns from it. Specify a comma separated header and a regular expression (help) with groups that define the columns.

Assume you have a file test.txt

Person:  name: Joe age: 53
Person:  name: Sally age: 93

Extract the data with: seesv -extractpattern "name,age" "name:(.*)age:(.*)" -p file.txt

-tokenize <header> <pattern>

Tokenize the input from the pattern

header: header1,header2...
pattern:

-sql <db> <table> <columns> <where> <properties>

Read data from the given database

db: The database id (defined in the environment) values:property:seesv_dbs
table: Comma separate list of tables to select from
columns: Comma separated list of columns to select
where: column1:expr:value;column2:expr:value;... e.g.: name:like:joe;age:>:60 Where expr is: =|<|>|<>|like|notlike
properties: name space value properties. e.g., join col1,col2

The -sql command uses a number of RAMADDA properties or environment variables that define the databases that can be accessed. The command specifies a database as: seesv -sql <db> ...
e.g.:
seesv -sql test ...
Where for the given db there should be 2 environment variables:

export seesv_db_<db>_url=JDBC URL
export seesv_db_<db>_tables=List of tables that can be accessed

So for the db "test" in a Derby database:

export seesv_db_test_url=jdbc:derby:/Users/home/derby/db1;create=true;
export seesv_db_test_tables=test_table1,test_table2,test_table3

-synthetic <header> <values> <number_rows>

Generate an empty file with the given number of rows

header: comma separated header
values: comma separated values
number_rows: Number of rows

For example using this: seesv -synthetic "name,value" "example name,example value" 5Gives:

name	value
example name	example value
example name	example value
example name	example value
example name	example value
example name	example value

Note: you don't have to specify all of the example values. For example, using this: seesv -synthetic "name,value" "example name,example value" 5 Gives:

name	value
<blank>	<blank>
<blank>	<blank>
<blank>	<blank>
<blank>	<blank>
<blank>	<blank>

-prune <bytes>

Prune out the first N bytes

bytes: Number of leading bytes to remove

-deheader

Strip off the RAMADDA point header

-headernames

Make the header proper capitalization

-headerids

Clean up the header names

-ids

Use canonical names
This command converts the column names to a standardized format. Changing the names to lower case and converting punctuation, etc to "_".

-sheet <sheet>

Set XLS sheet #

sheet: Sheet number

-cat <*.csv>

Concat the columns in one or more csv files

*.csv:

For example, with the following 2 files: test1.csv:

name1	value1
foo	1
bar	2

test2.csv:

name2	value2	value3
xxx	1	3
yyy	2	7

Calling: seesv -cat test1.csv test2.csv Would result in an out file:

name1	value1	name2	value2	value3
foo	1	xxx	1	3
bar	1	yyy	2	7

-append <skip> <files>

Append the files, skipping the given rows in the latter files

skip: Number of rows to skip
files:
- .csv

This concatenates the input files, passing through the first file whole but skipping over the given number of lines in the subsequent files. So with files: test1.csv:

name	value
foo	bar

test2.csv:

name	value
x	y

test3.csv:

name	value
l	m

Calling: seesv -append 1 test1.csv test2.csv test3.csv Results in:

name	value
foo	bar
x	y
l	m

-chop <numlines> <file>

Write out last N lines. include the header

numlines: Number of lines to leave
file:
- .csv

-filenamepattern <pattern> <columnnames>

Extract strings from the file name and add them as new columns

pattern: Pattern to match
columnnames: Comma separated list of column names

Filter

-skiplines <lines>

Skip number of raw lines.

lines: How many raw lines to skip

-maxrows <rows>

Set max rows to process

rows: Number of rows

-match <columns> <pattern>

Pass through rows that the columns each match the pattern

columns: Column indices. Can include ranges, e.g. 0-5
pattern: regexp or prefix with includes:s1,s2 to do substrings match

-notmatch <columns> <pattern>

Pass through rows that don't match the pattern

columns: Column indices. Can include ranges, e.g. 0-5
pattern: regexp or prefix with includes:s1,s2 to do substrings match

-if

Next N args specify a filter command followed by any change commands followed by an -endif.
For example, the below command would change the value of column2 for all rows that match the -pattern column1 string command. seesv -if -pattern column1 string -change column2 .* XXX -endif

-rawlines <lines>

lines: Pass through and print out rawlines unprocesed

-inputnotcontains <filters>

Filter out input lines that contain any of the strings

filters: Comma separated list of strings to filter on

These patterns are applied to the raw lines that are read in before they are tokenized into columns.

-min <min # columns>

Only pass thorough lines that have at least this number of columns. Specify blank to use the number of columns in the header

min # columns:

-max <max # columns>

Only pass through lines that have no more than this number of columns. Specify blank to use the number of columns in the header

max # columns:

-numcolumns <number>

Remove or add values so each row has the number of columns

number: use -1 to use the # of columns in the header

-has <columns>

Only pass through anything if the data has the given columns

columns: Column indices. Can include ranges, e.g. 0-5

-ifnumcolumns <operator> <number>

Only pass through rows with number of columns passing the operator

operator: <,<=,>,>=,=,!=
number: Number of columns

-fuzzypattern <threshold> <columns> <pattern>

Pass through rows that the columns each fuzzily match the pattern

threshold: Score threshold 0-100. Default:85. Higher number better match
columns: Column indices. Can include ranges, e.g. 0-5
pattern:

-lengthgreater <columns> <length>

Pass through rows that the length of the columns is greater than

columns: Column indices. Can include ranges, e.g. 0-5
length:

-same <column1> <column2>

Pass through where the 2 columns have the same value

column1:
column2:

-notsame <column1> <column2>

Pass through where the 2 columns don't have the same value

column1:
column2:

-unique <columns> <mode>

Pass through unique values

columns: Column indices. Can include ranges, e.g. 0-5
mode: What type of matching is done - exact (exact match) or clean (lower case and remove whitespace) or fuzzy:threshold (do fuzzy matching with threshold from 1: no similarity to 100: exact match. use fuzzy:? to print out values) values:exact,clean,fuzzy:threshold

The mode fuzzy:threshold uses the Fuzzywuzzy package which is based on the FuzzyWuzzy Python algorithm which uses the Levenshtein distance to calculate similarity between strings. You can view what the calculated threshold values are: seesv -unique column fuzzy:? This will print out the threshold values, e.g:

value:joey smith
        95 joe smith
value:sam jones
        44 joe smith
        42 joey smith
value:sam jones
        44 joe smith
        42 joey smith
value:sue washington
        43 joe smith
        42 joey smith
        35 sam jones
value:sue washington
        43 joe smith
        42 joey smith
        35 sam jones

-dups <columns>

Pass through duplicate values

columns: Column indices. Can include ranges, e.g. 0-5

-sample <probablity>

Pass through rows based on probablity

probablity: 0-1 probability of passing through a row

For example, if you only want 5% of the data sampled do: seesv -sample 0.05 ...

-minvalue <key column> <value column>

Pass through the row that has the min value in the group of columns defined by the key column

key column:
value column:

-maxvalue <key column> <value column>

Pass through the row that has the max value in the group of columns defined by the key column

key column:
value column:

-eq <column> <value>

Pass through rows that the column value equals the given value

column:
value:

-ne <column> <value>

Pass through rows that the column value does not equal the given value

column:
value:

-gt <column> <value>

Pass through rows that the column value is greater than the given value

column:
value:

-ge <column> <value>

Pass through rows that the column value is greater than or equals the given value

column:
value:

-lt <column> <value>

Pass through rows that the column value is less than the given value

column:
value:

-le <column> <value>

Pass through rows that the column value is less than or equals the given value

column:
value:

-between <column> <min value> <max value>

Extract rows that are within the range

column:
min value:
max value:

-notbetween <column> <min value> <max value>

Extract rows that are not within the range

column:
min value:
max value:

-betweenstring <column> <start string> <end string>

Extract rows that are between the given strings

column:
start string:
end string:

For example, given the following data

name	value1	value2
jane	foo	1
jane	bar	2
john	doe	3
bill	doe	4
bill	doe	4

Calling: seesv -betweenstring name jane john Will give:

name	value1	value2
jane	foo	1
jane	bar	2
john	doe	3

-notbetweenstring <column> <start string> <end string>

Extract rows that are between the given strings

column:
start string:
end string:

For example, given the following data

name	value1	value2
jane	foo	1
jane	bar	2
john	doe	3
bill	doe	4
bill	doe	4

Calling: seesv -notbetweenstring name jane john Will give:

name	value1	value2
bill	doe	4
bill	doe	4

-groupfilter <column> <value_column> <operator> <value>

One row in each group has to match

column: key column
value_column: Value column
operator: values:=,!=,~,<,<=,>,>=
value:

-before <column> <date>

Pass through rows whose date is before the given date

column:
date:

-after <column> <date>

Pass through rows whose date is after the given date

column:
date:

-countvalue <column> <count>

No more than count unique values

column:
count:

-decimate <rows> <skip>

only include every row

rows:
1. of start rows to include
skip: skip factor

-ifin <column> <file> <column2>

Pass through rows that the columns with ID is in given file

column: Column in the file
file: The file
column2: Column in main file

The -ifin command takes a file of values and will only pass through the row if the value in the row matches at least one of the values in the file. For example, say you have a file precincts.csv:

precinct

And a file voters.csv:

name	voter precinct
jim	p1
joe	p1
sally	p7
...

If you only want voters whose precinct is in the precints.csv you would do: seesv -ifin precinct precincts.csv voter_precinct -p voters.csv The -ifnotin command does the opposite only passing through rows that are not in the file, e.g.: seesv -ifnotin precinct precincts.csv voter_precinct -p voters.csv

-ifnotin <column> <file> <column2>

Pass through rows that the columns with ID is not in given file

column: Column in the file
file: The file
column2: Column in main file

-ifmatchesfile <pattern> <column> <file> <column2>

Pass through rows that the columns with ID begins with something in the given file

pattern: Pattern template, e.g. ^${value}
column: Column in the file
file: The file
column2: Column in main file

The -ifmatchesfile command takes a pattern template and a file of strings. For each line in the file it creates a regular expression pattern using the template by substituting the line into the pattern with the macro ${value}.

For example, say you have a file of addresses short.csv:

address	city	state
123 oak	boulder	co
345 main	denver	co
...

And your source data has a set of full addresses full.csv:

full address	city	state
123 oak unit #5
789 pine apt 3
345 main
...

And you only want to pass through rows in full.csv address begins with some address in the short.csv file. To do this call: seesv -ifmatchesfile "^${value}.*" address short.csv full_address -p fill.csv What this does is construct regexp patterns as:

^123 oak.*
^345 main.*
...

And then only passes through the rows in full.csv that match one of these patterns.

The command -ifnotmatchesfile does the opposite. For a row to pass through none of the patterns can match.

-ifnotmatchesfile <pattern> <file> <column2>

Pass through rows that the columns with ID does not begin with something in the given file

pattern: Pattern template, e.g. ^${value}
file: The file
column2: Column in main file

-skippattern <pattern>

Skip any line that matches the pattern

pattern:

-skiprows <rows>

Skip number of processed rows.

rows: How many rows to skip

-ensurenumeric <columns>

Throw error if non-numeric

columns: Column indices. Can include ranges, e.g. 0-5

Slice and Dice
Add/remove columns, rows, restructure, etc

-columns <columns>

Only include the given columns

columns: Column indices. Can include ranges, e.g. 0-5

-notcolumns <columns>

Don't include given columns

columns: Column indices. Can include ranges, e.g. 0-5

-firstcolumns <columns>

Move columns to beginning

columns: Column indices. Can include ranges, e.g. 0-5

-lastcolumns <columns>

Move columns to end

columns: Column indices. Can include ranges, e.g. 0-5

-columnsbefore <column> <columns>

Move columns before the given column

column: Column to move before
columns: Columns to move

-columnsafter <column> <columns>

Move columns after given column

column: Column to move after
columns: Columns to move

-delete <columns>

Remove the columns

columns: Column indices. Can include ranges, e.g. 0-5

-cut <rows>

Drop rows

rows: One or more rows. -1 to the end. e.g., 0-3,5,10,-1

-include <rows>

Only include specified rows

rows: one or more rows, -1 to the end

-rows_first <columns> <pattern>

Move rows to the top that match the pattern

columns: columns to match on
pattern: Pattern

-rows_last <columns> <pattern>

Move rows to the end of list that match the pattern

columns: columns to match on
pattern: Pattern

-copy <column> <name>

Copy column

column:
name:

-add <names> <values>

Add new columns

names: Comma separated list of new column names
values: Comma separated list of new values

-insert <column> <name> <values>

Insert new column values

column: Column to insert before
name: Name of new column
values: Value to insert. Use ${row} to add the row index

-concat <columns> <delimiter> <name>

Create a new column from the given columns

columns: Column indices. Can include ranges, e.g. 0-5
delimiter:
name: Name of new colums

-concatrows <num_rows>

Concatenate multiple rows into a single row

num_rows: Number of rows

For example, if you have this data:

name	value
joe	5
jill	2
jack	1
jim	7

Running: seesv -concatrows 2 Would give:

name	value	name	value
joe	5	jill	5
jack	1	jim	7

-combine <columns> <delimiter> <column name>

Combine columns with the delimiter. deleting columns

columns: Column indices. Can include ranges, e.g. 0-5
delimiter:
column name: New column name

-combineinplace <columns> <delimiter> <column name>

Combine columns with the delimiter

columns: Column indices. Can include ranges, e.g. 0-5
delimiter:
column name: New column name

-merge <columns> <name> <operator>

Apply operators to columns

columns: Columns to merge
name: New column(s) name
operator: Operator values:average,min,max,count

-split <column> <delimiter> <names>

Split the column

column:
delimiter: What to split on
names: Comma separated new column names

-splat <keycol> <column> <delimiter> <name>

Create a new column from the values in the given column

keycol: Key column
column:
delimiter:
name: new column name

-roll <columns>

Roll columns down into rows

columns: Column indices. Can include ranges, e.g. 0-5

-shift <rows> <column> <count>

Shift columns over by count for given rows

rows: Rows to apply to
column: Column to start at
count:

-slice <columns> <dest> <fill>

Slide columns down and over to append new rows to the bottom

columns: Columns to move
dest: Desc column to move to
fill: Comma separated list of values to fill out the new row

-addcell <row> <column> <value>

Add a new cell at row/column

row:
column:
value:

-deletecell <row> <column>

Delete cell at row/column

row:
column:

-clone <count>

Clone each row N times

count: Number of clonese

-appendrows <skip> <count> <delimiter>

Only include specified rows

skip: How many rows to skip
count: How many rows to merge
delimiter: How many rows to merge

For example, with the below data:

category	category	category
fruit	vegetable	meat
1	2	3
4	5	6

Running: seesv -appendrows 0 1 " - " Would give:

category - fruit	category - vegetable	category - meat
1	2	3
4	5	6

-mergerows <rows> <delimiter> <close>

Merge rows

rows: 2 or more rows
delimiter:
close:

-rowop <keys> <values> <operator>

Apply an operator to columns and merge rows

keys: Key columns
values: Value columns
operator: Operator values:average,min,max,count

-rotate

Rotate the data

-flip

Reverse the order of the rows except the header

-makefields <column> <value columns> <unique column> <other columns>

Make new columns from data values

column: Column to get new column header#
value columns: Columns to get values from
unique column: The unique value, e.g. date
other columns: Other columns to include

-melt <columns> <header label> <value label>

Use values in header to make new row

columns: Column indices. Can include ranges, e.g. 0-5
header label:
value label:

-explode <column>

Make separate files based on value of column

column:

-join <key columns> <value_columns> <file> <source_columns> <default_value>

Join the 2 files together

key columns: key columns the file to join with
value_columns: value columns
file: File to join with
source_columns: source key columns
default_value: default value - can be a comma separated list of defaults

-fuzzyjoin <threshold> <key columns> <value_columns> <file> <source_columns> <default_value>

Join the 2 files together using fuzzy matching logic

threshold: Score threshold 0-100. Default:85. Higher number better match
key columns: Numeric column numbers of the file to join with
value_columns: numeric columns of the values to join
file: File to join with
source_columns: source key columns
default_value: default value

-cross <file>

Make a cross product of 2 data files

file: File to cross with

With the following data: letters.csv:

letter

number

The command: seesv -cross numbers.csv -p letters.csv Would give:

letter	number
a	1
a	2
a	3
b	1
b	2
b	3
c	1
c	2
c	3
d	1
d	2
d	3

-normal <columns>

Normalize the strings

columns: Columns

-countunique <columns>

Count number of unique values

columns: Column indices. Can include ranges, e.g. 0-5

-dissect <column> <pattern>

Make fields based on patterns

column:
pattern: e.g., "(field1:.*) (field2:.*) ..."

-keyvalue <column>

Make fields from key/value pairs, e.g. name1=value1 name2=value2 ...

column:

-firstchars <column> <name> <number>

Extract first N characters and create new column

column:
name: New column name
number: Number of characters

-lastchars <column> <name> <number>

Extract last N characters and create new column

column:
name: New column name
number: Number of characters

-between_indices <column> <name> <start> <end>

Extract characters between the 2 indices

column:
name: New column name
start: Start index
end: End index

-fromheading <columns> <names> <pattern>

Extract column values from headings

columns: Columns of headings
names: Comma separated list of new column names
pattern: Regexp to apply to header with () defining column values

Change

-change <columns> <pattern> <substitution string>

Change columns

columns: Column indices. Can include ranges, e.g. 0-5
pattern: Regexp pattern. Help
substitution string: use $1, $2, etc for pattern (...) matches

For example, with the below data:

label

xxxsome valueyyy

You can strip out the xxx...yyy and add a label: prefix with: seesv -change label "xxx(.*)yyy" "label:$1" Which would give:

label

label:some value

You can also specify a file that holds a number of patterns: seesv -change columns file:file.txt "" The file is of the form:

pattern1::replace_string1
pattern2::replace_string2

e.g.:

(?i).*foo.*::foobar
.*value.*::new value

-changerow <rows> <columns> <pattern> <substitution string>

Change the values in the row/cols

rows: Row indices. Can include ranges, e.g. 0-5
columns: Column indices. Can include ranges, e.g. 0-5
pattern: Regexp pattern. Help
substitution string: use $1, $2, etc for pattern (...) matches

-replace <columns> <substitution string>

Replace

columns: Column indices. Can include ranges, e.g. 0-5
substitution string: use $1, $2, etc for pattern (...) matches
use {value} for value

-set <columns> <rows> <value>

Write the value into the cells

columns: Column indices. Can include ranges, e.g. 0-5
rows: Row indices. Can include ranges, e.g. 0-5
value:

-cleanwhitespace <columns>

Clean whitespace

columns: Column indices. Can include ranges, e.g. 0-5

-macro <pattern> <template> <column label>

Look for the pattern in the header and apply the template to make a new column, template: '{1} {2} ...', use 'none' for column name for no header

pattern: Regexp pattern. Help
template:
column label:

-setcol <column> <pattern> <write column> <value>

Write the value into the write col for rows that match the pattern

column: match col #
pattern: Regexp pattern. Help
write column:
value:

-copyif <columns> <pattern> <column1> <column2>

Copy column 2 to column 3 if all of the columns match the pattern

columns: Column indices. Can include ranges, e.g. 0-5
pattern: Regexp pattern. Help
column1:
column2:

-copycolumns <columns1> <columns2>

Copy columns 1 to columns 2

columns1:
columns2:

-filldown <columns>

Fill down with last non-null value

columns: Column indices. Can include ranges, e.g. 0-5

-fillacross <columns> <rows>

Fill across with last non-null value

columns: Column indices. Can include ranges, e.g. 0-5
rows: One or more rows. -1 to the end. e.g., 0-3,5,10,-1

-unfill <columns>

Set following cells to blank if the same as previous cell

columns: Column indices. Can include ranges, e.g. 0-5

For example, given the following data

name	value1	value2
jane	foo	1
jane	bar	2
john	doe	3
bill	doe	4
bill	doe	4

Calling: seesv -unfill name,value1 Will give:

name	value1	value2
jane	foo	1
<blank>	bar	2
john	doe	3
bill	<blank>	4
<blank>	<blank>	4

-priorprefix <column> <pattern> <delimiter>

Append prefix from the previous element to rows that match pattern

column:
pattern: Regexp pattern. Help
delimiter:

-case <column> <type>

Change case of column - type:lower,upper,proper,capitalize

column:
type: values:lower,upper,proper,capitalize

-toid <columns>

Convert the column(s) into IDS (lowercase, no space, a-z0-9_)

columns: Column indices. Can include ranges, e.g. 0-5

-padleft <columns> <character> <length>

Pad left with given character

columns: Column indices. Can include ranges, e.g. 0-5
character: Character to pad to
length: Length

-padright <columns> <character> <length>

Pad right with given character

columns: Column indices. Can include ranges, e.g. 0-5
character: Character to pad to
length: Length

-trim <columns>

Trim leading and trailing white space

columns: Column indices. Can include ranges, e.g. 0-5

-trimquotes <columns>

Trim leading and trailing quotes

columns: Column indices. Can include ranges, e.g. 0-5

-width <columns> <size>

Limit the string size of the columns

columns: Column indices. Can include ranges, e.g. 0-5
size:

-prepend <text>

Add the text to the beginning of the file. use _nl_ to insert newlines

text:

-pad <count> <pad string>

Add or remove columns to achieve the count

count:
pad string:

-prefix <column> <prefix>

Add prefix to column

column:
prefix: String to use

-suffix <column> <suffix>

Add suffix to column

column:
suffix:

-subst <column_name> <template>

Create a new column with the template

column_name: New Column Name
template: Template - use ${column_name} ...

-ascii <columns> <substitution string>

Convert non ascii characters

columns: Column indices. Can include ranges, e.g. 0-5
substitution string: use $1, $2, etc for pattern (...) matches

-cleanphone <columns>

Clean the phone number

columns: Column indices. Can include ranges, e.g. 0-5

-formatphone <columns>

Format the phone number

columns: Column indices. Can include ranges, e.g. 0-5

-ismobile <columns>

Add a true/false if the string is a mobile phone

columns: Column indices. Can include ranges, e.g. 0-5

The -ismobile command uses one of two external APIs - https://numverify.com/ or https://numlookupapi.com. After making an account with either of these services set the corresponding environment variables:

export NUMLOOKUPAPI_API_KEY=
or
export NUMVERIFY_API_KEY=

This command is only available when running from the command line or if running within RAMADDA the user is a logged in user. We do this as the SeeSV service within RAMADDA can be run by any user.

A cache file, ismobile.txt, of past phone numbers will be created in the local working directory so this command can be run multiple times without resorting to the API calls for past phone numbers.

-sms <column> <campaign> <message>

Send a text message - only for command line

column: Phone number
campaign: Campaign
message: Message template

The -sms command uses https://www.twilio.com/ to send text messages. This command is only available when running from the command line or if running within RAMADDA the user is a logged in user. We do this as the SeeSV service within RAMADDA can be run by any user.

To enable this command create an account at Twilio and access your API keys. Add the following environment variables:

export TWILIO_ACCOUNT_SID=
export TWILIO_AUTH_TOKEN=
export TWILIO_PHONE=

This command takes a column ID which holds the phone number, a campaign identifier and a message template. The campaign identifier is used to create a list of phone numbers that a message has already been sent to. For example if you first run: seesv -sms phone some_campaign "Hi there"The SeeSV package will write out to a file <campaign>.sent.txt the list of phone numbers that the message was sent to. If the command is interrupted then rerun with the same campaign ID then those numbers that have had a message sent to will not have the message sent again.

-js <javascript>

Define Javascript (e.g., functions) to use later in the -func call

javascript:

-func <names> <javascript>

Apply the javascript function. Use _colname or _col#

names: New column names
javascript: javascript expression

-endswith <column> <string>

Ensure that each column ends with the string

column:
string:

-truncate <columns> <max length> <suffix>

columns: Column indices. Can include ranges, e.g. 0-5
max length:
suffix:

-extract <column> <pattern> <replace with> <name>

Extract text from column and make a new column

column:
pattern: Regexp pattern. Help
replace with: use 'none' for no replacement
name: new column name

-urlarg <column> <argname>

Extract URL argument and make a new column

column:
argname: URL arg name

-extracthtml <column> <names> <pattern>

Extract text from HTML

column: URL Column
names: Comma separated list of new column names
pattern: Regexp pattern. Help

-htmlinfo <column>

Extract icon and description from input URL

column: URL Column

-checkmissing <column> <replace_with>

Check for missing URL

column: URL Column
replace_with: Replace with

-xmlencode <columns>

Encode the value for XML

columns: Column indices. Can include ranges, e.g. 0-5

-urlencode <columns>

URL encode the columns

columns: Column indices. Can include ranges, e.g. 0-5

-urldecode <columns>

URL decode the columns

columns: Column indices. Can include ranges, e.g. 0-5

-map <column> <new column name> <value>

Change values in column to new values

column:
new column name:
value: old_value new_value old_value new_value

-format <columns> <format>

Apply decimal format to the columns (see https://docs.oracle.com/javase/7/docs/api/java/text/DecimalFormat.html)

columns: Column indices. Can include ranges, e.g. 0-5
format: Decimal format e.g. '##0.00'

-denormalize <file> <from id idx> <from value idx> <to idx> <new col name> <mode replace add>

Read the id,value from file and substitute the value in the dest file col idx

file: From csv file
from id idx:
from value idx:
to idx:
new col name:
mode replace add:

-break <label1> <label2> <columns>

Break apart column values and make new rows

label1:
label2:
columns: Column indices. Can include ranges, e.g. 0-5

-parseemail <columns>

Parse out name and email

columns: Column indices. Can include ranges, e.g. 0-5

-makeids

Turn the header row into IDs (lowercase, no space, a-z0-9_)
For example, if you had the below data:

Some header label with some punctuation-+/	Some other header label
...

Running: seesv -makeids Gives

some_header_label_with_some_punctuation	some_other_header_label
...

-faker <what> <columns>

Fake up data. See the docs at https://ramadda.org/repository/userguide/seesv.html#-faker

what: firstname|lastname|fullname|etc
columns: Columns to change. If none given then add the fake value

The -faker command uses the Java Faker package and takes any of the below commands:

firstname
fullname
lastname
name
namewithmiddle
prefix
suffix
title
username
address
city
country
state
stateabbr
streetname
timezone
zipcode
latitude
longitude
countrycode
boolean
asin
ean13
ean8
gtin13
gtin8
imei
isbn10
isbn13
isbngroup
isbngs1
isbnregistrant
color
department
material
price
productname
promotioncode
demonym
educationalattainment
maritalstatus
race
sex
bic
creditcard
iban
ssn
digit
digits:number_of_digits
numberbetween:first:last
randomdigit
randomdigitnotzero
randomDouble:maxNumberOfDecimals:min:max
randomnumber
cellphone
phonenumber
diseasename
hospitalname
medicinename
symptoms

-edit <column>

Hand edit a column (command line only). ESC-stop, BLANK-skip

column: key column

The edit command is used from the command line and will prompt the user for a new column value. If the value has already been seen then the prior new value is used. Enter <blank> to skip the given row. Enter ESCAPE to stop prompting for new values.

Values

-md <columns> <type>

Make a message digest of the column values

columns: Column indices. Can include ranges, e.g. 0-5
type: values:MD5,SHA-1,SHA-256,SHA-512,SHA3-256,SHA3-512,

-tob64 <columns>

Base 64 Encode

columns: Column indices. Can include ranges, e.g. 0-5

-fromb64 <columns>

Base 64 Decode

columns: Column indices. Can include ranges, e.g. 0-5

-rot13 <columns>

Rot 13

columns: Column indices. Can include ranges, e.g. 0-5

-encrypt <columns> <password>

Encrypt using AES with SHA-256 key

columns: Column indices. Can include ranges, e.g. 0-5
password:

-decrypt <columns> <password>

Encrypt using AES with SHA-256 key

columns: Column indices. Can include ranges, e.g. 0-5
password:

-uuid

Add a UUID field

-number

Add 1,2,3... as column

-letter

Add 'A','B', ... as column

-soundex <columns>

Generate a soundex code

columns: Column indices. Can include ranges, e.g. 0-5

-wikidesc <columns> <suffix>

Add a description from Wikipedia

columns: Search string columns
suffix: Text to add after

This uses the API at https://en.wikipedia.org/w/api.php to get the description of a wikipedia page that matches the string provided by the given columns.

-image <columns> <suffix>

Do a Bing image Search for an image

columns: Column indices. Can include ranges, e.g. 0-5
suffix: Text to add after

-embed <url column>

Download the URL and embed the image contents

url column:

-fetch <name> <ignore_errors> <url>

Fetch the URL and embed the contents

name: Name of new column
ignore_errors: Ignore Errors e.g., true or false
url: URL template, e.g., https://foo.com/${column_name}

-imagefill <querycolumn> <suffix> <imagecolumn>

Search for an image with the query column text if the given image column is blank. Add the given suffix to the search.

querycolumn:
suffix:
imagecolumn:

-download <column> <suffix>

Download the URL

column: Column that holds the URL
suffix: File suffix

-gender <columns>

Figure out the gender of the name in the column

columns: Column indices. Can include ranges, e.g. 0-5

Dates

-indateformats <format> <timezone>

Specify one or more date formats for parsing

format: e.g. yyyy-MM-dd HH:mm:ss. Use semi-colon separated formats for multiples. Help
timezone:

-outdateformat <format> <timezone>

Specify date format for formatting

format: e.g. yyyy-MM-dd HH:mm:ss. Help
timezone:

-convertdate <columns>

Convert date

columns: Columns to convert

-adddate <date_column> <value> <value_type>

Add date

date_column: Date Column
value: Value Column
value_type: Value type - millisecond,second,minute,hour,hour_of_day,week,month,year

-cleardate <date_column> <component>

Clear date components

date_column: Date Column
component: Date component values:millisecond,second,minute,hour_of_day,day_of_month,month

-extractdate <date column> <what>

Extract date

date column:
what: What to extract, e.g., year, month, day_of_week, etc values:era,year,month,day_of_month,day_of_week,week_of_month, day_of_week_in_month,am_pm,hour,hour_of_day, minute,second,millisecond,days_in_year, hours_in_year, minutes_in_year,seconds_in_year

-formatdate <columns>

Format date

columns: Column indices. Can include ranges, e.g. 0-5

-formatdateoffset <column> <what>

Format the date offset, e.g. the hours in year

column:
what: What type of offset, e.g., year, month, day_of_week, etc values:days_in_year, hours_in_year, minutes_in_year,seconds_in_year

-elapsed <column>

Calculate elapsed time (ms) between rows

column:

-msto <column> <to>

Convert milliseconds to

column:
to: seconds|hours|days|weeks|months|years

-latest <columns> <column> <format>

Pass through rows whose date is the latest in the group of rows defined by the key column

columns: Key columns
column: Date column
format: Date Format, e.g. yyyy-MM-dd

-datediff <column1> <column2> <unit>

Calculate elapsed time between columns column1-column2

column1: Column 1
column2: Column 2
unit: Unit-milliseconds,seconds,minutes,hours,days values:milliseconds,seconds,minutes,hours,days

-datecompare <column1> <column2> <operator>

add a true/false column comparing the date values

column1:
column2:
operator: <,<=,=,!=,>=,> values:<,<=,=,!=,>=,>

Numeric

-scale <columns> <delta1> <scale> <delta2>

Set value={value+delta1}*scale+delta2

columns: Column indices. Can include ranges, e.g. 0-5
delta1:
scale:
delta2:

The -scale command adds offset and seesv -scale <column id> <offset1> <scale> <offset2>Where the result value, v2:

  v2 = (value+offset1)*scale + offset

You can specify random values for any of the values with:

#Generate a random number between 0-1
random:

#Generate a random number between 10 and 11
random:10

#Generate a random number between 10 and 30
random:10:30

For example, if you had a latitude value that you wanted to obfuscate by adding some random value between 1 and 2 you would do:

  -scale latitude random:1:2 1 0

-makenumber <columns>

Try to parse as number

columns: Column indices. Can include ranges, e.g. 0-5

-generate <label> <start> <step>

Add row values

label:
start:
step:

-decimals <columns> <num_decimals>

Round decimals

columns: Column indices. Can include ranges, e.g. 0-5
num_decimals: how many decimals to round to

-fuzz <columns> <num_places> <num_random_digits>

fuzz the number. if num_places less than zero than that is the # of decimals. else that is the lower digits to fuzz out

columns: Column indices. Can include ranges, e.g. 0-5
num_places: how many places to round to. use <=0 for decimals
num_random_digits: how many random digits

The -fuzz command works to anonymize numeric values.

The num_random_digits is used to generate a number with those random digits.

If the num_places argument is <= zero then that is the number of decimal places the number is rounded to. If there are fewer than num_places decimals than those decimals are converted to "0". The randomized digits are then appended to the number. For example with R = random digit: seesv -fuzz <columns> -2 3 Gives:

12345.6789 -> 12345.67RR
145.678 -> 145.67RR
9.0 -> 9.RR

If num_places > 0 then the number is converted to an integer and the num_places least significant digits are converted to 0. If the number of digits is < num_places then the number is converted to 0. The randomized digits are then addedto the number. For example with R = random digit: seesv -fuzz 2 3 Gives:

12345.678 -> 123RR
145.678 -> 1RR
1.678 -> RR

-ceil <columns> <value>

Set the max value

columns: Column indices. Can include ranges, e.g. 0-5
value: Value

-floor <columns> <value>

Set the min value

columns: Column indices. Can include ranges, e.g. 0-5
value: Value

-delta <key columns> <columns>

Add column that is the delta from the previous step

key columns:
columns: Column indices. Can include ranges, e.g. 0-5

-runningsum <columns>

Make a running sum of the column values

columns: Column indices. Can include ranges, e.g. 0-5

-trendcounter <column> <name>

Make counter field that is incremented everytime the value column decreases

column: The value column
name: Name of counter column

-operator <columns> <new col name> <operator>

Apply the operator to the given columns and create new one

columns: Columns
new col name:
operator: Operator:+,-,*,/,%,average values:+,-,*,/,%,average

-compare <column1> <column2> <operator>

Add a true/false column comparing the values

column1:
column2:
operator: <,<=,=,!=,>=,> values:<,<=,=,!=,>=,>

-round <columns>

Round the values

columns: Column indices. Can include ranges, e.g. 0-5

-abs <columns>

Make absolute values

columns: Column indices. Can include ranges, e.g. 0-5

-clip <columns> <min> <max>

Clip the number to within the range

columns: Column indices. Can include ranges, e.g. 0-5
min:
max:

-rand <column name> <minrange> <maxrange>

make random value

column name:
minrange: Minimum range (e.g. 0)
maxrange: Maximum range (e.g. 1)

-even <columns>

Add true if the column starts with an even number

columns: Column indices. Can include ranges, e.g. 0-5

-sum <key columns> <value columns> <carry over columns>

Sum values keying on key column value. If no value columns specified then do a count

key columns:
value columns:
carry over columns:

-pivot <key columns> <column columns> <value column> <operator>

Make a pivot table

key columns: Columns to key on
column columns: The columns the values of which are used to make the new columns in the result
value column: The value column
operator: The operator to apply - count,sum,average,min,max

For example, with the input file:

Region	Date	Units	Sales
West	2016	1	11.00
South	2016	8	96.00
West	2016	2	26.00
North	2016	7	84.00
North	2016	8	104.00
South	2016	2	22.00

Running: seesv -pivot region date sales count -p input.csv Gives:

Region	2016	2017	2018	2019
West	15.0	51.0	49.0	42.0
South	25.0	23.0	31.0	14.0
North	15.0	27.0	56.0	39.0
East	10.0	24.0	18.0	13.0

You can also specify multiple operators: seesv -pivot region date sales count,min,max -p input.csv Gives:

Region	2016 - count	2016 - min	2016 - max	2017 - count	2017 - min	2017 - max	2018 - count	2018 - min	2018 - max	2019 - count	2019 - min	2019 - max
West	15.0	11.0	96.0	51.0	11.0	104.0	49.0	11.0	112.0	42.0	13.0	120.0
South	25.0	12.0	96.0	23.0	11.0	104.0	31.0	11.0	120.0	14.0	12.0	112.0
North	15.0	12.0	104.0	27.0	12.0	96.0	56.0	11.0	120.0	39.0	13.0	120.0
East	10.0	24.0	96.0	24.0	11.0	104.0	18.0	11.0	112.0	13.0	12.0	105.0

-summary <key columns> <value columns> <carry over columns> <ops>

count/sum/average/min/max values keying on key column value. If no value columns specified then do a count

key columns: Columns to key on
value columns: Columns to apply operators on
carry over columns: Extra columns to include
ops: any of count,sum,average,min,max

For example, with the below data, if you wanted to summarize the total units by region:

Region	Color	Units	Sales
West	Red	1	11.00
South	Blue	8	96.00
West	Green	2	26.00
North	Blue	7	84.00
North	Green	8	104.00
South	Red	2	22.00
East	Blue	5	60.00
West	Green	2	26.00

You would do: seesv -summary region units "" sum -p summary.csv Which gives:

Region	Units sum
West	683.0
South	399.0
North	672.0
East	294.0

If you wanted to get the sum and average of the sales: seesv -summary region sales "" sum,avg -p summary.csv You'd get:

Region	Sales sum	Sales avg
West	8873.0	56.51592356687898
South	5044.0	54.236559139784944
North	8709.0	63.56934306569343
East	3730.0	57.38461538461539

The carry over column just gets the first value for the key column. So, if you wanted to include the color column do: seesv -summary region sales color sum,avg -p summary.csv Which gives:

Region	Sales sum	Sales avg	Color
West	8873.0	56.51592356687898	Red
South	5044.0	54.236559139784944	Blue
North	8709.0	63.56934306569343	Blue
East	3730.0	57.38461538461539	Blue

-histogram <column> <bins> <value columns> <ops>

Make a histogram with the given column and bins

column: The column
bins: Comma separated set of bin values
value columns: Extra columns to sum up
ops: ops to apply to extra columns - any of count,sum,average,min,max

The bins value is a comma separated list of the numeric ranges that define how to bin the data. e.g., for a set of age ranges one can specify:

bins="18,30,40,50,60,70"

This results in bins of:

0-18
18-30
30-40
40-50
etc.

You can also specify a range with an optional step value. For example this:

bins="18-30"

Will give you the bins:

18-19
19-20
20-21
...
29-30
30-infinity

Or you can specify a step

bins="18-30:2"

Will give you the bins:

18-20
20-22
22-24
...

-percent <columns>

Add columns together. Replace with their percentage

columns: Columns to add

-increase <columns> <how far back>

Calculate percent increase

columns: Column indices. Can include ranges, e.g. 0-5
how far back:

-diff <columns> <how far back (default 1)>

Difference from previous value

columns: Column indices. Can include ranges, e.g. 0-5
how far back (default 1):

-average <columns> <period> <label>

Calculate a moving average

columns: Columns
period:
label:

-ranges <columns> <name> <start> <size>

Create a new column with the (string) ranges where the value falls in

columns: Column indices. Can include ranges, e.g. 0-5
name: New column name
start: Numeric start of range
size: Numeric size of range

-bytes <unit> <columns>

Convert suffixed values (e.g., 2 MB) into the number

unit: values:binary,metric
columns: Column indices. Can include ranges, e.g. 0-5

-column_and <name> <columns>

And values

name: New column name
columns: Column indices. Can include ranges, e.g. 0-5

-colum_nor <name> <columns>

Or values

name: New column name
columns: Column indices. Can include ranges, e.g. 0-5

-column_not <name> <column>

Not value

name: New column name
column:

-check <columns> <what>

Check that the values are numbers

columns: Column indices. Can include ranges, e.g. 0-5
what: How strict values:strict,ramadda

Geospatial

-geocode <columns> <prefix> <suffix>

Geocode using given columns

columns: Address columns
prefix: optional prefix e.g., state: or county: or country:
suffix:

You can use Google or geocode.io for geocoding by setting the following environment variables. Else, this defaults to using US Census geocode API.

To use Google geocoding set the environment variable:

GOOGLE_API_KEY=...

To use geocod.io set the environment variable:

GEOCIDEIO_API_KEY=...

For here.com set:

HERE_API_KEY=

To geocode, e.g., with the following file:

city	state
Boulder	CO
Laramie	WY
Buffalo	NY

Run: seesv -geocode city,state "" "" ... This concatenates the city and state field, e.g., "Boulder CO" and calls the API to geolocate.

To use internal location tables set the prefix argument to one of: seesv -geocode columns "city:" "" ...
seesv -geocode columns "county:" "" ...
seesv -geocode columns "state:" "" ...
seesv -geocode columns "zip:" "" ...
seesv -geocode columns "country:" "" ...
The suffix gets appended to the query string. For example if you had a file with just cities in a state, e.g.:

city

Boulder

Denver

Grand Junction

Pueblo

You can geocode with: seesv -geocode city "" "CO"

-geocodeifneeded <columns> <prefix> <suffix> <latitude> <longitude>

Geocode if needed

columns: Address columns
prefix: optional prefix e.g., state: or county: or country:
suffix:
latitude: latitude column
longitude: longitude column

-geocodeaddressdb <columns> <prefix> <suffix>

Geocode for import into RAMADDA's DB. The lat/lon is one semi-colon delimited column

columns: columns
prefix: optional prefix e.g., state: or county: or country:
suffix:

-geocodejoin <column> <csv file> <key idx> <lat idx> <lon idx>

Geocode with file

column: key column
csv file: File to get lat/lon from
key idx:
lat idx:
lon idx:

This assumes you have a separate file that contains a key value and latitude/longitude and the source file has a key value. For example, you might have a list of precincts, e.g.:

precinct	latitude	longitude
4171107648	40.197055	-105.100351
2171207308	39.996986	-105.084938
2173307410	40.039912	-105.081393

The source file is a list of voters, e.g.:

name	precinct
John Doe	4171107648
Jane Doe	4171107648
Jim Smith	2171207308

To add the lat/lon: seesv -geocodejoin precinct precincts.csv precinct latitude longitude

-bounds <north> <west> <south> <east>

Geocode within bounds

north:
west:
south:
east:

-decodelatlon <columns>

Decode latlon

columns: Lat or Lon column

This converts a column that is of the form degrees minutes seconds into it decimal degrees equivalent. For example, with this data:

latitude	longitude
40° 1' 12 N	107° 40' 10 W
40:1:12	107:40:10
40:1:12 N	107:40:10 W
40:1	107:40

Running the commands: seesv -decodelatlon latitude -decodelatlon longitude Gives:

latitude	longitude
40.0	-107.0
40.019999999999996	107.66944444444445
40.019999999999996	-107.66944444444445
40.016666666666666	107.66666666666667

If you wanted to have less precision then call: seesv -decodelatlon latitude -decodelatlon longitude -decimals latitude,longitude 5 -p test.csv Which gives:

latitude	longitude
40.0	-107.0
40.02	107.66944
40.02	-107.66944
40.01667	107.66667

-getaddress <latitude> <latitude>

Get address from lat/lon

latitude: latitude column
latitude: latitude column

This function takes a latitude and longitude column and does an address lookup using either geocod.io or here.com apis. See above for the API keys that need to be set.

-statename <state_column>

Add state name from state ID

state_column: State ID column

-geoname <lookup> <fields> <latitude> <longitude>

Look up location name

lookup: ('counties' or 'states' or 'countries' or 'timezones')
fields: fields in shapefile
latitude: Latitude column
longitude: Longitude column

This uses one of the shapefiles provided by RAMADDA and finds the name of the feature that contains the given lat/lon. For example if you had the following data file:

name	latitude	longitude
City 1	40.197055	-105.100351
City 2	39.996986	-105.084938
City 3	40.039912	-105.081393

You can add the country and timezone with: seesv -geoname countries latitude longitude -geoname timezones latitude longitude

-geocontains <lookup> <name> <latitude> <longitude>

Check for containment

lookup: ('counties' or 'states' or 'countries' or 'timezones')
name: new column name
latitude: Latitude column
longitude: Longitude column

-elevation <latitude> <longitude>

Look up elevation(using 1/3 arc-second DEM)

latitude: Latitude column
longitude: Longitude column

This uses the web service at https://nationalmap.gov/epqs/pqs.php to look up the elevation for the given lat/lon. For example if you had the following data file:

name	latitude	longitude
City 1	40.197055	-105.100351
City 2	39.996986	-105.084938
City 3	40.039912	-105.081393

You can add the elevation with: seesv -geoname countries latitude longitude -geoname timezones latitude longitude

-mercator <columns>

Convert x/y to lon/lat

columns: x and y columns

-region <columns>

Add the state's region

columns: Columns with state name or abbrev.

This does a lookup from an internal table of the regions that maps state name (or abbrev) to region:

Alaska=West
AK=West
Alabama=South
AL=South
Arkansas=South
...

So with the following data:

name
Wyoming
California
Colorado

The region is added with: seesv -region name

-population <columns> <prefix> <suffix>

Add in population from address

columns: Column indices. Can include ranges, e.g. 0-5
prefix: e.g., state: or county: or city:
suffix:

-neighborhood <latitude> <longitude> <default>

Look up neighborhood for a given location

latitude: Latitude column
longitude: Longitude column
default: Default value

This command uses the API from https://www.precisely.com/ to look up the neighborhood name from a given lat/lon. To use this get and API key from Precisely and set the environment variable:

PRECISELY_API_KEY=

Call: seesv -neighborhood latitude_column longitude_column

Misc

-apply <columns> <commands>

Apply the commands to each of the columns

columns: Columns to expand with
commands: Commands. Use the macro ${column}. End with -endapply

-sortby <columns> <direction> <how>

columns: Column to sort on
direction: Direction - up or down values:up,down
how: How to sort - string, length, date, extract (number) values:string,number,length,date,extract

-count

Show count

-alias <name> <alias>

Set a field alias

name: Name
alias: Alias

The alias command can be used to make your commands easier to deal with if you have long field names. For example, say you have data like:

some_really_long_field_name

You can use the alias command to provide a short name, e.g.: seesv -alias some_really_long_field_name name -columns name ...

-value <name> <value>

Define a macro value for later use

name: name
value: Value

-filepattern <name> <pattern>

Extract a macro value from a filename

name: Macro name
pattern: Regexp pattern. Help

-changeline <from> <to>

Change the line

from: From pattern
to: To string

-changeraw <from> <to>

Change input text

from: From pattern
to: To string

-crop <columns> <patterns>

Crop last part of string after any of the patterns

columns: Column indices. Can include ranges, e.g. 0-5
patterns: Comma separated list of patterns

-strict

Be strict on columns. any rows that are not the size of the other rows are dropped

-flag

Be strict on columns. any rows that are not the size of the other rows are shown

-prop <property> <value>

Set a property

property: values:position
value: start, end, etc

-goeasy

Go easy on missing columns

-verify

Verify that all of the rows have the same # of columns

-ext <program_id>

Execute the external program

program_id: matches with seesv.ext.<program_id>=/path

The -ext command allows one to integrate external programs into the processing flow. seesv -ext <id> <arguments> The id is used to look up an environment variable (or RAMADDA runtime property) of the form:

seesv_ext_<id>=/path/to/executable

Any command line arguments given up to but not including an argument that equals "-" are passed in to the executable. The executable is called with the given arguments. Each row that is processed is converted into a CSV string which is passed to the external command on the stdin. A CSV text string is then read from the external command. This line is then passed on in the processing flow. If you want to skip a line the return the string "_null_".

-exec <program_id>

Execute the external program for every line

program_id: matches with seesv.ext.<program_id>=/path

The -exec command allows one to integrate external programs into the processing flow. seesv -exec <id> <arguments> The id is used to look up an environment variable (or RAMADDA runtime property) of the form:

seesv_exec_<id>=/path/to/executable

Any command line arguments given up to but not including an argument that equals "-" are passed in to the executable. For every row the executable is called with the given arguments. Each of the arguments can contain any number of macros of the form:

seesv  -exec <id> ${url}  ${name}

The macros get replaced with the corresponding values in the given row.

So for example, if you have a list of urls that you want to fetch using cURL for set the environment variable:

export seesv_exec_curl=/usr/bin/curl

Then assuming you have the file:

url
url1.html
url2.html

Call the command

seesv  -exec curl ${url}   - -p

This produces a new file:

url,result
url1.html,contents of url1
url2.html,contents of url2

Output

Commands:

-print: Print text output
-printdelim: Print with delimited output
-o: Write to the given file (command line only)
-comment: Add a comment to the output
-outputprefix: Specify text to add to the beginning of the file
-highlight: Highlight the columns
-background: Background the columns
-printheader: Print header
-raw: Print the file raw
-table: Print HTML table and stats
-cols: Set the width of the columns for output. Use with -p
-stats: Print summary stats
-torecord: Print records
-script: Generate script
-toxml: Generate XML
-tojson: Generate JSON
-togeojson: Generate GeoJSON
-tourl: Generate DB publish urls
-todb: Write to Database
-template: Apply the template to make the output
-subd: Subdivide into different files
-maptiles: Tile the data on lat/lon
-chunk: Make a number of output files with a max number of rows
-addheader: Add the RAMADDA point properties
-db: Generate the RAMADDA db xml from the header. See Help
-dbprops: Print to stdout props for db generation
-write: Write the contents of a row to a named file
-fields: Print the fields
-run:
-progress: Show progress
-debugrows: Debug # rows
-pointheader: Generate the RAMADDA point properties
-args: Generate the CSV file commands
-typexml: Generate the RAMADDA type xml

-print

Print text output

-printdelim <delimiter>

Print with delimited output

delimiter: Delimiter - ,|^ etc. Use "tab" for tab

-o <file>

Write to the given file (command line only)

file: The file

-comment <comment>

Add a comment to the output

comment: The comment

-outputprefix <text>

Specify text to add to the beginning of the file

text: The text. Use '_nl_' to add a new line. Use '_bom_' to write out the byte order mark.

-highlight <columns> <color>

Highlight the columns

columns: Column indices. Can include ranges, e.g. 0-5
color: Color values:red,green,yellow,blue,purple,cyan

-background <color>

Background the columns

color: Color values:red,green,yellow,blue,purple,cyan

-printheader

Print header

-raw

Print the file raw

-table

Print HTML table and stats

-cols <width>

Set the width of the columns for output. Use with -p

width: Column width

-stats

Print summary stats

-torecord

Print records

-script

Generate script

-toxml <outer tag> <inner tag>

Generate XML

outer tag:
inner tag:

-tojson <key index>

Generate JSON

key index: If defined use this as a map

-togeojson <latitude> <longitude> <columns>

Generate GeoJSON

latitude: latitude column
longitude: longitude column
columns: property columns - use * for all

-tourl

Generate DB publish urls

-todb <db id> <table> <columns> <properties>

Write to Database

db id:
table: table name
columns: database columns
properties: name value properties

-template <prefix> <template> <row_delimiter> <suffix>

Apply the template to make the output

prefix:
template: Use ${column_name} or indices: ${0},${1}, etc for values
row_delimiter: Output between rows
suffix:

For example, with this data:

name	value
joe	5
jill	2
bill	7

Calling: seesv -template "PREFIX\n" "name:\${name} value:\${value}" "\nROW DELIMITER\n" "\nSUFFIX\n" test.csvResults in:

PREFIX
name:joe value:5
ROW DELIMITER
name:jill value:2
ROW DELIMITER
name:bill value:7
SUFFIX

Note: the row delimiter is only printed between rows.

If you wanted to generated custom XML, for example, call: seesv -template "<mytags>\n" "<tag name=\"\${name}\" value=\"\${value}\"></tag>" "\n" "\n</mytags>\n" test.csvWhich results in:

<mytags>
<tag name="joe" value="5"></tag>
<tag name="jill" value="2"></tag>
<tag name="bill" value="7"></tag>
</mytags>

-subd <columns> <ranges> <output_template>

Subdivide into different files

columns: columns to subdivide on
ranges: Comma separated ranges min1;max1;step1,min2;max2;step2
output_template: Output template - use ${ikey} or ${vkey}, e.g., grid${ikey}.csv

The -subd command takes the input and, based on the values of the specified columns, subdivides the input rows into one or more output files based on the values of the given columns.

The format is seesv -subd col1,col2,...,colN "min1;max1;number1,min2;max2;number2,...,minN;maxN;numberN" "output_file_macro${ikey}.csv

The min1;max1;number1 are the min/max range and the number of steps the range is divided by for each column value.

The output file is a template to use to create the destination output file. You can use "${ikey}" for the indices or ${vkey} for the values.

For example, you can use this to do geographic tiling of a set of input data that contains latitude and longitude. If your input.csv file has latitude and longitude columns the below command: seesv -subd latitude,longitude "-90;90;180,-180;180;360" "grid\${ikey}.csv" input.csv says to subdivide the latitude values between -90 and 90 with 180 steps - i.e. 1 degree spacing. The longitude is divided up between -180 and 180 with 360 steps - again, 1 degree spacing. The index of a latitude value is the index between -90 and 90. Likewise the index of the longitude value is between -180 and 180. The ${ikey} for the filename is latitudeindex_longitudeindex. So the above command produces:

grid116_102.csv
grid116_103.csv
grid119_84.csv
grid119_85.csv
grid119_86.csv
...

The grid${ikey}.csv is the output file name macro where ${ikey} is made up of the indices. Using grid${vkey}.csv would give the following file names based on the values of the ranges:

grid30_31_-92_-91.csv
grid39_40_-76_-75.csv
grid38_39_-76_-75.csv
...

-maptiles <columns> <degrees> <output_template>

Tile the data on lat/lon

columns: lat/lon columns to subdivide on
degrees: Degrees per tile. Defaults to 1
output_template: Output template - use ${ikey} or ${vkey}, e.g., tile${vkey}.csv. Defaults to a tile${vket}.csv

Just a shortcut for the -sub command

-chunk <output_template> <number>

Make a number of output files with a max number of rows

output_template: Output template - use ${number}, e.g., output${number}.csv. Defaults to a output${number}.csv
number: Number of rows in each file

-addheader <properties>

Add the RAMADDA point properties

properties: name1 value1 ... nameN valueN
Set default: default.type double

The -addheader command adds RAMADDA's point data header to the output file. The header specifies the canonical field ids and a set of properties for each field and is of the form:

#fields=field1[prop1=value1 prop2=value2 ...],field2[],...,fieldN[propn=valuen ...]
v1,v2,v3
...

The addheader command is run as below passing in one set of name/value arguments. If the value has spaces then group the text with {...} brackets. seesv -addheader "arg1 value1 arg2 {value 2} arg3 value3" The args are typically of the form canonical_field_name.some_property where canonical field name is the column header cleaned up - lowercase, spaces converted to "_", strip off non alphanumeric characters.

For example, say you have a file like this:

some date	year	some field
2018-09-01	2023	10
2019-10-01	2024	20
2020-09-01	2024	30

Running this command: seesv -addheader "some_date.type date date.format yyyy-MM-dd year.type date year.format yyyy some_field.id temperature some_field.label {The Temperature} some_field.type celsius" -p testdate.csv Will result in an output:

#fields=some_date[label="Some Date"  type="date" format="yyyy-MM-dd" ] ,year[label="Year"  type="date" format="yyyy" ] ,temperature[label="Some Field"  type="integer" chartable="true" ] 
2018-09-01,2023,10
2019-10-01,2024,20

This specifies:

The type of the field "date" as date with the date format following the Java date format as yyyy-MM-dd. Other types include:
- "enumeration", e.g. field.type enumeration
- "integer", e.g. field.type integer
- "double", e.g. field.type double
The year field is also a date with format yyyy
The canonical id of "some field" field is changed to "temperature"
The label of "some field" is changed to "The Temperature"
The unit of "some field" is set to be celsius. This can be used when displaying the data.

Other field properties include:

group, e.g. field.group {some group} - this is used to group a set of fields and is primarily used for display, e.g., providing a header in a table
default.format, e.g. default.format {yyyy-MM-dd} - this is to set the default date format that is applied to all fields
{some pattern}.{some property} - instead of the field name as the prefix you can specify a regular expression pattern. For example if you had data like:
```
  value1,value2,value3
  10,20,30
```
You can specify the type as "value.*.type integer".
Adding a grouping to the pattern can be used to pull out text from the field name and use it in the value. For example, below would set the label: seesv -addheader "(value.*).*.label {Some label \$1}"

-db <properties>

Generate the RAMADDA db xml from the header. See Help

properties: Name value pairs: table.id <new id> table.name <new name> table.cansearch false table.canlist false table.icon <icon>, e.g., /db/database.png <column>.id <new id for column> <column>.label <new label> <column>.type <string|enumeration|double|int|date|latlon> <column>.format <yyyy MM dd HH mm ss format for dates> <column>.canlist false <column>.cansearch false db.install <true|false> install the new db table db.droptable <true|false> careful! this deletes any prior created dbs db.yesreallydroptable true - this double checks

The -db command generates a Data Tables specification, a ...db.xml file that acts as a RAMADDA plugin. See https://ramadda.org/repository/db/db.html for more information.

The -db command takes the form:

-db "name value pairs"
e.g.:
-db "
table.id <new id> table.name <new name>
table.cansearch false table.canlist false
table.icon <icon>, e.g., /db/database.png
<column>.id <new id for column>
<column>.label <new label>
lt;column>.type <string|enumeration|double|int|date>
<column>.format <yyyy MM dd HH mm ss format for dates>
<column>.canlist false <column>.cansearch false
install <true|false install the new db table>
nukedb <true|false> careful! this deletes any prior created dbs
yesreallynukethewholedb true - this double checks
"

In the interactive Convert Data SeeSV form if you press "Table" then the schema will be shown. Clicking on the column links brings up a popup menu that sets the id, label, type, etc for the column.

If you have the install flag defined as true then when you hit "Process" the db.xml file will be generated and the plugin will be loaded into RAMADDA. You can then go and add a new entry of the Database name from the "Pick a type..." menu

If you have nukedb and the yesreallynukethewholedb flags set to true then when you press Process the database table will first be dropped and then recreated. This capability is there because you can iteratively change the type of the columns. Note: any database entry and data that has been loaded will be removed.

-dbprops <id pattern> <suffix pattern>

Print to stdout props for db generation

id pattern:
suffix pattern:

-write <file name template> <contents template>

Write the contents of a row to a named file

file name template:
contents template:

-fields

Print the fields

-run <Name of process directory>

Name of process directory:

-progress <rows>

Show progress

rows: How often to print

-debugrows <rows>

Debug # rows

rows:
1. of rows

-pointheader

Generate the RAMADDA point properties

-args

Generate the CSV file commands

-typexml <type_id> <type_label> <database_columns>

Generate the RAMADDA type xml

type_id: Type ID, prefix with type_, no spaces, lowercase, e.g.type_point_mypointdata
type_label: Type Label - human readable label
database_columns: List of database columns, one per line
e.g:id,label,type,prop,value1,prop2,value1
Types can be:
string,int,double,enumeration,enumerationplus
list,latlon,latlonbox,url,date,datetime
If enumeration then define values with escaped commas - \,:
```
fruit,Fruit,enumeration,values,banana\,apple\,orange
```

The -typexml command generates an entry type definition file. The Type ID should have a type_ prefix, contain no no spaces and be lowercase The Database columns are one per line where each line is:

id,label,type,prop1,value1,prop2,value2,...
e.g.:

#A String value
some_field,Some Field,string

#An integer value
some_number,int,Some Integer Column,default=30,showinhtml,false

#A double value
some_double,double,Some Double number,cansearch,false

#An enumeration. Use \n to escape the commas
fruit,Fruit,enumeration,values,banana\,apple\,orange

#An enumeration that can also be added new values for
fruit_plus,Fruit Plus,enumerationplus,values,banana\,apple\,orange

#Use id:label in the values
sport,Sport,enumeration,values,baseball:Baseball\,football:Football\,hockey:Hockey

The id will be the column identifier in the database so it should be lower case Press Outputs-Print text output to generate the mytypes.xml file. Download it and copy it to your RAMADDA Home/plugins directory and restart your RAMADDA. The types.xml has a default wiki text that you can change