RAMADDA SeeSV

Input | Filter | Slice and Dice | Change | Values | Dates | Numeric | Geospatial | Misc | Output
The RAMADDA SeeSV package provides 308 commands for manipulating CSV and other types of files

Introduction

The RAMADDA SeeSV can be used both interactively from within RAMADDA as well as via the command line. Help with the interactive version can be viewed here. In depth interactive examples are available here.

For brevity, in the below documentation we're assuming that you are using an alias:

alias seesv="sh /path/to/seesv/seesv.sh"  
The examples here show the command line use. If running interactively through the RAMADDA web interface the commands are the same form except for the seesv command call and the specification of the input file. So, for example, if you are calling the tool from the command line as:
seesv -columns column1,column2 -p file.csv > output.csv
The corresponding commands interactively would be:
-columns column1,column2

Install

  • Download the seesv.zip file from the RAMADDA download site
  • Unzip the file
  • Consult the README
  • SeeSV runs using Java 1.8 or higher. Set the JAVA environment variable to point to the java runtime.
  • Usage:
    seesv <any number of commands> -p <input file> > output file

Basic Use

The general use of the SeeSV tools is as follows:
seesv <a number of processing commands> <some output command> file.csv > output.csv
#or
seesv <a number of processing commands> <some output command> -o output.csv   file.csv 

#To show help:
seesv -help

#To show help on a topic:
seesv -help:some_topic

#With no processing commands and just printing the header:
seesv -printheader file.csv

#Or a shortcut
seesv -ph file.csv

#With no processing commands and csv output:
seesv -p file.csv > output.csv

#With no processing commands and a record oriented  output:
seesv -record file.csv > output.csv

#With the -columns command printing csv output
seesv -columns column1,column2 -p file.csv > output.csv

Column Identifiers

Note: many of these commands use one or more column names from the source file. These column names are converted to a canonical identifier format by lower casing, changing spaces to "_" and removing non alpha numeric characters. So for example if your source file is:
column 1, Column 2, Some "funny" column
 ...
Then the column ids are:
column_1,column_2,some_funny_column
You can always use the -ids command to list the ids
seesv -ids -printheader test.csv

Some Examples

####  selecting columns
#Subset test.csv and only include column1 and column2
seesv -columns column1,column2 -p test.csv > newtest.csv

#You can also use numeric indices - zero based
seesv -columns 0,1 -p test.csv > newtest.csv

#Specify a range of columns
#either using column names:
seesv -columns column1-columnN -p test.csv > newtest.csv
#or indices:
seesv -columns 1-5,6,7,10-15 -p test.csv > newtest.csv

#or mix and match numeric and names
seesv -columns 0-1,columnA,columnB,columnF-columnZ -p test.csv > newtest.csv

#Specify a step:
#seesv -columns start-end:step

#e.g. below gives 0,3,6,9,12,15
seesv -columns 0-15:3

#include all columns except column1
seesv -notcolumns column1 -p test.csv > newtest.csv

#subset columns and then change the values 
seesv -columns column1,column2 -change column1,column2 "pattern" "new value" -p test.csv > newtest.csv

#Only include column1 and then only pass through unique values matching exactly
seesv -columns column1 -unique 0 exact -p test.csv > newtest.csv

#Only pass throught unique values in the first column and then sort the output
seesv -columns column1 -unique 0 -sort column1 -p test.csv > newtest.csv


print version
-commands <file>
file of commands
  • file: The file of commands. Any # of lines

Input
Specify the input. Default is assumed to be a CSV but can support HTML, JSON, XML, Shapefile, etc.
The default input type is CSV. To print out a CSV file just do: seesv -p test.csv The SeeSV package can also handle zipped or gzipped input files: seesv -p test.csv.zip
seesv -p test.csv.gz
Likewise, if you have an XLSX file the package will extract the first sheet in the input file: seesv -p test.xls The SeeSV package does have some support for handling multiple input files. It is assumed that each file is has the same set of columns. Note: not all commands work with multiple input files. seesv -p test1.csv test2.csv Other file types (see below for more details) are handled as: #Read each line as a single column
seesv -lines -p test.txt

#Process xml with the given path
seesv -xml "path.to.rows" -p test.xml

#Process JSON
seesv -json "path.to.array" "object_paths" -p test.json
Commands:
-delimiter <delimiter>
Specify the input delimiter
  • delimiter: Use 'space' for space, 'tab' for tab,'?' to guess between tab and space
-inputcomment <comment>
Input comment
  • comment:
Use tabs. A shortcut for -delimiter tab
-widths <widths>
Columns are fixed widths
  • widths: w1,w2,...,wN
For example, with the below data that has Xs to represent spaces
nameXXXvalue1XXXXXvalue2XXXXX   
foo    1          2
bar    3          4
You would set the -widths to be: seesv -widths 7,11,11

Don't treat quotes as special characters
Normally, when parsing a CSV file if a quote is encountered it is assumed to be a special character enclosing commas or new lines. If the quotes in your CSV file aren't special then use this command.

Input is one text line per row. i.e., no new lines in a data row. Setting this can improve performance on large files
-start <start pattern>
Start at pattern in source file
  • start pattern:
-stop <stop pattern>
End at pattern in source file
  • stop pattern:
Input has a leading byte order mark (BOM) that should be stripped out
-encoding <encoding>
Specify the file encoding
  • encoding: File Encoding see https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html values:UTF-8,UTF-16,UTF-16BE,UTF-16LE,UTF-32,UTF-32BE,UTF-32LE,CESU-8,IBM00858,IBM437,IBM775,IBM850,IBM852,IBM855,IBM857,IBM862,IBM866,ISO-8859-1,ISO-8859-13,ISO-8859-15,ISO-8859-2,ISO-8859-4,ISO-8859-5,ISO-8859-7,ISO-8859-9,KOI8-R,KOI8-U,Not available,US-ASCII,windows-1250,windows-1251,windows-1252,windows-1253,windows-1254,windows-1257,x-IBM737,x-IBM874,x-UTF-16LE-BOM,x-UTF-32BE-BOM,x-UTF-32LE-BOM
-header <header>
Raw header
  • header: Column names
Use this if your file does not have a header.

-multifiles <template>
Treat input files separately
  • template: File template - ${file_shortname} ${file_name} ${count}
-json <arrayPath> <objectPaths>
Parse the input as json
  • arrayPath: Path to the array e.g., obj1.arr[2].obj2
  • objectPaths: One or more paths to the objects e.g. geometry,features
The -json command takes as first argument a path to the array of objects that constitute the data. For example, if you had JSON like:
{
    "p1": {
	"p2": {
	    "values": [
		1,2,3,4
	    ]
	}
    }
}
The command would be: seesv -json p1.p2.values "" Since the values array just holds numeric values there is no column name. The data would be:
1
2
3
4

If you had a path that had an array component that you wanted to access, for example, the second element of the array, e.g.:
{
    "p1": [
	{"foo":"bar"},
	{
	    "p2": {
		"values": [
		    "1","2","3","4"
		]
	    }
	}
    ]
}
The command would be: seesv -json "p1[1\].p2.values" "" Here is an example where the value array holds objects.
{
    "p1": {
	"p2": {
	    "values": [
		{
		    "name":"jim",
		    "value":5
		},
		{
		    "name":"jill",
		    "value":10
		}		
	    ]
	}
    }
}
The command is: seesv -json "p1.p2.values" "" With the result using the object attribute names as column names:
namevalue
jim5
jill10

If the object within the array is a nested object and you only want to access an inner object, e.g.:
{
    "p1": {
	"p2": {
	    "values": [
		{
		    "foo":"bar",
		    "goodvalues": {
			"name":"jim",
			"value":5
		    }
		},
		{
		    "foo":"bar",
		    "goodvalues": {
			"name":"jill",
			"value":10
		    }
		},
	    ]
	}
    }
}
You would use the second argument to specify an inner path to the object of interest: seesv -json "p1.p2.values" "goodvalues" If the object within the array is a nested object and you only wanted to access multiple inner objects, e.g.:
{
    "p1": {
	"p2": {
	    "values": [
		{
		    "foo":"bar",
		    "goodvalues": {
			"name":"jim",
			"value":5
		    },
		    "moregoodvalues": {
			"age":74,
		    }    
		},
		{
		    "foo":"bar",
		    "goodvalues": {
			"name":"jill",
			"value":10
		    },
		    "moregoodvalues": {
			"age":80,
		    }
		},
	    ]
	}
    }
}
Then the second argument can be a comma separated list of path specifications:
seesv  -json "p1.p2.values" "goodvalues,moregoodvalues"
Resulting in:
agenamevalue
74jim5
80jill10

-jsonjoin <arrayPaths> <keys> <pattern> <replace> <missing>
Join different arrays in the input JSON
  • arrayPaths: comma separated list of the array paths
  • keys: Comma separated list of keys to match on
  • pattern: Optional pattern to replace the key value with
  • replace: Pattern replace
  • missing: Missing value
-jsonvalue <columns> <arrayPath>
Extract a value from a JSON column
  • columns: Column names
  • arrayPath: Path to the array e.g., obj1.arr[2].obj2
-geojson <includePolygon>
Parse the input as geojson
  • includePolygon: Include polygon values:true,false
Read input from a PDF file.
The RAMADDA_TABULA environment variable needs to be set to the tabula.sh file found in this release
export RAMADDA_TABULA=/path/to/tabula.sh

-xml <path>
Parse the input as xml
  • path: Path to the elements
-shapefile <props>
Parse the input shapefile
  • props: "addPoints true addShapes false"
The -shapefile command can either take a .shp shapefile or a .zip shapefile. If it is a .zip shapefile that contains a .dbf file then then values within the .dbf file are extracted and added as output. The argument is a set of name/value pairs. For now these can contain an addPoints and an addShapes values, e.g. the below will add a point which is the calculated centroid of the shape. seesv -shapefile "addPoints true" -p test.zip With addShapes=true the polygon will be extracted and added as a column value: seesv -shapefile "addShapes true" -p test.zip

Parse the input as text lines. Treat each line as one column
For each line in the input file this creates a row that contains one column which holds the line.

-htmltable <skip> <pattern> <properties>
Parse tables in the input html file
  • skip: Number of tables to skip
  • pattern: Pattern to skip to
  • properties: Other name value args -
    • numTables N:Number of tables to process. Default is 1
    • removeEntity true:remove HTML entities
    • removePattern pattern
    • extractUrls true
    • columnN.extractUrls true: N=column number
    • stripTags false: strip any HTML tags. Default =true
    • columnN.stripTags false: N=column number. Set stripTags for the column

If "pattern" is specified then skip to the first line that matches the pattern. Then skip over the "skip" number of tables. Then process each table up to the numTables (default=1).

The properties are defined as: seesv -htmltable 0 "" "numTable 3 removePattern {some pattern} stripTags true"

-htmlpattern <columns> <startPattern> <endPattern> <pattern>
Parse the input html file
  • columns: Column names
  • startPattern:
  • endPattern:
  • pattern: Row pattern. Use (...) to match columns
-harvest <pattern>
Harvest links in web page. This results in a 2 column dataset with fields: label,url
  • pattern: regexp to match
-text <comma separated header> <chunk pattern> <token pattern>
Extract rows from the text
  • comma separated header:
  • chunk pattern:
  • token pattern:
If you have an unstructured text file the -text command can extract out a set of rows and columns from it. The idea is you specify one regular expression (help) for finding a chunk of text in the file that holds the row values and another regular expression to extract the column values from the chunk.

Assume you have a file test.txt

  Hello there how are you
Person:  name: Jim age: 42
and some more text  
Person:  name: Joe age: 53
Person:  name: Sally age: 93
and more
We are going to extract the names and age with: seesv -text "name,age" "(?m)(?s).*?Person:(.*?$)" "name:(.*?)age: *(\d+)" The (?m)(?s) are directives to match multiple lines.
The (.*?$) in the first pattern is the group that is matched.
The second pattern: name:(.*?)age: *(\d+) has 2 groups, one for the name and one for the age.

-extractpattern <comma separated header> <token pattern>
Extract rows from the text
  • comma separated header:
  • token pattern:
If you have a row oriented but unstructured text file the -extractpattern command can be used to extract the columns from it. Specify a comma separated header and a regular expression (help) with groups that define the columns.

Assume you have a file test.txt

Person:  name: Joe age: 53
Person:  name: Sally age: 93
Extract the data with: seesv -extractpattern "name,age" "name:(.*)age:(.*)" -p file.txt

-tokenize <header> <pattern>
Tokenize the input from the pattern
  • header: header1,header2...
  • pattern:
-sql <db> <table> <columns> <where> <properties>
Read data from the given database
  • db: The database id (defined in the environment) values:property:seesv_dbs
  • table: Comma separate list of tables to select from
  • columns: Comma separated list of columns to select
  • where: column1:expr:value;column2:expr:value;... e.g.: name:like:joe;age:>:60 Where expr is: =|<|>|<>|like|notlike
  • properties: name space value properties. e.g., join col1,col2
The -sql command uses a number of RAMADDA properties or environment variables that define the databases that can be accessed. The command specifies a database as: seesv -sql <db> ...
e.g.:
seesv -sql test ...
Where for the given db there should be 2 environment variables:
export seesv_db_<db>_url=JDBC URL
export seesv_db_<db>_tables=List of tables that can be accessed
So for the db "test" in a Derby database:
export seesv_db_test_url=jdbc:derby:/Users/home/derby/db1;create=true;
export seesv_db_test_tables=test_table1,test_table2,test_table3

-synthetic <header> <values> <number_rows>
Generate an empty file with the given number of rows
  • header: comma separated header
  • values: comma separated values
  • number_rows: Number of rows
For example using this: seesv -synthetic "name,value" "example name,example value" 5Gives:
namevalue
example nameexample value
example nameexample value
example nameexample value
example nameexample value
example nameexample value

Note: you don't have to specify all of the example values. For example, using this: seesv -synthetic "name,value" "example name,example value" 5 Gives:
namevalue
<blank><blank>
<blank><blank>
<blank><blank>
<blank><blank>
<blank><blank>

-prune <bytes>
Prune out the first N bytes
  • bytes: Number of leading bytes to remove
Strip off the RAMADDA point header
Make the header proper capitalization
Clean up the header names
Use canonical names
This command converts the column names to a standardized format. Changing the names to lower case and converting punctuation, etc to "_".

-sheet <sheet>
Set XLS sheet #
  • sheet: Sheet number
-cat <*.csv>
Concat the columns in one or more csv files
  • *.csv:
For example, with the following 2 files: test1.csv:
name1value1
foo1
bar2

test2.csv:
name2value2value3
xxx13
yyy27

Calling: seesv -cat test1.csv test2.csv Would result in an out file:
name1value1name2value2value3
foo1xxx13
bar1yyy27

-append <skip> <files>
Append the files, skipping the given rows in the latter files
  • skip: Number of rows to skip
  • files:
    • .csv
This concatenates the input files, passing through the first file whole but skipping over the given number of lines in the subsequent files. So with files: test1.csv:
namevalue
foobar

test2.csv:
namevalue
xy

test3.csv:
namevalue
lm

Calling: seesv -append 1 test1.csv test2.csv test3.csv Results in:
namevalue
foobar
xy
lm

-chop <numlines> <file>
Write out last N lines. include the header
  • numlines: Number of lines to leave
  • file:
    • .csv
-filenamepattern <pattern> <columnnames>
Extract strings from the file name and add them as new columns
  • pattern: Pattern to match
  • columnnames: Comma separated list of column names


Filter
Commands:
-skiplines <lines>
Skip number of raw lines.
  • lines: How many raw lines to skip
-maxrows <rows>
Set max rows to process
  • rows: Number of rows
-match <columns> <pattern>
Pass through rows that the columns each match the pattern
  • columns: Column indices. Can include ranges, e.g. 0-5
  • pattern: regexp or prefix with includes:s1,s2 to do substrings match
-notmatch <columns> <pattern>
Pass through rows that don't match the pattern
  • columns: Column indices. Can include ranges, e.g. 0-5
  • pattern: regexp or prefix with includes:s1,s2 to do substrings match
-if
Next N args specify a filter command followed by any change commands followed by an -endif.
For example, the below command would change the value of column2 for all rows that match the -pattern column1 string command. seesv -if -pattern column1 string -change column2 .* XXX -endif

-rawlines <lines>
  • lines: Pass through and print out rawlines unprocesed
-inputnotcontains <filters>
Filter out input lines that contain any of the strings
  • filters: Comma separated list of strings to filter on
These patterns are applied to the raw lines that are read in before they are tokenized into columns.

-min <min # columns>
Only pass thorough lines that have at least this number of columns. Specify blank to use the number of columns in the header
  • min # columns:
-max <max # columns>
Only pass through lines that have no more than this number of columns. Specify blank to use the number of columns in the header
  • max # columns:
-numcolumns <number>
Remove or add values so each row has the number of columns
  • number: use -1 to use the # of columns in the header
-has <columns>
Only pass through anything if the data has the given columns
  • columns: Column indices. Can include ranges, e.g. 0-5
-ifnumcolumns <operator> <number>
Only pass through rows with number of columns passing the operator
  • operator: <,<=,>,>=,=,!=
  • number: Number of columns
-fuzzypattern <threshold> <columns> <pattern>
Pass through rows that the columns each fuzzily match the pattern
  • threshold: Score threshold 0-100. Default:85. Higher number better match
  • columns: Column indices. Can include ranges, e.g. 0-5
  • pattern:
-lengthgreater <columns> <length>
Pass through rows that the length of the columns is greater than
  • columns: Column indices. Can include ranges, e.g. 0-5
  • length:
-same <column1> <column2>
Pass through where the 2 columns have the same value
  • column1:
  • column2:
-notsame <column1> <column2>
Pass through where the 2 columns don't have the same value
  • column1:
  • column2:
-unique <columns> <mode>
Pass through unique values
  • columns: Column indices. Can include ranges, e.g. 0-5
  • mode: What type of matching is done - exact (exact match) or clean (lower case and remove whitespace) or fuzzy:threshold (do fuzzy matching with threshold from 1: no similarity to 100: exact match. use fuzzy:? to print out values) values:exact,clean,fuzzy:threshold
The mode fuzzy:threshold uses the Fuzzywuzzy package which is based on the FuzzyWuzzy Python algorithm which uses the Levenshtein distance to calculate similarity between strings. You can view what the calculated threshold values are: seesv -unique column fuzzy:? This will print out the threshold values, e.g:
value:joey smith
        95 joe smith
value:sam jones
        44 joe smith
        42 joey smith
value:sam jones
        44 joe smith
        42 joey smith
value:sue washington
        43 joe smith
        42 joey smith
        35 sam jones
value:sue washington
        43 joe smith
        42 joey smith
        35 sam jones

-dups <columns>
Pass through duplicate values
  • columns: Column indices. Can include ranges, e.g. 0-5
-sample <probablity>
Pass through rows based on probablity
  • probablity: 0-1 probability of passing through a row
For example, if you only want 5% of the data sampled do: seesv -sample 0.05 ...

-minvalue <key column> <value column>
Pass through the row that has the min value in the group of columns defined by the key column
  • key column:
  • value column:
-maxvalue <key column> <value column>
Pass through the row that has the max value in the group of columns defined by the key column
  • key column:
  • value column:
-eq <column> <value>
Pass through rows that the column value equals the given value
  • column:
  • value:
-ne <column> <value>
Pass through rows that the column value does not equal the given value
  • column:
  • value:
-gt <column> <value>
Pass through rows that the column value is greater than the given value
  • column:
  • value:
-ge <column> <value>
Pass through rows that the column value is greater than or equals the given value
  • column:
  • value:
-lt <column> <value>
Pass through rows that the column value is less than the given value
  • column:
  • value:
-le <column> <value>
Pass through rows that the column value is less than or equals the given value
  • column:
  • value:
-between <column> <min value> <max value>
Extract rows that are within the range
  • column:
  • min value:
  • max value:
-notbetween <column> <min value> <max value>
Extract rows that are not within the range
  • column:
  • min value:
  • max value:
-betweenstring <column> <start string> <end string>
Extract rows that are between the given strings
  • column:
  • start string:
  • end string:
For example, given the following data
namevalue1value2
janefoo1
janebar2
johndoe3
billdoe4
billdoe4

Calling: seesv -betweenstring name jane john Will give:
namevalue1value2
janefoo1
janebar2
johndoe3

-notbetweenstring <column> <start string> <end string>
Extract rows that are between the given strings
  • column:
  • start string:
  • end string:
For example, given the following data
namevalue1value2
janefoo1
janebar2
johndoe3
billdoe4
billdoe4

Calling: seesv -notbetweenstring name jane john Will give:
namevalue1value2
billdoe4
billdoe4

-groupfilter <column> <value_column> <operator> <value>
One row in each group has to match
  • column: key column
  • value_column: Value column
  • operator: values:=,!=,~,<,<=,>,>=
  • value:
-before <column> <date>
Pass through rows whose date is before the given date
  • column:
  • date:
-after <column> <date>
Pass through rows whose date is after the given date
  • column:
  • date:
-countvalue <column> <count>
No more than count unique values
  • column:
  • count:
-decimate <rows> <skip>
only include every row
  • rows:
    1. of start rows to include
  • skip: skip factor
-ifin <column> <file> <column2>
Pass through rows that the columns with ID is in given file
  • column: Column in the file
  • file: The file
  • column2: Column in main file
The -ifin command takes a file of values and will only pass through the row if the value in the row matches at least one of the values in the file. For example, say you have a file precincts.csv:
precinct
p1
p2
p3

And a file voters.csv:
namevoter precinct
jimp1
joep1
sallyp7
...

If you only want voters whose precinct is in the precints.csv you would do: seesv -ifin precinct precincts.csv voter_precinct -p voters.csv The -ifnotin command does the opposite only passing through rows that are not in the file, e.g.: seesv -ifnotin precinct precincts.csv voter_precinct -p voters.csv

-ifnotin <column> <file> <column2>
Pass through rows that the columns with ID is not in given file
  • column: Column in the file
  • file: The file
  • column2: Column in main file
-ifmatchesfile <pattern> <column> <file> <column2>
Pass through rows that the columns with ID begins with something in the given file
  • pattern: Pattern template, e.g. ^${value}
  • column: Column in the file
  • file: The file
  • column2: Column in main file
The -ifmatchesfile command takes a pattern template and a file of strings. For each line in the file it creates a regular expression pattern using the template by substituting the line into the pattern with the macro ${value}.

For example, say you have a file of addresses short.csv:

addresscitystate
123 oak boulder co
345 main denverco
...

And your source data has a set of full addresses full.csv:
full addresscitystate
123 oak unit #5
789 pine apt 3
345 main
...

And you only want to pass through rows in full.csv address begins with some address in the short.csv file. To do this call: seesv -ifmatchesfile "^${value}.*" address short.csv full_address -p fill.csv What this does is construct regexp patterns as:
^123 oak.*
^345 main.*
...
And then only passes through the rows in full.csv that match one of these patterns.

The command -ifnotmatchesfile does the opposite. For a row to pass through none of the patterns can match.

-ifnotmatchesfile <pattern> <file> <column2>
Pass through rows that the columns with ID does not begin with something in the given file
  • pattern: Pattern template, e.g. ^${value}
  • file: The file
  • column2: Column in main file
-skippattern <pattern>
Skip any line that matches the pattern
  • pattern:
-skiprows <rows>
Skip number of processed rows.
  • rows: How many rows to skip
-ensurenumeric <columns>
Throw error if non-numeric
  • columns: Column indices. Can include ranges, e.g. 0-5


Slice and Dice
Add/remove columns, rows, restructure, etc
Commands:
-columns <columns>
Only include the given columns
  • columns: Column indices. Can include ranges, e.g. 0-5
-notcolumns <columns>
Don't include given columns
  • columns: Column indices. Can include ranges, e.g. 0-5
-firstcolumns <columns>
Move columns to beginning
  • columns: Column indices. Can include ranges, e.g. 0-5
-lastcolumns <columns>
Move columns to end
  • columns: Column indices. Can include ranges, e.g. 0-5
-columnsbefore <column> <columns>
Move columns before the given column
  • column: Column to move before
  • columns: Columns to move
-columnsafter <column> <columns>
Move columns after given column
  • column: Column to move after
  • columns: Columns to move
-delete <columns>
Remove the columns
  • columns: Column indices. Can include ranges, e.g. 0-5
-cut <rows>
Drop rows
  • rows: One or more rows. -1 to the end. e.g., 0-3,5,10,-1
-include <rows>
Only include specified rows
  • rows: one or more rows, -1 to the end
-rows_first <columns> <pattern>
Move rows to the top that match the pattern
  • columns: columns to match on
  • pattern: Pattern
-rows_last <columns> <pattern>
Move rows to the end of list that match the pattern
  • columns: columns to match on
  • pattern: Pattern
-copy <column> <name>
Copy column
  • column:
  • name:
-add <names> <values>
Add new columns
  • names: Comma separated list of new column names
  • values: Comma separated list of new values
-insert <column> <name> <values>
Insert new column values
  • column: Column to insert before
  • name: Name of new column
  • values: Value to insert. Use ${row} to add the row index
-concat <columns> <delimiter> <name>
Create a new column from the given columns
  • columns: Column indices. Can include ranges, e.g. 0-5
  • delimiter:
  • name: Name of new colums
-concatrows <num_rows>
Concatenate multiple rows into a single row
  • num_rows: Number of rows
For example, if you have this data:
namevalue
joe5
jill2
jack1
jim7

Running: seesv -concatrows 2 Would give:
namevaluenamevalue
joe5jill5
jack1jim7

-combine <columns> <delimiter> <column name>
Combine columns with the delimiter. deleting columns
  • columns: Column indices. Can include ranges, e.g. 0-5
  • delimiter:
  • column name: New column name
-combineinplace <columns> <delimiter> <column name>
Combine columns with the delimiter
  • columns: Column indices. Can include ranges, e.g. 0-5
  • delimiter:
  • column name: New column name
-merge <columns> <name> <operator>
Apply operators to columns
  • columns: Columns to merge
  • name: New column(s) name
  • operator: Operator values:average,min,max,count
-split <column> <delimiter> <names>
Split the column
  • column:
  • delimiter: What to split on
  • names: Comma separated new column names
-splat <keycol> <column> <delimiter> <name>
Create a new column from the values in the given column
  • keycol: Key column
  • column:
  • delimiter:
  • name: new column name
-roll <columns>
Roll columns down into rows
  • columns: Column indices. Can include ranges, e.g. 0-5
-shift <rows> <column> <count>
Shift columns over by count for given rows
  • rows: Rows to apply to
  • column: Column to start at
  • count:
-slice <columns> <dest> <fill>
Slide columns down and over to append new rows to the bottom
  • columns: Columns to move
  • dest: Desc column to move to
  • fill: Comma separated list of values to fill out the new row
-addcell <row> <column> <value>
Add a new cell at row/column
  • row:
  • column:
  • value:
-deletecell <row> <column>
Delete cell at row/column
  • row:
  • column:
-clone <count>
Clone each row N times
  • count: Number of clonese
-appendrows <skip> <count> <delimiter>
Only include specified rows
  • skip: How many rows to skip
  • count: How many rows to merge
  • delimiter: How many rows to merge
For example, with the below data:
categorycategorycategory
fruitvegetablemeat
123
456

Running: seesv -appendrows 0 1 " - " Would give:
category - fruitcategory - vegetablecategory - meat
123
456

-mergerows <rows> <delimiter> <close>
Merge rows
  • rows: 2 or more rows
  • delimiter:
  • close:
-rowop <keys> <values> <operator>
Apply an operator to columns and merge rows
  • keys: Key columns
  • values: Value columns
  • operator: Operator values:average,min,max,count
Rotate the data
Reverse the order of the rows except the header
-makefields <column> <value columns> <unique column> <other columns>
Make new columns from data values
  • column: Column to get new column header#
  • value columns: Columns to get values from
  • unique column: The unique value, e.g. date
  • other columns: Other columns to include
-melt <columns> <header label> <value label>
Use values in header to make new row
  • columns: Column indices. Can include ranges, e.g. 0-5
  • header label:
  • value label:
-explode <column>
Make separate files based on value of column
  • column:
-join <key columns> <value_columns> <file> <source_columns> <default_value>
Join the 2 files together
  • key columns: key columns the file to join with
  • value_columns: value columns
  • file: File to join with
  • source_columns: source key columns
  • default_value: default value - can be a comma separated list of defaults
-fuzzyjoin <threshold> <key columns> <value_columns> <file> <source_columns> <default_value>
Join the 2 files together using fuzzy matching logic
  • threshold: Score threshold 0-100. Default:85. Higher number better match
  • key columns: Numeric column numbers of the file to join with
  • value_columns: numeric columns of the values to join
  • file: File to join with
  • source_columns: source key columns
  • default_value: default value
-cross <file>
Make a cross product of 2 data files
  • file: File to cross with
With the following data: letters.csv:
letter
a
b
c
d


number
1
2
3

The command: seesv -cross numbers.csv -p letters.csv Would give:
letternumber
a1
a2
a3
b1
b2
b3
c1
c2
c3
d1
d2
d3

-normal <columns>
Normalize the strings
  • columns: Columns
-countunique <columns>
Count number of unique values
  • columns: Column indices. Can include ranges, e.g. 0-5
-dissect <column> <pattern>
Make fields based on patterns
  • column:
  • pattern: e.g., "(field1:.*) (field2:.*) ..."
-keyvalue <column>
Make fields from key/value pairs, e.g. name1=value1 name2=value2 ...
  • column:
-firstchars <column> <name> <number>
Extract first N characters and create new column
  • column:
  • name: New column name
  • number: Number of characters
-lastchars <column> <name> <number>
Extract last N characters and create new column
  • column:
  • name: New column name
  • number: Number of characters
-between_indices <column> <name> <start> <end>
Extract characters between the 2 indices
  • column:
  • name: New column name
  • start: Start index
  • end: End index
-fromheading <columns> <names> <pattern>
Extract column values from headings
  • columns: Columns of headings
  • names: Comma separated list of new column names
  • pattern: Regexp to apply to header with () defining column values


Change
Commands:
-change <columns> <pattern> <substitution string>
Change columns
  • columns: Column indices. Can include ranges, e.g. 0-5
  • pattern: Regexp pattern. Help
  • substitution string: use $1, $2, etc for pattern (...) matches
For example, with the below data:
label
xxxsome valueyyy

You can strip out the xxx...yyy and add a label: prefix with: seesv -change label "xxx(.*)yyy" "label:$1" Which would give:
label
label:some value

You can also specify a file that holds a number of patterns: seesv -change columns file:file.txt "" The file is of the form:
pattern1::replace_string1
pattern2::replace_string2  
e.g.:
(?i).*foo.*::foobar
.*value.*::new value

-changerow <rows> <columns> <pattern> <substitution string>
Change the values in the row/cols
  • rows: Row indices. Can include ranges, e.g. 0-5
  • columns: Column indices. Can include ranges, e.g. 0-5
  • pattern: Regexp pattern. Help
  • substitution string: use $1, $2, etc for pattern (...) matches
-replace <columns> <substitution string>
Replace
  • columns: Column indices. Can include ranges, e.g. 0-5
  • substitution string: use $1, $2, etc for pattern (...) matches
    use {value} for value
-set <columns> <rows> <value>
Write the value into the cells
  • columns: Column indices. Can include ranges, e.g. 0-5
  • rows: Row indices. Can include ranges, e.g. 0-5
  • value:
-cleanwhitespace <columns>
Clean whitespace
  • columns: Column indices. Can include ranges, e.g. 0-5
-macro <pattern> <template> <column label>
Look for the pattern in the header and apply the template to make a new column, template: '{1} {2} ...', use 'none' for column name for no header
  • pattern: Regexp pattern. Help
  • template:
  • column label:
-setcol <column> <pattern> <write column> <value>
Write the value into the write col for rows that match the pattern
  • column: match col #
  • pattern: Regexp pattern. Help
  • write column:
  • value:
-copyif <columns> <pattern> <column1> <column2>
Copy column 2 to column 3 if all of the columns match the pattern
  • columns: Column indices. Can include ranges, e.g. 0-5
  • pattern: Regexp pattern. Help
  • column1:
  • column2:
-copycolumns <columns1> <columns2>
Copy columns 1 to columns 2
  • columns1:
  • columns2:
-filldown <columns>
Fill down with last non-null value
  • columns: Column indices. Can include ranges, e.g. 0-5
-fillacross <columns> <rows>
Fill across with last non-null value
  • columns: Column indices. Can include ranges, e.g. 0-5
  • rows: One or more rows. -1 to the end. e.g., 0-3,5,10,-1
-unfill <columns>
Set following cells to blank if the same as previous cell
  • columns: Column indices. Can include ranges, e.g. 0-5
For example, given the following data
namevalue1value2
janefoo1
janebar2
johndoe3
billdoe4
billdoe4

Calling: seesv -unfill name,value1 Will give:
namevalue1value2
janefoo1
<blank>bar2
johndoe3
bill<blank>4
<blank><blank>4

-priorprefix <column> <pattern> <delimiter>
Append prefix from the previous element to rows that match pattern
  • column:
  • pattern: Regexp pattern. Help
  • delimiter:
-case <column> <type>
Change case of column - type:lower,upper,proper,capitalize
  • column:
  • type: values:lower,upper,proper,capitalize
-toid <columns>
Convert the column(s) into IDS (lowercase, no space, a-z0-9_)
  • columns: Column indices. Can include ranges, e.g. 0-5
-padleft <columns> <character> <length>
Pad left with given character
  • columns: Column indices. Can include ranges, e.g. 0-5
  • character: Character to pad to
  • length: Length
-padright <columns> <character> <length>
Pad right with given character
  • columns: Column indices. Can include ranges, e.g. 0-5
  • character: Character to pad to
  • length: Length
-trim <columns>
Trim leading and trailing white space
  • columns: Column indices. Can include ranges, e.g. 0-5
-trimquotes <columns>
Trim leading and trailing quotes
  • columns: Column indices. Can include ranges, e.g. 0-5
-width <columns> <size>
Limit the string size of the columns
  • columns: Column indices. Can include ranges, e.g. 0-5
  • size:
-prepend <text>
Add the text to the beginning of the file. use _nl_ to insert newlines
  • text:
-pad <count> <pad string>
Add or remove columns to achieve the count
  • count:
  • pad string:
-prefix <column> <prefix>
Add prefix to column
  • column:
  • prefix: String to use
-suffix <column> <suffix>
Add suffix to column
  • column:
  • suffix:
-subst <column_name> <template>
Create a new column with the template
  • column_name: New Column Name
  • template: Template - use ${column_name} ...
-ascii <columns> <substitution string>
Convert non ascii characters
  • columns: Column indices. Can include ranges, e.g. 0-5
  • substitution string: use $1, $2, etc for pattern (...) matches
-cleanphone <columns>
Clean the phone number
  • columns: Column indices. Can include ranges, e.g. 0-5
-formatphone <columns>
Format the phone number
  • columns: Column indices. Can include ranges, e.g. 0-5
-ismobile <columns>
Add a true/false if the string is a mobile phone
  • columns: Column indices. Can include ranges, e.g. 0-5
The -ismobile command uses one of two external APIs - https://numverify.com/ or https://numlookupapi.com. After making an account with either of these services set the corresponding environment variables:
export NUMLOOKUPAPI_API_KEY=
or
export NUMVERIFY_API_KEY=
This command is only available when running from the command line or if running within RAMADDA the user is a logged in user. We do this as the SeeSV service within RAMADDA can be run by any user.

A cache file, ismobile.txt, of past phone numbers will be created in the local working directory so this command can be run multiple times without resorting to the API calls for past phone numbers.

-sms <column> <campaign> <message>
Send a text message - only for command line
  • column: Phone number
  • campaign: Campaign
  • message: Message template
The -sms command uses https://www.twilio.com/ to send text messages. This command is only available when running from the command line or if running within RAMADDA the user is a logged in user. We do this as the SeeSV service within RAMADDA can be run by any user.

To enable this command create an account at Twilio and access your API keys. Add the following environment variables:

export TWILIO_ACCOUNT_SID=
export TWILIO_AUTH_TOKEN=
export TWILIO_PHONE=
This command takes a column ID which holds the phone number, a campaign identifier and a message template. The campaign identifier is used to create a list of phone numbers that a message has already been sent to. For example if you first run: seesv -sms phone some_campaign "Hi there"The SeeSV package will write out to a file <campaign>.sent.txt the list of phone numbers that the message was sent to. If the command is interrupted then rerun with the same campaign ID then those numbers that have had a message sent to will not have the message sent again.

-js <javascript>
Define Javascript (e.g., functions) to use later in the -func call
  • javascript:
-func <names> <javascript>
Apply the javascript function. Use _colname or _col#
  • names: New column names
  • javascript: javascript expression
-endswith <column> <string>
Ensure that each column ends with the string
  • column:
  • string:
-truncate <columns> <max length> <suffix>
  • columns: Column indices. Can include ranges, e.g. 0-5
  • max length:
  • suffix:
-extract <column> <pattern> <replace with> <name>
Extract text from column and make a new column
  • column:
  • pattern: Regexp pattern. Help
  • replace with: use 'none' for no replacement
  • name: new column name
-urlarg <column> <argname>
Extract URL argument and make a new column
  • column:
  • argname: URL arg name
-extracthtml <column> <names> <pattern>
Extract text from HTML
  • column: URL Column
  • names: Comma separated list of new column names
  • pattern: Regexp pattern. Help
-htmlinfo <column>
Extract icon and description from input URL
  • column: URL Column
-checkmissing <column> <replace_with>
Check for missing URL
  • column: URL Column
  • replace_with: Replace with
-xmlencode <columns>
Encode the value for XML
  • columns: Column indices. Can include ranges, e.g. 0-5
-urlencode <columns>
URL encode the columns
  • columns: Column indices. Can include ranges, e.g. 0-5
-urldecode <columns>
URL decode the columns
  • columns: Column indices. Can include ranges, e.g. 0-5
-map <column> <new column name> <value>
Change values in column to new values
  • column:
  • new column name:
  • value: old_value new_value old_value new_value
-format <columns> <format>
Apply decimal format to the columns (see https://docs.oracle.com/javase/7/docs/api/java/text/DecimalFormat.html)
  • columns: Column indices. Can include ranges, e.g. 0-5
  • format: Decimal format e.g. '##0.00'
-denormalize <file> <from id idx> <from value idx> <to idx> <new col name> <mode replace add>
Read the id,value from file and substitute the value in the dest file col idx
  • file: From csv file
  • from id idx:
  • from value idx:
  • to idx:
  • new col name:
  • mode replace add:
-break <label1> <label2> <columns>
Break apart column values and make new rows
  • label1:
  • label2:
  • columns: Column indices. Can include ranges, e.g. 0-5
-parseemail <columns>
Parse out name and email
  • columns: Column indices. Can include ranges, e.g. 0-5
Turn the header row into IDs (lowercase, no space, a-z0-9_)
For example, if you had the below data:
Some header label with some punctuation-+/ Some other header label
...

Running: seesv -makeids Gives
some_header_label_with_some_punctuation some_other_header_label
...

-faker <what> <columns>
Fake up data. See the docs at https://ramadda.org/repository/userguide/seesv.html#-faker
  • what: firstname|lastname|fullname|etc
  • columns: Columns to change. If none given then add the fake value
The -faker command uses the Java Faker package and takes any of the below commands:
firstname
fullname
lastname
name
namewithmiddle
prefix
suffix
title
username
address
city
country
state
stateabbr
streetname
timezone
zipcode
latitude
longitude
countrycode
boolean
asin
ean13
ean8
gtin13
gtin8
imei
isbn10
isbn13
isbngroup
isbngs1
isbnregistrant
color
department
material
price
productname
promotioncode
demonym
educationalattainment
maritalstatus
race
sex
bic
creditcard
iban
ssn
digit
digits:number_of_digits
numberbetween:first:last
randomdigit
randomdigitnotzero
randomDouble:maxNumberOfDecimals:min:max
randomnumber
cellphone
phonenumber
diseasename
hospitalname
medicinename
symptoms

-edit <column>
Hand edit a column (command line only). ESC-stop, BLANK-skip
  • column: key column
The edit command is used from the command line and will prompt the user for a new column value. If the value has already been seen then the prior new value is used. Enter <blank> to skip the given row. Enter ESCAPE to stop prompting for new values.



Values
-md <columns> <type>
Make a message digest of the column values
  • columns: Column indices. Can include ranges, e.g. 0-5
  • type: values:MD5,SHA-1,SHA-256,SHA-512,SHA3-256,SHA3-512,
-tob64 <columns>
Base 64 Encode
  • columns: Column indices. Can include ranges, e.g. 0-5
-fromb64 <columns>
Base 64 Decode
  • columns: Column indices. Can include ranges, e.g. 0-5
-rot13 <columns>
Rot 13
  • columns: Column indices. Can include ranges, e.g. 0-5
-encrypt <columns> <password>
Encrypt using AES with SHA-256 key
  • columns: Column indices. Can include ranges, e.g. 0-5
  • password:
-decrypt <columns> <password>
Encrypt using AES with SHA-256 key
  • columns: Column indices. Can include ranges, e.g. 0-5
  • password:
Add a UUID field
Add 1,2,3... as column
Add 'A','B', ... as column
-soundex <columns>
Generate a soundex code
  • columns: Column indices. Can include ranges, e.g. 0-5
-wikidesc <columns> <suffix>
Add a description from Wikipedia
  • columns: Search string columns
  • suffix: Text to add after
This uses the API at https://en.wikipedia.org/w/api.php to get the description of a wikipedia page that matches the string provided by the given columns.

-image <columns> <suffix>
Do a Bing image Search for an image
  • columns: Column indices. Can include ranges, e.g. 0-5
  • suffix: Text to add after
-embed <url column>
Download the URL and embed the image contents
  • url column:
-fetch <name> <ignore_errors> <url>
Fetch the URL and embed the contents
  • name: Name of new column
  • ignore_errors: Ignore Errors e.g., true or false
  • url: URL template, e.g., https://foo.com/${column_name}
-imagefill <querycolumn> <suffix> <imagecolumn>
Search for an image with the query column text if the given image column is blank. Add the given suffix to the search.
  • querycolumn:
  • suffix:
  • imagecolumn:
-download <column> <suffix>
Download the URL
  • column: Column that holds the URL
  • suffix: File suffix
-gender <columns>
Figure out the gender of the name in the column
  • columns: Column indices. Can include ranges, e.g. 0-5


Dates
-indateformats <format> <timezone>
Specify one or more date formats for parsing
  • format: e.g. yyyy-MM-dd HH:mm:ss. Use semi-colon separated formats for multiples. Help
  • timezone:
-outdateformat <format> <timezone>
Specify date format for formatting
  • format: e.g. yyyy-MM-dd HH:mm:ss. Help
  • timezone:
-convertdate <columns>
Convert date
  • columns: Columns to convert
-adddate <date_column> <value> <value_type>
Add date
  • date_column: Date Column
  • value: Value Column
  • value_type: Value type - millisecond,second,minute,hour,hour_of_day,week,month,year
-cleardate <date_column> <component>
Clear date components
  • date_column: Date Column
  • component: Date component values:millisecond,second,minute,hour_of_day,day_of_month,month
-extractdate <date column> <what>
Extract date
  • date column:
  • what: What to extract, e.g., year, month, day_of_week, etc values:era,year,month,day_of_month,day_of_week,week_of_month, day_of_week_in_month,am_pm,hour,hour_of_day, minute,second,millisecond,days_in_year, hours_in_year, minutes_in_year,seconds_in_year
-formatdate <columns>
Format date
  • columns: Column indices. Can include ranges, e.g. 0-5
-formatdateoffset <column> <what>
Format the date offset, e.g. the hours in year
  • column:
  • what: What type of offset, e.g., year, month, day_of_week, etc values:days_in_year, hours_in_year, minutes_in_year,seconds_in_year
-elapsed <column>
Calculate elapsed time (ms) between rows
  • column:
-msto <column> <to>
Convert milliseconds to
  • column:
  • to: seconds|hours|days|weeks|months|years
-latest <columns> <column> <format>
Pass through rows whose date is the latest in the group of rows defined by the key column
  • columns: Key columns
  • column: Date column
  • format: Date Format, e.g. yyyy-MM-dd
-datediff <column1> <column2> <unit>
Calculate elapsed time between columns column1-column2
  • column1: Column 1
  • column2: Column 2
  • unit: Unit-milliseconds,seconds,minutes,hours,days values:milliseconds,seconds,minutes,hours,days
-datecompare <column1> <column2> <operator>
add a true/false column comparing the date values
  • column1:
  • column2:
  • operator: <,<=,=,!=,>=,> values:<,<=,=,!=,>=,>


Numeric
Commands:
-scale <columns> <delta1> <scale> <delta2>
Set value={value+delta1}*scale+delta2
  • columns: Column indices. Can include ranges, e.g. 0-5
  • delta1:
  • scale:
  • delta2:
The -scale command adds offset and seesv -scale <column id> <offset1> <scale> <offset2>Where the result value, v2:
  v2 = (value+offset1)*scale + offset
You can specify random values for any of the values with:
#Generate a random number between 0-1
random:

#Generate a random number between 10 and 11
random:10

#Generate a random number between 10 and 30
random:10:30

For example, if you had a latitude value that you wanted to obfuscate by adding some random value between 1 and 2 you would do:
  -scale latitude random:1:2 1 0

-makenumber <columns>
Try to parse as number
  • columns: Column indices. Can include ranges, e.g. 0-5
-generate <label> <start> <step>
Add row values
  • label:
  • start:
  • step:
-decimals <columns> <num_decimals>
Round decimals
  • columns: Column indices. Can include ranges, e.g. 0-5
  • num_decimals: how many decimals to round to
-fuzz <columns> <num_places> <num_random_digits>
fuzz the number. if num_places less than zero than that is the # of decimals. else that is the lower digits to fuzz out
  • columns: Column indices. Can include ranges, e.g. 0-5
  • num_places: how many places to round to. use <=0 for decimals
  • num_random_digits: how many random digits
The -fuzz command works to anonymize numeric values.

The num_random_digits is used to generate a number with those random digits.

If the num_places argument is <= zero then that is the number of decimal places the number is rounded to. If there are fewer than num_places decimals than those decimals are converted to "0". The randomized digits are then appended to the number. For example with R = random digit: seesv -fuzz <columns> -2 3 Gives:

12345.6789 -> 12345.67RR
145.678 -> 145.67RR
9.0 -> 9.RR

If num_places > 0 then the number is converted to an integer and the num_places least significant digits are converted to 0. If the number of digits is < num_places then the number is converted to 0. The randomized digits are then addedto the number. For example with R = random digit: seesv -fuzz 2 3 Gives:

12345.678 -> 123RR
145.678 -> 1RR
1.678 -> RR

-ceil <columns> <value>
Set the max value
  • columns: Column indices. Can include ranges, e.g. 0-5
  • value: Value
-floor <columns> <value>
Set the min value
  • columns: Column indices. Can include ranges, e.g. 0-5
  • value: Value
-delta <key columns> <columns>
Add column that is the delta from the previous step
  • key columns:
  • columns: Column indices. Can include ranges, e.g. 0-5
-runningsum <columns>
Make a running sum of the column values
  • columns: Column indices. Can include ranges, e.g. 0-5
-trendcounter <column> <name>
Make counter field that is incremented everytime the value column decreases
  • column: The value column
  • name: Name of counter column
-operator <columns> <new col name> <operator>
Apply the operator to the given columns and create new one
  • columns: Columns
  • new col name:
  • operator: Operator:+,-,*,/,%,average values:+,-,*,/,%,average
-compare <column1> <column2> <operator>
Add a true/false column comparing the values
  • column1:
  • column2:
  • operator: <,<=,=,!=,>=,> values:<,<=,=,!=,>=,>
-round <columns>
Round the values
  • columns: Column indices. Can include ranges, e.g. 0-5
-abs <columns>
Make absolute values
  • columns: Column indices. Can include ranges, e.g. 0-5
-clip <columns> <min> <max>
Clip the number to within the range
  • columns: Column indices. Can include ranges, e.g. 0-5
  • min:
  • max:
-rand <column name> <minrange> <maxrange>
make random value
  • column name:
  • minrange: Minimum range (e.g. 0)
  • maxrange: Maximum range (e.g. 1)
-even <columns>
Add true if the column starts with an even number
  • columns: Column indices. Can include ranges, e.g. 0-5
-sum <key columns> <value columns> <carry over columns>
Sum values keying on key column value. If no value columns specified then do a count
  • key columns:
  • value columns:
  • carry over columns:
-pivot <key columns> <column columns> <value column> <operator>
Make a pivot table
  • key columns: Columns to key on
  • column columns: The columns the values of which are used to make the new columns in the result
  • value column: The value column
  • operator: The operator to apply - count,sum,average,min,max
For example, with the input file:
RegionDateUnitsSales
West2016111.00
South2016896.00
West2016226.00
North2016784.00
North20168104.00
South2016222.00

Running: seesv -pivot region date sales count -p input.csv Gives:
Region2016201720182019
West15.051.049.042.0
South25.023.031.014.0
North15.027.056.039.0
East10.024.018.013.0

You can also specify multiple operators: seesv -pivot region date sales count,min,max -p input.csv Gives:
Region2016 - count2016 - min2016 - max2017 - count2017 - min2017 - max2018 - count2018 - min2018 - max2019 - count2019 - min2019 - max
West15.011.096.051.011.0104.049.011.0112.042.013.0120.0
South25.012.096.023.011.0104.031.011.0120.014.012.0112.0
North15.012.0104.027.012.096.056.011.0120.039.013.0120.0
East10.024.096.024.011.0104.018.011.0112.013.012.0105.0

-summary <key columns> <value columns> <carry over columns> <ops>
count/sum/average/min/max values keying on key column value. If no value columns specified then do a count
  • key columns: Columns to key on
  • value columns: Columns to apply operators on
  • carry over columns: Extra columns to include
  • ops: any of count,sum,average,min,max
For example, with the below data, if you wanted to summarize the total units by region:
RegionColorUnitsSales
WestRed111.00
SouthBlue896.00
WestGreen226.00
NorthBlue784.00
NorthGreen8104.00
SouthRed222.00
EastBlue560.00
WestGreen226.00

You would do: seesv -summary region units "" sum -p summary.csv Which gives:
RegionUnits sum
West683.0
South399.0
North672.0
East294.0

If you wanted to get the sum and average of the sales: seesv -summary region sales "" sum,avg -p summary.csv You'd get:
RegionSales sumSales avg
West8873.056.51592356687898
South5044.054.236559139784944
North8709.063.56934306569343
East3730.057.38461538461539

The carry over column just gets the first value for the key column. So, if you wanted to include the color column do: seesv -summary region sales color sum,avg -p summary.csv Which gives:
RegionSales sumSales avgColor
West8873.056.51592356687898Red
South5044.054.236559139784944Blue
North8709.063.56934306569343Blue
East3730.057.38461538461539Blue

-histogram <column> <bins> <value columns> <ops>
Make a histogram with the given column and bins
  • column: The column
  • bins: Comma separated set of bin values
  • value columns: Extra columns to sum up
  • ops: ops to apply to extra columns - any of count,sum,average,min,max
The bins value is a comma separated list of the numeric ranges that define how to bin the data. e.g., for a set of age ranges one can specify:
bins="18,30,40,50,60,70"
This results in bins of:
0-18
18-30
30-40
40-50
etc.
You can also specify a range with an optional step value. For example this:
bins="18-30" 
Will give you the bins:
18-19
19-20
20-21
...
29-30
30-infinity
Or you can specify a step
bins="18-30:2" 
Will give you the bins:
18-20
20-22
22-24
...

-percent <columns>
Add columns together. Replace with their percentage
  • columns: Columns to add
-increase <columns> <how far back>
Calculate percent increase
  • columns: Column indices. Can include ranges, e.g. 0-5
  • how far back:
-diff <columns> <how far back (default 1)>
Difference from previous value
  • columns: Column indices. Can include ranges, e.g. 0-5
  • how far back (default 1):
-average <columns> <period> <label>
Calculate a moving average
  • columns: Columns
  • period:
  • label:
-ranges <columns> <name> <start> <size>
Create a new column with the (string) ranges where the value falls in
  • columns: Column indices. Can include ranges, e.g. 0-5
  • name: New column name
  • start: Numeric start of range
  • size: Numeric size of range
-bytes <unit> <columns>
Convert suffixed values (e.g., 2 MB) into the number
  • unit: values:binary,metric
  • columns: Column indices. Can include ranges, e.g. 0-5
-column_and <name> <columns>
And values
  • name: New column name
  • columns: Column indices. Can include ranges, e.g. 0-5
-colum_nor <name> <columns>
Or values
  • name: New column name
  • columns: Column indices. Can include ranges, e.g. 0-5
-column_not <name> <column>
Not value
  • name: New column name
  • column:
-check <columns> <what>
Check that the values are numbers
  • columns: Column indices. Can include ranges, e.g. 0-5
  • what: How strict values:strict,ramadda


Geospatial
-geocode <columns> <prefix> <suffix>
Geocode using given columns
  • columns: Address columns
  • prefix: optional prefix e.g., state: or county: or country:
  • suffix:
You can use Google or geocode.io for geocoding by setting the following environment variables. Else, this defaults to using US Census geocode API.

To use Google geocoding set the environment variable:

GOOGLE_API_KEY=...
To use geocod.io set the environment variable:
GEOCIDEIO_API_KEY=...
For here.com set:
HERE_API_KEY=
To geocode, e.g., with the following file:
citystate
BoulderCO
LaramieWY
BuffaloNY

Run: seesv -geocode city,state "" "" ... This concatenates the city and state field, e.g., "Boulder CO" and calls the API to geolocate.

To use internal location tables set the prefix argument to one of: seesv -geocode columns "city:" "" ...
seesv -geocode columns "county:" "" ...
seesv -geocode columns "state:" "" ...
seesv -geocode columns "zip:" "" ...
seesv -geocode columns "country:" "" ...
The suffix gets appended to the query string. For example if you had a file with just cities in a state, e.g.:

city
Boulder
Denver
Grand Junction
Pueblo

You can geocode with: seesv -geocode city "" "CO"

-geocodeifneeded <columns> <prefix> <suffix> <latitude> <longitude>
Geocode if needed
  • columns: Address columns
  • prefix: optional prefix e.g., state: or county: or country:
  • suffix:
  • latitude: latitude column
  • longitude: longitude column
-geocodeaddressdb <columns> <prefix> <suffix>
Geocode for import into RAMADDA's DB. The lat/lon is one semi-colon delimited column
  • columns: columns
  • prefix: optional prefix e.g., state: or county: or country:
  • suffix:
-geocodejoin <column> <csv file> <key idx> <lat idx> <lon idx>
Geocode with file
  • column: key column
  • csv file: File to get lat/lon from
  • key idx:
  • lat idx:
  • lon idx:
This assumes you have a separate file that contains a key value and latitude/longitude and the source file has a key value. For example, you might have a list of precincts, e.g.:
precinctlatitudelongitude
417110764840.197055-105.100351
217120730839.996986-105.084938
217330741040.039912-105.081393

The source file is a list of voters, e.g.:
nameprecinct
John Doe4171107648
Jane Doe4171107648
Jim Smith2171207308

To add the lat/lon: seesv -geocodejoin precinct precincts.csv precinct latitude longitude

-bounds <north> <west> <south> <east>
Geocode within bounds
  • north:
  • west:
  • south:
  • east:
-decodelatlon <columns>
Decode latlon
  • columns: Lat or Lon column
This converts a column that is of the form degrees minutes seconds into it decimal degrees equivalent. For example, with this data:
latitudelongitude
40° 1' 12 N107° 40' 10 W
40:1:12107:40:10
40:1:12 N107:40:10 W
40:1107:40

Running the commands: seesv -decodelatlon latitude -decodelatlon longitude Gives:
latitudelongitude
40.0-107.0
40.019999999999996107.66944444444445
40.019999999999996-107.66944444444445
40.016666666666666107.66666666666667

If you wanted to have less precision then call: seesv -decodelatlon latitude -decodelatlon longitude -decimals latitude,longitude 5 -p test.csv Which gives:
latitudelongitude
40.0-107.0
40.02107.66944
40.02-107.66944
40.01667107.66667

-getaddress <latitude> <latitude>
Get address from lat/lon
  • latitude: latitude column
  • latitude: latitude column
This function takes a latitude and longitude column and does an address lookup using either geocod.io or here.com apis. See above for the API keys that need to be set.

-statename <state_column>
Add state name from state ID
  • state_column: State ID column
-geoname <lookup> <fields> <latitude> <longitude>
Look up location name
  • lookup: ('counties' or 'states' or 'countries' or 'timezones')
  • fields: fields in shapefile
  • latitude: Latitude column
  • longitude: Longitude column
This uses one of the shapefiles provided by RAMADDA and finds the name of the feature that contains the given lat/lon. For example if you had the following data file:
namelatitudelongitude
City 140.197055-105.100351
City 239.996986-105.084938
City 340.039912-105.081393

You can add the country and timezone with: seesv -geoname countries latitude longitude -geoname timezones latitude longitude

-geocontains <lookup> <name> <latitude> <longitude>
Check for containment
  • lookup: ('counties' or 'states' or 'countries' or 'timezones')
  • name: new column name
  • latitude: Latitude column
  • longitude: Longitude column
-elevation <latitude> <longitude>
Look up elevation(using 1/3 arc-second DEM)
  • latitude: Latitude column
  • longitude: Longitude column
This uses the web service at https://nationalmap.gov/epqs/pqs.php to look up the elevation for the given lat/lon. For example if you had the following data file:
namelatitudelongitude
City 140.197055-105.100351
City 239.996986-105.084938
City 340.039912-105.081393

You can add the elevation with: seesv -geoname countries latitude longitude -geoname timezones latitude longitude

-mercator <columns>
Convert x/y to lon/lat
  • columns: x and y columns
-region <columns>
Add the state's region
  • columns: Columns with state name or abbrev.
This does a lookup from an internal table of the regions that maps state name (or abbrev) to region:
Alaska=West
AK=West
Alabama=South
AL=South
Arkansas=South
...
So with the following data:
name
Wyoming
California
Colorado
The region is added with: seesv -region name

-population <columns> <prefix> <suffix>
Add in population from address
  • columns: Column indices. Can include ranges, e.g. 0-5
  • prefix: e.g., state: or county: or city:
  • suffix:
-neighborhood <latitude> <longitude> <default>
Look up neighborhood for a given location
  • latitude: Latitude column
  • longitude: Longitude column
  • default: Default value
This command uses the API from https://www.precisely.com/ to look up the neighborhood name from a given lat/lon. To use this get and API key from Precisely and set the environment variable:
PRECISELY_API_KEY=
Call: seesv -neighborhood latitude_column longitude_column



Misc
-apply <columns> <commands>
Apply the commands to each of the columns
  • columns: Columns to expand with
  • commands: Commands. Use the macro ${column}. End with -endapply
-sortby <columns> <direction> <how>
  • columns: Column to sort on
  • direction: Direction - up or down values:up,down
  • how: How to sort - string, length, date, extract (number) values:string,number,length,date,extract
Show count
-alias <name> <alias>
Set a field alias
  • name: Name
  • alias: Alias
The alias command can be used to make your commands easier to deal with if you have long field names. For example, say you have data like:
some_really_long_field_name
1
2

You can use the alias command to provide a short name, e.g.: seesv -alias some_really_long_field_name name -columns name ...

-value <name> <value>
Define a macro value for later use
  • name: name
  • value: Value
-filepattern <name> <pattern>
Extract a macro value from a filename
  • name: Macro name
  • pattern: Regexp pattern. Help
-changeline <from> <to>
Change the line
  • from: From pattern
  • to: To string
-changeraw <from> <to>
Change input text
  • from: From pattern
  • to: To string
-crop <columns> <patterns>
Crop last part of string after any of the patterns
  • columns: Column indices. Can include ranges, e.g. 0-5
  • patterns: Comma separated list of patterns
Be strict on columns. any rows that are not the size of the other rows are dropped
Be strict on columns. any rows that are not the size of the other rows are shown
-prop <property> <value>
Set a property
  • property: values:position
  • value: start, end, etc
Go easy on missing columns
Verify that all of the rows have the same # of columns
-ext <program_id>
Execute the external program
  • program_id: matches with seesv.ext.<program_id>=/path
The -ext command allows one to integrate external programs into the processing flow. seesv -ext <id> <arguments> The id is used to look up an environment variable (or RAMADDA runtime property) of the form:
seesv_ext_<id>=/path/to/executable
Any command line arguments given up to but not including an argument that equals "-" are passed in to the executable. The executable is called with the given arguments. Each row that is processed is converted into a CSV string which is passed to the external command on the stdin. A CSV text string is then read from the external command. This line is then passed on in the processing flow. If you want to skip a line the return the string "_null_".

-exec <program_id>
Execute the external program for every line
  • program_id: matches with seesv.ext.<program_id>=/path
The -exec command allows one to integrate external programs into the processing flow. seesv -exec <id> <arguments> The id is used to look up an environment variable (or RAMADDA runtime property) of the form:
seesv_exec_<id>=/path/to/executable
Any command line arguments given up to but not including an argument that equals "-" are passed in to the executable. For every row the executable is called with the given arguments. Each of the arguments can contain any number of macros of the form:
seesv  -exec <id> ${url}  ${name}
The macros get replaced with the corresponding values in the given row.

So for example, if you have a list of urls that you want to fetch using cURL for set the environment variable:

export seesv_exec_curl=/usr/bin/curl    
Then assuming you have the file:
url
url1.html
url2.html
Call the command
seesv  -exec curl ${url}   - -p
This produces a new file:
url,result
url1.html,contents of url1
url2.html,contents of url2



Output
Print text output
-printdelim <delimiter>
Print with delimited output
  • delimiter: Delimiter - ,|^ etc. Use "tab" for tab
-o <file>
Write to the given file (command line only)
  • file: The file
-comment <comment>
Add a comment to the output
  • comment: The comment
-outputprefix <text>
Specify text to add to the beginning of the file
  • text: The text. Use '_nl_' to add a new line. Use '_bom_' to write out the byte order mark.
-highlight <columns> <color>
Highlight the columns
  • columns: Column indices. Can include ranges, e.g. 0-5
  • color: Color values:red,green,yellow,blue,purple,cyan
-background <color>
Background the columns
  • color: Color values:red,green,yellow,blue,purple,cyan
Print header
Print the file raw
Print HTML table and stats
-cols <width>
Set the width of the columns for output. Use with -p
  • width: Column width

Print summary stats
Print records
Generate script
-toxml <outer tag> <inner tag>
Generate XML
  • outer tag:
  • inner tag:
-tojson <key index>
Generate JSON
  • key index: If defined use this as a map
-togeojson <latitude> <longitude> <columns>
Generate GeoJSON
  • latitude: latitude column
  • longitude: longitude column
  • columns: property columns - use * for all
Generate DB publish urls
-todb <db id> <table> <columns> <properties>
Write to Database
  • db id:
  • table: table name
  • columns: database columns
  • properties: name value properties
-template <prefix> <template> <row_delimiter> <suffix>
Apply the template to make the output
  • prefix:
  • template: Use ${column_name} or indices: ${0},${1}, etc for values
  • row_delimiter: Output between rows
  • suffix:
For example, with this data:
namevalue
joe5
jill2
bill7

Calling: seesv -template "PREFIX\n" "name:\${name} value:\${value}" "\nROW DELIMITER\n" "\nSUFFIX\n" test.csvResults in:
PREFIX
name:joe value:5
ROW DELIMITER
name:jill value:2
ROW DELIMITER
name:bill value:7
SUFFIX
Note: the row delimiter is only printed between rows.

If you wanted to generated custom XML, for example, call: seesv -template "<mytags>\n" "<tag name=\"\${name}\" value=\"\${value}\"></tag>" "\n" "\n</mytags>\n" test.csvWhich results in:

<mytags>
<tag name="joe" value="5"></tag>
<tag name="jill" value="2"></tag>
<tag name="bill" value="7"></tag>
</mytags>

-subd <columns> <ranges> <output_template>
Subdivide into different files
  • columns: columns to subdivide on
  • ranges: Comma separated ranges min1;max1;step1,min2;max2;step2
  • output_template: Output template - use ${ikey} or ${vkey}, e.g., grid${ikey}.csv
The -subd command takes the input and, based on the values of the specified columns, subdivides the input rows into one or more output files based on the values of the given columns.

The format is seesv -subd col1,col2,...,colN "min1;max1;number1,min2;max2;number2,...,minN;maxN;numberN" "output_file_macro${ikey}.csv

The min1;max1;number1 are the min/max range and the number of steps the range is divided by for each column value.

The output file is a template to use to create the destination output file. You can use "${ikey}" for the indices or ${vkey} for the values.

For example, you can use this to do geographic tiling of a set of input data that contains latitude and longitude. If your input.csv file has latitude and longitude columns the below command: seesv -subd latitude,longitude "-90;90;180,-180;180;360" "grid\${ikey}.csv" input.csv says to subdivide the latitude values between -90 and 90 with 180 steps - i.e. 1 degree spacing. The longitude is divided up between -180 and 180 with 360 steps - again, 1 degree spacing. The index of a latitude value is the index between -90 and 90. Likewise the index of the longitude value is between -180 and 180. The ${ikey} for the filename is latitudeindex_longitudeindex. So the above command produces:

grid116_102.csv
grid116_103.csv
grid119_84.csv
grid119_85.csv
grid119_86.csv
...
The grid${ikey}.csv is the output file name macro where ${ikey} is made up of the indices. Using grid${vkey}.csv would give the following file names based on the values of the ranges:
grid30_31_-92_-91.csv
grid39_40_-76_-75.csv
grid38_39_-76_-75.csv
...

-maptiles <columns> <degrees> <output_template>
Tile the data on lat/lon
  • columns: lat/lon columns to subdivide on
  • degrees: Degrees per tile. Defaults to 1
  • output_template: Output template - use ${ikey} or ${vkey}, e.g., tile${vkey}.csv. Defaults to a tile${vket}.csv
Just a shortcut for the -sub command

-chunk <output_template> <number>
Make a number of output files with a max number of rows
  • output_template: Output template - use ${number}, e.g., output${number}.csv. Defaults to a output${number}.csv
  • number: Number of rows in each file
-addheader <properties>
Add the RAMADDA point properties
  • properties: name1 value1 ... nameN valueN
    Set default: default.type double
The -addheader command adds RAMADDA's point data header to the output file. The header specifies the canonical field ids and a set of properties for each field and is of the form:
#fields=field1[prop1=value1 prop2=value2 ...],field2[],...,fieldN[propn=valuen ...]
v1,v2,v3
...
The addheader command is run as below passing in one set of name/value arguments. If the value has spaces then group the text with {...} brackets. seesv -addheader "arg1 value1 arg2 {value 2} arg3 value3" The args are typically of the form canonical_field_name.some_property where canonical field name is the column header cleaned up - lowercase, spaces converted to "_", strip off non alphanumeric characters.

For example, say you have a file like this:

some dateyearsome field
2018-09-01202310
2019-10-01202420
2020-09-01202430

Running this command: seesv -addheader "some_date.type date date.format yyyy-MM-dd year.type date year.format yyyy some_field.id temperature some_field.label {The Temperature} some_field.type celsius" -p testdate.csv Will result in an output:
#fields=some_date[label="Some Date"  type="date" format="yyyy-MM-dd" ] ,year[label="Year"  type="date" format="yyyy" ] ,temperature[label="Some Field"  type="integer" chartable="true" ] 
2018-09-01,2023,10
2019-10-01,2024,20
This specifies:
  • The type of the field "date" as date with the date format following the Java date format as yyyy-MM-dd. Other types include:
    • "enumeration", e.g. field.type enumeration
    • "integer", e.g. field.type integer
    • "double", e.g. field.type double
  • The year field is also a date with format yyyy
  • The canonical id of "some field" field is changed to "temperature"
  • The label of "some field" is changed to "The Temperature"
  • The unit of "some field" is set to be celsius. This can be used when displaying the data.
Other field properties include:
  • group, e.g. field.group {some group} - this is used to group a set of fields and is primarily used for display, e.g., providing a header in a table
  • default.format, e.g. default.format {yyyy-MM-dd} - this is to set the default date format that is applied to all fields
  • {some pattern}.{some property} - instead of the field name as the prefix you can specify a regular expression pattern. For example if you had data like:
      value1,value2,value3
      10,20,30
    
    You can specify the type as "value.*.type integer".

    Adding a grouping to the pattern can be used to pull out text from the field name and use it in the value. For example, below would set the label: seesv -addheader "(value.*).*.label {Some label \$1}"

-db <properties>
Generate the RAMADDA db xml from the header. See Help
  • properties: Name value pairs: table.id <new id> table.name <new name> table.cansearch false table.canlist false table.icon <icon>, e.g., /db/database.png <column>.id <new id for column> <column>.label <new label> <column>.type <string|enumeration|double|int|date|latlon> <column>.format <yyyy MM dd HH mm ss format for dates> <column>.canlist false <column>.cansearch false db.install <true|false> install the new db table db.droptable <true|false> careful! this deletes any prior created dbs db.yesreallydroptable true - this double checks
The -db command generates a Data Tables specification, a ...db.xml file that acts as a RAMADDA plugin. See https://ramadda.org/repository/db/db.html for more information.

The -db command takes the form:

-db "name value pairs"
e.g.:
-db "
table.id <new id> table.name <new name>
table.cansearch false table.canlist false
table.icon <icon>, e.g., /db/database.png
<column>.id <new id for column>
<column>.label <new label>
lt;column>.type <string|enumeration|double|int|date>
<column>.format <yyyy MM dd HH mm ss format for dates>
<column>.canlist false <column>.cansearch false
install <true|false install the new db table>
nukedb <true|false> careful! this deletes any prior created dbs
yesreallynukethewholedb true - this double checks
"
In the interactive Convert Data SeeSV form if you press "Table" then the schema will be shown. Clicking on the column links brings up a popup menu that sets the id, label, type, etc for the column.

If you have the install flag defined as true then when you hit "Process" the db.xml file will be generated and the plugin will be loaded into RAMADDA. You can then go and add a new entry of the Database name from the "Pick a type..." menu

If you have nukedb and the yesreallynukethewholedb flags set to true then when you press Process the database table will first be dropped and then recreated. This capability is there because you can iteratively change the type of the columns. Note: any database entry and data that has been loaded will be removed.

-dbprops <id pattern> <suffix pattern>
Print to stdout props for db generation
  • id pattern:
  • suffix pattern:
-write <file name template> <contents template>
Write the contents of a row to a named file
  • file name template:
  • contents template:
Print the fields
-run <Name of process directory>
  • Name of process directory:
-progress <rows>
Show progress
  • rows: How often to print
-debugrows <rows>
Debug # rows
  • rows:
    1. of rows
Generate the RAMADDA point properties
Generate the CSV file commands
-typexml <type_id> <type_label> <database_columns>
Generate the RAMADDA type xml
  • type_id: Type ID, prefix with type_, no spaces, lowercase, e.g.type_point_mypointdata
  • type_label: Type Label - human readable label
  • database_columns: List of database columns, one per line
    e.g:id,label,type,prop,value1,prop2,value1
    Types can be:
    string,int,double,enumeration,enumerationplus
    list,latlon,latlonbox,url,date,datetime
    If enumeration then define values with escaped commas - \,:
    fruit,Fruit,enumeration,values,banana\,apple\,orange

The -typexml command generates an entry type definition file. The Type ID should have a type_ prefix, contain no no spaces and be lowercase The Database columns are one per line where each line is:
id,label,type,prop1,value1,prop2,value2,...
e.g.:

#A String value
some_field,Some Field,string

#An integer value
some_number,int,Some Integer Column,default=30,showinhtml,false

#A double value
some_double,double,Some Double number,cansearch,false

#An enumeration. Use \n to escape the commas
fruit,Fruit,enumeration,values,banana\,apple\,orange

#An enumeration that can also be added new values for
fruit_plus,Fruit Plus,enumerationplus,values,banana\,apple\,orange

#Use id:label in the values
sport,Sport,enumeration,values,baseball:Baseball\,football:Football\,hockey:Hockey
The id will be the column identifier in the database so it should be lower case Press Outputs-Print text output to generate the mytypes.xml file. Download it and copy it to your RAMADDA Home/plugins directory and restart your RAMADDA. The types.xml has a default wiki text that you can change