AWS S3 Integration

7.7.0	S3 Bucket Entry
7.7.1	More Configuration
7.7.2	S3 Harvesting
7.7.3	Specifying Entry Types
7.7.4	S3 Tools

7.7.0 S3 Bucket Entry

An S3 Bucket entry type is provided. This allows you to have a "synthetic" set of entries in RAMADDA that represent the S3 bucket hierarchy. These entries do not exist in the RAMADDA database but can be viewed and accessed through the web interface.

To create a new S3 entry go to the File->Pick a Type menu and select "AWS S3". In the create form entry the S3 URL, a name and an optional description:

Image 1: Creating a New S3 Entry

After adding the entry you can edit it to configure how the S3 buckets are displayed.

Image 2: S3 Entry Configuration

Enable caching: Turn this off if you are working on the settings as the entries are normally cached and any changes you make might not take effect. Note: turn this back on once you are done.
Exclude patterns: Regular expression patterns, one per line, that are used to exclude certain S3 objects.
Max per folder: Max number of folders or files listed per folder
Percentage filter: Used to sub-sample the number of folders/files shown per folder. Value is a probablility that a folder/file will be included with range from 0-1
Max file size: Max size of a file to include
Convert dates: If yes then RAMADDA will see if the name of a folder and file matches certain basic patterns for dates (e.g. yyyy for year, yyyy/mm for year then month, etc)
Date patterns- a ";" seperated 3-tuple, one per line. Each 3-tuple has a:
```
pattern;template;date format
```
and is used to match against a folder or file name. The pattern needs to contain "(...)" groups. If it matches then the matching groups are extracted and the template (the 2nd part of the 3-tuple) is used to create a date string. This date string is then parsed by the date format.

7.7.1 More Configuration

You can also specify the naming, latitude/longitude and description templates using a "Convert File" property. For example, the NEXRAD on AWS S3 collection (s3://noaa-nexrad-level2/) has the following hierarchy. There are year/month/day folders.

First, we'll edit the entry and turn on the "Convert dates" property. This results in folders having the dates and names as follows:

Next, we'll go to the Edit menu and select Add Convert file. This allows us to upload a file that can be used for entry name aliases, geospatial locations or display templates. The file can be a .csv, .properties or .txt file. If it is a .csv file then the first column is used to match with the S3 folder names. If it is a .properties then it is of the format:

key1=value1  
key2=value2
...

If it is a .txt file then the first line is the key that is matched against the S3 folder names and the rest of the file is a wiki display template:

key_or_pattern
wiki text
...

For the NEXRAD example we will upload a nexrad.csv file. This has the following structure:

id,fullname,lat,lon
KAPD,KAPD - Fairbanks/Pedro Dome,65.035,-147.50167
KAEC,KAEC - Fairbanks/Nome,64.51139,-165.295
KACG,KACG - Juneau/Sitka,56.85278,-135.52917
KAIH,KAIH - Anchorage/Middleton Island,59.46139,-146.30306
...

The first column is the station ID and this is used to match up (either exact or pattern) with the S3 folders. Upload the nexrad.csv file and select Alias and Exact:

Image 5: Adding a Convert File

Now, the station folders have the full name. Next, upload the same file and select "Location". This adds the spatial metadata to the station folders.

Note: we have combined the aliases with the locations. The aliases file could have been a properties file: nexrad_aliases.properties file.

Next, we want to specify the wiki template to be used for the folder that contains the stations, i.e., the year/month/day folder. Here we will upload this nexrad_templates.txt file. It has the format:

^\d\d/\d\d$
+section title="NEXRAD Stations - {{name entry=parent}} {{name}} {{name entry=grandparent}}"
+inset left=20px right=20px
{{map listentries=true hideIfNoLocations=true}}
...

The first line is a pattern that matches NN/NN, i.e., the month/day folder names. The remainder of the file is the wiki text to be used for the folders that match the pattern.

Upload this file, select "Template" and for match type select "Pattern". Now, for the day folder we should have:

And finally, we want to show the files as "Level 3 Radar" files, a file format that RAMADDA provides. To do this go to the Add Properties menu and under Miscellaneous select "Entry Type Patterns". This property allows you to specify an entry type (e.g., Level 2 Radar File) and one or more patterns to match. For the NEXRAD data collection we know that all of the files are of this type so we specify as a pattern "file:"

Now, with that the files should be shown as this entry type:

7.7.2 S3 Harvesting

The RAMADDA file harvester can also harvest a S3 bucket store. Instead of a file path just enter the S3 URL, e.g.:

Image 9: Harvesting S3 Buckets

If you click on the "Move file to storage" then RAMADDA will copy the file from S3 over into its entry storage area. All of the other harvester settings work like a regular file harvester except that the metadata harvesting will only work if the file has also been copied over.

7.7.3 Specifying Entry Types

By default when an entry is created representing an S3 file RAMADDA will try to figure out its entry type based on the file name or it will use the basic S3 Bucket entry type. For example, if you had a number of .zip shapefiles RAMADDA would default to the Zip file entry type. You can override this by adding one or more Entry Type Pattern properties to the S3 Root entry. Go to the Add Properties menu of the root entry and select Entry Type Pattern. Select the appropriate entry type (e.g., Shapefile) and specify one or more patterns to match with. Note: the patterns are regular expression not glob style patterns.

Image 10: Specifying Entry Type Patterns

7.7.4 S3 Tools

As a convenience RAMADDA provides a S3 command line utility in the SeeSV package.

Download and unzip the seesv.zip and run the s3.sh to get a help listing:

sh s3.sh -help
Usage:
S3File 
	<-key KEY_SPEC (Key spec is either a accesskey:secretkey or an env variable set to accesskey:secretkey)>
	<-download  download the files>  
	<-nomakedirs don't make a tree when downloading files> 
	<-overwrite overwrite the files when downloading> 
	<-sizelimit size mb (don't download files larger than limit (mb)> 
	<-percent 0-1  (for buckets with many (>100) siblings apply this as percent probablity that the bucket will be downloaded)> 
	<-recursive  recurse down the tree when listing>
	<-search search_term>
	<-self print out the details about the bucket> ... one or more buckets

Another utility is provided in the RAMADDA repository runtime at the /aws/s3/list entry point. This provides an interface to the above tool to do a recursive listing of a bucket store. Check it out on https://ramadda.org/repository/aws/s3/list. Try this out with an example:

s3://first-street-climate-risk-statistics-for-noncommercial-use/