Data Cleansing

 

Data Schemes

In order for the GTN-P Database to read and work with your data the data sets must be cleansed (which means detecting and removing or correcting data which is incomplete or incorrectly formatted). The format rules for a cleansed data scheme are the following:

  • The character set should be UTF-8 (UCS Transformation Format - 8 bit)
  • Measurements should be given in meters (m)
  • Coordinates should be given in decimal degree with at least four decimals (allowing only a few meters offset). The referential system should be WGS 84 (EPSG 4326).
  • Positive values (+) should be used to indicate points below the ground
  • Negative values (-) should be used to indicate points above the ground
  • Use a dot (.) for decimals
  • Use a comma (,) to separate columns
  • Null values should be indicated by "-999".

 

Templates

Here we provide templates for data cleansing:

Ground Temperature data template

Active Layer Grid data template

Most of the data collections which are present in the GTN-P Database have been cleansed by Arctic Portal. However, due to the large amount of permafrost data collections to be uploaded to the GTN-P Database we will unfortunately not be capable of cleansing all of them and hence strongly recommend that all data files are cleansed before submitting by using the templates.

 

Data Upload Levels 

Another idea is two different entries for data collections where the user can choose to upload already cleansed data or uncleansed data which are turned by an algorithm into cleansed data. However, such an algorithm has not been implemented yet.

Strategy and Implementation Plan