Validator

A CLI validator for pool objects and entire even draft, snapshot, and published pools is provided at /pools/data-pool-tools/bin/data-pool-tools-validate. It is used like

/pools/data-pool-tools/bin/data-pool-tools-validate [OPTIONS] PATH

where PATH is the part of your draft pool you want to validate or even the whole draft pool if you give the path to its directory. The validator autodetects what is being validated based on the specific PATH. Use the -h or --help options to see all options. The most useful ones are -o OUTPUT to make it write the results of the validation to a file rather than stdout and -f FORMAT to change the output format. The output formats are human-color for human readable with color (default on the CLI), human for human readable (default for any output that isn’t a JSON file or the CLI), json for JSON (default for any file ending in .json), and auto for choosing based on the output (default).

An example with a valid pool made from the template and one data file is

[gzadmfnord@glogin8 ~]$ cp -r /pools/data-pool-template mypool
[gzadmfnord@glogin8 ~]$ echo "this is data" > mypool/content/data/data.txt
[gzadmfnord@glogin8 ~]$ /pools/data-pool-tools/bin/data-pool-tools-validate mypool
Validated "draft" at mypool

Valid: valid

Public: yes

Flagged:
  Green: 6
    Citation: 1
      * CITATION.bib file is OK.
    Draft: 1
      * Top-level of pool is OK.
    Metadata: 1
      * METADATA.json file is OK.
    Public: 1
      * public file is OK.
    Readme: 1
      * README.md file is OK.
    Content: 1
      * content directory is OK.

Info:
  Metadata:
    title: 'TITLE'
  Readme:
    title: 'TITLE'
  Content:
    number_directories: 0
    number_files: 1
    number_symlinks: 0
    size_inodes: 1
    size_inodes_human: '1'
    size_space: 13
    size_space_human: '13 B'
    files_by_extension:
      .txt: 1 files, 13 B

The validator’s human readable output shows

  1. What is being validated
  2. Whether it is valid or not (valid, possibly invalid, probably invalid, or invalid)
  3. What has been flagged
  4. Additional information

The validator flags various things which are organized by the kind of flag followed by the kind of pool object. The different flags and their meanings are

FlagMeaning
forbiddenCritical problem with the pool object.
redPotentially serious problem with the pool object. Will require discussion if submitted.
yellowPotential problem with problem with the pool object. May require discussion if submitted.
greenThe pool object is OK.
awesomeSomething good above and beyond that should be kept for sure.

Fix anything flagged as forbidden. The yellow and red flags are meant to denote things which might be wrong but might not be, so check them. Pools with yellow and red flags may be fine, but the flagged items will have to be discussed after submission. For example, a very large pool that is past the red threshold in space will lead to a discussion about whether the data is suitably compressed among other space saving strategies.

Here is an example of a pool that has a forbidden file and what the validator returns

[gzadmfnord@glogin8 ~]$ cp -r /pools/data-pool-template mypool2
[gzadmfnord@glogin8 ~]$ echo "my data" > mypool2/data.txt
[gzadmfnord@glogin8 ~]$ /pools/data-pool-tools/bin/data-pool-tools-validate mypool2
Validated "draft" at mypool2

Valid: invalid

Public: yes

Flagged:
  Forbidden: 1
    Draft: 1
      * Top-level directory contains a forbidden file: data.txt
  Green: 5
    Citation: 1
      * CITATION.bib file is OK.
    Metadata: 1
      * METADATA.json file is OK.
    Public: 1
      * public file is OK.
    Readme: 1
      * README.md file is OK.
    Content: 1
      * content directory is OK.

Info:
  Metadata:
    title: 'TITLE'
  Readme:
    title: 'TITLE'
  Content:
    number_directories: 0
    number_files: 0
    number_symlinks: 0
    size_inodes: 0
    size_inodes_human: '0'
    size_space: 0
    size_space_human: '0 B'
    files_by_extension:

The information section at the end shows useful information gathered at each step of the validation. Particularly useful are checking that the pool titles from the METADATA.json and README.md files match as well as the content information on the size of the pools content. Information on the pool size is given both in aggregate (size_inodes and size_space) as well as by file extension. This is useful for seeing how many files, of what kind, and how big they are in the pool. Be on the lookout for large numbers of files (inodes) or a large fraction of the pool being taken up by uncompressed files (e.g. .tar files rather than compressed .tar.zst files).