Validator
A CLI validator for pool objects and entire even draft, snapshot, and published pools is provided at /pools/data-pool-tools/bin/data-pool-tools-validate
.
It is used like
/pools/data-pool-tools/bin/data-pool-tools-validate [OPTIONS] PATH
where PATH
is the part of your draft pool you want to validate or even the whole draft pool if you give the path to its directory.
The validator autodetects what is being validated based on the specific PATH
.
Use the -h
or --help
options to see all options.
The most useful ones are -o OUTPUT
to make it write the results of the validation to a file rather than stdout and -f FORMAT
to change the output format.
The output formats are human-color
for human readable with color (default on the CLI), human
for human readable (default for any output that isn’t a JSON file or the CLI), json
for JSON (default for any file ending in .json
), and auto
for choosing based on the output (default).
An example with a valid pool made from the template and one data file is
[gzadmfnord@glogin8 ~]$ cp -r /pools/data-pool-template mypool
[gzadmfnord@glogin8 ~]$ echo "this is data" > mypool/content/data/data.txt
[gzadmfnord@glogin8 ~]$ /pools/data-pool-tools/bin/data-pool-tools-validate mypool
Validated "draft" at mypool
Valid: valid
Public: yes
Flagged:
Green: 6
Citation: 1
* CITATION.bib file is OK.
Draft: 1
* Top-level of pool is OK.
Metadata: 1
* METADATA.json file is OK.
Public: 1
* public file is OK.
Readme: 1
* README.md file is OK.
Content: 1
* content directory is OK.
Info:
Metadata:
title: 'TITLE'
Readme:
title: 'TITLE'
Content:
number_directories: 0
number_files: 1
number_symlinks: 0
size_inodes: 1
size_inodes_human: '1'
size_space: 13
size_space_human: '13 B'
files_by_extension:
.txt: 1 files, 13 B
The validator’s human readable output shows
- What is being validated
- Whether it is valid or not (valid, possibly invalid, probably invalid, or invalid)
- What has been flagged
- Additional information
The validator flags various things which are organized by the kind of flag followed by the kind of pool object. The different flags and their meanings are
Flag | Meaning |
---|---|
forbidden | Critical problem with the pool object. |
red | Potentially serious problem with the pool object. Will require discussion if submitted. |
yellow | Potential problem with problem with the pool object. May require discussion if submitted. |
green | The pool object is OK. |
awesome | Something good above and beyond that should be kept for sure. |
Fix anything flagged as forbidden. The yellow and red flags are meant to denote things which might be wrong but might not be, so check them. Pools with yellow and red flags may be fine, but the flagged items will have to be discussed after submission. For example, a very large pool that is past the red threshold in space will lead to a discussion about whether the data is suitably compressed among other space saving strategies.
Here is an example of a pool that has a forbidden file and what the validator returns
[gzadmfnord@glogin8 ~]$ cp -r /pools/data-pool-template mypool2
[gzadmfnord@glogin8 ~]$ echo "my data" > mypool2/data.txt
[gzadmfnord@glogin8 ~]$ /pools/data-pool-tools/bin/data-pool-tools-validate mypool2
Validated "draft" at mypool2
Valid: invalid
Public: yes
Flagged:
Forbidden: 1
Draft: 1
* Top-level directory contains a forbidden file: data.txt
Green: 5
Citation: 1
* CITATION.bib file is OK.
Metadata: 1
* METADATA.json file is OK.
Public: 1
* public file is OK.
Readme: 1
* README.md file is OK.
Content: 1
* content directory is OK.
Info:
Metadata:
title: 'TITLE'
Readme:
title: 'TITLE'
Content:
number_directories: 0
number_files: 0
number_symlinks: 0
size_inodes: 0
size_inodes_human: '0'
size_space: 0
size_space_human: '0 B'
files_by_extension:
The information section at the end shows useful information gathered at each step of the validation.
Particularly useful are checking that the pool titles from the METADATA.json
and README.md
files match as well as the content information on the size of the pools content.
Information on the pool size is given both in aggregate (size_inodes
and size_space
) as well as by file extension.
This is useful for seeing how many files, of what kind, and how big they are in the pool.
Be on the lookout for large numbers of files (inodes) or a large fraction of the pool being taken up by uncompressed files (e.g. .tar
files rather than compressed .tar.zst
files).