Command-line interface#

metakb#

metakb [OPTIONS] COMMAND [ARGS]...

Manage MetaKB data.

To reset the graph, prepare normalizers if unavailable, invalidate cached data, then load MetaKB data:

$ metakb clear-db
$ metakb check-normalizers || metakb load-normalizers
$ metakb update --refresh_source_caches

Other commands are available for more granular control over the update process.

Options

--version#

Show the version and exit.

check-normalizers#

metakb check-normalizers [OPTIONS] [[gene|disease|therapy]]...

Perform basic checks on DB health and table population for normalizers. Exits with status code 1 if >= 1 DB schema is uninitialized or critical tables appear empty for one or more of the concept normalizer services.

$ metakb check-normalizers
$ echo $?
1  # indicates failure

To select specific normalizer services, provide one or more arguments:

$ metakb check-normalizers therapy disease

Specific failures and descriptions are logged at level ERROR.

Options

-u, --normalizer_db_url <normalizer_db_url>#

URL endpoint of normalizer database. If not given, the individual normalizers will revert to their own defaults.

Arguments

[[gene|disease|therapy]]...#

Optional argument(s)

clear-db#

metakb clear-db [OPTIONS]

Clear graph DB.

$ metakb clear-db

Note that the Neo4j database URL, username, and password can either be set by a CLI options, or by the environment variable METAKB_DB_URL. For example:

$ metakb clear-db --db_url=bolt://username:password@localhost:7687

Options

-u, --db_url <db_url>#

Connection string for the application Neo4j database.

harvest#

metakb harvest [OPTIONS] [[civic|moa|cbioportal|fda_poda]]...

Perform harvest.

If provided SOURCE(s), only perform harvest on those sources:

$ metakb harvest civic

Otherwise, harvest all known sources.

Options

-r, --refresh_source_caches#

True if source caches (e.g. CIViCPy) should be updated prior to data regeneration. Note this will take several minutes. False if local cache should be used

Arguments

[[civic|moa|cbioportal|fda_poda]]...#

Optional argument(s)

load-cdm#

metakb load-cdm [OPTIONS] [CDM_FILE]...

Load one or more CDM_FILEs into Neo4j graph.

If no arguments are provided, load latest available from default transformed data location for each MetaKB source:

$ metakb load-cdm

Pass path to file(s) to load from a custom location:

$ metakb load-cdm path/to/file1.json path/to/file2.json

Use --from_s3 option to instead fetch snapshot files from the MetaKB S3 bucket:

$ metakb load-cdm --from_s3

Note that the Neo4j database URL, username, and password can either be set by a CLI options, or by the environment variable METAKB_DB_URL. For example:

$ metakb load-cdm --db_url=bolt://username:password@localhost:7687

Options

-u, --db_url <db_url>#

Connection string for the application Neo4j database.

-s, --from_s3#

Retrieves most recent data snapshot from the VICC S3 bucket and loads it. Mutually exclusive with target file arguments.

Arguments

[CDM_FILE]...#

Optional argument(s)

transform#

metakb transform [OPTIONS] [[civic|moa|cbioportal|fda_poda]]...

Transform MetaKB SOURCE(s).

If provided names of SOURCEs, perform transform on those sources only:

$ metakb transform civic

Otherwise, transform all available sources.

Options

-n, --normalizer_db_url <normalizer_db_url>#

URL endpoint of normalizer database. If not given, the individual normalizers will revert to their own defaults.

Arguments

[[civic|moa|cbioportal|fda_poda]]...#

Optional argument(s)

transform-file#

metakb transform-file [OPTIONS] HARVEST_FILE {civic|moa|cbioportal|fda_poda}

Transform an individual harvested data file. Source name must be specified as well.

$ metakb transform-file path/to/file.json civic

Options

-n, --normalizer_db_url <normalizer_db_url>#

URL endpoint of normalizer database. If not given, the individual normalizers will revert to their own defaults.

Arguments

HARVEST_FILE#

Required argument

SOURCE_NAME#

Required argument

update#

metakb update [OPTIONS] [[civic|moa|cbioportal|fda_poda]]...

Execute data harvest and transformation from resources and upload to graph datastore.

To harvest and transform source data into fresh CDM files, and then load them to the graph:

$ metakb update

Note that the Neo4j database URL, username, and password can either be set by a CLI options, or by the environment variable METAKB_DB_URL. For example:

$ metakb update --db_url=bolt://username:password@localhost:7687

Provide one or more SOURCE arguments to limit data harvest and transformation to just those source(s):

$ metakb update moa

Options

-u, --db_url <db_url>#

Connection string for the application Neo4j database.

-n, --normalizer_db_url <normalizer_db_url>#

URL endpoint of normalizer database. If not given, the individual normalizers will revert to their own defaults.

-r, --refresh_source_caches#

True if source caches (e.g. CIViCPy) should be updated prior to data regeneration. Note this will take several minutes. False if local cache should be used

Arguments

[[civic|moa|cbioportal|fda_poda]]...#

Optional argument(s)

update-normalizers#

metakb update-normalizers [OPTIONS] [[gene|disease|therapy]]...

Reload gene, disease, and therapy normalizer data.

Forces delete of each prior to fetching and loading new data. If errors are encountered, attempts to complete updates of other normalizers before exiting.

Providing no arguments will attempt to update all three:

$ metakb update-normalizers

Providing individual normalizer names as arguments will update only those normalizers:

$ metakb update-normalizers disease therapy

Options

-n, --normalizer_db_url <normalizer_db_url>#

URL endpoint of normalizer database. If not given, the individual normalizers will revert to their own defaults.

Arguments

[[gene|disease|therapy]]...#

Optional argument(s)