Installation#

Prerequisites#

Installing the service#

Install the most recent prerelease version of the MetaKB from PyPI:

python3 -m pip install --pre metakb

Or, install from the latest available commit via the GitHub repo:

git clone https://github.com/cancervariants/metakb
cd metakb
python3 -m virtualenv venv
source venv/bin/activate
pip install -e .

Note

Stable (1.x) releases can be acquired from PyPI:

python3 -m pip install metakb

Setting up dependencies#

MetaKB’s data loading and searching functions employ a variety of upstream services and data providers:

SeqRepo#

MetaKB requires access to SeqRepo data for reloading the Gene Normalizer and for normalizing variation queries. In general, we recommend the following for local setup:

pip install seqrepo
export SEQREPO_VERSION=2024-12-20  # or newer if available -- check `seqrepo list-remote-instances`
sudo mkdir /usr/local/share/seqrepo
sudo chown $USER /usr/local/share/seqrepo
seqrepo pull -i $SEQREPO_VERSION

If you encounter a permission error similar to the one below:

PermissionError: [Error 13] Permission denied: '/usr/local/share/seqrepo/2024-12-20._fkuefgd' -> '/usr/local/share/seqrepo/2024-12-20'

Try moving data manually with sudo:

sudo mv /usr/local/share/seqrepo/$SEQREPO_VERSION.* /usr/local/share/seqrepo/$SEQREPO_VERSION

See mirroring documentation on the SeqRepo GitHub repo for instructions and additional troubleshooting.

Universal Transcript Archive (UTA)#

The MetaKB requires an available instance of the Universal Transcript Archive (UTA) database, managed by the Cool-Seq-Tool library for normalizing variation queries. Complete installation instructions (via Docker or a local server) are available at the UTA GitHub repository. For local usage, we recommend the following:

createuser -U postgres uta_admin
createuser -U postgres anonymous
createdb -U postgres -O uta_admin uta

export UTA_VERSION=uta_20210129b.pgd.gz  # most recent as of 2024/06/20
curl -O https://dl.biocommons.org/uta/$UTA_VERSION
gzip -cdq ${UTA_VERSION} | psql -h localhost -U uta_admin --echo-errors --single-transaction -v ON_ERROR_STOP=1 -d uta -p 5432

By default, MetaKB expects to connect to the UTA database via a PostgreSQL connection served local on port 5432, under the PostgreSQL username uta_admin and the schema uta_20210129b. Use the environment variable UTA_DB_URL to specify an alternate libpq-compliant URI.

Gene, Disease, and Therapy Normalizers#

The MetaKB uses the Gene, Disease, and Therapy normalizer services to resolve biomedical concept referents during data loading and searching.

To set up databases from scratch, first set up an instance of DynamoDB (e.g. locally).

Next, two environment variables are required for data access. First, Thera-Py requires a UMLS license to access RxNorm data. Register for a license here, then acquire your API key from the UTS ‘My Profile’ page after signing in, and set it under the key UMLS_API_KEY:

export UMLS_API_KEY=12345-6789-abcdefg-hijklmnop

Thera-Py also requires a Harvard Dataverse API key to access HemOnc.org data. Create a user account on the website, follow these instructions to generate an API token, and set it under the key HARVARD_DATAVERSE_API_KEY:

export HARVARD_DATAVERSE_API_KEY=12345-6789-abcdefgh-hijklmnop

Finally, disease term data from the Online Mendelian Inheritance in Man (OMIM) resource must be acquired manually and placed in the Disease Normalizer data folder (located by default at ~/.local/share/wags-tails/omim). Acquire the OMIM file mimTitles.txt and rename it in the pattern omim_YYYYMMDD.tsv corresponding to the file’s versioning.

Once these prerequisites are fulfilled, the normalizers can be loaded from scratch in succession with a CLI command:

$ metakb update-normalizers

See the CLI reference for more information about commands for accessing and managing normalizer data.

Note

See specific instructions for each (Therapy, Gene, Disease) for additional setup options and more detailed instructions/troubleshooting.

Neo4j#

For local use, we recommend Neo4j Desktop. First, follow the desktop setup instructions to download, install, and open Neo4j Desktop for the first time.

Once you have opened Neo4j desktop, use the New button in the upper-left region of the window to create a new project. Within that project, click the Add button in the upper-right region of the window and select Local DBMS. The name of the DBMS doesn’t matter, but the password will be used later to connect the database to MetaKB. Select version 5.14.0 (other versions have not been tested). Click Create. Then, click the row within the project screen corresponding to your newly-created DBMS, and click the green Start button to start the database service.

See the database configuration entry for instructions on configuring a connection to a Neo4j instance.

Loading Data#

Once all dependencies are available, use the update console command to transform and load all MetaKB source data:

$ metakb update

See the CLI reference for more information about the update command.