metakb.repository.neo4j_repository#

Neo4j implementation of the repository abstraction.

exception metakb.repository.neo4j_repository.Neo4jCredentialsError[source]#

Raise for invalid or unparseable Neo4j credentials

class metakb.repository.neo4j_repository.Neo4jRepository(session)[source]#

Neo4j implementation of a repository abstraction.

__init__(session)[source]#

Initialize repository instance

Parameters:

session (Session) – Neo4j driver session

get_all_assertion_ids()[source]#

Return all assertion IDs

Return type:

list[str]

get_gene(gene_id)[source]#

Attempt to retrieve a gene given exact ID match

Parameters:

gene_id (str) – exact gene_id as stored in DB (eg “metakb.gene:hgnc_6407”)

Return type:

Optional[MappableConcept]

Returns:

gene if available, with child gene objects in extensions

get_statement(statement_id)[source]#

Retrieve a statement

Parameters:

statement_id (str) – ID of the statement minted by the source

Return type:

Optional[Statement]

Returns:

complete statement if available

get_stats()[source]#

Fetch counts for entities

Return type:

RepositoryStats

Returns:

structured stats data class

initialize()[source]#

Set up DB schema

Return type:

None

load_assertion(assertion)[source]#

Add or update a complete assertion object to the DB

Parameters:

assertion (Statement) – metakb assertion

Return type:

None

search_statements(variation_ids=None, gene_ids=None, therapy_ids=None, disease_ids=None, statement_ids=None, start=0, limit=None)[source]#

Perform entity-based search over all statements.

Return all statements matching any item within a given list of entity IDs.

IE: Given a list for [DrugA, DrugB] and [GeneA, GeneB], return all statements that involve both one of the two given drugs AND one of the two given genes.

Probable future changes * Combo-therapy specific search * Specific logic for searching diseases/conditionsets * Search on source values rather than normalized values * Searching non-allele catvars (e.g. feature context catvars)

Parameters:
  • variation_ids (Optional[list[str]]) – list of normalized variation IDs

  • gene_ids (Optional[list[str]]) – list of normalized gene IDs

  • therapy_ids (Optional[list[str]]) – list of normalized therapy IDs

  • disease_ids (Optional[list[str]]) – list of normalized disease IDs

  • statement_ids (Optional[list[str]]) – list of source statement IDs

  • start (int) – pagination start point

  • limit (Optional[int]) – page size

Return type:

list[Statement]

Returns:

list of statements matching provided criteria

teardown_db()[source]#

Reset repository storage.

Delete all nodes/edges and constraints.

Return type:

None

metakb.repository.neo4j_repository.get_driver(url=None)[source]#

Get a Neo4j DB connection using a resolved url.

Connection URL resolved in the following order:

  • If a connection string is provided via the url argument, use it

  • Otherwise, fall back on METAKB_DB_URL environment variable

This function intentionally avoids any direct dependency on AWS services. In deployed environments, METAKB_DB_URL is expected to be provided by infrastructure (e.g. CloudFormation) based on stored secrets.

Parameters:

url (Optional[str]) – connection string for Neo4j DB. Formatted as bolt://<username>:<password>@<hostname>:<port>

Return type:

Driver

Returns:

Neo4j driver instance

Raises:

Neo4jCredentialsError – If no valid connection URL can be resolved