Matching

class enzymm.jess_run.LogisticRegressionModel(coef: List[float], intercept: float, threshold: float)

Class for storing a Logistc Regression Model for predicting if a match is correct.

f(Xn) = 1/(1+e^-(beta0 + beta1*x1 + beta2*x2 + …))

Trained on data for matches with a particular size at a particular pairwise distance

coef

list of floats beta coefficients

Type:: List[float]

intercept

float beta0 intercept

Type:: float

threshold

float optimal threshold for this model

Type:: float

__call__(rmsd: float, orientation: float) → bool

Make a prediction with the Logistc Regression Model based on RMSD and residue orientation.

rmsd `float`: RMSD value of the match

orientation `float`: residue orientation of the match

Returns:: If the match is predicted as correct or not
Return type:: bool

__init__(coef: List[float], intercept: float, threshold: float) → None

property logit_threshold: float: Precomputed logit value of the threshold

class enzymm.jess_run.Match(hit: Hit, pairwise_distance: float, index: int = 0, complete: bool = False)

Class for storing annotated PyJess hits.

This class is a wrapper around pyjess.Hit and the original template object that was used for the query.

hit

Hit instance

Type:: pyjess._jess.Hit

pairwise_distance

float Pairwise distance at which this match was found

Type:: float

complete

bool If the query matched all other templates within the same cluster. Default False

Type:: bool

index

int internal index of this match. Default 0

Type:: int

Note

To get the matched atoms, iterate over Match.hit.atoms(transform: bool) To get the template atoms, iterate over Match.hit.template To get the template residues, iterate over Match.hit.template.residues

__init__(hit: Hit, pairwise_distance: float, index: int = 0, complete: bool = False) → None

property atom_triplets: List[Tuple[Atom, Atom, Atom]]

of Atom triplets belonging to the same matched query residue.

Type:: list

dump(file: TextIO, header: bool = False, kind: Literal['simple', 'full', 'residue'] = 'full')

Dump the information associated with a Match to a ‘.tsv’ like line.

Parameters:

file – file-like object to write to
header – bool If a header line should be written too
kind – chose a “full” or a “simple” results style

Note

Coordinate information is not written.

dump2pdb(file: TextIO, transform: bool = False)

Dump the 3D coordinates of the Match to a ‘.pdb’ file.

Parameters:

file – ` file-like` object to write to
transform – bool If the matched atoms should be written to the template reference frame.

Note

By default, atoms are written in the coordinate reference frame of the query.

dump_query(file: TextIO, transform: bool = False)

Dump the 3D coordinates of the hit.molecule to a ‘.pdb’ file.

Parameters:

file – file-like object to write to
transform – bool If the atoms should be written to the template reference frame.

Note

By default, atoms are written in the coordinate reference frame of the query.

dump_template(file: TextIO, transform: bool = False)

Dump the template coordinates of the hit to a ‘.pdb’ file.

Parameters:

file – file-like object to write to
transform – bool If the atoms should be written to the query reference frame.

Note

By default, template atoms are written in the template reference frame.

dumps(header: bool = False, kind: Literal['simple', 'full', 'residue'] = 'full') → str

Dump Match to a .tsv like string. Calls BaseTable.dumps()

Parameters:

header – if a header line should be dumped to the string too.
kind – choose to return a “full” or a “simple” results line

get_identifying_attributes() → Tuple[int, int, int]: tuple of (int , int , int) (M-CSA id, cluster id and template dimension).

property match_vector_list: List[Vec3]

of orientation vectors for each matched residue in the query

Type:: list of Vec3

property matched_residues: List[Tuple[str, str, str]]

with information on all matched query residues. Elements have are tuple (residue_name, chain_id, residue_number)

Type:: list

property multimeric: bool

If the matched atoms in the query stem from multiple protein chains

Type:: bool

property orientation: float

The arithmetic mean of per-residue orientation angles for matched pairs of template and query residues in radians

Type:: float

property predicted_correct: bool | None

If the match is predicted as correct based the ensemble model

Note

Returns None if no prediction could be made

Type:: bool | None

property preserved_resid_order: bool

If the residues in the template and in the matched query structure have the same relative order.

Note

This is a good filtering parameter but excludes hits on examples of convergent evolution or circular permutations

Note

Will always return False if either template or query is multimeric

Type:: bool

property query_atom_count: int

The number of atoms in the query molecule

Type:: int

property query_residue_count: int

The number of residues in the query molecule

Type:: int

property template_vector_list: List[Vec3]

of orientation vectors for each residue in the template

Type:: list of Vec3

class enzymm.jess_run.Matcher(templates: Sequence[Template], jess_params: Dict[int, Dict[str, float]] | None = None, warn: bool = False, verbose: bool = False, skip_smaller_hits: bool = False, match_small_templates: bool = False, cpus: int = 2, filter_matches: bool = True, console: Console | None = None)

Class from which a query Molecule is matched to a list of Template.

__init__(templates: Sequence[Template], jess_params: Dict[int, Dict[str, float]] | None = None, warn: bool = False, verbose: bool = False, skip_smaller_hits: bool = False, match_small_templates: bool = False, cpus: int = 2, filter_matches: bool = True, console: Console | None = None)

Initialize a Matcher instance

Parameters:

templates – list of Template to match
jess_params – dict Dictionary of PyJess parameters to apply. Will superseed defaults.
warn – bool If warnings about issues during matching should be printed. Default False
verbose – bool If progress statements on matching should be printed. Default False
skip_smaller_hits – bool Continue searching the query against smaller templates, after a match against any larger one was found. Default False
match_small_templates – bool If matches for Templates with fewer than 3 side-chain residues should be reported. Default False
cpus – int The number of cpus for multithreading. If 0 (default), use all. If <0 leave this number of threads free.
filter_matches – bool If matches should be filtered by wether they are predicted to be correct. Default True

Note

Default jess parameters depend on the size of the template:

_DEFAULT_JESS_PARAMS = {
{"rmsd": 2, "distance": 0.9, "max_dynamic_distance": 0.9},
{"rmsd": 2, "distance": 1.7, "max_dynamic_distance": 1.7},
{"rmsd": 2, "distance": 2.0, "max_dynamic_distance": 2.0},
{"rmsd": 2, "distance": 2.0, "max_dynamic_distance": 2.0},
{"rmsd": 2, "distance": 2.0, "max_dynamic_distance": 2.0},
{"rmsd": 2, "distance": 2.0, "max_dynamic_distance": 2.0},
}

run(molecules: List[Molecule]) → Dict[Molecule, List[Match]]

Run the matcher against a list of query Molecule to search.

Parameters:

molecules – list of Molecule to search

Returns:

Dictionary of: query molecules as keys and all found matches as values.

Return type:

dict of Molecule –> list of Match

run_single(molecule: Molecule) → List[Match]

Run the matcher against a single query Molecule.

Argument:: molecule: Molecule to search
Returns;: list: of Match found for the query Molecule

verbose_print(*args): Make a print statement only in verbose mode

class enzymm.jess_run.ModelEnsemble(ensemble: Dict[int, Dict[float, List[Callable[[...], bool]]]], min_true_template_size: int, minimum_effective_size: int, pairwise_distances: List[float])

Ensemble of Models which each produce a binary prediction. The ensemble takes a majority vote.

ensemble

Dict[int, Dict[float, List[Callable[..., bool]]]] Dictonary of template_effective_size of Dictonaries of pairwise_distance of a List of callable models.

Type:: Dict[int, Dict[float, List[Callable[[…], bool]]]]

min_true_template_size

int Minimum effective size of a template to be considered always correct.

Type:: int

minimum_effective_size

int Smallest template_effective_size for which there are models. Smaller template will be treated as if they had 3 residues.

Type:: int

Note

The ensemble dictionary should cover at least 3 and 4 residue matches at pairwise distances in the “usual” range - about 0.7 to 2.0A! The call method will raise an error otherwise!

__call__(*, template_effective_size: int, pairwise_distance: float, model_kwargs: Dict[str, float]) → bool | None

Make an ensemble prediction at a given template_effective_size and pairwise_distance

Parameters:

template_effective_size – int Number of side chain residues in the template
pairwise_distance – float Pairwise distance of the match
model_paramters – float Named floats to pass parameters to the individual models

Returns:

Wether the match is predicted correct or false by the: ensemble model

Return type:

bool

__init__(ensemble: Dict[int, Dict[float, List[Callable[[...], bool]]]], min_true_template_size: int, minimum_effective_size: int, pairwise_distances: List[float]) → None

classmethod from_json(json_file: TextIO[str], model_cls: Callable) → ModelEnsemble: Build an ensemble model directly from an open JSON file

number_of_models(*, template_effective_size: int, pairwise_distance: float) → int

Number of models for a given template_effective_size and pairwise_distance

Parameters:

template_effective_size – int Number of side chain residues in the template
pairwise_distance – float Pairwise distance of the match

Returns:

Number of models

Return type:

int