Matching

class enzymm.jess_run.LogisticRegressionModel(coef: List[float], intercept: float, threshold: float)

Class for storing a Logistc Regression Model for predicting if a match is correct.

f(Xn) = 1/(1+e^-(beta0 + beta1*x1 + beta2*x2 + …))

Trained on data for matches with a particular size at a particular pairwise distance

coef

list of floats beta coefficients

Type:

List[float]

intercept

float beta0 intercept

Type:

float

threshold

float optimal threshold for this model

Type:

float

__call__(rmsd: float, orientation: float) bool

Make a prediction with the Logistc Regression Model based on RMSD and residue orientation.

rmsd `float`

RMSD value of the match

orientation `float`

residue orientation of the match

Returns:

If the match is predicted as correct or not

Return type:

bool

__init__(coef: List[float], intercept: float, threshold: float) None
property logit_threshold: float

Precomputed logit value of the threshold

class enzymm.jess_run.Match(hit: Hit, pairwise_distance: float, index: int = 0, complete: bool = False)

Class for storing annotated PyJess hits.

This class is a wrapper around pyjess.Hit and the original template object that was used for the query.

hit

Hit instance

Type:

pyjess._jess.Hit

pairwise_distance

float Pairwise distance at which this match was found

Type:

float

complete

bool If the query matched all other templates within the same cluster. Default False

Type:

bool

index

int internal index of this match. Default 0

Type:

int

Note

To get the matched atoms, iterate over Match.hit.atoms(transform: bool) To get the template atoms, iterate over Match.hit.template To get the template residues, iterate over Match.hit.template.residues

__init__(hit: Hit, pairwise_distance: float, index: int = 0, complete: bool = False) None
property atom_triplets: List[Tuple[Atom, Atom, Atom]]

of Atom triplets belonging to the same matched query residue.

Type:

list

dump(file: TextIO, header: bool = False, kind: Literal['simple', 'full', 'residue'] = 'full')

Dump the information associated with a Match to a ‘.tsv’ like line.

Parameters:
  • filefile-like object to write to

  • headerbool If a header line should be written too

  • kind – chose a “full” or a “simple” results style

Note

Coordinate information is not written.

dump2pdb(file: TextIO, transform: bool = False)

Dump the 3D coordinates of the Match to a ‘.pdb’ file.

Parameters:
  • file – ` file-like` object to write to

  • transformbool If the matched atoms should be written to the template reference frame.

Note

By default, atoms are written in the coordinate reference frame of the query.

dump_query(file: TextIO, transform: bool = False)

Dump the 3D coordinates of the hit.molecule to a ‘.pdb’ file.

Parameters:
  • filefile-like object to write to

  • transformbool If the atoms should be written to the template reference frame.

Note

By default, atoms are written in the coordinate reference frame of the query.

dump_template(file: TextIO, transform: bool = False)

Dump the template coordinates of the hit to a ‘.pdb’ file.

Parameters:
  • filefile-like object to write to

  • transformbool If the atoms should be written to the query reference frame.

Note

By default, template atoms are written in the template reference frame.

dumps(header: bool = False, kind: Literal['simple', 'full', 'residue'] = 'full') str

Dump Match to a .tsv like string. Calls BaseTable.dumps()

Parameters:
  • header – if a header line should be dumped to the string too.

  • kind – choose to return a “full” or a “simple” results line

get_identifying_attributes() Tuple[int, int, int]

tuple of (int , int , int) (M-CSA id, cluster id and template dimension).

property match_vector_list: List[Vec3]

of orientation vectors for each matched residue in the query

Type:

list of Vec3

property matched_residues: List[Tuple[str, str, str]]

with information on all matched query residues. Elements have are tuple (residue_name, chain_id, residue_number)

Type:

list

property multimeric: bool

If the matched atoms in the query stem from multiple protein chains

Type:

bool

property orientation: float

The arithmetic mean of per-residue orientation angles for matched pairs of template and query residues in radians

Type:

float

property predicted_correct: bool | None

If the match is predicted as correct based the ensemble model

Note

Returns None if no prediction could be made

Type:

bool | None

property preserved_resid_order: bool

If the residues in the template and in the matched query structure have the same relative order.

Note

This is a good filtering parameter but excludes hits on examples of convergent evolution or circular permutations

Note

Will always return False if either template or query is multimeric

Type:

bool

property query_atom_count: int

The number of atoms in the query molecule

Type:

int

property query_residue_count: int

The number of residues in the query molecule

Type:

int

property template_vector_list: List[Vec3]

of orientation vectors for each residue in the template

Type:

list of Vec3

class enzymm.jess_run.Matcher(templates: Sequence[Template], jess_params: Dict[int, Dict[str, float]] | None = None, warn: bool = False, verbose: bool = False, skip_smaller_hits: bool = False, match_small_templates: bool = False, cpus: int = 2, filter_matches: bool = True, console: Console | None = None)

Class from which a query Molecule is matched to a list of Template.

__init__(templates: Sequence[Template], jess_params: Dict[int, Dict[str, float]] | None = None, warn: bool = False, verbose: bool = False, skip_smaller_hits: bool = False, match_small_templates: bool = False, cpus: int = 2, filter_matches: bool = True, console: Console | None = None)

Initialize a Matcher instance

Parameters:
  • templateslist of Template to match

  • jess_paramsdict Dictionary of PyJess parameters to apply. Will superseed defaults.

  • warnbool If warnings about issues during matching should be printed. Default False

  • verbosebool If progress statements on matching should be printed. Default False

  • skip_smaller_hitsbool Continue searching the query against smaller templates, after a match against any larger one was found. Default False

  • match_small_templatesbool If matches for Templates with fewer than 3 side-chain residues should be reported. Default False

  • cpusint The number of cpus for multithreading. If 0 (default), use all. If <0 leave this number of threads free.

  • filter_matchesbool If matches should be filtered by wether they are predicted to be correct. Default True

Note

Default jess parameters depend on the size of the template:

_DEFAULT_JESS_PARAMS = {
    3: {"rmsd": 2, "distance": 0.9, "max_dynamic_distance": 0.9},
    4: {"rmsd": 2, "distance": 1.7, "max_dynamic_distance": 1.7},
    5: {"rmsd": 2, "distance": 2.0, "max_dynamic_distance": 2.0},
    6: {"rmsd": 2, "distance": 2.0, "max_dynamic_distance": 2.0},
    7: {"rmsd": 2, "distance": 2.0, "max_dynamic_distance": 2.0},
    8: {"rmsd": 2, "distance": 2.0, "max_dynamic_distance": 2.0},
}
run(molecules: List[Molecule]) Dict[Molecule, List[Match]]

Run the matcher against a list of query Molecule to search.

Parameters:

moleculeslist of Molecule to search

Returns:

Dictionary of

query molecules as keys and all found matches as values.

Return type:

dict of Molecule –> list of Match

run_single(molecule: Molecule) List[Match]

Run the matcher against a single query Molecule.

Argument:

molecule: Molecule to search

Returns;

list: of Match found for the query Molecule

verbose_print(*args)

Make a print statement only in verbose mode

class enzymm.jess_run.ModelEnsemble(ensemble: Dict[int, Dict[float, List[Callable[[...], bool]]]], min_true_template_size: int, minimum_effective_size: int, pairwise_distances: List[float])

Ensemble of Models which each produce a binary prediction. The ensemble takes a majority vote.

ensemble

Dict[int, Dict[float, List[Callable[..., bool]]]] Dictonary of template_effective_size of Dictonaries of pairwise_distance of a List of callable models.

Type:

Dict[int, Dict[float, List[Callable[[…], bool]]]]

min_true_template_size

int Minimum effective size of a template to be considered always correct.

Type:

int

minimum_effective_size

int Smallest template_effective_size for which there are models. Smaller template will be treated as if they had 3 residues.

Type:

int

Note

The ensemble dictionary should cover at least 3 and 4 residue matches at pairwise distances in the “usual” range - about 0.7 to 2.0A! The call method will raise an error otherwise!

__call__(*, template_effective_size: int, pairwise_distance: float, model_kwargs: Dict[str, float]) bool | None

Make an ensemble prediction at a given template_effective_size and pairwise_distance

Parameters:
  • template_effective_sizeint Number of side chain residues in the template

  • pairwise_distancefloat Pairwise distance of the match

  • model_paramtersfloat Named floats to pass parameters to the individual models

Returns:

Wether the match is predicted correct or false by the

ensemble model

Return type:

bool

__init__(ensemble: Dict[int, Dict[float, List[Callable[[...], bool]]]], min_true_template_size: int, minimum_effective_size: int, pairwise_distances: List[float]) None
classmethod from_json(json_file: TextIO[str], model_cls: Callable) ModelEnsemble

Build an ensemble model directly from an open JSON file

number_of_models(*, template_effective_size: int, pairwise_distance: float) int

Number of models for a given template_effective_size and pairwise_distance

Parameters:
  • template_effective_sizeint Number of side chain residues in the template

  • pairwise_distancefloat Pairwise distance of the match

Returns:

Number of models

Return type:

int