Matching
- class enzymm.jess_run.LogisticRegressionModel(coef: List[float], intercept: float, threshold: float)
Class for storing a Logistc Regression Model for predicting if a match is correct.
f(Xn) = 1/(1+e^-(beta0 + beta1*x1 + beta2*x2 + …))
Trained on data for matches with a particular size at a particular pairwise distance
- class enzymm.jess_run.Match(hit: Hit, pairwise_distance: float, index: int = 0, complete: bool = False)
Class for storing annotated PyJess hits.
This class is a wrapper around
pyjess.Hitand the original template object that was used for the query.- complete
boolIf the query matched all other templates within the same cluster. Default False- Type:
Note
To get the matched atoms, iterate over
Match.hit.atoms(transform: bool)To get the template atoms, iterate overMatch.hit.templateTo get the template residues, iterate overMatch.hit.template.residues- property atom_triplets: List[Tuple[Atom, Atom, Atom]]
of
Atomtriplets belonging to the same matched query residue.- Type:
- dump(file: TextIO, header: bool = False, kind: Literal['simple', 'full', 'residue'] = 'full')
Dump the information associated with a
Matchto a ‘.tsv’ like line.- Parameters:
file –
file-likeobject to write toheader –
boolIf a header line should be written tookind – chose a “full” or a “simple” results style
Note
Coordinate information is not written.
- dump2pdb(file: TextIO, transform: bool = False)
Dump the 3D coordinates of the
Matchto a ‘.pdb’ file.- Parameters:
file – ` file-like` object to write to
transform –
boolIf the matched atoms should be written to the template reference frame.
Note
By default, atoms are written in the coordinate reference frame of the query.
- dump_query(file: TextIO, transform: bool = False)
Dump the 3D coordinates of the hit.molecule to a ‘.pdb’ file.
- Parameters:
file –
file-likeobject to write totransform –
boolIf the atoms should be written to the template reference frame.
Note
By default, atoms are written in the coordinate reference frame of the query.
- dump_template(file: TextIO, transform: bool = False)
Dump the template coordinates of the hit to a ‘.pdb’ file.
- Parameters:
file –
file-likeobject to write totransform –
boolIf the atoms should be written to the query reference frame.
Note
By default, template atoms are written in the template reference frame.
- dumps(header: bool = False, kind: Literal['simple', 'full', 'residue'] = 'full') str
Dump
Matchto a .tsv like string. CallsBaseTable.dumps()- Parameters:
header – if a header line should be dumped to the string too.
kind – choose to return a “full” or a “simple” results line
- get_identifying_attributes() Tuple[int, int, int]
tupleof (int,int,int) (M-CSA id, cluster id and template dimension).
- property match_vector_list: List[Vec3]
of orientation vectors for each matched residue in the query
- Type:
listofVec3
- property matched_residues: List[Tuple[str, str, str]]
with information on all matched query residues. Elements have are
tuple(residue_name,chain_id,residue_number)- Type:
- property multimeric: bool
If the matched atoms in the query stem from multiple protein chains
- Type:
- property orientation: float
The arithmetic mean of per-residue orientation angles for matched pairs of template and query residues in radians
- Type:
- property predicted_correct: bool | None
If the match is predicted as correct based the ensemble model
Note
Returns None if no prediction could be made
- Type:
bool | None
- property preserved_resid_order: bool
If the residues in the template and in the matched query structure have the same relative order.
Note
This is a good filtering parameter but excludes hits on examples of convergent evolution or circular permutations
Note
Will always return
Falseif either template or query is multimeric- Type:
- class enzymm.jess_run.Matcher(templates: Sequence[Template], jess_params: Dict[int, Dict[str, float]] | None = None, warn: bool = False, verbose: bool = False, skip_smaller_hits: bool = False, match_small_templates: bool = False, cpus: int = 2, filter_matches: bool = True, console: Console | None = None)
Class from which a query
Moleculeis matched to alistofTemplate.- __init__(templates: Sequence[Template], jess_params: Dict[int, Dict[str, float]] | None = None, warn: bool = False, verbose: bool = False, skip_smaller_hits: bool = False, match_small_templates: bool = False, cpus: int = 2, filter_matches: bool = True, console: Console | None = None)
Initialize a
Matcherinstance- Parameters:
templates –
listofTemplateto matchjess_params –
dictDictionary of PyJess parameters to apply. Will superseed defaults.warn –
boolIf warnings about issues during matching should be printed. DefaultFalseverbose –
boolIf progress statements on matching should be printed. DefaultFalseskip_smaller_hits –
boolContinue searching the query against smaller templates, after a match against any larger one was found. DefaultFalsematch_small_templates –
boolIf matches for Templates with fewer than 3 side-chain residues should be reported. DefaultFalsecpus –
intThe number of cpus for multithreading. If 0 (default), use all. If <0 leave this number of threads free.filter_matches –
boolIf matches should be filtered by wether they are predicted to be correct. DefaultTrue
Note
Default jess parameters depend on the size of the template:
_DEFAULT_JESS_PARAMS = { 3: {"rmsd": 2, "distance": 0.9, "max_dynamic_distance": 0.9}, 4: {"rmsd": 2, "distance": 1.7, "max_dynamic_distance": 1.7}, 5: {"rmsd": 2, "distance": 2.0, "max_dynamic_distance": 2.0}, 6: {"rmsd": 2, "distance": 2.0, "max_dynamic_distance": 2.0}, 7: {"rmsd": 2, "distance": 2.0, "max_dynamic_distance": 2.0}, 8: {"rmsd": 2, "distance": 2.0, "max_dynamic_distance": 2.0}, }
- verbose_print(*args)
Make a print statement only in verbose mode
- class enzymm.jess_run.ModelEnsemble(ensemble: Dict[int, Dict[float, List[Callable[[...], bool]]]], min_true_template_size: int, minimum_effective_size: int, pairwise_distances: List[float])
Ensemble of Models which each produce a binary prediction. The ensemble takes a majority vote.
- ensemble
Dict[int, Dict[float, List[Callable[..., bool]]]]Dictonary of template_effective_size of Dictonaries of pairwise_distance of a List of callable models.
- min_true_template_size
intMinimum effective size of a template to be considered always correct.- Type:
- minimum_effective_size
intSmallest template_effective_size for which there are models. Smaller template will be treated as if they had 3 residues.- Type:
Note
The
ensembledictionary should cover at least 3 and 4 residue matches at pairwise distances in the “usual” range - about 0.7 to 2.0A! The call method will raise an error otherwise!- __call__(*, template_effective_size: int, pairwise_distance: float, model_kwargs: Dict[str, float]) bool | None
Make an ensemble prediction at a given template_effective_size and pairwise_distance
- Parameters:
- Returns:
- Wether the match is predicted correct or false by the
ensemble model
- Return type:
- __init__(ensemble: Dict[int, Dict[float, List[Callable[[...], bool]]]], min_true_template_size: int, minimum_effective_size: int, pairwise_distances: List[float]) None
- classmethod from_json(json_file: TextIO[str], model_cls: Callable) ModelEnsemble
Build an ensemble model directly from an open JSON file