`biomolecule`

The biomolecule object used in PDB2PQR and associated methods.

Todo

This module should be broken into separate files.

Authors: Todd Dolinsky, Yong Huang

class pdb2pqr.biomolecule.Biomolecule(pdblist, definition)[source]

Biomolecule class.

This class represents the parsed PDB, and provides a hierarchy of information - each Biomolecule object contains a list of Chain objects as provided in the PDB file. Each Chain then contains its associated list of Residue objects, and each Residue contains a list of Atom objects, completing the hierarchy.

__init__(pdblist, definition)[source]

Initialize using parsed PDB file

Parameters:

pdblist (list) – list of objects from pdb from lines of PDB file
definition (Definition) – topology definition object

add_hydrogens(hlist=None)[source]

Add the hydrogens to the biomolecule.

This requires either the rebuild_tetrahedral function for tetrahedral geometries or the standard quatfit methods. These methods use three nearby bonds to rebuild the atom; the closer the bonds, the more accurate the results. As such the peptide bonds are used when available.

apply_force_field(forcefield_)[source]

Apply the forcefield to the atoms within the biomolecule.

Parameters:: forcefield (Forcefield) – forcefield object
Returns:: (list of atoms that were found in the forcefield, list of atoms that were not found in the forcefield)
Return type:: (list, list)

apply_name_scheme(forcefield_)[source]

Apply the naming scheme of the given forcefield.

Parameters:: forcefield (Forcefield) – forcefield object

apply_patch(patchname: str, residue: Residue)[source]

Apply a patch to the given residue.

This is one of the key functions in PDB2PQR. A similar function appears in definitions - that version is needed for residue level subtitutions so certain protonation states (i.e. CYM, HSE) are detectatble on input.

This version looks up the particular patch name in the patch_map stored in the biomolecule, and then applies the various commands to the reference and actual residue structures.

See the inline comments for a more detailed explanation.

Parameters:

patchname (str) – the name of the patch
residue (Residue) – the residue to apply the patch to

apply_pka_values(force_field, ph, pkadic)[source]

Apply calculated pKa values to assign titration states.

Parameters:

force_field (str) – force field name (determines naming of residues)
ph (float) – pH value
pkadic (dict) – dictionary of pKa values for residues

assign_termini(chain, *, neutraln=False, neutralc=False)[source]

Assign the termini for the given chain.

Assignment made by looking at the start and end residues.

Parameters:

chain (Chain) – chain of biomolecule
neutraln (bool) – indicate whether to neutralize N-terminus
neutralc (bool) – indicate whether to neutralize C-terminus

property atoms

Return all Atom objects in list format.

Returns:: all atom objects
Return type:: [Atom]

calculate_dihedral_angles()[source]: Calculate dihedral angles for every residue in the biomolecule.

property charge

Get the total charge on the biomolecule

Todo

Since the misslist is used to identify incorrect charge assignments, this routine does not list the 3 and 5 termini of nucleic acid chains as having non-integer charge even though they are (correctly) non-integer.

Returns:: (list of residues with non-integer charges, the total charge on the biomolecule)
Return type:: (list, float)

create_html_typemap(definition, outfilename)[source]

Create an HTML typemap file at the desired location.

If a type cannot be found for an atom a blank is listed.

Parameters:

definition (Definition) – the definition objects.
outfilename (str) – the name of the file to write

create_residue(residue, resname)[source]

Create a residue object.

If the resname is a known residue type, try to make that specific object, otherwise just make a standard residue object.

Parameters:

residue (list) – a list of atoms
resname (str) – the name of the residue

Returns:

the residue object

Return type:

Residue

hold_residues(hlist)[source]

Set fixed state of specified residues.

Parameters:: hlist ([(str, str, str)]) – list of (res_seq, chainid, ins_code) specifying the residues for altering fixed state status.

property num_bio_atoms

Return the number of ATOM (not HETATM) records in the biomolecule.

Returns:: number of ATOM records
Return type:: int

property num_heavy

Return number of biomolecular heavy atoms in structure.

Todo

Figure out if this is redundant with Biomolecule.num_bio_atoms()

Note

Includes hydrogens (but those are stripped off eventually)

Returns:: number of heavy atoms
Return type:: int

property num_missing_heavy

Return number of missing biomolecular heavy atoms in structure.

Returns:: number of missing heavy atoms in structure
Return type:: int

property patch_map

Return definition patch maps.

Returns:: definition patch maps
Return type:: list

property reference_map

Return definition reference map.

Returns:: definition reference map
Return type:: dict

remove_hydrogens()[source]: Remove hydrogens from the biomolecule.

repair_heavy()[source]

Repair all heavy atoms.

Unfortunately the first time we get to an atom we might not be able to rebuild it - it might depend on other atoms to be rebuild first (think side chains). As such a ‘seenmap’ is used to keep track of what we’ve already seen and subsequent attempts to rebuild the atom.

Raises:: ValueError – missing atoms prevent reconstruction

reserialize()[source]: Generate new serial numbers for atoms in the biomolecule.

set_donors_acceptors()[source]: Set the donors and acceptors within the biomolecule.

set_hip()[source]: Set all HIS states to HIP.

set_reference_distance()[source]

Set the distance to the CA atom in the residue.

This is necessary for determining which atoms are allowed to move during rotations. Uses the shortest_path() algorithm found in utilities.

Raises:: ValueError – if shortest path cannot be found (e.g., if the atoms are not connected)

set_states()[source]

Set the state of each residue.

This is the last step before assigning the forcefield, but is necessary so as to distinguish between various protonation states.

See aa for residue-specific functions.

set_termini(*, neutraln=False, neutralc=False)[source]

Set the termini for a protein.

First set all known termini by looking at the ends of the chain. Then examine each residue, looking for internal chain breaks.

Todo

This function needs to be cleaned and simplified

Parameters:

neutraln (bool) – indicate whether N-terminus is neutral
neutralc (bool) – indicate whether C-terminus is neutral

update_bonds()[source]

Update the bonding network of the biomolecule.

This happens in 3 steps:

Apply the PEPTIDE patch to all Amino residues to add reference for the N(i+1) and C(i-1) atoms
UpdateInternal_bonds for inter-residue linking
Set the links to the N(i+1) and C(i-1) atoms

update_internal_bonds()[source]

Update the internal bonding network.

Update using the reference objects in each atom.

update_ss_bridges()[source]: Check and set SS-bridge partners.

biomolecule

`biomolecule`