# biomolecule¶

The biomolecule object used in PDB2PQR and associated methods.

Todo

This module should be broken into separate files.

Authors: Todd Dolinsky, Yong Huang

class pdb2pqr.biomolecule.Biomolecule(pdblist, definition)[source]

Biomolecule class.

This class represents the parsed PDB, and provides a hierarchy of information - each Biomolecule object contains a list of Chain objects as provided in the PDB file. Each Chain then contains its associated list of Residue objects, and each Residue contains a list of Atom objects, completing the hierarchy.

__init__(pdblist, definition)[source]

Initialize using parsed PDB file

Parameters: pdblist (list) – list of objects from pdb from lines of PDB file definition (Definition) – topology definition object
add_hydrogens()[source]

Add the hydrogens to the biomolecule.

This requires either the rebuild_tetrahedral function for tetrahedral geometries or the standard quatfit methods. These methods use three nearby bonds to rebuild the atom; the closer the bonds, the more accurate the results. As such the peptide bonds are used when available.

apply_force_field(forcefield_)[source]

Apply the forcefield to the atoms within the biomolecule.

Parameters: forcefield (Forcefield) – forcefield object (list of atoms that were found in the forcefield, list of atoms that were not found in the forcefield) (list, list)
apply_name_scheme(forcefield_)[source]

Apply the naming scheme of the given forcefield.

Parameters: forcefield (Forcefield) – forcefield object
apply_patch(patchname, residue)[source]

Apply a patch to the given residue.

This is one of the key functions in PDB2PQR. A similar function appears in definitions - that version is needed for residue level subtitutions so certain protonation states (i.e. CYM, HSE) are detectatble on input.

This version looks up the particular patch name in the patch_map stored in the biomolecule, and then applies the various commands to the reference and actual residue structures.

See the inline comments for a more detailed explanation.

Parameters: patchname (str) – the name of the patch residue (Residue) – the residue to apply the patch to
apply_pka_values(force_field, ph, pkadic)[source]

Apply calculated pKa values to assign titration states.

Parameters: force_field (str) – force field name (determines naming of residues) ph (float) – pH value pkadic (dict) – dictionary of pKa values for residues
assign_termini(chain, neutraln=False, neutralc=False)[source]

Assign the termini for the given chain.

Assignment made by looking at the start and end residues.

Parameters: chain (Chain) – chain of biomolecule neutraln (bool) – indicate whether to neutralize N-terminus neutralc (bool) – indicate whether to neutralize C-terminus
atoms

Return all Atom objects in list format.

Returns: all atom objects [Atom]
calculate_dihedral_angles()[source]

Calculate dihedral angles for every residue in the biomolecule.

charge

Get the total charge on the biomolecule

Todo

Since the misslist is used to identify incorrect charge assignments, this routine does not list the 3 and 5 termini of nucleic acid chains as having non-integer charge even though they are (correctly) non-integer.

Returns: (list of residues with non-integer charges, the total charge on the biomolecule) (list, float)
create_html_typemap(definition, outfilename)[source]

Create an HTML typemap file at the desired location.

If a type cannot be found for an atom a blank is listed.

Parameters: definition (Definition) – the definition objects. outfilename (str) – the name of the file to write
create_residue(residue, resname)[source]

Create a residue object.

If the resname is a known residue type, try to make that specific object, otherwise just make a standard residue object.

Parameters: residue (list) – a list of atoms resname (str) – the name of the residue the residue object Residue
hold_residues(hlist)[source]

Set fixed state of specified residues.

Parameters: hlist ([(str, str, str)]) – list of (res_seq, chainid, ins_code) specifying the residues for altering fixed state status.
num_bio_atoms

Return the number of ATOM (not HETATM) records in the biomolecule.

Returns: number of ATOM records int
num_heavy

Return number of biomolecular heavy atoms in structure.

Todo

Figure out if this is redundant with Biomolecule.num_bio_atoms()

Note

Includes hydrogens (but those are stripped off eventually)

Returns: number of heavy atoms int
num_missing_heavy

Return number of missing biomolecular heavy atoms in structure.

Returns: number of missing heavy atoms in structure int
patch_map

Return definition patch maps.

Returns: definition patch maps list
reference_map

Return definition reference map.

Returns: definition reference map dict
remove_hydrogens()[source]

Remove hydrogens from the biomolecule.

repair_heavy()[source]

Repair all heavy atoms.

Unfortunately the first time we get to an atom we might not be able to rebuild it - it might depend on other atoms to be rebuild first (think side chains). As such a ‘seenmap’ is used to keep track of what we’ve already seen and subsequent attempts to rebuild the atom.

Raises: ValueError – missing atoms prevent reconstruction
reserialize()[source]

Generate new serial numbers for atoms in the biomolecule.

set_donors_acceptors()[source]

Set the donors and acceptors within the biomolecule.

set_hip()[source]

Set all HIS states to HIP.

set_reference_distance()[source]

Set the distance to the CA atom in the residue.

This is necessary for determining which atoms are allowed to move during rotations. Uses the shortest_path() algorithm found in utilities.

Raises: ValueError – if shortest path cannot be found (e.g., if the atoms are not connected)
set_states()[source]

Set the state of each residue.

This is the last step before assigning the forcefield, but is necessary so as to distinguish between various protonation states.

See aa for residue-specific functions.

set_termini(neutraln=False, neutralc=False)[source]

Set the termini for a protein.

First set all known termini by looking at the ends of the chain. Then examine each residue, looking for internal chain breaks.

Todo

This function needs to be cleaned and simplified

Parameters: neutraln (bool) – indicate whether N-terminus is neutral neutralc (bool) – indicate whether C-terminus is neutral
update_bonds()[source]

Update the bonding network of the biomolecule.

This happens in 3 steps:

1. Apply the PEPTIDE patch to all Amino residues to add reference for the N(i+1) and C(i-1) atoms
2. UpdateInternal_bonds for inter-residue linking
3. Set the links to the N(i+1) and C(i-1) atoms
update_internal_bonds()[source]

Update the internal bonding network.

Update using the reference objects in each atom.

update_residue_types()[source]

Find the type of residue as notated in the Amino Acid definition.

Todo

Why are we setting residue types to numeric values (see code)?

update_ss_bridges()[source]

Check and set SS-bridge partners.