STING_DB Quality Assessment

STING_DB is composed of structural, sequence, function and stability parameters/descriptor for protein analysis. This database operates with a collection of both publicly available data (PDB, HSSP, Prosite) and its own data (contacts, interface contacts, surface accessibility). STING_DB is one of the best known databases of structural parameters reported in per-residue fashion with over 300 of them compiled at a single site.

Considering its relevance for researchers interested in protein analysis, the module Sting_DB Quality Assessment (QA) was designed to measure the quality of the data deposited in the Sting_DB. On a weekly basis, a checklist procedure is performed to identify the parameters/files that are both missing and/or empty for the new PDB files added to the database. The main goal of such a procedure is to guarantee that the quality of the data will not be degraded as the updates take place. When the checklist procedure identifies a group of parameters that are missing and/or empty, a report is automatically sent to the Sting_DB administrator who will run a set of scripts to update the parameters concerning the new PDB files, and subsequently, perform the checklist procedure to evaluate the quality of the updated data.

Table 1 shows the parameters analyzed by the module Sting_DB QA and their corresponding eligible PDB files.

Parameter Name Elegible PDB Files
Accessibility_and_Interface_Residue All PDB files with at least one proteic chain
Cavity_Complex All PDB files
Cavity_Isolation All proteic chains in PDB files
Contact_Energy_Density_Intrachain All proteins in PDB files
Contact_Energy_Density_Interface All PDB files with at least 2 proteic chains
Contacts All proteins in PDB files
Cross_Link All proteins in PDB files
Cross_Presence All proteins in PDB files
Curvature_Complex All PDB files
Curvature_Isolation All PDB files
Density_Sponge All proteins in PDB files
Distances All proteins in PDB files
Eletrostatic_Potential All PDB files
Entropy_Density_Interface All proteins in PDB files w/ at least 2 chains one being protein w/ HSSP
Entropy_Density_Intrachain All proteins in PDB files w/ HSSP
Evolutionary_Pressure All proteins in PDB files w/ HSSP-MSA
HSSP All proteins in PDB files with HSSP
HSSP_MSA All proteins in PDB files with HSSP
HSSP_MSA_Full All proteins in PDB files with HSSP
Hydro_Patches All proteins in PDB files
Ligand_Pocket_Residue All proteins in PDB files with ligands
My_Evolutionary_Pressure All proteins in PDB files
My_HSSP All proteins in PDB files
My_HSSP_100 All proteins in PDB files
My_Phylogenetic_Tree All proteins in PDB files
Phylogenetic_Tree All proteins in PDB files
Protein_Ligand_Contacts All proteins in PDB files with ligands
Ramachandran All proteins in PDB files
Rotamer All proteins in PDB files
Space_Clash All proteins in PDB files
Stride All proteins in PDB files
Unused_Contacts All proteins in PDB files
Water_Contact_Residue All proteins in PDB files with HOH

Table 1. Parameters analyzed by Sting_DB QA and their corresponding eligible PDB files.

When a Sting user selects a PDB file for analysis, if one or more parameters of that PDB are not available at the STING_DB, the user can search for such a PDB name in the Sting_DB QA to verify the existence of those parameters. For each parameter, there is a list of missing and empties PDB containing the PDB’s parameters. However, this situation is unusual since we are keeping the percentage of missing and empty PDB’s parameters below 3% in almost all PDB files. In addition, we are working diligently to reduce that percentage to 1% or less.

This effort makes the Sting_DB unique in terms of quality assessment when compared with other counterparts in the Bioinformatics domain. To the best of our knowledge, Sting is the only software for protein analysis capable of providing its users with a data quality indicator.

Parameters not included in Sting_DB QA: Prosite and Protherm
We did not include the parameters Prosite and Protherm in Sting_DB QA due to difficulties to establish correlation of the number of PDB files that should have the Prosite Motif, from one side, and the actual number of PDB files we identify having Prosite Motif.