Introduction to Structural Bioinformatics
SA Sehgal,RA Tahir,M WaqasAbstract
Bioinformatics is an emerging field of modern biology that solves biological problems with the help of computational approaches by utilizing mathematical and statistical techniques. There are many available databases of protein and gene sequences. The protein sequence can be retrieved from UniProt and 3D structure prediction can be performed by utilizing homology modeling, threading, and ab initio. The predicted structures can be evaluated by utilizing numerous model evaluation tools.
Keywords: Bioinformatics tools, Evolutionary biology, Homology modeling, Molecular docking, Sequence retrieval, Structural bioinformatics, Virtual screening.
MOTIVATION
This chapter briefly describes the basic concepts about Bioinformatics and Computer-Aided Drug Designing (CADD). The crucial step of computational drug design includes the selection of protein of interest. In the Fortune magazine, an article titled Next Industrial Revolution: Designing drugs by Computer at Merck was published on the 5th of October, 1981 (, Sliwoski et al., 2014).
BIOINFORMATICS
Bioinformatics is an interdisciplinary field that develops different tools and methods to understand biological data and solve biological problems (Sehgal et al., 2018).
Bioinformatics Approaches
In Bioinformatics, numerous in silico approaches and computational tools are used to solve biological problems, 3D structural insights of protein, and DNA analyses. For modeling biological systems, static and dynamic approaches are considered the most fundamental approaches (Mitra et al., 2013).
Static
It comprises the sequences of proteins, nucleic acids, and peptides. Static entities include interactional data networks of proteins, microarray data, and metabolites ().
Dynamic
Dynamic approaches comprise many structures of nucleic acids, ligands, and peptides. Various reaction fluxes and concentration of metabolites of system biology also comes in this category (Hendricks et al., 2012). Capturing cellular events by using multi-agent-based modeling techniques, consists of signaling reaction dynamics and transcription observed as dynamic techniques in bioinformatics ().
Structural Bioinformatics
Protein structure prediction is considered the most significant approach of Bioinformatics. The protein structure is predicted through amino acid sequences. The structure prediction helps to analyze the fold recognition, secondary, tertiary, and quaternary structures from primary structures of the target proteins. The primary structure of proteins determines the coding sequence of the gene. The primary structure determines the native structure of a protein. The structural information of proteins are classified into primary, secondary, tertiary, and quaternary structures ().
Software and Tools
Various Bioinformatics companies and many public institutions provide software, tools, servers, and databases for bioinformatics scale from simple command-line tools to more complex graphical programs and standalone services ().
Open-source Bioinformatics Software
Since the 1980s, many public and private institutes provide free and open-source tools and software. For analysis of rising biological problems, there is a need to develop new algorithms ().
Web Services in Bioinformatics
EBI classified bioinformatics services into 3 categories, such as SSS (Sequence Search Services), MSA (Multiple Sequence Alignment), BSA (Biological Sequence Analysis). The service-oriented resources illustrate the applicability of web-based bioinformatics solutions from standalone tools and under a single common data format, integrative, standalone, or web-based interface, and an extensible effort in management system in bioinformatics ().
Virtual Screening
Virtual screening is an advanced tool of bioinformatics for the screening of a large database to select potential drug targets. Virtual screening has boosted the process of computer-aided drug designing by lowering the time complexity of the ligand selection step (Boarder et al., 2004).
Homology Modeling
Homology Modeling is an approach that constructs an atomic-resolution model of target protein from its sequence of amino acids by utilizing homologous proteins template structure to an experimental 3D structure. Homology modeling provides help in identifying the resemblance between the query structure and one of the known protein structures. Similar residues of the query sequence with template sequence can be determined by using the sequence alignment tool. The protein structures in homologous sequences are more conserved than sequences having less than 20% sequence identity. The 3D structures of a target protein are produced by template structure and sequence alignment. ().
Steps of Homology Modeling
The homology modeling procedure divide into consecutive four steps: 1) template selection, 2) target-template alignment, 3) model construction, 4) model assessment ().
Template Selection
The most critical step in homology modeling is to identify a suitable template. By using database search techniques the template identification of pair-wise sequence alignment was obtained from FASTA and BLAST ().
Target-Template Alignment
In this homology modeling step, alignment of target protein-based against selected template protein. The resulting template protein obtained by X-ray crystallography or NMR approaches were experimentally determined 3D structures of proteins. For protein sequence alignment, the most highly cited tools are BLAST and ClustalW ().
Model Construction
The information of target sequence alignment against template protein sequence alignment utilized in generating 3D structure of the target protein, thus each atom in protein PDB format, represented as Cartesian coordinates ().
Model Assessment
In homology modeling, the evidence of predicted protein structure is the most important step. The assessment of predicted protein structure provides evidence of the accuracy of a structure. To validate the experimentally determined design structure various consideration tools are reported such as PDBsum generate, ERRAT, RAMPAGE, and Verify 3D. Finally, the selection of a predicted structure is mainly based on physicochemical properties, protein expression, and available data of proteins ().
Accuracy
The structure accuracy depends on the structural similarity of the target and template sequences. The primary source of error at high sequence similarity in homology modeling from which model is based obtained from the choice of template and templates. While in sequence alignment, errors in >40% sequence similarity exhibit which further inhibits the production of models having high-quality. In sequence alignment, serious errors in <30% sequence identity can cause misprediction of folds. For irrefutable results between target and template similarity should be greater than 65% (). However, the similarity should be greater than 80% for reliable structures.
Importance of Homology modeling
The predicted structure is used in various bioinformatics analyses including protein-protein interaction, molecular docking, protein-protein docking, and functional annotation of identified genes from the genome of an organism (). With low accuracies of protein, a homology model can be predicted as closely related proteins have many loops in the surface of a protein.
Next page