Página 1 dos resultados de 626 itens digitais encontrados em 0.007 segundos

PGP : prokaryote gene prediction software

Pacheco, José Carlos Ribeiro
Fonte: Universidade do Minho Publicador: Universidade do Minho
Tipo: Dissertação de Mestrado
Publicado em //2013 POR
Relevância na Pesquisa
36.27%
Dissertação de mestrado em Bioinformática; A correta previsão e anotação de genes bacterianos é essencial para a aplicação da informação contida no ADN em muitos tópicos de pesquisa (bio)médica, como microbiologia, imunologia e doenças infeciosas. Embora existam vários softwares de previsão de genes bacterianos como GenemarkHMM, Glimmer e Prodigal e pipelines completos como ISGA, xBASE, Maker e Consensus Prediction, a previsão de genes pode ser melhorada. O principal objetivo deste trabalho foi o desenvolvimento de um pipeline de previsão de genes bacterianos, o Prokaryote Gene Prediction (PGP), que combina métodos de ab initio e de homologia. Uma vez que o software ab initio Prodigal mostrou um melhor desempenho relativamente a outros softwares estudados, foi usado como o passo inicial para o PGP. Considerando as proteínas previstas pelo Prodigal, o PGP a) analisa os alinhamentos obtidos, b) determina a necessidade de encurtar ou estender genes, c) introduz as correções necessárias, d) faz a previsão de ARNr e ARNt utilizando os programas RNAmmer e tRNA-scan2 e e) determina a existência de eventuais genes não identificados nas regiões intergénicas, através de um BLASTx. Quando comparados os resultados do PGP com os dados produzidos pelo Prodigal utilizando 4 genomas com conteúdo G+C% moderado e 3 com conteúdo em G+C% extremo...

On the Accuracy of Homology Modeling and Sequence Alignment Methods Applied to Membrane Proteins

Forrest, Lucy R.; Tang, Christopher L.; Honig, Barry
Fonte: Biophysical Society Publicador: Biophysical Society
Tipo: Artigo de Revista Científica
EN
Relevância na Pesquisa
26.19%
In this study, we investigate the extent to which techniques for homology modeling that were developed for water-soluble proteins are appropriate for membrane proteins as well. To this end we present an assessment of current strategies for homology modeling of membrane proteins and introduce a benchmark data set of homologous membrane protein structures, called HOMEP. First, we use HOMEP to reveal the relationship between sequence identity and structural similarity in membrane proteins. This analysis indicates that homology modeling is at least as applicable to membrane proteins as it is to water-soluble proteins and that acceptable models (with Cα-RMSD values to the native of 2 Å or less in the transmembrane regions) may be obtained for template sequence identities of 30% or higher if an accurate alignment of the sequences is used. Second, we show that secondary-structure prediction algorithms that were developed for water-soluble proteins perform approximately as well for membrane proteins. Third, we provide a comparison of a set of commonly used sequence alignment algorithms as applied to membrane proteins. We find that high-accuracy alignments of membrane protein sequences can be obtained using state-of-the-art profile-to-profile methods that were developed for water-soluble proteins. Improvements are observed when weights derived from the secondary structure of the query and the template are used in the scoring of the alignment...

Residue-Level Prediction of DNA-Binding Sites and its Application on DNA-Binding Protein Predictions

Bhardwaj, Nitin; Lu, Hui
Fonte: PubMed Publicador: PubMed
Tipo: Artigo de Revista Científica
EN
Relevância na Pesquisa
26.18%
Protein-DNA interactions are crucial to many cellular activities such as expression-control and DNA-repair. These interactions between amino acids and nucleotides are highly specific and any aberrance at the binding site can render the interaction completely incompetent. In this study, we have three aims focusing on DNA-binding residues on the protein surface: develop an automated approach for fast and reliable recognition of DNA-binding sites; improving the prediction by distance-dependent refinement and use these predictions to identify DNA-binding proteins. We use a support vector machines (SVM)-based approach to harness the features of the DNA-binding residues to distinguish them from non-binding residues. Features used for distinction include the residue’s identity, charge, solvent accessibility, average potential, the secondary structure it is embedded in, neighboring residues, and location in a cationic patch. These features collected from 50 proteins are used to train SVM. Testing is then performed on another set of 37 proteins, much larger than any testing set used in previous studies. The testing set has no more than 20% sequence identity not only among its pairs, but also with the proteins in the training set, thus removing any undesired redundancy due to homology. This set also has proteins with an unseen DNA-binding structural class not present in the training set. With the above features...

Mimicking the folding pathway to improve homology-free protein structure prediction

DeBartolo, Joe; Colubri, Andrés; Jha, Abhishek K.; Fitzgerald, James E.; Freed, Karl F.; Sosnick, Tobin R.
Fonte: National Academy of Sciences Publicador: National Academy of Sciences
Tipo: Artigo de Revista Científica
EN
Relevância na Pesquisa
26.26%
Since the demonstration that the sequence of a protein encodes its structure, the prediction of structure from sequence remains an outstanding problem that impacts numerous scientific disciplines, including many genome projects. By iteratively fixing secondary structure assignments of residues during Monte Carlo simulations of folding, our coarse-grained model without information concerning homology or explicit side chains can outperform current homology-based secondary structure prediction methods for many proteins. The computationally rapid algorithm using only single (φ,ψ) dihedral angle moves also generates tertiary structures of accuracy comparable with existing all-atom methods for many small proteins, particularly those with low homology. Hence, given appropriate search strategies and scoring functions, reduced representations can be used for accurately predicting secondary structure and providing 3D structures, thereby increasing the size of proteins approachable by homology-free methods and the accuracy of template methods that depend on a high-quality input secondary structure.

Toward high-resolution homology modeling of antibody Fv regions and application to antibody-antigen docking

Sivasubramanian, Arvind; Sircar, Aroop; Chaudhury, Sidhartha; Gray, Jeffrey J.
Fonte: PubMed Publicador: PubMed
Tipo: Artigo de Revista Científica
Publicado em 01/02/2009 EN
Relevância na Pesquisa
26.21%
High-resolution homology models are useful in structure-based protein engineering applications, especially when a crystallographic structure is unavailable. Here, we report the development and implementation of RosettaAntibody, a protocol for homology modeling of antibody variable regions. The protocol combines comparative modeling of canonical complementarity determining region (CDR) loop conformations and de novo loop modeling of CDR H3 conformation with simultaneous optimization of VL-VH rigid-body orientation and CDR backbone and side-chain conformations. The protocol was tested on a benchmark of 54 antibody crystal structures. The median root-mean-square-deviation (rmsd) of the antigen binding pocket comprised of all the CDR residues was 1.5 Å with 80% of the targets having an rmsd lower than 2.0 Å. The median backbone heavy atom global rmsd of the CDR H3 loop prediction was 1.6 Å, 1.9 Å, 2.4 Å, 3.1 Å and 6.0 Å for very short (4–6 residues), short (7–9), medium (10–11), long (12–14) and very long (17–22) loops respectively. When the set of ten top-scoring antibody homology models are used in local ensemble docking to antigen, a moderate to high accuracy docking prediction was achieved in seven of fifteen targets. This success in computational docking with high-resolution homology models is encouraging...

Three-Level Prediction of Protein Function by Combining Profile-Sequence Search, Profile-Profile Search, and Domain Co-Occurrence Networks

Wang, Zheng; Cao, Renzhi; Cheng, Jianlin
Fonte: BioMed Central Publicador: BioMed Central
Tipo: Artigo de Revista Científica
Publicado em 28/02/2013 EN
Relevância na Pesquisa
26.46%
Predicting protein function from sequence is useful for biochemical experiment design, mutagenesis analysis, protein engineering, protein design, biological pathway analysis, drug design, disease diagnosis, and genome annotation as a vast number of protein sequences with unknown function are routinely being generated by DNA, RNA and protein sequencing in the genomic era. However, despite significant progresses in the last several years, the accuracy of protein function prediction still needs to be improved in order to be used effectively in practice, particularly when little or no homology exists between a target protein and proteins with annotated function. Here, we developed a method that integrated profile-sequence alignment, profile-profile alignment, and Domain Co-Occurrence Networks (DCN) to predict protein function at different levels of complexity, ranging from obvious homology, to remote homology, to no homology. We tested the method blindingly in the 2011 Critical Assessment of Function Annotation (CAFA). Our experiments demonstrated that our three-level prediction method effectively increased the recall of function prediction while maintaining a reasonable precision. Particularly, our method can predict function terms defined by the Gene Ontology more accurately than three standard baseline methods in most situations...

Critical Analysis of the Successes and Failures of Homology Models of G-protein coupled receptors: Homology modeling of GPCRs: Success and failures

Bhattacharya, Supriyo; Lam, Alfonso Ramon; Li, Hubert; Balaraman, Gouthaman; Niesen, Michiel Jacobus Maria; Vaidehi, Nagarajan
Fonte: PubMed Publicador: PubMed
Tipo: Artigo de Revista Científica
EN
Relevância na Pesquisa
26.25%
We present a critical assessment of the performance of our homology model refinement method for G-protein coupled receptors (GPCRs), called LITICon, that led to top ranking structures in a recent structure prediction assessment GPCRDOCK2010. GPCRs form the largest class of drug targets for which only a few crystal structures are currently available. Therefore accurate homology models are essential for drug design in these receptors. We submitted five models each for human chemokine CXCR4 (bound to small molecule IT1t and peptide CVX15) and dopamine D3DR (bound to small molecule eticlopride) before the crystal structures were published. Our models in both CXCR4/IT1t and D3/eticlopride assessments were ranked first and second respectively by ligand RMSD to the crystal structures. For both receptors, we developed two types of protein models: homology models based on known GPCR crystal structures, and ab initio models based on the prediction method MembStruk. The homology based models compared better to the crystal structures than the ab initio models. However a robust refinement procedure for obtaining high accuracy structures is needed. We demonstrate that optimization of the helical tilt, rotation and translation are vital for GPCR homology model refinement. As a proof of concept...

Extending RosettaDock with water, sugar, and pH for prediction of complex structures and affinities for CAPRI rounds 20–27

Kilambi, Krishna Praneeth; Pacella, Michael S.; Xu, Jianqing; Labonte, Jason W.; Porter, Justin R.; Muthu, Pravin; Drew, Kevin; Kuroda, Daisuke; Schueler-Furman, Ora; Bonneau, Richard; Gray, Jeffrey J.
Fonte: PubMed Publicador: PubMed
Tipo: Artigo de Revista Científica
EN
Relevância na Pesquisa
26.23%
Rounds 20–27 of the Critical Assessment of PRotein Interactions (CAPRI) provided a testing platform for computational methods designed to address a wide range of challenges. The diverse targets drove the creation of and new combinations of computational tools. In this study, RosettaDock and other novel Rosetta protocols were used to successfully predict four of the 10 blind targets. For example, for DNase domain of Colicin E2–Im2 immunity protein, RosettaDock and RosettaLigand were used to predict the positions of water molecules at the interface, recovering 46% of the native water-mediated contacts. For α-repeat Rep4–Rep2 and g-type lysozyme–PliG inhibitor complexes, homology models were built and standard and pH-sensitive docking algorithms were used to generate structures with interface RMSD values of 3.3 Å and 2.0 Å, respectively. A novel flexible sugar–protein docking protocol was also developed and used for structure prediction of the BT4661–heparin-like saccharide complex, recovering 71% of the native contacts. Challenges remain in the generation of accurate homology models for protein mutants and sampling during global docking. On proteins designed to bind influenza hemagglutinin, only about half of the mutations were identified that affect binding (T55: 54%; T56: 48%). The prediction of the structure of the xylanase complex involving homology modeling and multidomain docking pushed the limits of global conformational sampling and did not result in any successful prediction. The diversity of problems at hand requires computational algorithms to be versatile; the recent additions to the Rosetta suite expand the capabilities to encompass more biologically realistic docking problems.

INTEGRATING COMPUTATIONAL PROTEIN FUNCTION PREDICTION INTO DRUG DISCOVERY INITIATIVES

Grant, Marianne A.
Fonte: PubMed Publicador: PubMed
Tipo: Artigo de Revista Científica
Publicado em /02/2011 EN
Relevância na Pesquisa
26.26%
Pharmaceutical researchers must evaluate vast numbers of protein sequences and formulate innovative strategies for identifying valid targets and discovering leads against them as a way of accelerating drug discovery. The ever increasing number and diversity of novel protein sequences identified by genomic sequencing projects and the success of worldwide structural genomics initiatives have spurred great interest and impetus in the development of methods for accurate, computationally empowered protein function prediction and active site identification. Previously, in the absence of direct experimental evidence, homology-based protein function annotation remained the gold-standard for in silico analysis and prediction of protein function. However, with the continued exponential expansion of sequence databases, this approach is not always applicable, as fewer query protein sequences demonstrate significant homology to protein gene products of known function. As a result, several non-homology based methods for protein function prediction that are based on sequence features, structure, evolution, biochemical and genetic knowledge have emerged. Herein, we review current bioinformatic programs and approaches for protein function prediction/annotation and discuss their integration into drug discovery initiatives. The development of such methods to annotate protein functional sites and their application to large protein functional families is crucial to successfully utilizing the vast amounts of genomic sequence information available to drug discovery and development processes.

Stringent homology-based prediction of H. sapiens-M. tuberculosis H37Rv protein-protein interactions

Zhou, Hufeng; Gao, Shangzhi; Nguyen, Nam Ninh; Fan, Mengyuan; Jin, Jingjing; Liu, Bing; Zhao, Liang; Xiong, Geng; Tan, Min; Li, Shijun; Wong, Limsoon
Fonte: BioMed Central Publicador: BioMed Central
Tipo: Artigo de Revista Científica
EN_US
Relevância na Pesquisa
26.42%
Background: H. sapiens-M. tuberculosis H37Rv protein-protein interaction (PPI) data are essential for understanding the infection mechanism of the formidable pathogen M. tuberculosis H37Rv. Computational prediction is an important strategy to fill the gap in experimental H. sapiens-M. tuberculosis H37Rv PPI data. Homology-based prediction is frequently used in predicting both intra-species and inter-species PPIs. However, some limitations are not properly resolved in several published works that predict eukaryote-prokaryote inter-species PPIs using intra-species template PPIs. Results: We develop a stringent homology-based prediction approach by taking into account (i) differences between eukaryotic and prokaryotic proteins and (ii) differences between inter-species and intra-species PPI interfaces. We compare our stringent homology-based approach to a conventional homology-based approach for predicting host-pathogen PPIs, based on cellular compartment distribution analysis, disease gene list enrichment analysis, pathway enrichment analysis and functional category enrichment analysis. These analyses support the validity of our prediction result, and clearly show that our approach has better performance in predicting H. sapiens-M. tuberculosis H37Rv PPIs. Using our stringent homology-based approach...

RNA Structure Prediction:Advancing Both Secondary and Tertiary Structure Prediction

Seetin, Matthew G. ; Mathews, David H. (1971 - )
Fonte: Universidade de Rochester Publicador: Universidade de Rochester
Tipo: Tese de Doutorado
ENG
Relevância na Pesquisa
36.08%
Thesis (Ph.D.)--University of Rochester. School of Medicine & Dentistry. Dept. of Biochemistry and Biophysics, 2011.; RNAs can function without being translated into proteins. These RNAs adopt a structure or structures to perform these functions, and accurate prediction of structure is a valuable tool for understanding these functions. RNA structure is hierarchical, beginning with the primary sequence, then the secondary structure, i.e. the set of canonical pairs, and ultimately the tertiary structure, i.e. the three-dimensional structure. One significant tool for prediction of secondary structure is the nearest neighbor model. This assumes the free energy change of forming a base pair depends on the identities of the pair and the adjacent pairs. Parameters were previously derived from optical melting on RNA duplexes where it was assumed all strands would be completely duplex or single-stranded. When individual base pairs are allowed to break as a function of temperature, the model does not agree with experiment. A new treatment of the data is presented. The probabilities of individual base pairs are calculated using a partition function, allowing internal loops and frayed ends. The parameters of the nearest neighbor model are recalculated using a nonlinear fit to the original data. These new parameters better fit the data and should provide improved structure prediction. Homologous RNAs adopt similar structures. One important structural motif is the pseudoknot...

Template-based Protein Structure Prediction and its Applications

Cheng, Yushao
Fonte: Universidade Rice Publicador: Universidade Rice
Relevância na Pesquisa
36.07%
Protein structure prediction, also called protein folding, is one of the most significant and challenging research areas in computational biophysics and structural bioinformatics. With the rapid growth of PDB database, template-based modeling such as homology modeling and threading has become a popular method in protein structure prediction. However, it is still hard to detect good templates when the sequence identity is below 30%. In chapter 1, a profile-profile alignment method is proposed. It uses evolutionary and structural profiles to detect homologs, and a z-score-based method to rank templates. The performance of this method in the critical assessment of protein structure prediction experiments (CASP) was reported. In chapter 2, p53 mutations are studied as an application of protein structure prediction. The TP53 gene encodes a tumor suppressor protein called p53, and p53 mutations occur in about half of human cancers. Experimental studies showed that p53 cancer mutants can be reactivated by mutations on other sites. Machine learning technologies were used in this research. Multiple classifiers were built to predict whether a p53 mutant (single-point or multiple-point) would be transcriptionally active or not, based on features extracted from amino acid sequences and structures. The mutant structures were modeled using template-based protein structure prediction. Theses features were selected and analyzed using different feature selection methods...

Examining the Use of Homology Models in Predicting Kinase Binding Affinity

Chyan, Jeffrey
Fonte: Universidade Rice Publicador: Universidade Rice
Relevância na Pesquisa
36.06%
Drug design is a difficult and multi-faceted problem that has led to extensive interdiscplinary work in the field of computational biology. In recent years, several computational methods have emerged. The overall goal of computational algorithms is to narrow down the number of leads that will be further considered for laboratory experimentation and clinical studies. Much of current drug design focuses on a family of proteins called kinases because they play a pivotal role in many of the cell signaling pathways in the human body. Drugs need to be designed such that they bind to specific kinases in the human kinome inhibiting kinase functions that can be causing various diseases such as cancer. It is important for drugs to have high specificity inhibiting only certain kinases avoiding undesirable effects on the human body. Computational prediction methods can accomplish this complex task by doing a comparative analysis on the binding site of kinases both in sequence and structure to predict binding affinity with potential drugs. However, computational methods depend on existing protein data to make predictions. There is a lack of structural protein data relative to known proteins and protein sequences. A potential solution to the the lack of information is to use computationally generated structural data called homology models. This thesis introduces a framework for the integration of homology models with CCORPS...

Development of Homology Modeling Techniques; Entwicklung von Techniken zur Homologiemodellierung

Diemand, Alexander
Fonte: Universität Tübingen Publicador: Universität Tübingen
Tipo: Dissertation; info:eu-repo/semantics/doctoralThesis
EN
Relevância na Pesquisa
36.35%
The focus of this thesis was on computer-aided protein structure analysis and homology modeling. Proteins are produced in the cell according to their sequences, which are encoded in their genes. Moreover, biological function of proteins depends on their structure. Computer-aided structure prediction is based on statistically significant homology detection applying sequence comparison between a model protein and proteins with known structure. As the direct study of proteins in vitro and in vivo requires laborious experiments, prediction methods relying on homology offer practical alternatives. In this context, bioinformatics methods for sequence analysis can identify homologies between related proteins, which have evolved from a common ancestor. The more structures were solved experimentally, the more it became apparent that proteins with similar sequences predominantly share similar structural architectures (folds). An immediate application thereof is computer-aided modeling of protein structures by homology. Protein structure cannot be regarded as a rigid object, rather it exists in one defined conformational state that is related to biological function and can depend on external effects, e.g. the presence of a ligand. Because there is no signal for conformational changes at the level of sequence...

Multi-Regional Analysis of Contact Maps for Protein Structure Prediction

Ahmed, Hazem Radwan A.
Fonte: Quens University Publicador: Quens University
Tipo: Tese de Doutorado Formato: 2789816 bytes; application/pdf
EN; EN
Relevância na Pesquisa
35.89%
1D protein sequences, 2D contact maps and 3D structures are three different representational levels of detail for proteins. Predicting protein 3D structures from their 1D sequences remains one of the complex challenges of bioinformatics. The "Divide and Conquer" principle is applied in our research to handle this challenge, by dividing it into two separate yet dependent subproblems, using a Case-Based Reasoning (CBR) approach. Firstly, 2D contact maps are predicted from their 1D protein sequences; secondly, 3D protein structures are then predicted from their predicted 2D contact maps. We focus on the problem of identifying common substructural patterns of protein contact maps, which could potentially be used as building blocks for a bottom-up approach for protein structure prediction. We further demonstrate how to improve identifying these patterns by combining both protein sequence and structural information. We assess the consistency and the efficiency of identifying common substructural patterns by conducting statistical analyses on several subsets of the experimental results with different sequence and structural information.; Thesis (Master, Computing) -- Queen's University, 2009-04-23 22:01:04.528

PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation

Montgomerie, Scott; Cruz, Joseph A.; Shrivastava, Savita; Arndt, David; Berjanskii, Mark; Wishart, David S.
Fonte: Oxford University Press Publicador: Oxford University Press
Tipo: Artigo de Revista Científica
EN
Relevância na Pesquisa
26.22%
PROTEUS2 is a web server designed to support comprehensive protein structure prediction and structure-based annotation. PROTEUS2 accepts either single sequences (for directed studies) or multiple sequences (for whole proteome annotation) and predicts the secondary and, if possible, tertiary structure of the query protein(s). Unlike most other tools or servers, PROTEUS2 bundles signal peptide identification, transmembrane helix prediction, transmembrane β-strand prediction, secondary structure prediction (for soluble proteins) and homology modeling (i.e. 3D structure generation) into a single prediction pipeline. Using a combination of progressive multi-sequence alignment, structure-based mapping, hidden Markov models, multi-component neural nets and up-to-date databases of known secondary structure assignments, PROTEUS is able to achieve among the highest reported levels of predictive accuracy for signal peptides (Q2 = 94%), membrane spanning helices (Q2 = 87%) and secondary structure (Q3 score of 81.3%). PROTEUS2's homology modeling services also provide high quality 3D models that compare favorably with those generated by SWISS-MODEL and 3D JigSaw (within 0.2 Å RMSD). The average PROTEUS2 prediction takes ∼3 min per query sequence. The PROTEUS2 server along with source code for many of its modules is accessible a http://wishart.biology.ualberta.ca/proteus2.

Persistent homology analysis of protein structure, flexibility and folding

Xia, Kelin; Wei, Guo-Wei
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 08/12/2014
Relevância na Pesquisa
36.11%
Proteins are the most important biomolecules for living organisms. The understanding of protein structure, function, dynamics and transport is one of most challenging tasks in biological science. In the present work, persistent homology is, for the first time, introduced for extracting molecular topological fingerprints (MTFs) based on the persistence of molecular topological invariants. MTFs are utilized for protein characterization, identification and classification. The method of slicing is proposed to track the geometric origin of protein topological invariants. Both all-atom and coarse-grained representations of MTFs are constructed. A new cutoff-like filtration is proposed to shed light on the optimal cutoff distance in elastic network models. Based on the correlation between protein compactness, rigidity and connectivity, we propose an accumulated bar length generated from persistent topological invariants for the quantitative modeling of protein flexibility. To this end, a correlation matrix based filtration is developed. This approach gives rise to an accurate prediction of the optimal characteristic distance used in protein B-factor analysis. Finally, MTFs are employed to characterize protein topological evolution during protein folding and quantitatively predict the protein folding stability. An excellent consistence between our persistent homology prediction and molecular dynamics simulation is found. This work reveals the topology-function relationship of proteins.; Comment: 22 figures...

Remote Homology Detection in Proteins Using Graphical Models

Daniels, Noah M.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 23/04/2013
Relevância na Pesquisa
36.08%
Given the amino acid sequence of a protein, researchers often infer its structure and function by finding homologous, or evolutionarily-related, proteins of known structure and function. Since structure is typically more conserved than sequence over long evolutionary distances, recognizing remote protein homologs from their sequence poses a challenge. We first consider all proteins of known three-dimensional structure, and explore how they cluster according to different levels of homology. An automatic computational method reasonably approximates a human-curated hierarchical organization of proteins according to their degree of homology. Next, we return to homology prediction, based only on the one-dimensional amino acid sequence of a protein. Menke, Berger, and Cowen proposed a Markov random field model to predict remote homology for beta-structural proteins, but their formulation was computationally intractable on many beta-strand topologies. We show two different approaches to approximate this random field, both of which make it computationally tractable, for the first time, on all protein folds. One method simplifies the random field itself, while the other retains the full random field, but approximates the solution through stochastic search. Both methods achieve improvements over the state of the art in remote homology detection for beta-structural protein folds.; Comment: Doctoral dissertation

Prediction of TF target sites based on atomistic models of protein-DNA complexes

Espinosa Angarica, Vladimir; Pérez, A.G.; Vasconcelos, A.T.; Collado-Vides, Julio; Contreras-Moreira, Bruno
Fonte: Conselho Superior de Investigações Científicas Publicador: Conselho Superior de Investigações Científicas
Tipo: Artículo Formato: 719698 bytes; application/pdf
ENG
Relevância na Pesquisa
36.03%
Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition.; NIH grant RO1-GM071962 Fundación Agencia Aragonesa I+D; Peer reviewed

Homology-based annotation yields 1,042 new candidate genes in the Drosophila melanogaster genome

Gopal, Shuba; Schroeder, Mark; Pieper, Ursula; Sczyrba, Alexander; Aytekin-Kurban, Gulriz; Bekiranov, Stefan; Fajardo, Eduardo; Eswar, Narayanan; Sanchez, Roberto; Sali, Andrej; Gaasterland, Terry
Fonte: Nature Publishing Group: Nature Genetics Publicador: Nature Publishing Group: Nature Genetics
Tipo: Artigo de Revista Científica Formato: 27604 bytes; application/pdf
EN_US
Relevância na Pesquisa
36.11%
The approach to annotating a genome critically affects the number and accuracy of genes identified in the genome sequence. Genome annotation based on stringent gene identification is prone to underestimate the complement of genes encoded in a genome. In contrast, over-prediction of putative genes followed by exhaustive computational sequence, motif and structural homology search will find rarely expressed, possibly unique, new genes at the risk of including non-functional genes. We developed a two-stage approach that combines the merits of stringent genome annotation with the benefits of over-prediction. First we identify plausible genes regardless of matches with EST, cDNA or protein sequences from the organism (stage 1). In the second stage, proteins predicted from the plausible genes are compared at the protein level with EST, cDNA and protein sequences, and protein structures from other organisms (stage 2). Remote but biologically meaningful protein sequence or structure homologies provide supporting evidence for genuine genes. The method, applied to the Drosophila melanogaster genome, validated 1,042 novel candidate genes after filtering 19,410 plausible genes, of which 12,124 matched the original 13,601 annotated genes1. This annotation strategy is applicable to genomes of all organisms...