Motivation: With the advent of genome sequencing, a huge database of protein primary sequences has been accumulating. In parallel, a number of tools to investigate and expand upon this information, e.g. reconstructing and building relationships between protein families and superfamilies, have been developed. Metalloproteins are proteins capable of binding one or more metal ions, which are required for their biological function or for regulation of their activities or for structural purposes. Sometimes, metal binding can be observed in vitro but not be physiologically relevant. At present, there is a lack of specific tools to address the matter of the identification of metalloproteins in databases of gene sequences. Results: In the present work, an approach exploiting metal-binding patterns (MBPs) of metalloproteins present in the Protein Data Bank to search gene banks for new metalloproteins is presented and applied to copper proteins. Nearly 100 different MBPs have been identified and then used for subsequent applications. The ensemble of sequences of the whole PDB is used to assess the potentiality and limits of the method and to identify levels of confidence for the predictions output by the search. It appears that copper-binding capabilities are identified with a confidence >90% when the percentage of identical amino acids aligned around the MBP by PHI-BLAST is at least 20% with respect to the entire protein domain length. If this percentage is between 10% and 20%, the level of confidence is similar to50%. Application of the methodology to the entire genome sequences of Pyrococcus furiosus, Escherichia coli, Drosophila melanogaster and Homo sapiens suggests some differentiation between prokaryotes and eukaryotes.
A hint to search for metalloproteins in gene banks / C.Andreini; I.Bertini; A.Rosato. - In: BIOINFORMATICS. - ISSN 1367-4803. - STAMPA. - 20:(2004), pp. 1373-1380. [10.1093/bioinformatics/bth095]
A hint to search for metalloproteins in gene banks
ANDREINI, CLAUDIA;BERTINI, IVANO;ROSATO, ANTONIO
2004
Abstract
Motivation: With the advent of genome sequencing, a huge database of protein primary sequences has been accumulating. In parallel, a number of tools to investigate and expand upon this information, e.g. reconstructing and building relationships between protein families and superfamilies, have been developed. Metalloproteins are proteins capable of binding one or more metal ions, which are required for their biological function or for regulation of their activities or for structural purposes. Sometimes, metal binding can be observed in vitro but not be physiologically relevant. At present, there is a lack of specific tools to address the matter of the identification of metalloproteins in databases of gene sequences. Results: In the present work, an approach exploiting metal-binding patterns (MBPs) of metalloproteins present in the Protein Data Bank to search gene banks for new metalloproteins is presented and applied to copper proteins. Nearly 100 different MBPs have been identified and then used for subsequent applications. The ensemble of sequences of the whole PDB is used to assess the potentiality and limits of the method and to identify levels of confidence for the predictions output by the search. It appears that copper-binding capabilities are identified with a confidence >90% when the percentage of identical amino acids aligned around the MBP by PHI-BLAST is at least 20% with respect to the entire protein domain length. If this percentage is between 10% and 20%, the level of confidence is similar to50%. Application of the methodology to the entire genome sequences of Pyrococcus furiosus, Escherichia coli, Drosophila melanogaster and Homo sapiens suggests some differentiation between prokaryotes and eukaryotes.I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.