Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Genome-wide structural and functional protein characterization by ab initio protein structure prediction

Malmström, Lars LU (2005)
Abstract
Very little is known about a considerable part of all proteins and it is time consuming



and expensive to study each individual protein to determine its function,



structure and cellular role. Proteins retain structural, functional and sequential



characteristics from ancestral proteins and hence two proteins that share a common



ancestor, i.e. are homologs, will to some extent have similar sequence,



structure and function. One way to learn something about a protein is to identify



its homologous and use information from those homologs to annotate the protein



of interest. Close homologs with a common ancestor can... (More)
Very little is known about a considerable part of all proteins and it is time consuming



and expensive to study each individual protein to determine its function,



structure and cellular role. Proteins retain structural, functional and sequential



characteristics from ancestral proteins and hence two proteins that share a common



ancestor, i.e. are homologs, will to some extent have similar sequence,



structure and function. One way to learn something about a protein is to identify



its homologous and use information from those homologs to annotate the protein



of interest. Close homologs with a common ancestor can be detected using sequence



alone, but more distant homologs cannot. Structure is more conserved



than sequence and enables detection of a common ancestor between more distantly



related proteins and thereby also enabling transfer of information to a



larger fraction of the uncharacterized proteins. This thesis covers my efforts to



develop a method to use ab initio protein structure prediction to detect distant



homologs and use the homologs to annotate proteins from the genome of Saccharomyces



cerevisiae.



The ab initio protein structure prediction software used in this thesis, Rosetta,



can predict a protein's tertiary structure using the amino acid sequence alone.



Rosetta works by reducing the search space by approximating the local conformation



with conformations from the protein data bank, and judging the over all



fitness of the simulated protein structure through a statistically derived energy



function. The program has been successful in the last three Critical assessment of



techniques for protein structure prediction (CASP) and the results from the last



2



CASP is reported in Paper I. Distant homologs can be detected by comparing the



structures generated by Rosetta with structures from the Protein Data Bank



(PDB). In general, however, such a comparison is noisy, that is, gives many answers,



of which only a few are correct. The noise can be filtered out by utilizing



the fact that there is a strong relationship between protein function and protein



structure, and either use functional information from a database or infer functional



information from one or more experimental high-throughput technologies.



This idea was tested in Paper II were 100 proteins were investigated using protein



structure prediction, yeast two hybrid, fluorescent microscopy and mass



spectrometry. The data from all four technologies was integrated and 77% of the



proteins were assigned a function.



Data integration is very labor-intensive when done by hand, and the amount of



information generated for each protein investigated is substantial. Everything



needs to be automated and all data have to be stored and managed in an efficient



way to be able to apply this technology on a genome-wide scale. Paper III and



Paper IV cover information management, that is, how the data used and produced



in the project is organized and stored. Paper V reports both how we automated



the integration process using the software described in Paper I and II and



the application of the technology to the genome of Saccharomyces cerevisiae. (Less)
Abstract (Swedish)
Popular Abstract in Swedish

Många av alla kemiska processer som pågår i våra kroppar utförs av proteiner.



När de inte fungerar blir vi sjuka. I vissa fall, till exempel vid Alzheimers sjukdom



eller Creutzfeld Jacobs sjukdom, beror det på att ett visst protein har fel



form. Att förstå hur proteiner antar sin slutliga form och vad det har för inverkan



på proteinets funktion är således viktigt. År 2005 kostade det mellan en och ett



par miljoner att mäta formen på ett enda protein på grund av att utrustning är dyr



och det krävs mycket arbete. På sextiotalet upptäckte Ryle att proteiner verkar ha



... (More)
Popular Abstract in Swedish

Många av alla kemiska processer som pågår i våra kroppar utförs av proteiner.



När de inte fungerar blir vi sjuka. I vissa fall, till exempel vid Alzheimers sjukdom



eller Creutzfeld Jacobs sjukdom, beror det på att ett visst protein har fel



form. Att förstå hur proteiner antar sin slutliga form och vad det har för inverkan



på proteinets funktion är således viktigt. År 2005 kostade det mellan en och ett



par miljoner att mäta formen på ett enda protein på grund av att utrustning är dyr



och det krävs mycket arbete. På sextiotalet upptäckte Ryle att proteiner verkar ha



en ritning för vilken form de antar inbyggt i ordningen på aminosyrorna,



proteinernas byggstenar. Sedan dess har det lagts ner mycket tid på att försöka



förstå och kunna förutspå vilken form ett protein får när man bara vet ordningen



på amino syrorna. Under de senaste tio åren har teknologin blivit bättre. I detta



arbete har jag användt mig av Rosetta, ett mjukvaroprogram, som utvecklas av



David Baker vid University of Washington. Rosetta kan förutsäga vilken form



ett protein har utifrån ordningen av aminosyrorna. Genom att använda Rosetta på



alla proteiner i jäst och kombinera resultatet med information både från experimentella



tekniker och databaser har vi lyckats öka förståelsen för hur jäst



fungerar och vad styrkorna och svagheterna är med den teknologi som vi utvecklat.



Förhoppningen är att denna information leder till en ökad förståelse i biologin



som helhet. (Less)
Please use this url to cite or link to this publication:
author
supervisor
opponent
  • Dr Fenyö, David, The Rockefeller University, New York, New York, USA
organization
publishing date
type
Thesis
publication status
published
subject
keywords
Biomedicinska vetenskaper, Biomedical sciences, Saccharymyces cerevisiae, Ab initio protein structure prediction, Protein annotation
pages
200 pages
publisher
Department of Electrical Measurements, Lund University
defense location
Room E:1406, E-building, Ole Römers väg 3, Lund Institute of Technology
defense date
2005-12-16 10:15:00
ISBN
91-628-6689-3
language
English
LU publication?
yes
id
0696abeb-0295-455e-b983-488b739cbcf1 (old id 545856)
date added to LUP
2016-04-01 15:21:26
date last changed
2018-11-21 20:34:02
@phdthesis{0696abeb-0295-455e-b983-488b739cbcf1,
  abstract     = {{Very little is known about a considerable part of all proteins and it is time consuming<br/><br>
<br/><br>
and expensive to study each individual protein to determine its function,<br/><br>
<br/><br>
structure and cellular role. Proteins retain structural, functional and sequential<br/><br>
<br/><br>
characteristics from ancestral proteins and hence two proteins that share a common<br/><br>
<br/><br>
ancestor, i.e. are homologs, will to some extent have similar sequence,<br/><br>
<br/><br>
structure and function. One way to learn something about a protein is to identify<br/><br>
<br/><br>
its homologous and use information from those homologs to annotate the protein<br/><br>
<br/><br>
of interest. Close homologs with a common ancestor can be detected using sequence<br/><br>
<br/><br>
alone, but more distant homologs cannot. Structure is more conserved<br/><br>
<br/><br>
than sequence and enables detection of a common ancestor between more distantly<br/><br>
<br/><br>
related proteins and thereby also enabling transfer of information to a<br/><br>
<br/><br>
larger fraction of the uncharacterized proteins. This thesis covers my efforts to<br/><br>
<br/><br>
develop a method to use ab initio protein structure prediction to detect distant<br/><br>
<br/><br>
homologs and use the homologs to annotate proteins from the genome of Saccharomyces<br/><br>
<br/><br>
cerevisiae.<br/><br>
<br/><br>
The ab initio protein structure prediction software used in this thesis, Rosetta,<br/><br>
<br/><br>
can predict a protein's tertiary structure using the amino acid sequence alone.<br/><br>
<br/><br>
Rosetta works by reducing the search space by approximating the local conformation<br/><br>
<br/><br>
with conformations from the protein data bank, and judging the over all<br/><br>
<br/><br>
fitness of the simulated protein structure through a statistically derived energy<br/><br>
<br/><br>
function. The program has been successful in the last three Critical assessment of<br/><br>
<br/><br>
techniques for protein structure prediction (CASP) and the results from the last<br/><br>
<br/><br>
2<br/><br>
<br/><br>
CASP is reported in Paper I. Distant homologs can be detected by comparing the<br/><br>
<br/><br>
structures generated by Rosetta with structures from the Protein Data Bank<br/><br>
<br/><br>
(PDB). In general, however, such a comparison is noisy, that is, gives many answers,<br/><br>
<br/><br>
of which only a few are correct. The noise can be filtered out by utilizing<br/><br>
<br/><br>
the fact that there is a strong relationship between protein function and protein<br/><br>
<br/><br>
structure, and either use functional information from a database or infer functional<br/><br>
<br/><br>
information from one or more experimental high-throughput technologies.<br/><br>
<br/><br>
This idea was tested in Paper II were 100 proteins were investigated using protein<br/><br>
<br/><br>
structure prediction, yeast two hybrid, fluorescent microscopy and mass<br/><br>
<br/><br>
spectrometry. The data from all four technologies was integrated and 77% of the<br/><br>
<br/><br>
proteins were assigned a function.<br/><br>
<br/><br>
Data integration is very labor-intensive when done by hand, and the amount of<br/><br>
<br/><br>
information generated for each protein investigated is substantial. Everything<br/><br>
<br/><br>
needs to be automated and all data have to be stored and managed in an efficient<br/><br>
<br/><br>
way to be able to apply this technology on a genome-wide scale. Paper III and<br/><br>
<br/><br>
Paper IV cover information management, that is, how the data used and produced<br/><br>
<br/><br>
in the project is organized and stored. Paper V reports both how we automated<br/><br>
<br/><br>
the integration process using the software described in Paper I and II and<br/><br>
<br/><br>
the application of the technology to the genome of Saccharomyces cerevisiae.}},
  author       = {{Malmström, Lars}},
  isbn         = {{91-628-6689-3}},
  keywords     = {{Biomedicinska vetenskaper; Biomedical sciences; Saccharymyces cerevisiae; Ab initio protein structure prediction; Protein annotation}},
  language     = {{eng}},
  publisher    = {{Department of Electrical Measurements, Lund University}},
  school       = {{Lund University}},
  title        = {{Genome-wide structural and functional protein characterization by ab initio protein structure prediction}},
  year         = {{2005}},
}