Genome-wide structural and functional protein characterization by ab initio protein structure prediction
(2005)- Abstract
- Very little is known about a considerable part of all proteins and it is time consuming
and expensive to study each individual protein to determine its function,
structure and cellular role. Proteins retain structural, functional and sequential
characteristics from ancestral proteins and hence two proteins that share a common
ancestor, i.e. are homologs, will to some extent have similar sequence,
structure and function. One way to learn something about a protein is to identify
its homologous and use information from those homologs to annotate the protein
of interest. Close homologs with a common ancestor can... (More) - Very little is known about a considerable part of all proteins and it is time consuming
and expensive to study each individual protein to determine its function,
structure and cellular role. Proteins retain structural, functional and sequential
characteristics from ancestral proteins and hence two proteins that share a common
ancestor, i.e. are homologs, will to some extent have similar sequence,
structure and function. One way to learn something about a protein is to identify
its homologous and use information from those homologs to annotate the protein
of interest. Close homologs with a common ancestor can be detected using sequence
alone, but more distant homologs cannot. Structure is more conserved
than sequence and enables detection of a common ancestor between more distantly
related proteins and thereby also enabling transfer of information to a
larger fraction of the uncharacterized proteins. This thesis covers my efforts to
develop a method to use ab initio protein structure prediction to detect distant
homologs and use the homologs to annotate proteins from the genome of Saccharomyces
cerevisiae.
The ab initio protein structure prediction software used in this thesis, Rosetta,
can predict a protein's tertiary structure using the amino acid sequence alone.
Rosetta works by reducing the search space by approximating the local conformation
with conformations from the protein data bank, and judging the over all
fitness of the simulated protein structure through a statistically derived energy
function. The program has been successful in the last three Critical assessment of
techniques for protein structure prediction (CASP) and the results from the last
2
CASP is reported in Paper I. Distant homologs can be detected by comparing the
structures generated by Rosetta with structures from the Protein Data Bank
(PDB). In general, however, such a comparison is noisy, that is, gives many answers,
of which only a few are correct. The noise can be filtered out by utilizing
the fact that there is a strong relationship between protein function and protein
structure, and either use functional information from a database or infer functional
information from one or more experimental high-throughput technologies.
This idea was tested in Paper II were 100 proteins were investigated using protein
structure prediction, yeast two hybrid, fluorescent microscopy and mass
spectrometry. The data from all four technologies was integrated and 77% of the
proteins were assigned a function.
Data integration is very labor-intensive when done by hand, and the amount of
information generated for each protein investigated is substantial. Everything
needs to be automated and all data have to be stored and managed in an efficient
way to be able to apply this technology on a genome-wide scale. Paper III and
Paper IV cover information management, that is, how the data used and produced
in the project is organized and stored. Paper V reports both how we automated
the integration process using the software described in Paper I and II and
the application of the technology to the genome of Saccharomyces cerevisiae. (Less) - Abstract (Swedish)
- Popular Abstract in Swedish
Många av alla kemiska processer som pågår i våra kroppar utförs av proteiner.
När de inte fungerar blir vi sjuka. I vissa fall, till exempel vid Alzheimers sjukdom
eller Creutzfeld Jacobs sjukdom, beror det på att ett visst protein har fel
form. Att förstå hur proteiner antar sin slutliga form och vad det har för inverkan
på proteinets funktion är således viktigt. År 2005 kostade det mellan en och ett
par miljoner att mäta formen på ett enda protein på grund av att utrustning är dyr
och det krävs mycket arbete. På sextiotalet upptäckte Ryle att proteiner verkar ha
... (More) - Popular Abstract in Swedish
Många av alla kemiska processer som pågår i våra kroppar utförs av proteiner.
När de inte fungerar blir vi sjuka. I vissa fall, till exempel vid Alzheimers sjukdom
eller Creutzfeld Jacobs sjukdom, beror det på att ett visst protein har fel
form. Att förstå hur proteiner antar sin slutliga form och vad det har för inverkan
på proteinets funktion är således viktigt. År 2005 kostade det mellan en och ett
par miljoner att mäta formen på ett enda protein på grund av att utrustning är dyr
och det krävs mycket arbete. På sextiotalet upptäckte Ryle att proteiner verkar ha
en ritning för vilken form de antar inbyggt i ordningen på aminosyrorna,
proteinernas byggstenar. Sedan dess har det lagts ner mycket tid på att försöka
förstå och kunna förutspå vilken form ett protein får när man bara vet ordningen
på amino syrorna. Under de senaste tio åren har teknologin blivit bättre. I detta
arbete har jag användt mig av Rosetta, ett mjukvaroprogram, som utvecklas av
David Baker vid University of Washington. Rosetta kan förutsäga vilken form
ett protein har utifrån ordningen av aminosyrorna. Genom att använda Rosetta på
alla proteiner i jäst och kombinera resultatet med information både från experimentella
tekniker och databaser har vi lyckats öka förståelsen för hur jäst
fungerar och vad styrkorna och svagheterna är med den teknologi som vi utvecklat.
Förhoppningen är att denna information leder till en ökad förståelse i biologin
som helhet. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/545856
- author
- Malmström, Lars LU
- supervisor
- opponent
-
- Dr Fenyö, David, The Rockefeller University, New York, New York, USA
- organization
- publishing date
- 2005
- type
- Thesis
- publication status
- published
- subject
- keywords
- Biomedicinska vetenskaper, Biomedical sciences, Saccharymyces cerevisiae, Ab initio protein structure prediction, Protein annotation
- pages
- 200 pages
- publisher
- Department of Electrical Measurements, Lund University
- defense location
- Room E:1406, E-building, Ole Römers väg 3, Lund Institute of Technology
- defense date
- 2005-12-16 10:15:00
- ISBN
- 91-628-6689-3
- language
- English
- LU publication?
- yes
- id
- 0696abeb-0295-455e-b983-488b739cbcf1 (old id 545856)
- date added to LUP
- 2016-04-01 15:21:26
- date last changed
- 2018-11-21 20:34:02
@phdthesis{0696abeb-0295-455e-b983-488b739cbcf1, abstract = {{Very little is known about a considerable part of all proteins and it is time consuming<br/><br> <br/><br> and expensive to study each individual protein to determine its function,<br/><br> <br/><br> structure and cellular role. Proteins retain structural, functional and sequential<br/><br> <br/><br> characteristics from ancestral proteins and hence two proteins that share a common<br/><br> <br/><br> ancestor, i.e. are homologs, will to some extent have similar sequence,<br/><br> <br/><br> structure and function. One way to learn something about a protein is to identify<br/><br> <br/><br> its homologous and use information from those homologs to annotate the protein<br/><br> <br/><br> of interest. Close homologs with a common ancestor can be detected using sequence<br/><br> <br/><br> alone, but more distant homologs cannot. Structure is more conserved<br/><br> <br/><br> than sequence and enables detection of a common ancestor between more distantly<br/><br> <br/><br> related proteins and thereby also enabling transfer of information to a<br/><br> <br/><br> larger fraction of the uncharacterized proteins. This thesis covers my efforts to<br/><br> <br/><br> develop a method to use ab initio protein structure prediction to detect distant<br/><br> <br/><br> homologs and use the homologs to annotate proteins from the genome of Saccharomyces<br/><br> <br/><br> cerevisiae.<br/><br> <br/><br> The ab initio protein structure prediction software used in this thesis, Rosetta,<br/><br> <br/><br> can predict a protein's tertiary structure using the amino acid sequence alone.<br/><br> <br/><br> Rosetta works by reducing the search space by approximating the local conformation<br/><br> <br/><br> with conformations from the protein data bank, and judging the over all<br/><br> <br/><br> fitness of the simulated protein structure through a statistically derived energy<br/><br> <br/><br> function. The program has been successful in the last three Critical assessment of<br/><br> <br/><br> techniques for protein structure prediction (CASP) and the results from the last<br/><br> <br/><br> 2<br/><br> <br/><br> CASP is reported in Paper I. Distant homologs can be detected by comparing the<br/><br> <br/><br> structures generated by Rosetta with structures from the Protein Data Bank<br/><br> <br/><br> (PDB). In general, however, such a comparison is noisy, that is, gives many answers,<br/><br> <br/><br> of which only a few are correct. The noise can be filtered out by utilizing<br/><br> <br/><br> the fact that there is a strong relationship between protein function and protein<br/><br> <br/><br> structure, and either use functional information from a database or infer functional<br/><br> <br/><br> information from one or more experimental high-throughput technologies.<br/><br> <br/><br> This idea was tested in Paper II were 100 proteins were investigated using protein<br/><br> <br/><br> structure prediction, yeast two hybrid, fluorescent microscopy and mass<br/><br> <br/><br> spectrometry. The data from all four technologies was integrated and 77% of the<br/><br> <br/><br> proteins were assigned a function.<br/><br> <br/><br> Data integration is very labor-intensive when done by hand, and the amount of<br/><br> <br/><br> information generated for each protein investigated is substantial. Everything<br/><br> <br/><br> needs to be automated and all data have to be stored and managed in an efficient<br/><br> <br/><br> way to be able to apply this technology on a genome-wide scale. Paper III and<br/><br> <br/><br> Paper IV cover information management, that is, how the data used and produced<br/><br> <br/><br> in the project is organized and stored. Paper V reports both how we automated<br/><br> <br/><br> the integration process using the software described in Paper I and II and<br/><br> <br/><br> the application of the technology to the genome of Saccharomyces cerevisiae.}}, author = {{Malmström, Lars}}, isbn = {{91-628-6689-3}}, keywords = {{Biomedicinska vetenskaper; Biomedical sciences; Saccharymyces cerevisiae; Ab initio protein structure prediction; Protein annotation}}, language = {{eng}}, publisher = {{Department of Electrical Measurements, Lund University}}, school = {{Lund University}}, title = {{Genome-wide structural and functional protein characterization by ab initio protein structure prediction}}, year = {{2005}}, }