Genome-wide structural and functional protein characterization by ab initio protein structure prediction

Malmström, Lars

Genome-wide structural and functional protein characterization by ab initio protein structure prediction

Mark

Malmström, Lars ^LU (2005)

Abstract: Very little is known about a considerable part of all proteins and it is time consuming

and expensive to study each individual protein to determine its function,

structure and cellular role. Proteins retain structural, functional and sequential

characteristics from ancestral proteins and hence two proteins that share a common

ancestor, i.e. are homologs, will to some extent have similar sequence,

structure and function. One way to learn something about a protein is to identify

its homologous and use information from those homologs to annotate the protein

of interest. Close homologs with a common ancestor can... (More); Very little is known about a considerable part of all proteins and it is time consuming

and expensive to study each individual protein to determine its function,

structure and cellular role. Proteins retain structural, functional and sequential

characteristics from ancestral proteins and hence two proteins that share a common

ancestor, i.e. are homologs, will to some extent have similar sequence,

structure and function. One way to learn something about a protein is to identify

its homologous and use information from those homologs to annotate the protein

of interest. Close homologs with a common ancestor can be detected using sequence

alone, but more distant homologs cannot. Structure is more conserved

than sequence and enables detection of a common ancestor between more distantly

related proteins and thereby also enabling transfer of information to a

larger fraction of the uncharacterized proteins. This thesis covers my efforts to

develop a method to use ab initio protein structure prediction to detect distant

homologs and use the homologs to annotate proteins from the genome of Saccharomyces

cerevisiae.

The ab initio protein structure prediction software used in this thesis, Rosetta,

can predict a protein's tertiary structure using the amino acid sequence alone.

Rosetta works by reducing the search space by approximating the local conformation

with conformations from the protein data bank, and judging the over all

fitness of the simulated protein structure through a statistically derived energy

function. The program has been successful in the last three Critical assessment of

techniques for protein structure prediction (CASP) and the results from the last

2

CASP is reported in Paper I. Distant homologs can be detected by comparing the

structures generated by Rosetta with structures from the Protein Data Bank

(PDB). In general, however, such a comparison is noisy, that is, gives many answers,

of which only a few are correct. The noise can be filtered out by utilizing

the fact that there is a strong relationship between protein function and protein

structure, and either use functional information from a database or infer functional

information from one or more experimental high-throughput technologies.

This idea was tested in Paper II were 100 proteins were investigated using protein

structure prediction, yeast two hybrid, fluorescent microscopy and mass

spectrometry. The data from all four technologies was integrated and 77% of the

proteins were assigned a function.

Data integration is very labor-intensive when done by hand, and the amount of

information generated for each protein investigated is substantial. Everything

needs to be automated and all data have to be stored and managed in an efficient

way to be able to apply this technology on a genome-wide scale. Paper III and

Paper IV cover information management, that is, how the data used and produced

in the project is organized and stored. Paper V reports both how we automated

the integration process using the software described in Paper I and II and

the application of the technology to the genome of Saccharomyces cerevisiae. (Less)
Abstract (Swedish): Popular Abstract in Swedish

Många av alla kemiska processer som pågår i våra kroppar utförs av proteiner.

När de inte fungerar blir vi sjuka. I vissa fall, till exempel vid Alzheimers sjukdom

eller Creutzfeld Jacobs sjukdom, beror det på att ett visst protein har fel

form. Att förstå hur proteiner antar sin slutliga form och vad det har för inverkan

på proteinets funktion är således viktigt. År 2005 kostade det mellan en och ett

par miljoner att mäta formen på ett enda protein på grund av att utrustning är dyr

och det krävs mycket arbete. På sextiotalet upptäckte Ryle att proteiner verkar ha

... (More); Popular Abstract in Swedish

Många av alla kemiska processer som pågår i våra kroppar utförs av proteiner.

När de inte fungerar blir vi sjuka. I vissa fall, till exempel vid Alzheimers sjukdom

eller Creutzfeld Jacobs sjukdom, beror det på att ett visst protein har fel

form. Att förstå hur proteiner antar sin slutliga form och vad det har för inverkan

på proteinets funktion är således viktigt. År 2005 kostade det mellan en och ett

par miljoner att mäta formen på ett enda protein på grund av att utrustning är dyr

och det krävs mycket arbete. På sextiotalet upptäckte Ryle att proteiner verkar ha

en ritning för vilken form de antar inbyggt i ordningen på aminosyrorna,

proteinernas byggstenar. Sedan dess har det lagts ner mycket tid på att försöka

förstå och kunna förutspå vilken form ett protein får när man bara vet ordningen

på amino syrorna. Under de senaste tio åren har teknologin blivit bättre. I detta

arbete har jag användt mig av Rosetta, ett mjukvaroprogram, som utvecklas av

David Baker vid University of Washington. Rosetta kan förutsäga vilken form

ett protein har utifrån ordningen av aminosyrorna. Genom att använda Rosetta på

alla proteiner i jäst och kombinera resultatet med information både från experimentella

tekniker och databaser har vi lyckats öka förståelsen för hur jäst

fungerar och vad styrkorna och svagheterna är med den teknologi som vi utvecklat.

Förhoppningen är att denna information leder till en ökad förståelse i biologin

som helhet. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/545856

author

Malmström, Lars ^LU

supervisor

Thomas Laurell ^LU

opponent

Dr Fenyö, David, The Rockefeller University, New York, New York, USA

organization

Division for Biomedical Engineering

publishing date

2005

type

Thesis

publication status

published

subject

Medical Engineering

keywords

Biomedicinska vetenskaper, Biomedical sciences, Saccharymyces cerevisiae, Ab initio protein structure prediction, Protein annotation

pages

200 pages

publisher

Department of Electrical Measurements, Lund University

defense location

Room E:1406, E-building, Ole Römers väg 3, Lund Institute of Technology

defense date

2005-12-16 10:15:00

ISBN

91-628-6689-3

language

English

LU publication?

yes

id

0696abeb-0295-455e-b983-488b739cbcf1 (old id 545856)

date added to LUP

2016-04-01 15:21:26

date last changed

2025-04-04 15:05:45

@phdthesis{0696abeb-0295-455e-b983-488b739cbcf1,
  abstract     = {{Very little is known about a considerable part of all proteins and it is time consuming<br/><br>
<br/><br>
and expensive to study each individual protein to determine its function,<br/><br>
<br/><br>
structure and cellular role. Proteins retain structural, functional and sequential<br/><br>
<br/><br>
characteristics from ancestral proteins and hence two proteins that share a common<br/><br>
<br/><br>
ancestor, i.e. are homologs, will to some extent have similar sequence,<br/><br>
<br/><br>
structure and function. One way to learn something about a protein is to identify<br/><br>
<br/><br>
its homologous and use information from those homologs to annotate the protein<br/><br>
<br/><br>
of interest. Close homologs with a common ancestor can be detected using sequence<br/><br>
<br/><br>
alone, but more distant homologs cannot. Structure is more conserved<br/><br>
<br/><br>
than sequence and enables detection of a common ancestor between more distantly<br/><br>
<br/><br>
related proteins and thereby also enabling transfer of information to a<br/><br>
<br/><br>
larger fraction of the uncharacterized proteins. This thesis covers my efforts to<br/><br>
<br/><br>
develop a method to use ab initio protein structure prediction to detect distant<br/><br>
<br/><br>
homologs and use the homologs to annotate proteins from the genome of Saccharomyces<br/><br>
<br/><br>
cerevisiae.<br/><br>
<br/><br>
The ab initio protein structure prediction software used in this thesis, Rosetta,<br/><br>
<br/><br>
can predict a protein's tertiary structure using the amino acid sequence alone.<br/><br>
<br/><br>
Rosetta works by reducing the search space by approximating the local conformation<br/><br>
<br/><br>
with conformations from the protein data bank, and judging the over all<br/><br>
<br/><br>
fitness of the simulated protein structure through a statistically derived energy<br/><br>
<br/><br>
function. The program has been successful in the last three Critical assessment of<br/><br>
<br/><br>
techniques for protein structure prediction (CASP) and the results from the last<br/><br>
<br/><br>
2<br/><br>
<br/><br>
CASP is reported in Paper I. Distant homologs can be detected by comparing the<br/><br>
<br/><br>
structures generated by Rosetta with structures from the Protein Data Bank<br/><br>
<br/><br>
(PDB). In general, however, such a comparison is noisy, that is, gives many answers,<br/><br>
<br/><br>
of which only a few are correct. The noise can be filtered out by utilizing<br/><br>
<br/><br>
the fact that there is a strong relationship between protein function and protein<br/><br>
<br/><br>
structure, and either use functional information from a database or infer functional<br/><br>
<br/><br>
information from one or more experimental high-throughput technologies.<br/><br>
<br/><br>
This idea was tested in Paper II were 100 proteins were investigated using protein<br/><br>
<br/><br>
structure prediction, yeast two hybrid, fluorescent microscopy and mass<br/><br>
<br/><br>
spectrometry. The data from all four technologies was integrated and 77% of the<br/><br>
<br/><br>
proteins were assigned a function.<br/><br>
<br/><br>
Data integration is very labor-intensive when done by hand, and the amount of<br/><br>
<br/><br>
information generated for each protein investigated is substantial. Everything<br/><br>
<br/><br>
needs to be automated and all data have to be stored and managed in an efficient<br/><br>
<br/><br>
way to be able to apply this technology on a genome-wide scale. Paper III and<br/><br>
<br/><br>
Paper IV cover information management, that is, how the data used and produced<br/><br>
<br/><br>
in the project is organized and stored. Paper V reports both how we automated<br/><br>
<br/><br>
the integration process using the software described in Paper I and II and<br/><br>
<br/><br>
the application of the technology to the genome of Saccharomyces cerevisiae.}},
  author       = {{Malmström, Lars}},
  isbn         = {{91-628-6689-3}},
  keywords     = {{Biomedicinska vetenskaper; Biomedical sciences; Saccharymyces cerevisiae; Ab initio protein structure prediction; Protein annotation}},
  language     = {{eng}},
  publisher    = {{Department of Electrical Measurements, Lund University}},
  school       = {{Lund University}},
  title        = {{Genome-wide structural and functional protein characterization by ab initio protein structure prediction}},
  year         = {{2005}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Genome-wide structural and functional protein characterization by ab initio protein structure prediction