Finalizing a rapid algorithm to describe community structure using next generation amplicon sequencing data

Willforss, Jakob

Finalizing a rapid algorithm to describe community structure using next generation amplicon sequencing data

Mark

Willforss, Jakob (2015) BINP30 20151
Degree Projects in Bioinformatics

Abstract: Recent advances in the sequencing technology have led to an increasing pressure on the computer programs used for analysing the sequence data. The programs should be able to provide accurate analysis in a reasonable time span. 16S and 18S rRNA analysis is performed by using PCR to amplify the genes from microorganisms present in a sample followed by sequencing of the amplicons. The result is files containing millions of short sequences. In order to make sense of the data, the sequences are processed and analysed using a variety of computer programs. Those programs are often only able to perform parts of the analysis steps or require extensive knowledge and time to operate properly. There is a need for automating the usage of those tools,... (More); Recent advances in the sequencing technology have led to an increasing pressure on the computer programs used for analysing the sequence data. The programs should be able to provide accurate analysis in a reasonable time span. 16S and 18S rRNA analysis is performed by using PCR to amplify the genes from microorganisms present in a sample followed by sequencing of the amplicons. The result is files containing millions of short sequences. In order to make sense of the data, the sequences are processed and analysed using a variety of computer programs. Those programs are often only able to perform parts of the analysis steps or require extensive knowledge and time to operate properly. There is a need for automating the usage of those tools, and to make the tools simple and intuitive to use.
The aim of this project has been to finalise a previously developed software pipeline called "Amplicon Pipeline" that was able to process and analyse 16S amplicon data and provide the results in a web interface. It was limited to processing 16S data and single samples. This project has involved both implementing support for analysing the 18S gene which allows for classification of eukaryotes and for simultaneous processing of multiple samples. It has also involved preparing the web page and program for a potentially high user load. Furthermore, a stand-alone version of the pipeline has been developed which allows for local processing and which can be implemented into automated work flows.
Finally, Amplicon Pipeline has been further compared with QIIME. It has previously been shown to produce similar results for single 16S datasets. Now, its multiple sample functionality and its 18S classification have been further evaluated with promising results. The standalone version has also been implemented in an automated work flow where it successfully processed and annotated the majority of 500 16S datasets downloaded from the Sequence Read Archive. Amplicon Pipeline can currently be accessed on the following address: http://130.235.244.91/Pipeline/ (Less)
Popular Abstract (Swedish): Amplicon Pipeline - Snabb och automatisk analys av mikroorganismer

Vi är omgivna av mikroorganismer. Bakterier, svampar, parasiter, de omger oss och utgör en stor majoritet av allt levande i världen omkring oss. Vi har betydligt fler bakterier i oss än mänskliga celler, och de har stor betydelse för vårt välmående. Men vi behöver rätt verktyg för att kunna studera dem.

Mikroorganismer är en bråkdel av ett hårstrå i storlek. Bakterier, svampar, parasiter... Vår omvärld kryllar av dem. Vi talar ofta om dem som något som mest orsakar oss problem, men faktum är att de är av stor betydelse för oss. Våra kroppar är fulla av nyttiga bakterier som bland annat hjälper oss att ta upp näringsämnen och spelar en viktig roll för vårt... (More); Amplicon Pipeline - Snabb och automatisk analys av mikroorganismer

Vi är omgivna av mikroorganismer. Bakterier, svampar, parasiter, de omger oss och utgör en stor majoritet av allt levande i världen omkring oss. Vi har betydligt fler bakterier i oss än mänskliga celler, och de har stor betydelse för vårt välmående. Men vi behöver rätt verktyg för att kunna studera dem.

Mikroorganismer är en bråkdel av ett hårstrå i storlek. Bakterier, svampar, parasiter... Vår omvärld kryllar av dem. Vi talar ofta om dem som något som mest orsakar oss problem, men faktum är att de är av stor betydelse för oss. Våra kroppar är fulla av nyttiga bakterier som bland annat hjälper oss att ta upp näringsämnen och spelar en viktig roll för vårt immunförsvar. Svampar har en viktig roll i markens kretslopp och hjälper växter och djur att få de näringsämnen de behöver.

Genom att studera vilka mikroorganismer som lever på en viss plats vid en viss tidpunkt kan vi dra värdefulla slutsatser om den miljö de lever i. Genom att studera vilka organismer som lever till exempel i vår mun eller våra tarmar kan vi lära oss mer om hur vi mår. Ett kraftfullt sätt att studera vilka organismer som lever i en viss miljö är att kopiera en del av deras DNA (arvsmassa), och sedan sekvensera (läsa av DNA-sekvensen) dessa kopior. Kopiorna kallas amplikon. Vi kan sedan jämföra amplikonen med hur sekvensen ser ut hos olika kända organismer. Då kan man lära sig om vilka typer av mikroorganismer som lever i just den miljön vid den tidpunkten.

Ett snabbt och smidigt analysprogram

Det krävs mycket arbete att reda ut vilka typer av mikroorganismer de olika amplikonen kommer ifrån. Det finns en rad datorprogram som hjälper till med detta, men det är många steg och det blir lätt en tidskrävande process att köra programmen. För att underlätta arbetet finns så kallade pipelines, program som kopplar ihop andra program och kör dem automatiskt i följd. Forskarna matar in amplikonsekvenserna i början av pipelinen. Pipelinen kör datan genom de olika programmen och visar slutligen resultatet.

Amplicon Pipeline är ett sådant program. Den utmärker sig jämfört med sina konkurrenter genom att den är mycket lättanvändlig och snabb. För att köra den går man helt enkelt in på en hemsida, skriver in sin mail och laddar upp en fil med amplikonsekvenser. En liten stund senare får man bland annat en översikt över vilka grupper av organismer som finns i provet (se figur 1) tillsammans med mycket annan information.

Under detta projekt har Amplicon Pipeline vidareutveckats. Bland annat kan programmet nu analysera flera prover samtidigt, och är inte begränsat till att analysera bakterier utan kan även studera prover med till exempel svamp och andra små organismer. Dessutom har dess hemsida förberetts för att bättre kunna hantera många användare.

Vår förhoppning är att verktyget nu är redo att användas av forskare, och kommer kunna underlätta deras arbete med att studera mikroorganismer i olika sammanhang.

Handledare: Björn Canbäck
Examensarbete 30 hp i bioinformatik 2015
Biologiska institutionen, Lunds Universitet (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/7374347

author

Willforss, Jakob

supervisor

Björn Canbäck ^LU

organization

Degree Projects in Bioinformatics

course

BINP30 20151

year

2015

type

H2 - Master's Degree (Two Years)

subject

Biology and Life Sciences

language

English

id

7374347

date added to LUP

2015-06-22 08:50:14

date last changed

2015-06-22 08:50:14

@misc{7374347,
abstract = {{Recent advances in the sequencing technology have led to an increasing pressure on the computer programs used for analysing the sequence data. The programs should be able to provide accurate analysis in a reasonable time span. 16S and 18S rRNA analysis is performed by using PCR to amplify the genes from microorganisms present in a sample followed by sequencing of the amplicons. The result is files containing millions of short sequences. In order to make sense of the data, the sequences are processed and analysed using a variety of computer programs. Those programs are often only able to perform parts of the analysis steps or require extensive knowledge and time to operate properly. There is a need for automating the usage of those tools, and to make the tools simple and intuitive to use.
The aim of this project has been to finalise a previously developed software pipeline called "Amplicon Pipeline" that was able to process and analyse 16S amplicon data and provide the results in a web interface. It was limited to processing 16S data and single samples. This project has involved both implementing support for analysing the 18S gene which allows for classification of eukaryotes and for simultaneous processing of multiple samples. It has also involved preparing the web page and program for a potentially high user load. Furthermore, a stand-alone version of the pipeline has been developed which allows for local processing and which can be implemented into automated work flows.
Finally, Amplicon Pipeline has been further compared with QIIME. It has previously been shown to produce similar results for single 16S datasets. Now, its multiple sample functionality and its 18S classification have been further evaluated with promising results. The standalone version has also been implemented in an automated work flow where it successfully processed and annotated the majority of 500 16S datasets downloaded from the Sequence Read Archive. Amplicon Pipeline can currently be accessed on the following address: http://130.235.244.91/Pipeline/}},
author = {{Willforss, Jakob}},
language = {{eng}},
note = {{Student Paper}},
title = {{Finalizing a rapid algorithm to describe community structure using next generation amplicon sequencing data}},
year = {{2015}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Finalizing a rapid algorithm to describe community structure using next generation amplicon sequencing data