Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Pseudomonas aeruginosa gene expression analysis using pangenome and PAO1 reference genomes

Su, Yi (2023) BINP51 20222
Degree Projects in Bioinformatics
Abstract
Development in sequencing technologies has made the analyses of genetic material much more accessible. Processing sequenced data for an accurate analysis comes with its challenges, especially with the studies in microbial in clinical in vivo samples where difficulties in the collection of these samples for sequencing could lower the quality and contamination from the human host which might affect the accuracy of downstream analysis. In this project, we use RNA-seq and different reference genomes to look at the differential gene expression of Pseudomonas aeruginosa (PA), one of the most prevalent species of bacterial pathogens in the progression of chronic pulmonary diseases such as cystic fibrosis, due to its resistance to antimicrobial... (More)
Development in sequencing technologies has made the analyses of genetic material much more accessible. Processing sequenced data for an accurate analysis comes with its challenges, especially with the studies in microbial in clinical in vivo samples where difficulties in the collection of these samples for sequencing could lower the quality and contamination from the human host which might affect the accuracy of downstream analysis. In this project, we use RNA-seq and different reference genomes to look at the differential gene expression of Pseudomonas aeruginosa (PA), one of the most prevalent species of bacterial pathogens in the progression of chronic pulmonary diseases such as cystic fibrosis, due to its resistance to antimicrobial treatment. In this project, we created a pangenome from 21 strains of PA and explored the use of this, its subsets (core and soft-core gene sets) and a commonly used PA genome (PAO1) as reference genomes. We compared some of the differences and similarities in the results using the four gene sets, including mapping transcripts while developing a feasible pipeline to process raw sample reads from human sputum samples for differential expression and gene ontology enrichment analysis. From the analyses, we have found that differentially expressed genes upregulated in in vivo samples were related to biofilm, which plays a role in the difficulties in the treatment of PA infections, across most of the various genome reference-based results. (Less)
Popular Abstract
How a “pan-genome” can help discover which genes that bacteria use to survive inside the human body

Have you ever wondered about the microscopic world that exists in our bodies? Would they attack us at our moment of weakness? For people who are affected by cystic fibrosis, a disease that affects the lungs by causing a buildup of fluids, this can become a lifetime warfare against microbes such as Pseudomonas aeruginosa (PA). This tricky species of bacteria can change its growth form, making it difficult to treat patients who are infected.

Thanks to a large set of genes in its chromosome, PA can change its growth mode from free-living cells in a liquid to forming a thick layer of cells known as a biofilm, by switching on or off... (More)
How a “pan-genome” can help discover which genes that bacteria use to survive inside the human body

Have you ever wondered about the microscopic world that exists in our bodies? Would they attack us at our moment of weakness? For people who are affected by cystic fibrosis, a disease that affects the lungs by causing a buildup of fluids, this can become a lifetime warfare against microbes such as Pseudomonas aeruginosa (PA). This tricky species of bacteria can change its growth form, making it difficult to treat patients who are infected.

Thanks to a large set of genes in its chromosome, PA can change its growth mode from free-living cells in a liquid to forming a thick layer of cells known as a biofilm, by switching on or off various genetic materials. Since different strains of PA have different sets of genes, it is difficult to study common traits between strains and the importance of each gene. Hence, we want to use a more inclusive approach to explore their gene expression. Here we create a pan-genome, a complete set of genes of different strains to use as a reference when studying how different PA strains use their particular set of genes. We used this to explore what kind of genes are more expressed in human samples from people with PA in their lungs, compared to PA grown in a lab.

Improvement in sequencing technology has made the analyses of genetic material more accessible and the use of software tools has become essential in processing these data. We created a pan-genome from 21 strains of PA. From this, we identified a gene set that was shared among all these strains, i.e., the core gene set. While exploring the pan-genome, we found most genes that support their survival are found more commonly in their core gene set.

To highlight the importance of which gene set that is used as a reference, we examined how the results from the analysis changed depending on which gene set we used as a reference. We did this by comparing sequenced RNA from the samples by pseudo-aligning or “matching” with the data in the gene sets. We created and employed a bioinformatic workflow that involves steps in removing the human data from the sequences and “matching” all sample sequences to one of the reference gene sets. The resulting counts were imported into the statistical software R, where statistical analysis was used to identify PA genes that were more in use in the sputum samples than in bacterial cells cultured in the lab. To aid the biological interpretation, these genes were categorized into gene ontology (GO) terms to inform us more about what the genes do. We compared the results from the different gene sets with a “differential expression analysis”.

From the differential expression analysis, we discovered that PA genes related to biofilm were more highly expressed in patient samples. Biofilm also plays a role in the difficult treatment of PA infections. However, the genes in each set of results were different in the specific role they play in biofilm. For example, the biofilm-related genes that are highly important to explain why PA survive in the lungs were not identified in some of the reference gene sets.

In conclusion, the choice of reference gene sets used resulted in finding different types of genes expressed, which might have not been discovered if we only used a singular strain, or the wrong gene set, as a reference. The use of bacterial gene expression analysis in clinical samples is a new research field and this insight may be important in future studies that focus on bacterial antibiotic resistance, biofilm or virulence. Future pan-genome studies can be upscaled to use more strains which would in turn include more variation that can be explored. Perhaps we might find a special way PA express themselves in human sputum which makes them easier to treat.

Master’s Degree Project in Bioinformatics BINP51 credits 45
Department of Biology, Lund University

Advisor: Magnus Paulsson
Infection medicine, Department of Clinical Sciences Lund, Medical faculty, Lund University (Less)
Please use this url to cite or link to this publication:
author
Su, Yi
supervisor
organization
course
BINP51 20222
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
9140075
date added to LUP
2023-10-16 14:41:50
date last changed
2023-10-16 14:41:50
@misc{9140075,
  abstract     = {{Development in sequencing technologies has made the analyses of genetic material much more accessible. Processing sequenced data for an accurate analysis comes with its challenges, especially with the studies in microbial in clinical in vivo samples where difficulties in the collection of these samples for sequencing could lower the quality and contamination from the human host which might affect the accuracy of downstream analysis. In this project, we use RNA-seq and different reference genomes to look at the differential gene expression of Pseudomonas aeruginosa (PA), one of the most prevalent species of bacterial pathogens in the progression of chronic pulmonary diseases such as cystic fibrosis, due to its resistance to antimicrobial treatment. In this project, we created a pangenome from 21 strains of PA and explored the use of this, its subsets (core and soft-core gene sets) and a commonly used PA genome (PAO1) as reference genomes. We compared some of the differences and similarities in the results using the four gene sets, including mapping transcripts while developing a feasible pipeline to process raw sample reads from human sputum samples for differential expression and gene ontology enrichment analysis. From the analyses, we have found that differentially expressed genes upregulated in in vivo samples were related to biofilm, which plays a role in the difficulties in the treatment of PA infections, across most of the various genome reference-based results.}},
  author       = {{Su, Yi}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Pseudomonas aeruginosa gene expression analysis using pangenome and PAO1 reference genomes}},
  year         = {{2023}},
}