Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Critical steps for computational inference of the 3′-end of novel alleles of immunoglobulin heavy chain variable genes - illustrated by an allele of IGHV3-7

Thörnqvist, Linnea LU and Ohlin, Mats LU orcid (2018) In Molecular Immunology 103. p.1-6
Abstract

Sequencing of immunoglobulin germline gene loci is a challenging process, e.g. due to their repetitiveness and complexity, hence limiting the insight in the germline gene repertoire of humans and other species. Through next generation sequencing technology, it is possible to generate immunoglobulin transcript data sets large enough to computationally infer the germline genes from which the transcripts originate. Multiple tools for such inference have been developed and they can be used for construction of individual germline gene databases, and for discovery of new immunoglobulin germline genes and alleles. However, there are challenges associated with these methods, many of them related to the biological process through which... (More)

Sequencing of immunoglobulin germline gene loci is a challenging process, e.g. due to their repetitiveness and complexity, hence limiting the insight in the germline gene repertoire of humans and other species. Through next generation sequencing technology, it is possible to generate immunoglobulin transcript data sets large enough to computationally infer the germline genes from which the transcripts originate. Multiple tools for such inference have been developed and they can be used for construction of individual germline gene databases, and for discovery of new immunoglobulin germline genes and alleles. However, there are challenges associated with these methods, many of them related to the biological process through which immunoglobulin coding genes are generated. The junctional diversity introduced during rearrangement of the immunoglobulin heavy chain variable (IGHV), diversity and joining genes specifically complicates the inference of the junction regions, with implications for inference of the 3′-end of IGHV genes. With the aim of coping with such diversity, an inference software package may not be able to identify novel alleles harbouring a difference in these regions compared to their closest relatives in the starting database. In this study, we were able to computationally infer one such previously uncharacterized allele, IGHV3-7*02 A318G. However, this was possible only if a strategy was used in which different variants of IGHV3-7*02 were included in the inference-initiating database. Importantly, the presence of the novel allele, but not the standard IGHV3-7*02 sequence, in the genotype was strongly supported by the actual sequences that were assigned to the allele. We thus showed that the starting database used will impact the germline gene inference process, and that difference in the 3′-end of IGHV genes may remain undetected unless specific, non-standard procedures are used to address this matter. We suggest that inferred genes/alleles should be confirmed e.g. by examination of the nucleotide composition of the 3′-bases of the inference-supporting sequence reads.

(Less)
Please use this url to cite or link to this publication:
author
and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Antibody, Bioinformatics, Germline gene allelic diversity, Germline gene inference, Immunoglobulin germline gene
in
Molecular Immunology
volume
103
pages
6 pages
publisher
Pergamon Press Ltd.
external identifiers
  • scopus:85052432709
  • pmid:30172112
ISSN
0161-5890
DOI
10.1016/j.molimm.2018.08.018
language
English
LU publication?
yes
id
a3f836c0-fe37-48e1-af75-13b73816477a
date added to LUP
2018-09-25 08:03:55
date last changed
2024-04-01 10:49:53
@article{a3f836c0-fe37-48e1-af75-13b73816477a,
  abstract     = {{<p>Sequencing of immunoglobulin germline gene loci is a challenging process, e.g. due to their repetitiveness and complexity, hence limiting the insight in the germline gene repertoire of humans and other species. Through next generation sequencing technology, it is possible to generate immunoglobulin transcript data sets large enough to computationally infer the germline genes from which the transcripts originate. Multiple tools for such inference have been developed and they can be used for construction of individual germline gene databases, and for discovery of new immunoglobulin germline genes and alleles. However, there are challenges associated with these methods, many of them related to the biological process through which immunoglobulin coding genes are generated. The junctional diversity introduced during rearrangement of the immunoglobulin heavy chain variable (IGHV), diversity and joining genes specifically complicates the inference of the junction regions, with implications for inference of the 3′-end of IGHV genes. With the aim of coping with such diversity, an inference software package may not be able to identify novel alleles harbouring a difference in these regions compared to their closest relatives in the starting database. In this study, we were able to computationally infer one such previously uncharacterized allele, IGHV3-7*02 A318G. However, this was possible only if a strategy was used in which different variants of IGHV3-7*02 were included in the inference-initiating database. Importantly, the presence of the novel allele, but not the standard IGHV3-7*02 sequence, in the genotype was strongly supported by the actual sequences that were assigned to the allele. We thus showed that the starting database used will impact the germline gene inference process, and that difference in the 3′-end of IGHV genes may remain undetected unless specific, non-standard procedures are used to address this matter. We suggest that inferred genes/alleles should be confirmed e.g. by examination of the nucleotide composition of the 3′-bases of the inference-supporting sequence reads.</p>}},
  author       = {{Thörnqvist, Linnea and Ohlin, Mats}},
  issn         = {{0161-5890}},
  keywords     = {{Antibody; Bioinformatics; Germline gene allelic diversity; Germline gene inference; Immunoglobulin germline gene}},
  language     = {{eng}},
  month        = {{11}},
  pages        = {{1--6}},
  publisher    = {{Pergamon Press Ltd.}},
  series       = {{Molecular Immunology}},
  title        = {{Critical steps for computational inference of the 3′-end of novel alleles of immunoglobulin heavy chain variable genes - illustrated by an allele of IGHV3-7}},
  url          = {{http://dx.doi.org/10.1016/j.molimm.2018.08.018}},
  doi          = {{10.1016/j.molimm.2018.08.018}},
  volume       = {{103}},
  year         = {{2018}},
}