Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences

Smakaj, Erand; Babrak, Lmar; Ohlin, Mats; Shugay, Mikhail; Briney, Bryan; Tosoni, Deniz; Galli, Christopher; Grobelsek, Vendi; D'Angelo, Igor; Olson, Branden; Reddy, Sai; Greiff, Victor; Trück, Johannes; Marquez, Susanna; Lees, William; Miho, Enkelejda

Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences

Mark

Smakaj, Erand ; Babrak, Lmar ; Ohlin, Mats ^LU

; Shugay, Mikhail ; Briney, Bryan ; Tosoni, Deniz ; Galli, Christopher ; Grobelsek, Vendi ; D'Angelo, Igor and Olson, Branden , et al. (2020) In Bioinformatics 36(6). p.1731-1739

Abstract: SUMMARY: Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the correct annotation of these sequences; however, a thorough comparison of the performance of these tools has not been investigated. Here, we benchmark the performance of commonly used immunoinformatic tools, i.e. IMGT/HighV-QUEST, IgBLAST and MiXCR, in terms of reproducibility of annotation output, accuracy and speed using simulated and experimental high-throughput sequencing datasets.We analyzed changes in IMGT reference germline database in... (More); SUMMARY: Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the correct annotation of these sequences; however, a thorough comparison of the performance of these tools has not been investigated. Here, we benchmark the performance of commonly used immunoinformatic tools, i.e. IMGT/HighV-QUEST, IgBLAST and MiXCR, in terms of reproducibility of annotation output, accuracy and speed using simulated and experimental high-throughput sequencing datasets.We analyzed changes in IMGT reference germline database in the last 10 years in order to assess the reproducibility of the annotation output. We found that only 73/183 (40%) V, D and J human genes were shared between the reference germline sets used by the tools. We found that the annotation results differed between tools. In terms of alignment accuracy, MiXCR had the highest average frequency of gene mishits, 0.02 mishit frequency and IgBLAST the lowest, 0.004 mishit frequency. Reproducibility in the output of complementarity determining three regions (CDR3 amino acids) ranged from 4.3% to 77.6% with preprocessed data. In addition, run time of the tools was assessed: MiXCR was the fastest tool for number of sequences processed per unit of time. These results indicate that immunoinformatic analyses greatly depend on the choice of bioinformatics tool. Our results support informed decision-making to immunoinformaticians based on repertoire composition and sequencing platforms. AVAILABILITY AND IMPLEMENTATION: All tools utilized in the paper are free for academic use. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/505859a4-c98a-49d5-832b-f9124417e54f

author

Smakaj, Erand ; Babrak, Lmar ; Ohlin, Mats ^LU

; Shugay, Mikhail ; Briney, Bryan ; Tosoni, Deniz ; Galli, Christopher ; Grobelsek, Vendi ; D'Angelo, Igor and Olson, Branden , et al. (More)

Smakaj, Erand ; Babrak, Lmar ; Ohlin, Mats ^LU

; Shugay, Mikhail ; Briney, Bryan ; Tosoni, Deniz ; Galli, Christopher ; Grobelsek, Vendi ; D'Angelo, Igor ; Olson, Branden ; Reddy, Sai ; Greiff, Victor ; Trück, Johannes ; Marquez, Susanna ; Lees, William and Miho, Enkelejda (Less)

organization

publishing date

2020

type

Contribution to journal

publication status

published

subject

Medical Laboratory Technologies

in

Bioinformatics

volume

36

issue

6

pages

9 pages

publisher

Oxford University Press

external identifiers

pmid:31873728
scopus:85082147889

ISSN

1367-4803

DOI

10.1093/bioinformatics/btz845

language

English

LU publication?

yes

id

505859a4-c98a-49d5-832b-f9124417e54f

date added to LUP

2020-04-07 09:46:02

date last changed

2025-10-14 13:23:26

@article{505859a4-c98a-49d5-832b-f9124417e54f,
  abstract     = {{<p>SUMMARY: Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the correct annotation of these sequences; however, a thorough comparison of the performance of these tools has not been investigated. Here, we benchmark the performance of commonly used immunoinformatic tools, i.e. IMGT/HighV-QUEST, IgBLAST and MiXCR, in terms of reproducibility of annotation output, accuracy and speed using simulated and experimental high-throughput sequencing datasets.We analyzed changes in IMGT reference germline database in the last 10 years in order to assess the reproducibility of the annotation output. We found that only 73/183 (40%) V, D and J human genes were shared between the reference germline sets used by the tools. We found that the annotation results differed between tools. In terms of alignment accuracy, MiXCR had the highest average frequency of gene mishits, 0.02 mishit frequency and IgBLAST the lowest, 0.004 mishit frequency. Reproducibility in the output of complementarity determining three regions (CDR3 amino acids) ranged from 4.3% to 77.6% with preprocessed data. In addition, run time of the tools was assessed: MiXCR was the fastest tool for number of sequences processed per unit of time. These results indicate that immunoinformatic analyses greatly depend on the choice of bioinformatics tool. Our results support informed decision-making to immunoinformaticians based on repertoire composition and sequencing platforms. AVAILABILITY AND IMPLEMENTATION: All tools utilized in the paper are free for academic use. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.</p>}},
  author       = {{Smakaj, Erand and Babrak, Lmar and Ohlin, Mats and Shugay, Mikhail and Briney, Bryan and Tosoni, Deniz and Galli, Christopher and Grobelsek, Vendi and D'Angelo, Igor and Olson, Branden and Reddy, Sai and Greiff, Victor and Trück, Johannes and Marquez, Susanna and Lees, William and Miho, Enkelejda}},
  issn         = {{1367-4803}},
  language     = {{eng}},
  number       = {{6}},
  pages        = {{1731--1739}},
  publisher    = {{Oxford University Press}},
  series       = {{Bioinformatics}},
  title        = {{Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences}},
  url          = {{http://dx.doi.org/10.1093/bioinformatics/btz845}},
  doi          = {{10.1093/bioinformatics/btz845}},
  volume       = {{36}},
  year         = {{2020}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences