Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

An Object-Oriented Regression for Building Disease Predictive Models with Multiallelic HLA Genes

Zhao, Lue Ping ; Bolouri, Hamid ; Zhao, Michael ; Geraghty, Daniel E. and Lernmark, Åke LU orcid (2016) In Genetic Epidemiology 40(4). p.315-332
Abstract

Recent genome-wide association studies confirm that human leukocyte antigen (HLA) genes have the strongest associations with several autoimmune diseases, including type 1 diabetes (T1D), providing an impetus to reduce this genetic association to practice through an HLA-based disease predictive model. However, conventional model-building methods tend to be suboptimal when predictors are highly polymorphic with many rare alleles combined with complex patterns of sequence homology within and between genes. To circumvent this challenge, we describe an alternative methodology; treating complex genotypes of HLA genes as "objects" or "exemplars," one focuses on systemic associations of disease phenotype with "objects" via similarity... (More)

Recent genome-wide association studies confirm that human leukocyte antigen (HLA) genes have the strongest associations with several autoimmune diseases, including type 1 diabetes (T1D), providing an impetus to reduce this genetic association to practice through an HLA-based disease predictive model. However, conventional model-building methods tend to be suboptimal when predictors are highly polymorphic with many rare alleles combined with complex patterns of sequence homology within and between genes. To circumvent this challenge, we describe an alternative methodology; treating complex genotypes of HLA genes as "objects" or "exemplars," one focuses on systemic associations of disease phenotype with "objects" via similarity measurements. Conceptually, this approach assigns disease risks base on complex genotype profiles instead of specific disease-associated genotypes or alleles. Effectively, it transforms large, discrete, and sparse HLA genotypes into a matrix of similarity-based covariates. By the Kernel representative theorem and machine learning techniques, it uses a penalized likelihood method to select disease-associated exemplars in building predictive models. To illustrate this methodology, we apply it to a T1D study with eight HLA genes (HLA-DRB1, HLA-DRB3, HLA-DRB4, HLA-DRB5, HLA-DQA1, HLA-DQB1, HLA-DPA1, and HLA-DPB1) to build a predictive model. The resulted predictive model has an area under curve of 0.92 in the training set, and 0.89 in the validating set, indicating that this methodology is useful to build predictive models with complex HLA genotypes.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Generalized linear model, Kernel machine, Multiallelic genotypes, Penalized regression, Prediction, Similarity regression, Statistical learning
in
Genetic Epidemiology
volume
40
issue
4
pages
18 pages
publisher
John Wiley & Sons Inc.
external identifiers
  • scopus:84963701149
  • wos:000374542600006
  • pmid:27080919
ISSN
0741-0395
DOI
10.1002/gepi.21968
language
English
LU publication?
yes
id
9179d8e8-07a8-49f4-a894-74ce4c399537
date added to LUP
2016-05-10 13:22:56
date last changed
2024-04-04 21:22:06
@article{9179d8e8-07a8-49f4-a894-74ce4c399537,
  abstract     = {{<p>Recent genome-wide association studies confirm that human leukocyte antigen (HLA) genes have the strongest associations with several autoimmune diseases, including type 1 diabetes (T1D), providing an impetus to reduce this genetic association to practice through an HLA-based disease predictive model. However, conventional model-building methods tend to be suboptimal when predictors are highly polymorphic with many rare alleles combined with complex patterns of sequence homology within and between genes. To circumvent this challenge, we describe an alternative methodology; treating complex genotypes of HLA genes as "objects" or "exemplars," one focuses on systemic associations of disease phenotype with "objects" via similarity measurements. Conceptually, this approach assigns disease risks base on complex genotype profiles instead of specific disease-associated genotypes or alleles. Effectively, it transforms large, discrete, and sparse HLA genotypes into a matrix of similarity-based covariates. By the Kernel representative theorem and machine learning techniques, it uses a penalized likelihood method to select disease-associated exemplars in building predictive models. To illustrate this methodology, we apply it to a T1D study with eight HLA genes (HLA-DRB1, HLA-DRB3, HLA-DRB4, HLA-DRB5, HLA-DQA1, HLA-DQB1, HLA-DPA1, and HLA-DPB1) to build a predictive model. The resulted predictive model has an area under curve of 0.92 in the training set, and 0.89 in the validating set, indicating that this methodology is useful to build predictive models with complex HLA genotypes.</p>}},
  author       = {{Zhao, Lue Ping and Bolouri, Hamid and Zhao, Michael and Geraghty, Daniel E. and Lernmark, Åke}},
  issn         = {{0741-0395}},
  keywords     = {{Generalized linear model; Kernel machine; Multiallelic genotypes; Penalized regression; Prediction; Similarity regression; Statistical learning}},
  language     = {{eng}},
  month        = {{05}},
  number       = {{4}},
  pages        = {{315--332}},
  publisher    = {{John Wiley & Sons Inc.}},
  series       = {{Genetic Epidemiology}},
  title        = {{An Object-Oriented Regression for Building Disease Predictive Models with Multiallelic HLA Genes}},
  url          = {{http://dx.doi.org/10.1002/gepi.21968}},
  doi          = {{10.1002/gepi.21968}},
  volume       = {{40}},
  year         = {{2016}},
}