Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

A Novel Predictor for the Analysis and Prediction of Enhancers and Their Strength via Multi-View Features and Deep Forest

Gill, Mehwish ; Ahmed, Saeed LU ; Kabir, Muhammad LU and Hayat, Maqsood (2023) In Information (Switzerland) 14(12).
Abstract

Enhancers are short DNA segments (50–1500 bp) that effectively activate gene transcription when transcription factors (TFs) are present. There is a correlation between the genetic differences in enhancers and numerous human disorders including cancer and inflammatory bowel disease. In computational biology, the accurate categorization of enhancers can yield important information for drug discovery and development. High-throughput experimental approaches are thought to be vital tools for researching enhancers’ key characteristics; however, because these techniques require a lot of labor and time, it might be difficult for researchers to forecast enhancers and their powers. Therefore, computational techniques are considered an alternate... (More)

Enhancers are short DNA segments (50–1500 bp) that effectively activate gene transcription when transcription factors (TFs) are present. There is a correlation between the genetic differences in enhancers and numerous human disorders including cancer and inflammatory bowel disease. In computational biology, the accurate categorization of enhancers can yield important information for drug discovery and development. High-throughput experimental approaches are thought to be vital tools for researching enhancers’ key characteristics; however, because these techniques require a lot of labor and time, it might be difficult for researchers to forecast enhancers and their powers. Therefore, computational techniques are considered an alternate strategy for handling this issue. Based on the types of algorithms that have been used to construct predictors, the current methodologies can be divided into three primary categories: ensemble-based methods, deep learning-based approaches, and traditional ML-based techniques. In this study, we developed a novel two-layer deep forest-based predictor for accurate enhancer and strength prediction, namely, NEPERS. Enhancers and non-enhancers are divided at the first level by NEPERS, whereas strong and weak enhancers are divided at the second level. To evaluate the effectiveness of feature fusion, block-wise deep forest and other algorithms were combined with multi-view features such as PSTNPss, PSTNPdss, CKSNAP, and NCP via 10-fold cross-validation and independent testing. Our proposed technique performs better than competing models across all parameters, with an ACC of 0.876, Sen of 0.864, Spe of 0.888, MCC of 0.753, and AUC of 0.940 for layer 1 and an ACC of 0.959, Sen of 0.960, Spe of 0.958, MCC of 0.918, and AUC of 0.990 for layer 2, respectively, for the benchmark dataset. Similarly, for the independent test, the ACC, Sen, Spe, MCC, and AUC were 0.863, 0.865, 0.860, 0.725, and 0.948 for layer 1 and 0.890, 0.940, 0.840, 0.784, and 0.951 for layer 2, respectively. This study provides conclusive insights for the accurate and effective detection and characterization of enhancers and their strengths.

(Less)
Please use this url to cite or link to this publication:
author
; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
bioinformatics, classification, deep forest, deep learning, enhancers, feature representation, learning algorithms, sequence-based models
in
Information (Switzerland)
volume
14
issue
12
article number
636
publisher
MDPI AG
external identifiers
  • scopus:85180261867
ISSN
2078-2489
DOI
10.3390/info14120636
language
English
LU publication?
yes
id
45ce223c-35b3-466d-9c03-af2e8da1ad14
date added to LUP
2024-01-04 10:40:18
date last changed
2024-01-04 13:20:20
@article{45ce223c-35b3-466d-9c03-af2e8da1ad14,
  abstract     = {{<p>Enhancers are short DNA segments (50–1500 bp) that effectively activate gene transcription when transcription factors (TFs) are present. There is a correlation between the genetic differences in enhancers and numerous human disorders including cancer and inflammatory bowel disease. In computational biology, the accurate categorization of enhancers can yield important information for drug discovery and development. High-throughput experimental approaches are thought to be vital tools for researching enhancers’ key characteristics; however, because these techniques require a lot of labor and time, it might be difficult for researchers to forecast enhancers and their powers. Therefore, computational techniques are considered an alternate strategy for handling this issue. Based on the types of algorithms that have been used to construct predictors, the current methodologies can be divided into three primary categories: ensemble-based methods, deep learning-based approaches, and traditional ML-based techniques. In this study, we developed a novel two-layer deep forest-based predictor for accurate enhancer and strength prediction, namely, NEPERS. Enhancers and non-enhancers are divided at the first level by NEPERS, whereas strong and weak enhancers are divided at the second level. To evaluate the effectiveness of feature fusion, block-wise deep forest and other algorithms were combined with multi-view features such as PSTNPss, PSTNPdss, CKSNAP, and NCP via 10-fold cross-validation and independent testing. Our proposed technique performs better than competing models across all parameters, with an ACC of 0.876, Sen of 0.864, Spe of 0.888, MCC of 0.753, and AUC of 0.940 for layer 1 and an ACC of 0.959, Sen of 0.960, Spe of 0.958, MCC of 0.918, and AUC of 0.990 for layer 2, respectively, for the benchmark dataset. Similarly, for the independent test, the ACC, Sen, Spe, MCC, and AUC were 0.863, 0.865, 0.860, 0.725, and 0.948 for layer 1 and 0.890, 0.940, 0.840, 0.784, and 0.951 for layer 2, respectively. This study provides conclusive insights for the accurate and effective detection and characterization of enhancers and their strengths.</p>}},
  author       = {{Gill, Mehwish and Ahmed, Saeed and Kabir, Muhammad and Hayat, Maqsood}},
  issn         = {{2078-2489}},
  keywords     = {{bioinformatics; classification; deep forest; deep learning; enhancers; feature representation; learning algorithms; sequence-based models}},
  language     = {{eng}},
  number       = {{12}},
  publisher    = {{MDPI AG}},
  series       = {{Information (Switzerland)}},
  title        = {{A Novel Predictor for the Analysis and Prediction of Enhancers and Their Strength via Multi-View Features and Deep Forest}},
  url          = {{http://dx.doi.org/10.3390/info14120636}},
  doi          = {{10.3390/info14120636}},
  volume       = {{14}},
  year         = {{2023}},
}