Advanced

Genome-wide transcription factor binding site/promoter databases for the analysis of gene sets and co-occurrence of transcription factor binding motifs.

Veerla, Srinivas LU ; Ringnér, Markus LU and Höglund, Mattias LU (2010) In BMC Genomics 11.
Abstract
BACKGROUND: The use of global gene expression profiling is a well established approach to understand biological processes. One of the major goals of these investigations is to identify sets of genes with similar expression patterns. Such gene signatures may be very informative and reveal new aspects of particular biological processes. A logical and systematic next step is to reduce the identified gene signatures to the regulatory components that induce the relevant gene expression changes. A central issue in this context is to identify transcription factors, or transcription factor binding sites (TFBS), likely to be of importance for the expression of the gene signatures. RESULTS: We develop a strategy that efficiently produces... (More)
BACKGROUND: The use of global gene expression profiling is a well established approach to understand biological processes. One of the major goals of these investigations is to identify sets of genes with similar expression patterns. Such gene signatures may be very informative and reveal new aspects of particular biological processes. A logical and systematic next step is to reduce the identified gene signatures to the regulatory components that induce the relevant gene expression changes. A central issue in this context is to identify transcription factors, or transcription factor binding sites (TFBS), likely to be of importance for the expression of the gene signatures. RESULTS: We develop a strategy that efficiently produces TFBS/promoter databases based on user-defined criteria. The resulting databases constitute all genes in the Santa Cruz database and the positions for all TFBS provided by the user as position weight matrices. These databases are then used for two purposes, to identify significant TFBS in the promoters in sets of genes and to identify clusters of co-occurring TFBS. We use two criteria for significance, significantly enriched TFBS in terms of total number of binding sites for the promoters, and significantly present TFBS in terms of the fraction of promoters with binding sites. Significant TFBS are identified by a re-sampling procedure in which the query gene set is compared with typically 10(5) gene lists of similar size randomly drawn from the TFBS/promoter database. We apply this strategy to a large number of published ChIP-Chip data sets and show that the proposed approach faithfully reproduces ChIP-Chip results. The strategy also identifies relevant TFBS when analyzing gene signatures obtained from the MSigDB database. In addition, we show that several TFBS are highly correlated and that co-occurring TFBS define functionally related sets of genes. CONCLUSIONS: The presented approach of promoter analysis faithfully reproduces the results from several ChIP-Chip and MigDB derived gene sets and hence may prove to be an important method in the analysis of gene signatures obtained through ChIP-Chip or global gene expression experiments. We show that TFBS are organized in clusters of co-occurring TFBS that together define highly coherent sets of genes. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
BMC Genomics
volume
11
publisher
BioMed Central
external identifiers
  • wos:000275836000002
  • pmid:20193056
  • scopus:77950402758
ISSN
1471-2164
DOI
10.1186/1471-2164-11-145
project
CREATE Health
language
English
LU publication?
yes
id
a165096e-e794-492d-adad-190fc085cb62 (old id 1582711)
alternative location
http://www.ncbi.nlm.nih.gov/pubmed/20193056?dopt=Abstract
date added to LUP
2010-04-07 11:03:13
date last changed
2017-01-01 07:48:12
@article{a165096e-e794-492d-adad-190fc085cb62,
  abstract     = {BACKGROUND: The use of global gene expression profiling is a well established approach to understand biological processes. One of the major goals of these investigations is to identify sets of genes with similar expression patterns. Such gene signatures may be very informative and reveal new aspects of particular biological processes. A logical and systematic next step is to reduce the identified gene signatures to the regulatory components that induce the relevant gene expression changes. A central issue in this context is to identify transcription factors, or transcription factor binding sites (TFBS), likely to be of importance for the expression of the gene signatures. RESULTS: We develop a strategy that efficiently produces TFBS/promoter databases based on user-defined criteria. The resulting databases constitute all genes in the Santa Cruz database and the positions for all TFBS provided by the user as position weight matrices. These databases are then used for two purposes, to identify significant TFBS in the promoters in sets of genes and to identify clusters of co-occurring TFBS. We use two criteria for significance, significantly enriched TFBS in terms of total number of binding sites for the promoters, and significantly present TFBS in terms of the fraction of promoters with binding sites. Significant TFBS are identified by a re-sampling procedure in which the query gene set is compared with typically 10(5) gene lists of similar size randomly drawn from the TFBS/promoter database. We apply this strategy to a large number of published ChIP-Chip data sets and show that the proposed approach faithfully reproduces ChIP-Chip results. The strategy also identifies relevant TFBS when analyzing gene signatures obtained from the MSigDB database. In addition, we show that several TFBS are highly correlated and that co-occurring TFBS define functionally related sets of genes. CONCLUSIONS: The presented approach of promoter analysis faithfully reproduces the results from several ChIP-Chip and MigDB derived gene sets and hence may prove to be an important method in the analysis of gene signatures obtained through ChIP-Chip or global gene expression experiments. We show that TFBS are organized in clusters of co-occurring TFBS that together define highly coherent sets of genes.},
  articleno    = {145},
  author       = {Veerla, Srinivas and Ringnér, Markus and Höglund, Mattias},
  issn         = {1471-2164},
  language     = {eng},
  publisher    = {BioMed Central},
  series       = {BMC Genomics},
  title        = {Genome-wide transcription factor binding site/promoter databases for the analysis of gene sets and co-occurrence of transcription factor binding motifs.},
  url          = {http://dx.doi.org/10.1186/1471-2164-11-145},
  volume       = {11},
  year         = {2010},
}