The projection score  an evaluation criterion for variable subset selection in PCA visualization
(2011) In BMC Bioinformatics 12. Abstract
 Background
In many scientific domains, it is becoming increasingly common to collect highdimensional data sets, often with an exploratory aim, to generate new and relevant hypotheses. The exploratory perspective often makes statistically guided visualization methods, such as Principal Component Analysis (PCA), the methods of choice. However, the clarity of the obtained visualizations, and thereby the potential to use them to formulate relevant hypotheses, may be confounded by the presence of the many noninformative variables. For microarray data, more easily interpretable visualizations are often obtained by filtering the variable set, for example by removing the variables with the smallest variances or by only including the... (More)  Background
In many scientific domains, it is becoming increasingly common to collect highdimensional data sets, often with an exploratory aim, to generate new and relevant hypotheses. The exploratory perspective often makes statistically guided visualization methods, such as Principal Component Analysis (PCA), the methods of choice. However, the clarity of the obtained visualizations, and thereby the potential to use them to formulate relevant hypotheses, may be confounded by the presence of the many noninformative variables. For microarray data, more easily interpretable visualizations are often obtained by filtering the variable set, for example by removing the variables with the smallest variances or by only including the variables most highly related to a specific response. The resulting visualization may depend heavily on the inclusion criterion, that is, effectively the number of retained variables. To our knowledge, there exists no objective method for determining the optimal inclusion criterion in the context of visualization.
Results
We present the projection score, which is a straightforward, intuitively appealing measure of the informativeness of a variable subset with respect to PCA visualization. This measure can be universally applied to find suitable inclusion criteria for any type of variable filtering. We apply the presented measure to find optimal variable subsets for different filtering methods in both microarray data sets and synthetic data sets. We note also that the projection score can be applied in general contexts, to compare the informativeness of any variable subsets with respect to visualization by PCA.
Conclusions
We conclude that the projection score provides an easily interpretable and universally applicable measure of the informativeness of a variable subset with respect to visualization by PCA, that can be used to systematically find the most interpretable PCA visualization in practical exploratory analysis. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/record/2060581
 author
 Fontes, Magnus ^{LU} and Soneson, Charlotte ^{LU}
 organization
 publishing date
 2011
 type
 Contribution to journal
 publication status
 published
 subject
 in
 BMC Bioinformatics
 volume
 12
 publisher
 BioMed Central
 external identifiers

 wos:000294558200001
 scopus:79960655875
 ISSN
 14712105
 DOI
 10.1186/1471210512307
 language
 English
 LU publication?
 yes
 id
 96a6c4d3861d4088b36287c44ff7dc9d (old id 2060581)
 alternative location
 http://www.biomedcentral.com/14712105/12/307
 date added to LUP
 20110823 17:07:02
 date last changed
 20170101 06:00:23
@article{96a6c4d3861d4088b36287c44ff7dc9d, abstract = {Background<br/><br> In many scientific domains, it is becoming increasingly common to collect highdimensional data sets, often with an exploratory aim, to generate new and relevant hypotheses. The exploratory perspective often makes statistically guided visualization methods, such as Principal Component Analysis (PCA), the methods of choice. However, the clarity of the obtained visualizations, and thereby the potential to use them to formulate relevant hypotheses, may be confounded by the presence of the many noninformative variables. For microarray data, more easily interpretable visualizations are often obtained by filtering the variable set, for example by removing the variables with the smallest variances or by only including the variables most highly related to a specific response. The resulting visualization may depend heavily on the inclusion criterion, that is, effectively the number of retained variables. To our knowledge, there exists no objective method for determining the optimal inclusion criterion in the context of visualization.<br/><br> <br/><br> Results<br/><br> We present the projection score, which is a straightforward, intuitively appealing measure of the informativeness of a variable subset with respect to PCA visualization. This measure can be universally applied to find suitable inclusion criteria for any type of variable filtering. We apply the presented measure to find optimal variable subsets for different filtering methods in both microarray data sets and synthetic data sets. We note also that the projection score can be applied in general contexts, to compare the informativeness of any variable subsets with respect to visualization by PCA.<br/><br> <br/><br> Conclusions<br/><br> We conclude that the projection score provides an easily interpretable and universally applicable measure of the informativeness of a variable subset with respect to visualization by PCA, that can be used to systematically find the most interpretable PCA visualization in practical exploratory analysis.}, articleno = {307}, author = {Fontes, Magnus and Soneson, Charlotte}, issn = {14712105}, language = {eng}, publisher = {BioMed Central}, series = {BMC Bioinformatics}, title = {The projection score  an evaluation criterion for variable subset selection in PCA visualization}, url = {http://dx.doi.org/10.1186/1471210512307}, volume = {12}, year = {2011}, }