Unsupervised feature selection through Gram-Schmidt orthogonalization-A word co-occurrence perspective
 
 
 
 作者:Wang, DQ (Wang, Deqing)[ 1 ] ; Zhang, H (Zhang, Hui)[ 1 ] ; Liu, R (Liu, Rui)[ 1 ] ; Liu, XL (Liu, Xianglong)[ 1 ] ; Wang, J (Wang, Jing)[ 2 ] 
  
 
  
 
  
   NEUROCOMPUTING
   
 
  
  
 DOI: 
  
   10.1016/j.neucom.2015.08.038
   
 
 出版年: 
  
   JAN 15 2016
   
  
 
 
  摘要
 
 
 Feature selection is a key step in many machine learning applications, such as categorization, and clustering. Especially for text data, the original document-term matrix is high-dimensional and sparse, which affects the performance of feature selection algorithms. Meanwhile, labeling training instance is time-consuming and expensive. So unsupervised feature selection algorithms have attracted more attention. In this paper, we propose an unsupervised feature selection algorithm through R andom P rojection and G ram-G chmidt O rthogonalization (RP-GSO) from the word co-occurrence matrix. The RP-GSO algorithm has three advantages: (1) it takes as input dense word co-occurrence matrix, avoiding the sparseness of original document-term matrix; (2) it selects "basis features" by Gram-Schmidt process, guaranteeing the orthogonalization of feature space; and (3) it adopts random projection to speed up GS process. Extensive experimental results show our proposed RP-GSO approach achieves better performance comparing against supervised and unsupervised feature selection methods in text classification and clustering tasks. (C) 2015 Elsevier B.V. All rights reserved.