A Granular Self-Organizing Map for Clustering and Gene Selection in Microarray Data                
------------------------------------------------------------------------------------
Downloads (keep all the files of Gsom and Ufrfs in the same folder):

  • Data: From raw data file, we prepare two files that contains i) only expression values (e.g., "expressionvalues.txt" for ALL & AML data) and ii) only gene names (e.g., "genenames.txt" for AML & ALL data). Sample files for raw data can be obtained from the following links: ALL & AML and Prostate cancer.

  • Gsom.c: This is the main program for GSOM. To run this program one has to do the following: i) Enter the number of patterns in the data  in line 7. ii) Enter the number of features in the data in line 8. iiiEnter the number of ROWS & COLS (2D grid) in lines 9 & 10, respectively, to select the expected number of output clusters for GSOM. iv) Enter the number of iterations for training the GSOM in line 11. v) Enter the value of learning parameter (ETA) in line 12.  

  •  Normalization.h: It is used to normalize the expression values of data within 0 to 1. This file is included in the main program in line 14.

  • Decisiovalues.h: It is used to compute fuzzy decision classes.This header file is included in the main program in line 15.

  •  Frweight.h: It is used to calculate fuzzy lower and upper approximations of a set (cluster), based on the fuzzy reflexive   relational matrix corresponding to every feature and fuzzy decision classes. This file is included in the main program line 16.
  • DBindex.h: A cluster evaluation measure.  This file is included in the main program in line 17.

  • Dunindex.h: A cluster evaluation measure. This file is included in the main program in line 18.

  • Output: The output of GSOM is saved in two files "outputclusters.txt" and "results_GSOM.txt". The former file provides the result required for the gene selection method (UFRFS) and the latter provides the same result in terms of pattern number, cluster number and clustering measures for users analysis. 
  • Ufrfs.c: This is the main program for feature (gene) selection method (UFRFS). To run this main program one has to keep "outputclusters.txt" and "genenames.txt" in same folder and do the following: i) Enter the number of patterns in the data  in line 6. ii) Enter the number of features(genes) in the data in line 7. iii) Enter the number of clusters available with the data in line 8 (see "results_GSOM.txt" for the number of clusters.  iv) Enter the value for the number of genes to be selected in line 9. 

  • Output: The results of UFRFS are saved in the file "results_UFRFS.txt".

Note: The values of the learning parameter (ETA) and the expected number of clusters in Gsom.c can be chosen as in TABLES III-V, pages 8-10 of the article.

Codes of related clustering methods used for comparison:

1. Self-organizing map: SOM.c   2. Robust  rough fuzzy c-means: RRFCM.c   3. Rough fuzzy possibilistic c-means: FRPCM.c  4. Rough possibilistic c-means: RPCM.c  5. Fuzzy c-means method: FCM.c  6. Affinity propagation method: AP method  7. Method of  c-medoids: cmediod.m 

The main programs SOM.c, RRFCM.c, FRPCM.c, RPCM.c and FCM.c include a header file, "normalization.h". For compiling the programs individually, the main program and the header file should be placed in a single folder. Here, we enter the number of patterns, features and clusters available with the data in the main program of each method. The clustering solutions of every method are saved in text file titled "output_clusters.txt". Two renowned cluster evaluation measures, DBindex.c and Dunindex.c, as examplesare provided  to evaluate the solutions.

Note: The parameters of the related clustering methods used for comparison are described in lines 5-17, Section V-B-3. The values of parameters for all the methods for different data sets can be chosen as shown in TABLES III-V, pages 8-10 of the article.

Codes of related feature (gene) selection methods used for comparison:

1. Algorithm 1: This is a gene selection method which uses Ufrfs.c (mentioned above) and the output cluster of SOM.

2. Unsupervised Fuzzy Rough Dimensionality Reduction (UFRDR): The code for this method is available with WEKA software, downloadable at http://users.aber.ac.uk/rkj/book/weka.zip . This weblink is available at http://users.aber.ac.uk/rkj/book/programs.php . 

3. Unsupervised Feature Selection Using Feature Similarity (UFRFS): The Matlab code for this algorithm is available at http://cse.iitkgp.ac.in/ pabitra/paper.html.

4. Fuzzy-Rough Mutual Information based Method (FRMIM.c): The c-code for FRMIM is downloadable at       http://www.isical.ac.in/ pmaji/important.html.

5. Correlation based Feature Selection (CFS): The code for this method is available with WEKA software, downloadable at http://users.aber.ac.uk/rkj/book/weka.zip . This weblink is available at http://users.aber.ac.uk/rkj/book/programs.php . 

Note: The parameters of all the feature selection methods are selected as in a supplementary file available at http://avatharamg.webs.com/GSOM-UFRFS.pdf .