Inorder to execute the C implementation of the SURE algorithm


Input File format is as follows:
Samples=     (the number of samples)
Modalities=  (the number of modalities)
Rank=        (rank of eigenspace)
Clusters=    (the number of clusters in the data set) 
File1= (Filename for Modality1)      Features= (Number of features in Modality1)	logtransform= (1 for log transformation 0 for if not)
File2= (Filename for Modality2)      Features= (Number of features in Modality2)	logtransform= (1 for log transformation 0 for if not)
File3= (Filename for Modality3)      Features= (Number of features in Modality3)	logtransform= (1 for log transformation 0 for if not)
File4= (Filename for Modality4)      Features= (Number of features in Modality4)	logtransform= (1 for log transformation 0 for if not)


Example Input File for GBM Dataset:
Samples= 168
Modalities= 3
Cluster/Rank= 4
File1= DataSets/GBM/RNA      Features= 2000 	    logtransform= 0
File2= DataSets/GBM/miRNA    Features= 534	        logtransform= 0
File3= DataSets/GBM/CNV      Features= 2000	        logtransform= 0


Example Input File for CESC Dataset:
Samples= 124
Modalities= 4
Cluster/Rank= 3
File1= DataSets/CESC/mDNA      Features= 2000	logtransform= 0
File2= DataSets/CESC/RNA     Features= 2000	logtransform= 1
File3= DataSets/CESC/miRNA    Features= 311	    logtransform= 1
File4= DataSets/CESC/Protein  Features= 219	    logtransform= 0




execute C version of code by 
$ gcc SURE.c -lm -llapack
$ ./a.out GBM			  (Provide the input file name as command line argument, this command executes SURE on GBM Data set)

$ /a.out CESC             (Executes SURE on CESC data set)
$ /a.out KIDNEY           (Executes KIDNEY on CESC data set)



Components of the Joint eigenspace are written to the following files: 
JointU.txt - contains (n x r) joint left subspace, where n is the number of samples in the data set and r is  the rank of the joint subspace.
JointS.txt - contains (r x r) diagonal matrix of singular values.
JointV.txt - contains (d x r) joint right subspace, where d is the total number of features in the integrated data.
The principal components of the integrated data matrix can be obtained by multiplying the (n x r) and the (r x r) matrices in JointU and JointS
The joint Principal Components are written to file JointPCs.txt
K-means clustering can then be perfomed on the principal components.




R demo code for GBM data set is given in GBMxample.R file, for the CESC data set is given is CESCExample.R
To execute R implmentation of the SURE algorithm for the GBM data set, in terminal execute

$Rscript GBMExample.R


To execute R implmentation of the SURE algorithm on CESC and KIDNEY data sets, in terminal execute

$Rscript CESCExample.R
$Rscript KIDNEYExample.R



File SURE.R contains the R implementation of the proposed method as a function SURE. Details of the fuctions is as follows:

Function Name: SURE

Usage 
SURE(Data, mod)


Arguments
Data:  A list object containing M data matrices representing M different omic data types measured in a set of n samples. 
For each matrix, the rows represent samples, and the columns represent genomic features.
rank: The rank of the individual and joint eigenspaces.
K: The number of clusters in the data set.
mod: A string array of names of the modalities. Required for modality selection.
example: mod=c("RNA","miRNA","CNV")

Example:

Data<-list()
Data[[1]] <- as.matrix(read.table("DataSets/GBM/RNA", sep=" ",header=TRUE,row.names=1))
Data[[2]] <- as.matrix(read.table("DataSets/GBM/miRNA", sep=" ",header=TRUE,row.names=1))
Data[[3]] <- as.matrix(read.table("DataSets/GBM/CNV", sep=" ",header=TRUE,row.names=1))
K=4
modname=c("RNA","miRNA","CNV")
source("SURE.R")
out=SURE(Data,rank=K,K=K,modname=modname)




For CESC Data set, Log Transform RNA and miRNA modalities before execution of SURE Algorithm.
Example:

DataSet="CESC"
n=124
K=3
rank=K
Data<-list()
Data[[1]] <- as.matrix(read.table(paste0("DataSets/",DataSet,"/mDNA"), sep=" ",header=TRUE,row.names=1))
Data[[2]] <- as.matrix(read.table(paste0("DataSets/",DataSet,"/RNA"), sep=" ",header=TRUE,row.names=1))
Data[[3]] <- as.matrix(read.table(paste0("DataSets/",DataSet,"/miRNA"), sep=" ",header=TRUE,row.names=1))
Data[[4]] <- as.matrix(read.table(paste0("DataSets/",DataSet,"/Protein"), sep=" ",header=TRUE,row.names=1))
modname=c("DNA","GEN","MIR","PRO")
#Log Transform of Sequence based Gene and miRNA modality
LogData=Data
LogData[[2]][LogData[[2]]==0]=1
LogData[[2]]=log(LogData[[2]],base=10)
LogData[[3]][LogData[[3]]==0]=1
LogData[[3]]=log(LogData[[3]],base=10)
source("SURE.R")
out=SURE(LogData,rank=rank,K=K,modname=modname)

