Finding Optimum Width of Discretization for Gene Expressions using Functional Annotations
Authors:Sampa Misra and Shubhra Sankar Ray
Welcome to the page for finding optimum width of discretization for gene expression data using functional annotations of genes.
The steps for finding optimum width of discretization (using Matlab and 16GB RAM) are as follows:
·
Download yeastallgoprocessvector.mat. When loaded in Matlab the file will create a variable "vector" which will provide annotation profiles for Saccharomyces cerevisiae genes using yeast GO-Slim process annotations in SGD.
·
Download the expression values Expression.xls for Cell Cycle All Yeast (CCAY) data set.
·
Download the Main Program (OptimumWidthofDiscretization.m) for finding optimum width of discretization. Before running this program one has to keep all other downloadable files, mentioned above, in the same folder of the Main Program. The main program will return the optimum width of discretization.
For gene annotations one can use the existing yeastallgoprocessvector.mat or one can create the new/latest annotation profiles of genes as follows:
·
Upload gene names genes.txt to the website SGD GO Slim Mapper, select GO Slim terms as 'Yeast GO-Slim: Process', click 'SELECT ALL Terms' from GO Slim Terms, and download the result file. Delete the last two rows if categories are 'others' & 'not yet decided' and replace the 'commas' with 'Tab' in the result file and save it.
·
Divide the tab delimited result file in two files, one containing Go-Slim terms and named as Go-ProcessCategory.txt and other one containg only genes belonging to different categories and named as yeast_GOslimprocessGene.xls.
·
Download GenetoORF.txt file and the program genetoORFmapping.m to convert gene names (in yeast_GOslimprocessGene.xls) to their corresponding ORFs. The program will create a file yeast_GOslimprocessORF.txt in tab delimated format to be used in the next step.
·
Run the program vectorconstruction.m . The program will use genes.txt and yeast_GOslimprocessORF.txt to construct yeastallgoprocessvector.mat .
The steps to construct GenetoORF.txt are as follows:
1. In the website SGD_YeastMine, paste the genes name in Analyse section and click Analyse.
2. Export the file as (filename).tsv format.
3. Copy the columns representing genes and ORFs only and save the file as GenetoORF.txt (this GenetoORF.txt file shows the format for saving).
4. Replace ' ' with N.