A Weighted Power Framework for Integrating Multi-Source Information: Gene Function Prediction in Yeast
Welcome to the page for unclassified gene function prediction in Saccharomyces Cerevisiae. The detailed method is decribed in
The “Similarity matrix (nonlinearset.mat) ” for Weighted Power Biological Score (WPBS) using Matlab. The file "nonlinearset.mat" will be loaded as similarity matrix variable "set" in Matlab.
The “k-medoids” function for Matlab. This function can be operated on any similarity matrix.
“MIPS Functional Annotations for Saccharomyces Cerevisiae Genes (yeastallvector.mat) ”: This file will create the variable "vector" in Matlab. This variable basically provides a matrix for all Saccharomyces Cerevisiae genes (rows) and all functional categories for each gene in MIPS (columns). If a gene belongs to a particular function, then the respective cell of the matrix is asigned 1 and otherwise 0.
“Names of MIPS Functional Annotations (categoryname.mat)”: This file will create the variable "categoryname" in Matlab. This variable provides character arrays with full names of each functional categories in MIPS, corresponding to each column position in variable "vector".
“Gene Names (yeastallgeneclass.mat) ” and “Gene Index (yeastallclassindex.mat) ” : These files will create the variables "geneclass" and "classindex", repectively, in Matlab. "geneclass" provides a single character array with names of all Saccharomyces Cerevisiae genes and "classindex" provides the location of genenames in "geneclass".
“Pseudo and dubious ORF indices (pgene.txt)”. These indices are used to computationally remove pseudogenes and dubious ORFs from prediction results. A full description of all Saccharomyces cerevisiae ORFs is available at ftp://ftp.yeastgenome.org/sequence/S288C_reference/orf_dna/
“Sample Cluster Centers (clustercenter.mat)” for k-medoids algorithm in Matlab language. These cluster centers are used to report predicted functions of genes in the journal article. To run the main program with a different number of clusters (with randomly chosen cluster centers) one has to change the value of 'k' at line 4 of the main program. Similarly, one has to change the pvalue at line 2, of the main program, to get clusters with a different statistical significance.
“Main Program (unclassifiedprediction.m)” for gene function prediction using Matlab 7.11.0 and 4 GB RAM. Before running this program one has to keep all other downloadable files, mentioned in this page, in the same folder of the "Main Program". The main program will return gene clusters and predicted functions of classified and unclassified genes in Microsoft Office Excel format. The predicted functions of classified genes along with their clusters will be generated in file classifiedprediction.xls. The predicted functions of unclassified genes and their corresponding clusters will be generated in file unclassifiedprediction.xls
Shubhra Sankar Ray
Machine Intelligence Unit
Indian Statistical Institute
shubhrasankar (at) yahoo.com
shubhra (at) isical.ac.in
Machine Intelligence Unit,
Center for Soft Computing Research: A National Facility,
203, B. T. Road
PIN - 700064.
This page is last modified on November 27, 2011