Workshop on Big Data in Bioinformatics

Background

Various types of data related to sequence (DNA, RNA and protein), pathways, reaction parameters, gene expression, gene ontology are being generated through different high throughput technologies in the field of molecular biology. In general, these data are complex and are existing in different nature including structured, unstructured and semi structured. The amount of these data is huge and has led to the most discussed current trend Big Data in bioinformatics. Different people think of different things when they hear about Big Data. For statisticians, the challenge is to get various statistical parameters and thereby to draw inference on these data. The computer and information science people wish to extract usable information out of databases that are so huge and complex that many of the traditional or classical methods cannot handle. Thus, Big Data Analytics would be helpful to analyze huge, complex and heterogeneous data and provide the insight in timely manner. The most importantly Big Data analytics provides cost effective solutions to deliver information effectively.

The workshop on Big Data in Bioinformatics is devoted to promoting the highest standards in research and education in this field in the area of bioinformatics. It features keynote thematic presentations by recognized leaders and luminaries who have significantly advanced the domain. It also hosts an educational session where students can interact with established researchers and principal investigators.

Importance

The volume of data is growing fast in bioinformatics research. To cope up with its scale, diversity, and complexity Big Data requires new architecture, techniques and algorithms. It also requires analytics to manage it and extract value and hidden knowledge from it. In other words, big data are characterised by volume, variety (structured and unstructured data), velocity (high rate of changing), veracity (biases, noise and abnormality), validity (correctness and accuracy of data), volatility (how long data is valid and how long it should be stored), value (source of value to those who can deal with its scale and unlock the knowledge within) and visualization (transform the scale of it into something easily comprehended and actionable). The traditional definition of big data does not cover two most important characteristics which separate big data from traditional databases and data warehouses. First, big data are incremental, i.e., from time to time new data are dynamically added to the big data lake. Second, big data are geographically distributed. Big data sources are no longer limited to particle physics experiments or search engine logs and indexes. With digitization of various processes and availability of high throughput devices at lower costs, data volume is rising everywhere, including in bioinformatics research. Advances in next generation sequencing technologies has resulted in the generation of unprecedented levels of sequence data. Thus modern biology now presents new challenges in terms of data management, query and analysis. Human DNA comprises approximately 3 billion base pairs with a personal genome representing approximately 100 gigabytes (GB) of data.

Due to this high availability of information intensive data stream and the advances in high performance computing technologies, big data analytics have emerged to perform real time descriptive and predictive analyses on massive amount of biological data, in order to formulate intelligent informed decisions and make biology a predictive science.