By Brian Moyer
The National Biotechnology Conference (NBC) Program Committee has read the writing on the wall and seen big letters spelling “Big Data” as a core element that must be covered within the context of biotechnology in pharma. The committee has seen the literature trends on novel ways to utilize data (e.g., next generation sequencing in functional genomics), as well as innovative ways to handle, archive, amass, and distribute data that are, in many respects, nontraditional. That is, we have come to see data as a part of a staged structured handling of output from a planned and quality-executed study where the output can be combined with ordered (structured/formatted) data sources. These data sources are passed through a planned statistical or managed manipulation to attain a goal of a decision or a solution to a question or hypothesis. In most cases, we view all data passing into this analysis as valuable, but in truth, data has both elements of useful and confounding. However, what about utilizing other data, data not formally collected, not structured as desired, or parenthetical to the objectives of a study? The inclusion of such is a new term called “big data.”
Recent figures estimate that in 2015, the number of “big data” jobs will grow to 4.4 million, with 1.9 million of these jobs in the United States. Many of these jobs will be directly related to biotechnology applications and platform development. Scientists will be responsible for building intricate algorithms and crafting stories out of the massive amounts of data that today’s pharmaceutical companies are generating every day. Data scientists will be challenged to connect the dots using enormous data arrays and uniquely structured data using cross-platform experiences, and to optimize platform outcomes to satisfy customer interests. The big data approaches will allow one to build predictive analytics capabilities that can foster new creative product designs, models of disease with interpretations of statistical analyses that will generate new pharmaceutical ventures and products, and ultimately create solutions to many of the deadly diseases we still face.
Big data methodologies will provide new opportunities for estimating complex correlations among a large number of variables. Big data reaches beyond the data generated in a single study or even a group of studies and is, instead, a cloud approach where data is aligned to your question and has sources well beyond the operation of your unique study. It is in fact a pool of data that can help define your needs from the position of an open source. Traditionally we operate from our own data, which we call a “structured data source,” and we pass it through our storage mechanisms and selected software to organize, quality check, and authorize its entry into a statistical or other analytical tool for generation of a solution/answer/next step/decision. Big data approaches take the organized data, integrate the data set with an expanded set of related data through cloud computing, and with this larger set of data and, depending on the specifics of the data additions, either add or reduce statistical power to the outcome.
Big data is not simply more data. What the concept allows is for the user to employ analytical tools that can do operational reporting, ad hoc queries, predictions, descriptive analytics, and data visualization (aka, imaging) and produce a more realistic statistical picture of an outcome. A key element in the utility of big data is that the input structure of the data need not be structured but can have a commodity appearance that is recognized by the file system as applicable. Application analytical tools (Apps) are built to use the unstructured (non-matched, different format, etc.) along with structured data sets. Using this approach, we can process large volumes of data, whether structured and unstructured, in parallel and across multiple analytical platforms allowing a process that facilitates the use and control as a fault-tolerant system. The allowance of error, the recognition of error, and the control of error relies upon the interpretation of the output pattern and logical allowance for the selection of “truth” outcomes.
Big data is now in all sectors of biotechnology. The new extended symposium The Utility of “Big Data” in the Creation of a Modern Pharmaceutical Industry at the 2015 NBC will highlight the platforms for big data (what is “public data,” high performance databases, cloud computing systems, personal genomics, and what we can do with it), and then finish with the applications of big data in the new era of biotechnology (microbiome, pharmacogenomics, circulating DNA, cloud genomics, etc.).
The advent of the big data era is not unlike that of the advent of the PCR era some 20+ years ago. What we can achieve using these novel databases and data handling as well as the innovations in analytical approaches is set to bring another technological revolution to biotechnology. Are we all on board for this new tool in pharma or do we sit back and watch? Attend the extended symposia on big data at the 2015 National Biotechnology Conference on Monday June 8, 8 am–1 pm, and see the future.