Presentation on Big Data
Big Data: Yet Another Buzzword or Actual Big Deal?
BRIITE Semi-annual Meeting, 11 - 13 Dec, 2013. Salk Institute for Biological Studies, La Jolla, CA
The invention of digital information storage technology is allowing the amount of stored information on Earth to grow at an incredible rate. In 1986, 99% of stored information was in analog form; today 99.9% is digital. When n is approaching all, that is, when it is possible to access data on all relevant subjects, analyses change in both quantitative and qualitative ways. If "de-identified" data sets can be joined with other data sets, de-identification becomes impossible.
Robbins, RJ 2016. Big Data: Yet Another Buzzword or Actual Big Deal? BRIITE Semi-annual Meeting, 11 - 13 Dec, 2013. Salk Institute for Biological Studies, La Jolla, CA
In the early 1990's, Robert Robbins was a faculty member at Johns Hopkins, where he directed the informatics core of GDB — the human gene-mapping database of the international human genome project. To share papers with colleagues around the world, he set up a small paper-sharing section on his personal web page. This small project evolved into The Electronic Scholarly Publishing Project.
In 1995, Robbins became the VP/IT of the Fred Hutchinson Cancer Research Center in Seattle, WA. Soon after arriving in Seattle, Robbins secured funding, through the ELSI component of the US Human Genome Project, to create the original ESP.ORG web site, with the formal goal of providing free, world-wide access to the literature of classical genetics.
Although the methods of molecular biology can seem almost magical to the uninitiated, the original techniques of classical genetics are readily appreciated by one and all: cross individuals that differ in some inherited trait, collect all of the progeny, score their attributes, and propose mechanisms to explain the patterns of inheritance observed.
In reading the early works of classical genetics, one is drawn, almost inexorably, into ever more complex models, until molecular explanations begin to seem both necessary and natural. At that point, the tools for understanding genome research are at hand. Assisting readers reach this point was the original goal of The Electronic Scholarly Publishing Project.
Usage of the site grew rapidly and has remained high. Faculty began to use the site for their assigned readings. Other on-line publishers, ranging from The New York Times to Nature referenced ESP materials in their own publications. Nobel laureates (e.g., Joshua Lederberg) regularly used the site and even wrote to suggest changes and improvements.
When the site began, no journals were making their early content available in digital format. As a result, ESP was obliged to digitize classic literature before it could be made available. For many important papers — such as Mendel's original paper or the first genetic map — ESP had to produce entirely new typeset versions of the works, if they were to be available in a high-quality format.
Early support from the DOE component of the Human Genome Project was critically important for getting the ESP project on a firm foundation. Since that funding ended (nearly 20 years ago), the project has been operated as a purely volunteer effort. Anyone wishing to assist in these efforts should send an email to Robbins.
With the development of methods for adding typeset side notes to PDF files, the ESP project now plans to add annotated versions of some classical papers to its holdings. We also plan to add new reference and pedagogical material. We have already started providing regularly updated, comprehensive bibliographies to the ESP.ORG site.
ESP Picks from Around the Web (updated 07 JUL 2018 )
Science Policy & Funding
Big Data & Informatics