Tag Archives: Genomics

The Fall in Gov Funding & Rise of Privatization in Genome Databases

Government Funded Sequence Database

As the spaceshuttle program comes to an end we are reminded of the role of goverments in birthing industries. And just like the manned space program, genomics has been mostly government funded and just like the space program it’s about to take a big hit:

Recently, NCBI announced that due to budget constraints, it would be discontinuing its Sequence Read Archive (SRA) and Trace Archive repositories for high-throughput sequence data. However, NIH has since committed interim funding for SRA in its current form until October 1, 2011.

With its fall there will be few if any (Japan: DDBJ & Europe: ERA) centralized public databases for nextgen sequencing. Once again we’re left to ride with the ruskies, figuratively speaking. Enter private industry and its first batter, from the SF Bay Area, DNA Nexus. Though Sundquist and his team have managed to create a very well polished and modern platform, unlike SRA there is no data aggregation. There is no public pool where researchers can access data. This is a problem in that much of the power of genomics comes from studying statistical trends and a large, public data pool is to date the best way to make any sense of what our genes say.

A similar private effort from the great innovators across the ocean comes in the form of Simbiot by Japan Bioinformatics K.K. At the moment Simbiot is edging a lead as they’ve recently released two mobile applications allowing sequence data management and analysis on the go. However, just as with DNA Nexus users are only given the option to keep their data within their own accounts or share with select others. Both of the aforementioned companies have well thought-out price plans, sleek interfaces and well produced videos. But what makes government efforts like the SRA valuable is that for a time they provided a centralized public data pool.

Said Infamous Graph

As anyone who’s seen the now infamous graph of the rate of decrease in sequencing costs vs that of Moore’s law will likely have figured out by now, the costs associated with maintaining a sequencing database only increases with adoption of the technology. As such, it was reasonable for Uncle Sam to pay for this party at first but the costs rise every year, by leaps. There must be a private model that is both aggregate & open in nature but can also pull it’s own weight in terms of cost; the future of healthcare and any form of “genomics industry” may well be dependant on it.

2 Comments

Filed under Genomics

Biotech for Hackers: Computational Genomics 1 of 2

A low hurdle to entry along with the ability to iterate rapidly is key to taking on problems & creating solutions. What do these solutions look like in genomics and why can hackers lead the way? Fig 1 shows something very similar to social interaction maps one comes across at places like Facebook.

Fig 1: Interaction map of genes implicated in Alzheimer's. Genes were grouped by those that have similar functions (squares) and those with different functions (circles). Modules with a red border have high confidence interactions. While the weight of the connecting green lines corresponds to the number of interactions between two sets.

The map above is of individual gene relationships where an algorithm began with 12 seed genes that previous experiments have shown to play a role in Alzheimer’s disease. These seeds were compared with 185 new candidate genes from regions deemed susceptible to carrying Alzheimer’s genes. From here, both experimental and computational data was combined to generate Fig 1, which the authors dubbed AD-PIN (Alzheimer’s Disease Protein Interaction Network).

Fig 2: Interactions discovered by the Hig-Confidence (HC) set generated by this study in context to known relationships in the Human Interactome (created in past studies).

What we learn by simply tracking genes already known to play a role in Alzheimer’s is the discovery of new regions of genetic code that are  also participating in the expression of related functions, in this case those being affected by the disease, such as memory. In Fig 2 we see that between seeds this algorithm produced 7 high confidence interaction results, of which 3 were  in common with previous studies. In addition to almost 200 new interactions, which can each lead to new therapies, blockbuster drugs and better understanding of the disease itself.

Many software developers have extensive experience and interest in dealing with large data sets, finding correlations  and creating meaningful solutions. However, much of our generation has had little exposure to these problems. Often resulting in the bandwagon effect, as one recent article put it “the latest fucking location fucking based fucking mobile fucking app.” Progress has often been linked to literacy, from books to programming, being able to read and write in life-code just might be the next stage.

Original published study: Interactome mapping suggests new mechanistic details underlying Alzheimer’s disease by Soler-Lopez et al.

1 Comment

Filed under Genomics

Library of Life: Genomic Databases & Browsers

DNA at it’s heart is enormous chunks of information. The genome of an organism like  yeast, mice or humans contains an ocean of data. Currently there are several on-line genomic databases, a great example being SGD dedicated to the yeast S. cerevisiae. SGD has become a necessary tool for life-scientist over the past 10  years but at the same time has not kept up with information technology, resulting in a platform which works like a 10 year old website.

SGD is clunky but necessary, for now

Above we see a typical SGD search, it takes  5 windows to arrive at the sequence data of 1 gene. Nevertheless, SGD is used by drug companies trying to find the next big hit, academic labs trying to cure cancer and field biologists studying wildlife.

DNA is extracted and placed through a sequencing machine which spits out the information into a computer file.  Just as having an aged internet browser affects our productivity the browser one uses to view these files can have a large impact. Following the web-browser analogy we take a look at 3 different sequence browsers, starting with Vector NTI.

Vector NTI is enterprise software.

Vector NTI is well established and often bundled with hardware. It has many features but can often seem like information overload, causing most users to stumble through it’s many menus and windows. A step up in usability comes from the third-party software suite Sequencher, popular amongst mac users.

Sequencher is your friend

Sequencher strikes a healthy balance between features and usability. But is a fairly resource intensive program requiring CDs and hard drive space to store local algorithms. However, the most up to date browser is likely to be the free and light download, 4Peaks.

4Peaks Simplicity & Usability

4Peaks allows the user to go in, read their sequence file and get out. What it lacks in features it makes up for in simplicity. The end result of any software or database is to help researchers wade through all this information and continue their studies. In this environment services such as GENEART offers to perform much of the genomic related leg work on a given project.

These are all tools, the databases, browsers and services, which enable researchers to answer the questions that line our horizon. The progress of our tools has always directly correlated with our advancement, the life sciences adoption of information technology is a necessity as we discover so much of life is condensed data in every nook.

1 Comment

Filed under Genomics, Microbiology, PCR