The Fall in Gov Funding & Rise of Privatization in Genome Databases

Government Funded Sequence Database

As the spaceshuttle program comes to an end we are reminded of the role of goverments in birthing industries. And just like the manned space program, genomics has been mostly government funded and just like the space program it’s about to take a big hit:

Recently, NCBI announced that due to budget constraints, it would be discontinuing its Sequence Read Archive (SRA) and Trace Archive repositories for high-throughput sequence data. However, NIH has since committed interim funding for SRA in its current form until October 1, 2011.

With its fall there will be few if any (Japan: DDBJ & Europe: ERA) centralized public databases for nextgen sequencing. Once again we’re left to ride with the ruskies, figuratively speaking. Enter private industry and its first batter, from the SF Bay Area, DNA Nexus. Though Sundquist and his team have managed to create a very well polished and modern platform, unlike SRA there is no data aggregation. There is no public pool where researchers can access data. This is a problem in that much of the power of genomics comes from studying statistical trends and a large, public data pool is to date the best way to make any sense of what our genes say.

A similar private effort from the great innovators across the ocean comes in the form of Simbiot by Japan Bioinformatics K.K. At the moment Simbiot is edging a lead as they’ve recently released two mobile applications allowing sequence data management and analysis on the go. However, just as with DNA Nexus users are only given the option to keep their data within their own accounts or share with select others. Both of the aforementioned companies have well thought-out price plans, sleek interfaces and well produced videos. But what makes government efforts like the SRA valuable is that for a time they provided a centralized public data pool.

Said Infamous Graph

As anyone who’s seen the now infamous graph of the rate of decrease in sequencing costs vs that of Moore’s law will likely have figured out by now, the costs associated with maintaining a sequencing database only increases with adoption of the technology. As such, it was reasonable for Uncle Sam to pay for this party at first but the costs rise every year, by leaps. There must be a private model that is both aggregate & open in nature but can also pull it’s own weight in terms of cost; the future of healthcare and any form of “genomics industry” may well be dependant on it.


SHDH45@Google: simple BLAST+ setup

Super Happy Dev House pays off once again, with a day that blurred the line between play and work. Reminiscent of the days in middle and high school where my parents would provide tables, power cords and snacks for those all night LAN parties. Except now Google was the host & playing games still ranked high on the agenda.

The National Center for Biotechnology Information (NCBI) provides a command line based standalone Basic Local Alignment Search Tool (BLAST) package known as BLAST+ to analyze and play with genomic sequence data. Although, the legacy web based BLAST can perform a range of functions, BLAST+ as a command line tool is much better to understand and analyze large amounts of nucleotide data. It may be best to get an idea of what sort of data we’re dealing with by getting into the government’s database:

mokas$ ftp
Connected to
 Warning Notice!

 This is a U.S. Government computer system, which may be accessed and used
 only for authorized Government business by authorized personnel.
 Unauthorized access or use of this computer system may subject violators to
 criminal, civil, and/or administrative action.

 All information on this computer system may be intercepted, recorded, read,
 copied... There is no right of privacy in this system.

Don’t worry about the scary message, this is all public data… well until the funding stops. Take a look in the blast/db directory for many pre-formatted databases NCBI has provided, i.e. genomic & protein reference sequences, patent nucleotide sequence databases from USPTO & EU/Japan Patent Agencies. Get yourself the latest BLAST+ from blast/executables/LATEST , I used ncbi-blast-2.2.25+-universal-macosx.tar.gz .


mokas$ tar zxvpf ncbi-blast-2.2.25+-universal-macosx.tar.gz 
mokas$ PATH=/Users/mokas/Desktop/ncbi-blast-2.2.25+/bin
mokas$ export PATH
mokas$ echo $PATH
mokas$ mkdir ./blast-2.2.25+/db
mokas$ blastn -help
  blastn [-h] [-help] [-import_search_strategy filename]

Databases should be loaded directly into /db directory created above with the mkdir command. The last thing that needs to be done is to make a “.ncbirc” text file in the main directory containing the following:


This will guide the program to where data is being kept. At the end of the day we should hope to get something like this:

mokas$ blastn -query Homo_sapiens.NCBI36.apr.rna.fa -db refseq_rna
BLASTN 2.2.25+
Query=  ENST00000361359 ncrna:Mt_rRNA chromosome:NCBI36:MT:650:1603:1
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

ref|XR_109154.1|  PREDICTED: Homo sapiens hypothetical LOC1005054...   464    5e-128

>ref|XR_109154.1| PREDICTED: Homo sapiens hypothetical LOC100505479 (LOC100505479),
partial miscRNA

 Score =  464 bits (251),  Expect = 5e-128
 Identities = 255/257 (99%), Gaps = 0/257 (0%)

            |||||||||||||||||||||||||||| |||||||| ||||||||||||||||||||||

BLAST+ in action

Much thanks are in order to Dr. Tao Tao of NCBI, all the great folks who showed up, hung out and helped out. To Google for the food and drinks (no beer?!) and for everyone on the SHDH team who scrambled all this together, which I’m told is par for the course. Hopefully this will be a fun tool for folks not well acquainted with genomics/programming to sandbox and explore in. #funsaturday

Industry & Academia, part 2: Ex Mod Op

Infamous Exubera Insulin Disaster

Opening restrictions of what a guided problem is allows us to achieve greater results.  A simple rendition of this can be seen with consumers perception of what “cures” or medicine is. The lay public expects a pill to solve our problems. Which inversely effects the professionals own vision of what their goals are. Any cure that a researcher imagines is heavily influenced by what they perceive the consumer will accept, overwhelmingly a pill or vaccine.  When new forms of delivery are brought to the edge of market they are often marred internally as “untested” or the cost of implementation by an older method is brought to attention. Exubera was developed when predictions throughout the healthcare industry pointed to a diabetes epidemic, which of course we are smack in the middle of now. In that climate a non-invasive inhalable insulin seemed like it would pay its weight in gold, it didn’t.

Today, the oracles in their glass towers predict a surge in respiratory illness. Rightfully pointing to developing nations, i.e. China, India and their falling air qualities & rising numbers of healthcare consumers. Guiding research towards COPD, cystic fibrosis and others, all of which are significant causes of suffering. Chasing after the dollar often is the best method for innovation; healthcare however, has often demonstrated to be a more complex system requiring greater foresight than simply following consumers pocketbooks and wants. Adding to this are the already strict standards which government agencies apply and by so doing hinder the progress of medicine.

This often brings up the fear that the regulations were placed to keep the public safe and still to-date so many dangerous drugs make it to market every year; a moot point, in that many of the addictive, high risk drugs which make it to market are often brought about by public want. Pain-killers & anti-depressants, all poster ads for substance abuse and hollywood over-doses. Truly increasing life span and quality significantly, requires a new paradigm of for-profit research and public perception of medicine. Extinctus Modus Operandi.

Yoga, A Cognitive Exercise

In August 2009 Annals of the New York Academy of Sciences Volume 1172 was released, containing within it 31 peer-reviewed articles which attempt to “study the impact of Indo-Tibetan practices on longevity and health.” If our brain cells are rewiring themselves based on our habits and thought patterns, suddenly our hobbies and pastimes are rocketed to the center stage of mental health. What you or I do in our free time is habitual and over time can optimize our brains for said activities while allowing other regions to decay. Yoga as a physical exercise has demonstrated incredible ability to affect mental health, including but not limited to focus and emotional regulation. The following papers contained within the volume are of particular interest:

