SHDH45@Google: simple BLAST+ setup

Super Happy Dev House pays off once again, with a day that blurred the line between play and work. Reminiscent of the days in middle and high school where my parents would provide tables, power cords and snacks for those all night LAN parties. Except now Google was the host & playing games still ranked high on the agenda.

The National Center for Biotechnology Information (NCBI) provides a command line based standalone Basic Local Alignment Search Tool (BLAST) package known as BLAST+ to analyze and play with genomic sequence data. Although, the legacy web based BLAST can perform a range of functions, BLAST+ as a command line tool is much better to understand and analyze large amounts of nucleotide data. It may be best to get an idea of what sort of data we’re dealing with by getting into the government’s database:

mokas$ ftp
Connected to
 Warning Notice!

 This is a U.S. Government computer system, which may be accessed and used
 only for authorized Government business by authorized personnel.
 Unauthorized access or use of this computer system may subject violators to
 criminal, civil, and/or administrative action.

 All information on this computer system may be intercepted, recorded, read,
 copied... There is no right of privacy in this system.

Don’t worry about the scary message, this is all public data… well until the funding stops. Take a look in the blast/db directory for many pre-formatted databases NCBI has provided, i.e. genomic & protein reference sequences, patent nucleotide sequence databases from USPTO & EU/Japan Patent Agencies. Get yourself the latest BLAST+ from blast/executables/LATEST , I used ncbi-blast-2.2.25+-universal-macosx.tar.gz .


mokas$ tar zxvpf ncbi-blast-2.2.25+-universal-macosx.tar.gz 
mokas$ PATH=/Users/mokas/Desktop/ncbi-blast-2.2.25+/bin
mokas$ export PATH
mokas$ echo $PATH
mokas$ mkdir ./blast-2.2.25+/db
mokas$ blastn -help
  blastn [-h] [-help] [-import_search_strategy filename]

Databases should be loaded directly into /db directory created above with the mkdir command. The last thing that needs to be done is to make a “.ncbirc” text file in the main directory containing the following:


This will guide the program to where data is being kept. At the end of the day we should hope to get something like this:

mokas$ blastn -query Homo_sapiens.NCBI36.apr.rna.fa -db refseq_rna
BLASTN 2.2.25+
Query=  ENST00000361359 ncrna:Mt_rRNA chromosome:NCBI36:MT:650:1603:1
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

ref|XR_109154.1|  PREDICTED: Homo sapiens hypothetical LOC1005054...   464    5e-128

>ref|XR_109154.1| PREDICTED: Homo sapiens hypothetical LOC100505479 (LOC100505479),
partial miscRNA

 Score =  464 bits (251),  Expect = 5e-128
 Identities = 255/257 (99%), Gaps = 0/257 (0%)

            |||||||||||||||||||||||||||| |||||||| ||||||||||||||||||||||

BLAST+ in action

Much thanks are in order to Dr. Tao Tao of NCBI, all the great folks who showed up, hung out and helped out. To Google for the food and drinks (no beer?!) and for everyone on the SHDH team who scrambled all this together, which I’m told is par for the course. Hopefully this will be a fun tool for folks not well acquainted with genomics/programming to sandbox and explore in. #funsaturday

Citations: Standalone BLAST Setup for Unix – BLAST® Help – NCBI Bookshelf

Leave a comment

Filed under Genomics

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s