Tag Archives: DNA

SHDH45@Google: simple BLAST+ setup

Super Happy Dev House pays off once again, with a day that blurred the line between play and work. Reminiscent of the days in middle and high school where my parents would provide tables, power cords and snacks for those all night LAN parties. Except now Google was the host & playing games still ranked high on the agenda.

The National Center for Biotechnology Information (NCBI) provides a command line based standalone Basic Local Alignment Search Tool (BLAST) package known as BLAST+ to analyze and play with genomic sequence data. Although, the legacy web based BLAST can perform a range of functions, BLAST+ as a command line tool is much better to understand and analyze large amounts of nucleotide data. It may be best to get an idea of what sort of data we’re dealing with by getting into the government’s database:

mokas$ ftp ftp.ncbi.nlm.nih.gov
Connected to ftp.wip.ncbi.nlm.nih.gov.
220-
 Warning Notice!

 This is a U.S. Government computer system, which may be accessed and used
 only for authorized Government business by authorized personnel.
 Unauthorized access or use of this computer system may subject violators to
 criminal, civil, and/or administrative action.

 All information on this computer system may be intercepted, recorded, read,
 copied... There is no right of privacy in this system.

Don’t worry about the scary message, this is all public data… well until the funding stops. Take a look in the blast/db directory for many pre-formatted databases NCBI has provided, i.e. genomic & protein reference sequences, patent nucleotide sequence databases from USPTO & EU/Japan Patent Agencies. Get yourself the latest BLAST+ from blast/executables/LATEST , I used ncbi-blast-2.2.25+-universal-macosx.tar.gz .

Installation:

mokas$ tar zxvpf ncbi-blast-2.2.25+-universal-macosx.tar.gz 
mokas$ PATH=/Users/mokas/Desktop/ncbi-blast-2.2.25+/bin
mokas$ export PATH
mokas$ echo $PATH
/Users/mokas/Desktop/ncbi-blast-2.2.25+/bin
mokas$ mkdir ./blast-2.2.25+/db
mokas$ blastn -help
USAGE
  blastn [-h] [-help] [-import_search_strategy filename]
...

Databases should be loaded directly into /db directory created above with the mkdir command. The last thing that needs to be done is to make a “.ncbirc” text file in the main directory containing the following:

[BLAST]
BLASTDB=/Users/mokas/Desktop/ncbi-blast-2.2.25+/db

This will guide the program to where data is being kept. At the end of the day we should hope to get something like this:

mokas$ blastn -query Homo_sapiens.NCBI36.apr.rna.fa -db refseq_rna
BLASTN 2.2.25+
...
Query=  ENST00000361359 ncrna:Mt_rRNA chromosome:NCBI36:MT:650:1603:1
gene:ENSG00000198714
Length=954
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

ref|XR_109154.1|  PREDICTED: Homo sapiens hypothetical LOC1005054...   464    5e-128

>ref|XR_109154.1| PREDICTED: Homo sapiens hypothetical LOC100505479 (LOC100505479),
partial miscRNA
Length=266

 Score =  464 bits (251),  Expect = 5e-128
 Identities = 255/257 (99%), Gaps = 0/257 (0%)
 Strand=Plus/Minus

Query  334  CACCTGAGTTGTAAAAAACTCCAGTTGACACAAAATAGACTACGAAAGTGGCTTTAACAT  393
            |||||||||||||||||||||||||||| |||||||| ||||||||||||||||||||||
Sbjct  257  CACCTGAGTTGTAAAAAACTCCAGTTGATACAAAATAAACTACGAAAGTGGCTTTAACAT  198

BLAST+ in action

Much thanks are in order to Dr. Tao Tao of NCBI, all the great folks who showed up, hung out and helped out. To Google for the food and drinks (no beer?!) and for everyone on the SHDH team who scrambled all this together, which I’m told is par for the course. Hopefully this will be a fun tool for folks not well acquainted with genomics/programming to sandbox and explore in. #funsaturday

Citations: Standalone BLAST Setup for Unix – BLAST® Help – NCBI Bookshelf

Leave a comment

Filed under Genomics

The Polymerase Chain Reaction, A Microcosm

Creating a new life-form is an awe-inspiring experience. Writing DNA like a mere sentence and watching creation unfold in the mechanism of life is both breathtaking and humbling. None of this would be possible without the Polymerase Chain Reaction (PCR). A simple process where all the ingredients for DNA: a teaspoon of reagents, a pinch of polymerase enzyme and a handful of the “letters” that make up our genetic code are thrown into the oven, literally, well a very accurate oven that can step temperatures rather quickly. Within hours the sentence you had written out on a computer screen, is now molecules floating around in a tiny tube ready to be put into a cell, which will read the instructions and attempt to build or act accordingly. Using this simple idea the human race has been handed over the keys to the Build a Life Workshop, however this simple process often goes without scrutiny, without improvement.

Basic Principles of PCR

Much of the drug discovery in both academia and industry is now focused on protein mechanics. How does this receptor behave? What buttons turn this enzyme on and off? Focusing on protein structure and mechanism often makes PCR a boring chore that most researchers have to grudgingly get past before they can get to the interesting part. As a result, the basic process of PCR has remained the same for decades. I literally remember when a P.I. gave me a paper from 1985 to look up what settings I should use for my reaction. All this wouldn’t be a problem, except people are often wasting weeks to months trying to get the right PCR outcomes. At the root of the problem & the solution is information. PCR is a “black box” process, in that you throw all the ingredients together turn on the machine and hope that all the right molecules will bump into each other at the right times. Traditionally, it has been a exasperating trial & error based system. Now however, information technology has given a glimpse of a solution and a way to move forward to the next chapter in the development of this life-science staple.

Leave a comment

Filed under Genomics, Microbiology, PCR