Category Archives: Microbiology

August 27, 2024 · 11:17 pm

Where Art Meets Science: The Communities Behind Biological Visualizations

Ever since my time in a crystallography lab, I’ve been fascinated by the process of translating raw diffraction patterns into detailed structural models.

I recently won an auction for a fairly worn, but very nostalgic 80s BYTE magazine. Don’t you just **love** the hand drawn artwork?

Back then, we relied on those cool SGI (Silicon Graphics, Inc.) computers—high-performance machines that were the go-to for graphics and visualization in the late ’90s and early 2000s. These workstations were the backbone of many scientific labs, enabling us to visualize complex molecular structures in ways that were groundbreaking at the time.

Even at that time, I was captivated by the pursuit of renders that were more than just scientific models—they were almost eye candy, visually arresting representations of the intricate dance of atoms. These weren’t just static arrangements; they were the keys to understanding quantum-scale processes. I found myself wondering: What do these molecular machines really look like? Are they transparent? Gooey?

At the time, most people stuck with the ball-and-stick method of molecular rendering, and sometimes the very creative ribbon method for visualizing secondary structures like α-helices and β-sheets. This ribbon method, first introduced by Jane Richardson in the 1980s, revolutionized how we perceive and depict the elegant architecture of proteins.

I was particularly struck by a talk she gave a few years ago at the Machine Learning in Structural Biology (MLSB) workshop, which is part of the broader NeurIPS (Conference on Neural Information Processing Systems). The presentations at MLSB highlight how the intersection of machine learning and structural biology is opening up new avenues for visualizing and understanding complex biological data—building on the foundations researchers laid with creative solutions like the ribbon diagrams decades earlier. You can still view a recording of the entire Richardson talk here.

Fast forward a few years, and I find myself reminiscing about the times when Sonic Youth was the soundtrack to my late-night coding sessions, and I was drawn to ideas about being in the flow, letting go, and just being present in the moment. These days, hanging out with my wife and child brings that same sense of peace and connection. But as much as I cherish these moments, there’s something uniquely exhilarating about attending conferences, where you get to “talk shop” and be exposed to the latest innovations.

I remember the first time I saw molecular renderings by Drew Berry—it was at a VIZBI conference, a gathering that perfectly blends scientific rigor with creative visualization. VIZBI (Visualizing Biological Data) is more than just a conference; it’s an international meeting point for the best minds in science, bioinformatics, and data visualization. What makes VIZBI special is its emphasis on both the scientific and artistic aspects of data visualization.

The conference not only showcases cutting-edge visualizations that transform how life scientists view and interact with data, but it also encourages a deeper appreciation for the aesthetic quality of these visualizations. This was evident in Drew Berry’s work, which brought the molecular world to life in a way I had never seen before. The structures didn’t just sit there; they vibrated, darted around, and had this incredible “stochastic” feel to them, capturing the chaotic energy that defines molecular interactions. It was like seeing the molecules not just as static models, but as living entities, each with its own rhythm and motion. VIZBI isn’t just about keeping up with the latest research; it’s about being inspired, about seeing the boundaries of science and art blur in ways that open up new possibilities for how we understand life at its most fundamental level. It’s the kind of experience that reminds me why I got into this field in the first place.

In the last couple of years, BioVis has really stepped up its efforts to engage the community, working closely with other organizations like IEEE and ISCB to keep the spirit vibrant and interdisciplinary. BioVis has taken on the challenge of pushing forward the frontiers of biological data visualization, encouraging collaboration across fields and nurturing a community that is as diverse as it is innovative.

By bringing together visualization researchers with biologists and bioinformaticians, BioVis has managed to keep the conversation fresh and evolving, ensuring that new methods and ideas keep flowing. It’s exciting to see how these gatherings—both old and new—continue to energize the community and drive progress in understanding the complexities of life at every scale.

And hey, these are just some of the meetings out there, but they are my personal favorites. Even with incredible advancements like AlphaFold, RoseTTAFold, and ColabFold, which have made huge leaps in predicting molecular structures, there is still something uniquely thrilling about the art of representation. For me, that thrill is often fueled by the same sense of awe I get from playing video games. A good game isn’t just about throwing the newest engine at it; it’s about the aesthetics, the art, and the way it all comes together. That’s what makes good games age well—and I think it’s the same with science.

As the saying goes: sometimes, it’s less about where the path leads and more about the wonder found along the way.

Leave a comment

Filed under CS, Microbiology

Tagged as Bioinformatics, Computational Biology

November 30, 2022 · 2:11 pm

Totally Tubular: Carbon Nanotubes, Microtubules, & Computation

*^{Showing traditional carbon nanotube made only with carbon, versus microtubules found in cells made with Tubulin proteins.}*

You’ve all heard of carbon nanotubes. You know, really small tubes made of just carbon atoms; maybe they’re good for moving electrons around. And supposedly carbon nanotubes might play a role in the next type of computers, so-called quantum computers. That’s carbon nanotubes. Then we have Microtubules, those with biology backgrounds will know about them, but for everyone else these are also really small tubes, but inside the cells of our bodies. We’ve seen microtubules do things inside cells even using microscopes, things like giving the cell types their shape; like a scaffolding. Microtubules have also been seen helping cells divide by grabbing things like chromosomes to opposite sides of a cell, and also as rails for transporting packages around cells. Anyway, the main point is microtubules do lots of things, we know this, but the question is do they also do some of the things that carbon nanotubes do?

^{Looking through the microscope to see some fun things microtubules do: pull chromosomes around, give cells their shape, act as rails for transport of goods.}

Not only are microtubules hollow on the inside, say around 14nm which is similar to carbon nanotubes, they are also covered in a lattice of interesting molecules that act as binary switches. This is the G-protein signaling system, and it’s not exclusive to microtubules, rather it’s a family of molecules that work together within cells to create cascades of events and state changes. The outside of microtubules have GTP/GDP, Guanosine triphosphate and diphosphate. Removal of the 3rd phosphate is one state of the switch, while GTP is the other state. This lattice of GTP/GDP on the surface of the microtubules plays a known and observed role in vesicle transport, say when a “rail cart” attaches or detaches from the tubes, as well as in cell division, as in when the tubes attach to chromosomes; but what might be most fascinating are the fun patterns they form.

At this time for the hollow aspect of microtubules there has not been much, if any, observed behavior of electrons or other particles being transported within/through the tubes, only along the outside. But I may have just not seen those papers, or don’t know the right search terms. Some researchers have suggested, that these GTP/GDP lattices might themselves represent a form of computation, and/or electron transport. The most famous of these researchers is Roger Penrose, who is by no stretch a biologist, but there is some precedent of theoretical physicists getting close to the answer before evidence based molecular biology experiments validated the models. What comes to mind is the Trinity College lecture “What Is Life?” by Erwin Schrödinger, guessing characteristics of a molecule for transfer of hereditary information more than a decade before the structure of DNA was resolved.

^{Types of electron orbitals found in molecular bonds, with Pi-bonds at the bottom right, which is maybe what we are interested in.}

Most bonds between atoms in molecules are either sigma bonds (single bonds) or pi-bonds (double bonds). And it seems like what Penrose and his friends are suggesting is related to a property of pi-bonds. We all know that at the center of atoms are protons & neutrons, they’re huge, and then electrons which are tiny orbit around that big center, like a planet and its moon, or a sun and its planets. But that’s really an oversimplified concept, and not really reality. As an atom gains electrons, from 1 to however many, each electron has a chance of being found in a certain region away from the center. So, yes the first 2 or 3 electrons might be found at any given time in a region that’s like concentric spheres around the atom but then things get weird. In double bonds (pi-bonds), which are common in organic molecules, a single electron can be found at the top half of the distribution or the bottom at any given instant; and when there are multiple pi-bonds in a row, like in many protein backbones, any of the electrons from any of atoms along the backbone can show up at any of the locations at any given instant. Or something, it’s weird, I don’t really get it, and maybe that’s the point Penrose is making. Quantum stuff is funky and doesn’t make much sense to dumb-dumbs like me.

There is a lot of excitement (hype) around Artificial Intelligence, all the great strides we have made with neural networks, and all the right people at the right venture retreats are saying general intelligence will change your lives forever. But much of the tech stack behind AI research is based on mimicking how neurons in biological systems are, and it may wholly be true that beyond synapses, dendrites, and neurotransmitters there is an entirely still hidden mechanism to intelligence. It may be what Penrose proposes, and it may not be. What’s clear is there is so much we do not understand about what happens within cells, on a molecular scale, and even less when it comes down to the subatomic scale. From what has been observed to date, living systems, even basic multicellular processes utilize these poorly understood mechanisms of particle behavior, let alone a process such as intelligence. Are microtubules the answer? Are carbon nanotubes and quantum processors? Is general intelligence right around the corner? Who knows. What seems important is to balance evidence based models with pure or more maths based models as guides. As is more common in physics, work it out on the chalkboard before/if ever diving in.

Retooling Analysis Pipelines from Human to EBOV NGS Data for Rapid Alignment and Strain Identification

Screen Shot 2014-10-01 at 12.26.01 PM Can we use pipelines developed for human NGS analysis and quickly apply them for viral analysis? With ebolavirus being in the news, it seemed like a good time to try. Just as with a human sequencing project, it’s helpful if we have a good reference genome. The NCBI has four different ebola strain reference files located at their ftp:
Remote directory: /genomes/Viruses/* Accession: NC_002549.1 : 18,959 bp linear cRNA Accession: NC_014372.1 : 18,935 bp linear cRNA Accession: NC_004161.1 : 18,891 bp linear cRNA Accession: NC_014373.1 : 18,940 bp linear cRNA

Currently everything that’s happened in West Africa looks to match best with NC_002549.1, the Zaire strain. The Broad Institute began metagenomic sequencing from human serum this summer and the data can be accessed here (Accession: PRJNA257197). We can take some of these datasets and map them to NC_002549.1. The datasets are in .sra format, and must be extracted using fastq-dump.

Coverage map of SRA data from 2014 outbreak in Sierra Leone to the Zaire reference genome.

We can see that the data maps really well to this strain. All four of the reference genomes above were indexed with a new build of bwa(0.7.10-r876-dirty git clone https://github.com/lh3/bwa.git). Because EBOV genomes are so small, compared to humans, the only alignment algorithm which seemed suitable within bwa, was mem.
EBOV mokas$ ./bwa/bwa mem Zaire_ebolavirus_uid14703.fa SRR1553514.fastq > SRR1553514.sam [M::main_mem] read 99010 sequences (10000010 bp)... [M::mem_process_seqs] Processed 99010 reads in 8.988 CPU sec, 9.478 real sec [M::main_mem] read 99010 sequences (10000010 bp)... [M::mem_process_seqs] Processed 99010 reads in 8.964 CPU sec, 9.671 real sec
If we take the same SRA data and try to map it to some of the other strain references, e.g. the Reston Virginia strain from 1989, it can help give a rough idea of how closely related the 2014 incident is.

Very few regions from 2014 map to the Reston reference

It can be seen that apart from a few highly conserved regions where many reads align, the coverage map indicates that the data collected in West Africa and sequenced on the Illumina HiSeq2500 does not match to NC_004161.1. There were still approximately 500 variants with the Zaire reference on the 2014 samples, showing a good amount differences, considering the entire genome is only 18,000bp.

LucidAlign genome browser comparing the two strains

All of this is, of course, good news. We can take sequencing data of new EBOV strains and apply slightly modified pipelines to get meaningful results. And with the Ion PGM now being FDA approved means data can be generated in nearly 3 hours, with Federal approval.There have even been some publications which show that the protein VP24 can stop EBOV all together [DOI: 10.1086/520582] with the structures available for analysis as well. So, it looks like it’s all coming up humanity, our capabilities are there, and with proper resources this scary little bug can be a thing of history.

Leave a comment

Filed under Genome Browser, Genomics, LucidViewer, Microbiology

Tagged as Bioinformatics, ebola, Genome Browser, LucidAlign

October 8, 2012 · 3:44 pm

Virtualization of Raw Experimental Data

Earlier today it was announced that the 2012 Nobel Prize in Physiology/Medicine would be shared by Shinya Yamanaka for his discovery of 4 genes that could turn a normal cell back into a pluripotent cell.

An effect originally shown by John B. Gurdon with his work on frog eggs over 40 years ago. The NCBI’s Gene Expression Omnibus (GEO) database under accession number GSE5259 contains all 24 candidate genes that were suspected to play a role in returning a cell to a non-specialized state. A practical near-term impact of the research however may be overlooked. That is you can have all of Dr. Yamanaka’s experimental DNA microarray data used in making the prize winning discovery.

Unless you’ve been living under a rock on Mars, or you don’t care what dorky scientists are up to, then you may have heard of the ENCODE project. The Encyclopedia of DNA Elements isn’t winning any Nobel Prizes, not yet anyways, and if what many researchers believe to be true, it never will. All the datasets can be found, spun up, played with, and used as fodder for a new round of pure in silico research from the ENCODE Virtual Machine and Cloud Resource.

What ENCODE and the Nobel Prize in Medicine have in common is ushering in a new paradigm of raw experimental data/protocol/methodology sharing. ENCODE, which generated huge amounts of varied data across 400+ labs has made all of the raw data available online. They go one step further to provide the exact analytic pipelines utilized per experiment, including the raw datasets, as Virtual Machines. The lines between scientist and engineers are blurring, the best of either will have to be a bit of both. From the Nobel data, can you find the 4 genes out of the 24 responsible for pluripotent mechanisms? Are there similarly valuable needles, lost in the haystack of ENCODE data? Go ahead, give it a GREP through.

Citations:

“The 2012 Nobel Prize in Physiology or Medicine – Press Release“. Nobelprize.org. 9 Oct 2012
Induction of Pluripotent Stem Cells from Mouse Embryonic and Adult Fibroblast Cultures by Defined Factors
Kazutoshi Takahashi, Shinya Yamanaka
Cell – 25 August 2006 (Vol. 126, Issue 4, pp. 663-676)
ENCODE Nature Portal

Leave a comment

Filed under Genomics, Microbiology

Tagged as Bioinformatics, Computational Biology, Genomics

December 27, 2011 · 1:54 pm

Bioinformatics In Bengal

Dept. Biochemistry University of Dhaka

Visiting Bengal for the holidays I didn’t expect a thriving bioinformatics community. Yet, that’s exactly what I found when Dr. Haseena Khan invited me to visit her lab at The University of Dhaka. The Jute Genome Project was a consortium of academia, industry, and government which had sequenced & analyzed the Jute plant.

What Dr. Khan and her researchers lacked in cutting-edge equipment, they made up in passion, ingenuity & thorough knowledge of the most miniscule advancements in the field. After spending the day with them Dr. Khan insisted I meet with the industrial wing of the project.

Tucked away amidst one of the most clustered places on the planet, there are a few small buildings covered in plants, within them incredible things are happening.

Lush green home of scientists, developers & supercomputers at DataSoft

DataSoft Systems Ltd. created a sub-division, Swapnojaatra (dream journey) which would “put scientists, developers, and supercomputers in one room and throw away the key” as Palash the Director of Technology for DataSoft would tell me. Although the Jute Genome Project is now complete, the developers of Swapnojaatra are hooked on informatics. From the minute we met they were excited to show what they had done (within lines of existing NDAs) and ask what was new in the field from San Francisco. Indeed, the team here had discovered genomic re-assortment of the influenza virus, performed molecular docking studies of pneumonia and created many of their own informatics tools.

For a well-educated, computer savvy, developing region bioinformatics is a near perfect industry. With low overhead costs, compared to traditional wet-lab sciences and endless data being generated in more economically developed countries, it’s only a matter of time. Bengal and bioinformatics may have been made for each other.

Citations:

A Putative Leucine-Rich Repeat Receptor-Like Kinase of Jute Involved in Stress Response (2010) by MS Islam, SB Nayeem, M Shoyaib, H Khan DOI: 10.1007/s11105-009-0166-4

Molecular-docking study of capsular regulatory protein in Streptococcus pneumoniae portends the novel approach to its treatment (2011) by S Thapa, A Zubaer DOI:10.2147/OAB.S26236

Palindromes drive the re-assortment in Influenza A (2011) by A Zubaer, S Thapa ISSN 0973-2063

Leave a comment

Filed under Genomics, Microbiology

Tagged as Bioinformatics, bioinformatics community, developing world, influenza virus

April 17, 2011 · 11:13 pm

Decided? No, we just finished saying Good Morning: Sage Congress 2011

“Therefore a sage has said, ‘I will do nothing (of purpose), and the people will be transformed of themselves; I will be fond of keeping still, and the people will of themselves become correct. I will take no trouble about it, and the people will of themselves become rich; I will manifest no ambition, and the people will of themselves attain to the primitive simplicity’ ” reads Ch. 57 of the Tao Te Ching. How chillingly the 2 millennia old caricature of a wise-learned man holds true to this day.

Sage Bionetworks is a medical research organization, whose goal is “to share research and development of biological network models and their application to human disease and biology.” To this end, top geneticists, clinicians, computer scientists and pharmaceutical researchers gathered this weekend at UC San Francisco. We were given an inspirational speech by a cancer survivor, followed by report of the progress since last years congress. Although admirable on their own, the research and programs built in the last year seemed to remind us all again that in silico research was still closer to the speed of traditional life-science than the leaps and bounds by which the internet moves.

Example of an effort which aligns with & was presented at Sage

Projects like GenomeSpace by the Broad Institute give us hope of what’s possible while watching hours of debate and conjecture at Sagecon. There were many distinguished scientists, authors , nobel laureates and government representatives, the totality of whose achievement here was coming to agreement on what should be built, who should build it and by when. Groups were divided into subgroups, and then those divided yet again. All the little policy details, software choices and even funding options would be worked out. There was a lot of talk.

Normal Conference VS Developer Conference. SHDH Illustrated by Derek Yu

Attending gatherings for software developers in silicon valley, their hackathons leave much to be desired at events like Sagecon, the least of which being the beer. I doubt anyone enjoys sitting in a stuffy blazer listening to talks for hours on end. The hacker events are very informal, there is no set goal, yet by the end of 24 hours there are often great new programs, friendships and even companies formed. Iteration rate is key to finding solutions and the rate-limiting step in the life-sciences & medicine isn’t the talent or resources it’s the culture; an opinion echoed by Sages’ own shorts-wearing heroes Aled Edwards & Eric Schadt.

“You must understand, young Hobbit, it takes a long time to say anything in Old Entish. And we never say anything unless it is worth taking a long time to say. “

Leave a comment

Filed under Genomics, Microbiology

Tagged as #sagecon, Bioinformatics, Computational Biology, in silico

January 30, 2011 · 6:02 pm

Library of Life: Genomic Databases & Browsers

DNA at it’s heart is enormous chunks of information. The genome of an organism like yeast, mice or humans contains an ocean of data. Currently there are several on-line genomic databases, a great example being SGD dedicated to the yeast S. cerevisiae. SGD has become a necessary tool for life-scientist over the past 10 years but at the same time has not kept up with information technology, resulting in a platform which works like a 10 year old website.

SGD is clunky but necessary, for now

Above we see a typical SGD search, it takes 5 windows to arrive at the sequence data of 1 gene. Nevertheless, SGD is used by drug companies trying to find the next big hit, academic labs trying to cure cancer and field biologists studying wildlife.

DNA is extracted and placed through a sequencing machine which spits out the information into a computer file. Just as having an aged internet browser affects our productivity the browser one uses to view these files can have a large impact. Following the web-browser analogy we take a look at 3 different sequence browsers, starting with Vector NTI.

Vector NTI is enterprise software.

Vector NTI is well established and often bundled with hardware. It has many features but can often seem like information overload, causing most users to stumble through it’s many menus and windows. A step up in usability comes from the third-party software suite Sequencher, popular amongst mac users.

Sequencher is your friend

Sequencher strikes a healthy balance between features and usability. But is a fairly resource intensive program requiring CDs and hard drive space to store local algorithms. However, the most up to date browser is likely to be the free and light download, 4Peaks.

4Peaks Simplicity & Usability

4Peaks allows the user to go in, read their sequence file and get out. What it lacks in features it makes up for in simplicity. The end result of any software or database is to help researchers wade through all this information and continue their studies. In this environment services such as GENEART offers to perform much of the genomic related leg work on a given project.

These are all tools, the databases, browsers and services, which enable researchers to answer the questions that line our horizon. The progress of our tools has always directly correlated with our advancement, the life sciences adoption of information technology is a necessity as we discover so much of life is condensed data in every nook.

1 Comment

Filed under Genomics, Microbiology, PCR

Tagged as Bioinformatics, Genomics, Sequencing

December 5, 2010 · 7:23 am

The Polymerase Chain Reaction, A Microcosm

Creating a new life-form is an awe-inspiring experience. Writing DNA like a mere sentence and watching creation unfold in the mechanism of life is both breathtaking and humbling. None of this would be possible without the Polymerase Chain Reaction (PCR). A simple process where all the ingredients for DNA: a teaspoon of reagents, a pinch of polymerase enzyme and a handful of the “letters” that make up our genetic code are thrown into the oven, literally, well a very accurate oven that can step temperatures rather quickly. Within hours the sentence you had written out on a computer screen, is now molecules floating around in a tiny tube ready to be put into a cell, which will read the instructions and attempt to build or act accordingly. Using this simple idea the human race has been handed over the keys to the Build a Life Workshop, however this simple process often goes without scrutiny, without improvement.

Basic Principles of PCR

Much of the drug discovery in both academia and industry is now focused on protein mechanics. How does this receptor behave? What buttons turn this enzyme on and off? Focusing on protein structure and mechanism often makes PCR a boring chore that most researchers have to grudgingly get past before they can get to the interesting part. As a result, the basic process of PCR has remained the same for decades. I literally remember when a P.I. gave me a paper from 1985 to look up what settings I should use for my reaction. All this wouldn’t be a problem, except people are often wasting weeks to months trying to get the right PCR outcomes. At the root of the problem & the solution is information. PCR is a “black box” process, in that you throw all the ingredients together turn on the machine and hope that all the right molecules will bump into each other at the right times. Traditionally, it has been a exasperating trial & error based system. Now however, information technology has given a glimpse of a solution and a way to move forward to the next chapter in the development of this life-science staple.

Leave a comment

Filed under Genomics, Microbiology, PCR

Tagged as DNA, Microbiology, PCR, Polymerase Chain Reaction