Bioinformatics and the importance of curation

I recently finished a class on bioinformatics, the study of how best to use all the information scientists have accumulated and are accumulating in databases scattered around the world and web. And there’s a lot of information out there. Sometimes you hear the word “inventory” used to describe massive studies that characterize a lot of cellular components at once, as in, at some point in the future we will have a complete inventory of all proteins in all cancer cells, or stages of embryonic development, or what have you. I have to confess, I always picture an attic full of boxes, with structures and sequences and expression patterns, all minimally labeled, gathering dust.

Huge aggregations of information are scary.  Even just in one database for just one field of study—say, the NCBI website—even that tiny slice of All The Knowledge in the World makes us face that no one person will ever be able to assimilate it all. My feelings about bioinformatics are a lot like my feelings as a child, when I realized I would never be able to read all the books in the world: a crestfallen sense of having to miss out on something really, really interesting.

And at this point, maybe everyone is missing out. Even the authors of big-data studies, say, microarrays that assay expression levels of every gene in the genome in disease and non-disease tissues, can’t follow every lead to its source; often a few differently-regulated genes get followed up, but the rest just get put out there for others to work with. Sometimes I suspect that we (that collective, Internet-era “we”) have all the information to answer any cell-level biological question we could ask, but no idea of what the questions will be or how best to frame them.

One of the solutions to this problem is data mining—enlisting a computer’s help to sift through the masses of information with a program that looks for connections a human might see, if a human had the time and brainpower. Reading up on the art of data mining, I stumbled across a company called Narrative Science with a novel idea on how to present the results of data mining. Instead of making graphs from data, it makes sentences or even short passages, computer-generated but composed so that they read like something a human wrote. The company calls it a novel kind of visualization.  I think it’s absolutely brilliant, both as a way forward for handling a huge amount of information, and for the amount of cleverness that must have gone into programming a computer to take scores from a game and generate something like this:

WISCONSIN appears to be in the driver’s seat en route to a win, as it leads 51-10 after the third quarter…

As I’ve mentioned, I like my science in story form; decontextualized data, however useful they may be, are automatically less compelling to me.  So I’m glad that, in this age when the gestalt seems to be moving from the longform newspapers of the past to the Twitters of the future, that there’s still some consensus that narrative is a good vehicle for understanding the world.  According to some neuroscientists, perhaps it is the best vehicle.

When I think it over, I realize that nature has, or is, exactly as staggering a dataset as anything they’ve got at NCBI, with considerably more information much better encrypted. People have been trying to extract laws and trends from it for generations. All we’ve accomplished with our microarrays and our high-throughput proteomic studies is removing one step between the framing of a question and finding out the answer. Plenty of work of interpretation and meaning-making remains to be done.


Book review: Spook

Spook: Science Tackles the Afterlife, Mary Roach’s second published book, is a whimsical tour of modern and historical investigations into whether human consciousness survives death.  It follows up on her previous book, Stiff: the curious lives of human cadavers, with further investigation of what happens when people die. This time around, the question is not about the fate of our earthly remains but instead whether that’s all of us that does remain. The two books are unified by the marginality of the research they describe, and the curiosity and persistence Roach shares with us in getting as close to the bottom of things as circumstances allow. After all,

 By definition, death is a destination with no return ticket. Clinically dead is not dead-dead. So how do we know the near-death experience isn’t a hallmark of dying, not death? What if several minutes down the line, the bright light dims and the euphoria fades and you’re just, well, dead?  We don’t know, says Greyson. ‘It’s possible it’s like going to the Paris airport and thinking you’ve seen France.’

Lively as it is, the book ends up a trifle disorganized.  It presents a peculiar mix of Duke scientists designing random image displays for operating room ceilings to test whether an out-of-body experience can be confirmed, with tourists wandering supposedly haunted areas with tape recorders, listening for the voices of the Donner party. The field is fractured. There are a lot of individuals out there looking for ghosts, the weight of the soul or proof of the persistence of consciousness, who trust their own methodologies but scorn others’. The book reflects this lack of communication and collaboration, juxtaposing historical skeptics’ efforts to discredit mediums with a visit to a school for psychics-in-training.  One gets the sense that for all our new technologies, very little progress has been made in the last hundred and fifty years toward either showing that an afterlife is real, or showing conclusively that it is not.

To be fair, the field of trying-to-find-out-whether-ghosts-exist is uncommonly peppered with false starts and short on the compelling positive results—though as a kind of compensation, the book is more full than average of cocktail party stories.

What’s more, there’s something about Roach’s work that captures the spirit of research. She isn’t always sure where she is headed, but she invites us along and lets us see the questions as they arise, the leads as they pan out (or not), and all of the side routes that make the work so interesting and maddening to do. The result is like a travelogue, a road trip with a delightfully witty companion. One could wish for more science books to follow this model.

Coffee Spoons (Part One)

If you’re anything like me, you responded to recent news articles on the global coffee and chocolate shortage with a wail.  My own dismayed “Nooooo!” was just about fit for a Star Wars movie.  I love coffee and chocolate, especially together: their rich color; their smoky, bitter, very lightly sweetened tastes; the caffeine… the caffeine.

I imagine chocolate coming from Europe and coffee coming from Seattle, but of course it wasn’t always that way. Though chocolate has a venerable place in European culture, it got there just five hundred years ago in the Columbian exchange; and any coffee aficionado worth their grounds can tell you that both Arabica and robusta beans are native to Ethiopia.

When plants full of the same psychoactive compound start turning up across oceans, you start to wonder when they evolved that compound, and why.  And the plot thickens: caffeine is also found in tea, of course, not to mention the kola nut from Nigeria, and the yerba mate holly, guarana, and guayusa from the Amazon.  If you’ve been keeping score, that’s three continents’ worth of caffeinated delights, and those are only the plants that humans consume.

Caffeine is what’s known among plant biologists as a specialized metabolite. It isn’t absolutely needed to keep the plant alive, but improves its chances a little; human addicts may feel the same effect. Plants synthesize caffeine mostly in seeds and nascent leaves, where the compound paralyzes pests that try and eat these vulnerable tissues. The caffeine molecule looks a great deal like the universally-used nucleotide (DNA component) adenosine, and it does its work both in the human brain and the insect physiology by impersonating adenosine’s energy-carrying cousin ATP.

Caffeine biosynthesis is a fairly simple pathway. Once you have adenosine around, which every organism known to science does, it’s just a matter of cutting one or two bonds and switching out a methyl group or two to generate caffeine. It’s the kind of metabolite that, if gene products were patents, would make you smack your forehead and shout, “Why didn’t I think of that first?!” Therefore it’s not surprising that caffeine biosynthesis has evolved at least twice in plants: they have plenty of enzymes that specialize in cutting bonds and transferring methyls, it was only a matter of time and slightly altered specificity.  In fact, these researchers say, based on protein sequence analysis, that the same cutting-and-switching enzymes that held down one job in the plant left for the new caffeine-synthesis job several times in different plant lineages. So, although tea, coffee, and chocolate have the same delectable effects on our brains, they came to these effects independently.

Knowing this stuff won’t help us solve the caffeine shortage that is sure to give many of us metaphorical (and also withdrawal-induced) headaches over the next harvest season. Besides a sudden reversal of global climate change, it’s not clear what can do that. But at least we know that this precious metabolite is a feature of many plants, probably some we haven’t found yet.  And if all else fails, there’s always straight-up pharmacological synthesis!

How we spin a scientific story

It’s a common metaphor among scientists that to report your data is to tell your audience a story. No one gets excited about finding the answer to a question they aren’t invested in, which only makes sense; to be excited about results, you have to know where they fit into the field of previous knowledge and what will happen next.  And narrative is a great way to convey all that, to make an audience interested in a problem

That’s why the first ten minutes of any academic presentation are devoted to making a case for why the question is interesting, worth the scientist’s time and the audience’s attention.  Call it setting the stage, beginning the story, catching the audience up to where the excitement begins. The research question makes the dramatic conflict; if a presentation is really good, then the hero is clever and thorough in their approach to answering the question, but not self-aggrandizing (such are the hazards of telling a story about yourself). Once the question is answered, in full or in part, the denouement must remind the audience why they’re supposed to care; how will we all live happily ever after, or else what are the prospects for a sequel? (Applications for funding, though in theory they have only the very beginning of the story, are especially glowing when they pitch the sequels).

As in fiction, in a good scientific story the hero is relatable; knowing only what they have been told, the audience can see why the approaches the researcher took are sensible, can imagine taking such approaches themselves. This is what makes it tricky for scientists to tell nonscientists about their research. You can always assume that a roomful of cell biologists will agree that cell cycle control is an interesting topic. A roomful of normal people will want to know the point.  That’s why the world is full of publications like Discover and Scientific American, which make it their business to communicate why the eggheads are so excited.

Popular science writing tends to slant toward the novel, the gross, and the health-related: topics that are intrinsically interesting. When it is bad, it is very, very bad (here’s a great sendup of the crappy pop-sci story).  But when it is good, it is splendid.

This blog will be about how scientists and writers communicate science: to one another and the public.  What’s compelling?  What isn’t?  And what genuinely cool answers are out there waiting to be shared?