Research 5 min read

Funny Algorithm, Serious Science

The New Yorker caption contest uses a voting system with ties to genetic research.

By Greg Uyeno featured image

Every week since the spring of 2005, The New Yorker magazine has run a caption contest. The magazine prints a single-panel comic with no words, and readers submit captions to complete the joke; for example, giving a voice to a cat reading a book. The contest is popular— there are thousands of submissions for each cartoon. Three finalists are printed the following week, and the overall winner is determined by public vote. For years, the magazine’s staff picked the three finalists after painstakingly reviewing every entry.

But in 2016, then-cartoon editor Bob Mankoff announced a change. A “fancy crowdsourcing algorithm,” developed with the help of computer scientists at the University of Wisconsin promised to streamline the process of selecting finalists. Online voters would help narrow the field of roughly 5,000 weekly entries down to a more manageable number for the editors to consider.

The voting system and algorithm are similar to ones that have also been used for research ranging from basic viral genetics to identifying medically useful associations between personal genomes and electronic health records.

Collaborator Rob Nowak, a computer scientist and an engineer, says he saw parallels between the human labor required to vote on captions and the time-consuming experiments involved in biological research. “The New Yorkeris a proving ground for algorithms of this sort,” says Nowak. The voting system has been in place for more than 100 caption contests now, and each week offers an opportunity to refine the approach.

The voting system uses machine learning, a form of artificial intelligence — and features an “active learning” approach to gathering information. Like many other machine-learning systems, the caption contest relies on human-generated labels. In this case, each caption gets ratings of “funny,” “somewhat funny,” or “unfunny” from online voters.

The system presents voters random captions from the pool of submissions, but over time it becomes clear that certain captions are poorly received. Rather than gather information about captions that are almost certainly not winners, the active learning voting system begins selectively showing voters the captions that it is less certain about. By the end of a roughly 24-hour window, the algorithm has collected voting information, scored captions based on user ratings, and spat out a list of the top candidates for editors to review.

In the world of biological research, scientists, too, have limited resources and lots of information to review. New drugs are constantly being developed, and more and more genetic data is being gathered every day. This data is like the cartoon captions, and researchers are trying to figure out what should be ranked at the top, such as likely contributors to disease, or effective treatments for those diseases.

“There’s a deluge of data, but there’s still a bottleneck of finding annotations to train algorithms,” says Shantanu Singh, a computational biologist at the Broad Institute in Boston. Some researchers don’t need much additional information about their datasets, such as the labeling of images. But in the case of genome-wide association studies, for example, researchers might need to comb through thousands of gene variants associated with heart disease or cancer risk, and repeat expensive, time-consuming tests for each in animal experiments. A properly tuned active learning algorithm can help reduce the number of tests that need to be run overall without significantly raising the risk that any one gene isn’t investigated thoroughly.

In his own research, Singh uses machine learning to examine images of cells. He takes pictures of the cells under different conditions, such as one that has a genetic mutation or another that has been exposed to a drug. Computers then find tiny perturbations in cell structure that trained experts might not look for, and compare those effects to those of other genes or molecules.

Other biology research uses deep learning, another form of machine learning that imitates human neurons to learn. Deep learning has also been applied to tasks like image recognition and natural language processing and is being investigated for medical applications. But the inscrutability of how a deep-learning algorithm has arrived at a conclusion is one major challenge to using these systems in a practical medical setting and a hot topic in the field.

“It’s a black box beyond anything we’ve had before,” Singh says. “It’s a little scary that you have no clue what it’s actually doing.”

While machine-learning algorithms offer hope of saving labor and provide insights that experts aren’t able to find on their own, many scientists are acutely aware that complex biological systems require cautious interpretation.

“We don’t want to miss something that’s really important. It has to be done very carefully,” says Nowak of developing active learning algorithms. “What does it mean to be careful? How do you do that algorithmically?”

For the caption contest, the stakes are low. A caption that many people might find hilarious could be discounted too early in the voting process and never make it to the editors’ desk. But in biology, an error could incorrectly reshape the understanding of the function of a gene. In medicine, it could consign a life-saving drug to the trash heap.

Machine learning may one day be a widespread aspect of personalized medical practice, but as its being developed, Singh suggests not to overlook information with established usefulness that might be accessible right now. “There’s already a lot one can tell from simple indicators like family history,” he says.