Breaking
World leaders gather for emergency summit on climate crisis • Tech giants announce major breakthrough in fusion energy • Stocks reach all-time high as global trade recovers • Global News 24 launches premium news experience • Stay updated with real-time headlines •
BACK TO NEWS
Technologyabout 5 hours ago

Two AI-based science assistants succeed with drug-retargeting tasks

Ars Technica
Ars Technica

Verified Publisher

Two AI-based science assistants succeed with drug-retargeting tasks

Both tools generate hypotheses; one goes on to analyze some of the data.

Story text Size Small Standard Large Width * Standard Wide Links Standard Orange * Subscribers only Learn more On Tuesday, Nature released two papers describing AI systems intended to help scientists develop and test hypotheses. One, Google’s Co-Scientist, is designed as what they term “scientist in the loop,” meaning researchers are regularly applying their judgements to direct the system. The second, from a nonprofit called FutureHouse, goes a step beyond and has trained a system that can evaluate biological data coming from some specific classes of experiments.

While Google says its system will also work for physics, both groups exclusively present biological data, and largely straightforward hypotheses—this drug will work for that. So, this is not an attempt to replace either scientists or the scientific process. Instead, it’s meant to help with the things that current AIs are best at: chewing through massive amounts of information that humans would struggle to come to grips with.

What’s this good for?

There are some distinctions between the two systems, but both of them are what is termed agentic; they operate in the background by calling out to separate tools. (Microsoft has taken a similar approach with its science assistant as well; OpenAI seems to be an exception in that it simply tuned an LLM for biology .) And, while there are differences between them that we’ll highlight, they are both focused on the same general issue: the utter profusion of scientific information.

With the ease of online publishing, the number of journals has exploded, and with them the number of papers. It has gotten tough for any researcher to stay on top of their field. Finding potentially relevant material in other fields is a real challenge. If you’re focused on eye development, for example, one of the signaling systems used there may also be involved in the kidney, and it can be easy to miss what people are discovering about it there.

As the people at FutureHouse put this issue, “By focusing on ‘combinatorial synthesis’ (identifying non-obvious connections between disparate fields), Robin effectively targets ‘low-hanging fruit’ that human experts may overlook due to the compartmentalization of scientific knowledge.” This is a task that’s well suited to AI, which can chew through the peer-reviewed literature in the background while researchers do other things. This isn’t really a question of whether an AI could do something better or worse than a human; it’s more of an issue of whether any human would end up doing these sorts of searches at all.

By finding enough connections among disparate research, these tools can make suggestions—hypotheses, really—about the biology. This can include things like what processes underly biological behaviors, and what pathways and networks regulate those processes. And, in the cases explored here, it included suggesting known drugs that might target some of these pathways in diseased cells: acute myeloid leukemia in Google’s case, and a form of macular degeneration for FutureHouse.

Co-scientist As you might imagine, Google’s system is based on the company’s Gemini large language model. That helps the system interpret a statement of research goals provided by human scientists and starts a literature search to find relevant information and form hypotheses. Those are then evaluated relative to each other in a “tournament,” the results of which are evaluated by a Reflection agent. An Evolution agent can then make improvements to any surviving ideas, which can be sent back through the process.

Key criteria considered throughout this process include plausibility, novelty, testability, and safety. And the Reflection tool has access to external search tools, as access to the scientific literature “prevented the hallucination of seemingly novel but implausible hypotheses,” the company wrote.

As the paper puts it, scientists were kept in the loop at all times. In the search for potential drugs targeting leukemia, the suggestions made by the system were prioritized based on a review by a panel of experts, who had access to the literature Co-Scientist used to formulate its suggestions.

The results are what you’d expect from cancer therapies. Some of the drugs identified were effective, but only against subsets of a panel of myeloid leukemia cells. That’s not unusual, given that there are multiple routes to unchecked growth, so drugs that block the route followed by one cell type may not be effective in cells that took a different route.

Google also mentioned that the system could do more general hypothesizing that doesn’t involve drugs, using an example of the spread of virulence genes in bacteria. But the details of that work were fairly sparse.

The system is also set up so that it’s model agnostic, allowing it to be switched over to better-performing models as AI systems evolve. But they also warn that, “Co-Scientist also inherits the intrinsic limitations of its underlying models, including imperfect factuality and the potential for hallucinations.” And Robin FutureHouse’s system has some similarities but a couple of critical differences that go beyond naming all the agentic tools after birds. The main system, Robin, has access to specialized literature search tools. One, Crow, produces a concise summary of papers, while Falcon gives a deep overview of the information contained in the paper. The paper describing the system provides a clear sense of the advantages here: “Robin analyses 551 papers in 30 minutes compared to an estimated time of 540 hours for a human.” Taking those summaries, Robin then formed a series of hypotheses about disease mechanisms for macular degeneration and used these tools to provide a detailed report on the evidence for each mechanism. An LLM judge then made pairwise comparisons among the hypotheses, which resulted in relative rankings—a bit like Google’s tournament system.

In a similar manner, the system was re-deployed to suggest cell lines and culture conditions that could provide a model of macular degeneration, and it prepared reports on 30 candidate drugs. “These reports contained both justification for why each drug is suitable for mitigating the disease mechanism represented in the in vitro model and potential limitations the drug may pose,” according to the FutureHouse team. Again, these reports were evaluated by human experts to determine which tests to go ahead with.

Robin also suggested assays to test the drugs, which humans evaluated (in most cases, it appears they used variants of the suggested ones).

The key difference with Robin is that it includes a tool, Finch, that can automate the evaluation of data from some standard biological screening assays, like flow cytometry and RNA-seq . So, as long as your tests involve one of the assays that Finch can handle, then there’s an additional step that can be performed by the system.

As above, Robin came up with a novel hypothesis: Increasing the ability of retinal cells to pick up debris outside the cells could provide some protection against the disease. And it identified a drug that seemed to provide just that sort of boost in the experiments it proposed.

As Google found, having tools designed specifically to interface with the scientific literature mattered. Swapping out Crow for OpenAI’s o4-mini took the rate of hallucinated references from zero percent all the way up to 45 percent. FutureHouse also took a look at the performance of OpenAI’s research-focused tool and found that, in all cases where it suggested drugs that Robin hadn’t come up with, those drugs failed to have an effect on these cells.

Where does this leave us?

For starters, it’s important to note that these successes come in one of the easier parts of drug development (not that any part of it can really be said to be easy). The AIs weren’t being asked to design entirely new molecules, and most drugs fail during the animal and clinical trials phase, rather than during testing in cell culture. That’s not to say repurposing existing drugs is nothing—we already have safety profiles and agency approvals for these molecules, and many are off-patent and therefore cheap. But we’re not at the point where AIs are solving hard problems.

This sort of hypothesis—this mechanism underlies that disease, and the drug over there can target it—is also one of the more concrete forms of hypothesis in biology. In my career as a scientist, I had to come up with hypotheses that were meant to address things like “mice with this mutation have a whole lot of defects in very different tissues; is there a single mechanism underlying them?” Or, “What’s going on at the border of this gene’s expression that is changing how cells respond to this signaling molecule?” It’s not clear how these systems could handle these more open-ended scientific problems.

That said, the problem of literature overload is a real one in many fields, and systems meant to address it can potentially help us avoid a situation where all the information we needed was sitting around for a decade, but nobody put it together. Given we’re still working through AI’s growing pains, however, I’m also happy that there are at least two independently developed systems tackling this problem so that we can potentially run both and compare the results.

Nature, 2026. DOI: 10.1038/s41586-026-10652-y , /10.1038/s41586-026-10644-y ( About DOIs ).

John Timmer Senior Science Editor John Timmer Senior Science Editor John is Ars Technica's science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

4 Comments

Read original story at Ars Technica

Continue reading this article on the publisher's website.

Visit Website

More from Ars Technica