Since it launched in 2017, Facebook’s machine-learning framework PyTorch has been put to good use, with applications ranging from powering Elon Musk’s autonomous cars to driving robot-farming projects.
Now pharmaceutical firm AstraZeneca has revealed how its in-house team of engineers are tapping PyTorch too, and for equally as important endeavors: to simplify and speed up drug discovery.
Combining PyTorch with Microsoft Azure Machine Learning, AstraZeneca’s technology can comb through massive amounts of data to gain new insights about the complex links between drugs, diseases, genes, proteins or molecules.
SEE: Managing AI and ML in the enterprise 2020: Tech leaders increase project development and implementation (TechRepublic Premium)
Those insights are used to feed an algorithm that can, in turn, recommend a number of drug targets for a given disease for scientists to test in the lab.
The method could allow for huge strides in a sector like drug discovery, which so far has been based on costly and time-consuming trial-and-error methods.
Typically, to come up with a new drug to combat a specific disease, scientists have to test different protein designs and combinations in the lab until they find a working solution – which is why it can take 10 to 15 years to go from an idea for a drug to a medicine that is ready to go to market. AstraZeneca’s algorithm, on the other hand, can more quickly identify the top 10 drug targets that scientists should be looking at for a given disease.
Bringing automation to drug discovery is especially useful as the amount of data that scientists can access to drive their research increases exponentially each year. Analyzing databases that grow by the day to understand how they can inform drug discovery is effectively becoming a superhuman task.
Gavin Edwards, machine-learning engineer at AstraZeneca, told ZDNet: “Each year, the sheer amount of scientific information and data available to researchers grows. By leveraging AI and machine learning tools such as PyTorch and Azure, we can quickly extract, combine and interpret information from multiple sources, with the aim of drawing better and faster scientific conclusions than if we analyzed this data manually.”
A lot of the data available is unstructured text, which is where PyTorch comes in. The Facebook-developed package, based on the Python programming language, is an open-source machine-learning library that is especially useful for developers working on intense data-science tasks in fields like computer vision and natural language processing (NLP).
AstraZeneca’s NLP team uses PyTorch to define and train biomedical text-mining algorithms that can work their way through the data, finding patterns and trends, and eventually structuring the information at hand.
The data is then fed into a knowledge graph, which is able to intelligently link together pockets of information, so that each data point can be contextualized. Acting like a web of information, the graph can reflect the properties of each piece of data – genes, proteins, diseases, compounds – but also the relationships between different categories.
In other words, the knowledge graph comprehensively organizes all the scientific data at hand. Using the compute capabilities of Microsoft Azure Machine Learning, AstraZeneca’s engineers then uses the knowledge graph to train an algorithm that recommends new drug targets to scientists.
“We’ve combined research in the public domain and our internal research into a graph that encodes complex information easily,” said Edwards. “By layering machine learning on top of that, we can train machine learning models that recommend novel drug targets and help to inform pipeline decisions.”
A recommendation algorithm for drug discovery certainly sounds like the promise of huge time savings for scientists who have been relentlessly trialing new drug designs in the lab. But Edwards and his team also hope that the knowledge graphs they are creating might help researchers find new connections, explore new paths, and test unproven theories without the risk of losing too much time.
Knowledge graphs can be shrunk down to look at one aspect of a problem in detail, and they can also be expanded to provide a wider view across different branches of research. Researchers could, therefore, easily have access to untapped information that could bring additional value to their projects.
“Our knowledge graph allows researchers to ask key questions about genes, diseases, drugs and safety information to help identify and prioritize drug targets,” said Edwards. “And, as our data and knowledge continues to evolve, so will our graph, which means every new experiment will benefit from everything learned before.”
For Edwards the scope of the technology is potentially huge. In the context of an ongoing global pandemic, that’s one piece of good news to hold on to.