Articial Intelligence in Chemistry: Current Trends and Future
Directions
Zachary J. Baum,* Xiang Yu, Philippe Y. Ayala, Yanan Zhao, Steven P. Watkins, and Qiongqiong Zhou
Cite This: J. Chem. Inf. Model. 2021, 61, 31973212
Read Online
ACCESS
Metrics & More Article Recommendations
*
sı
Supporting Information
ABSTRACT: The application of articial intelligence (AI) to
chemistry has grown tremendously in recent years. In this Review,
we studied the growth and distribution of AI-related chemistry
publications in the last two decades using the CAS Content
Collection. The volume of both journal and patent publications have
increased dramatically, especially since 2015. Study of the
distribution of publications over various chemistry research areas
revealed that analytical chemistry and biochemistry are integrating AI
to the greatest extent and with the highest growth rates. We also
investigated trends in interdisciplinary research and identied
frequently occurring combinations of research areas in publications.
Furthermore, topic analyses were conducted for journal and patent
publications to illustrate emerging associations of AI with certain
chemistry research topics. Notable publications in various chemistry
disciplines were then evaluated and presented to highlight emerging
use cases. Finally, the occurrence of dierent classes of substances
and their roles in AI-related chemistry research were quantied,
further detailing the popularity of AI adoption in the life sciences and analytical chemistry. In summary, this Review oers a broad
overview of how AI has progressed in various elds of chemistry and aims to provide an understanding of its future directions.
KEYWORDS: articial intelligence, CAS Content Collection, analytical chemistry, biochemistry
INTRODUCTION
Articial intelligence (AI) refers to the ability of machines to
act in seemingly intelligent ways, making decisions in response
to new inputs without being explicitly programmed to do so.
Whereas typical computer programs generate outputs accord-
ing to explicit sets of instructions, AI systems are designed to
use data-driven models to make predictions. These AI models
are generally rst trained on representative data sets with
known output values, thereby learning inputoutput
relationships. The resulting trained models can then be used
to predict output values of data similar to the training set or to
generate new data. Many problems invol ving data with
complex inputoutput relationships are dicult or impractical
to model procedurally, thus creating an opportunity for AI.
AI can feasibly be applied to various tasks in the eld of
chemistry, where complex relationships are often present in
data sets. For example, the solubility of a new compound may
be predicted either through equations based on empirical data
or by using theoretical calculations. Alternatively, prediction of
solubility may also be accomplished by an AI program that has
developed structuresolubility relationships after being trained
on numerous compounds with known solubilities. The use of
AI for tasks, such as property prediction have proliferated in
recent years due to explosive growth in computing power,
open-source machine-learning frameworks, and increasing data
literacy among chemists.
19
AI implementations have proven
to dramatically reduce design and experimental eort by
enabling laboratory automation,
10
predicting bioactivities of
new drugs,
1113
optimizing reaction conditions,
14
and
suggesting synthetic routes to complex target molecules.
15
Although signicant publicity has been given to AI and its
application in chemistry, perspective on its use and develop-
ment in chemistry is not obvious from the massive volume of
available information. This Review uses the CAS Content
Collection to contextualiz e the current AI landscape,
classifying and quantifying chemistry publications related to
AI from the years 20002020. The CAS Content Collection
covers publications in 50 000 scientic journals from around
the world in a wide range of disciplines, 62 patent authorities,
Received: June 1, 2021
Published: July 15, 2021
Reviewpubs.acs.org/jcim
© 2021 The Authors. Published by
American Chemical Society
3197
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
Downloaded via 1.52.5.101 on December 25, 2025 at 04:45:36 (UTC).
See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
and 2 defensive publications (Research Disclosures and IP.
com).
16
There are more than 1000 global scientists specialized
in various scientic domains curating, analyzing, and
connecting data from published sources at CAS. The CAS
Content Collection, as one of the largest collections of
scientic databases in the world, has many unique features and
annotations added during data curation. Expert-curated CAS
content is suitable for quantitative analysis of publications
against variables, such as time, country, research area, and
substance details. We rst examine the growth and distribution
of AI-related publications in chemistry, which includes the
annual growth of publication volume and the distribution of
publications among countries, organizations, and research
areas, followed by a topic analysis revealing the evolution of
frequently used concepts related to AI in chemistry. We then
provide lists of notable AI-related j ournal and patent
publications in a variety of research areas. Finally, we look at
the types of chemical substances most frequently involved in
the AI-related literature, highlighting the distribution of AI-
related publications among various classes of substances and
their roles. We hope that this Review can serve as a useful
resource for those who would like to understand global trends
in AI-oriented research eorts in chemistry.
GROWTH AND DISTRIBUTION OF PUBLICATION
VOLUME IN AI-RELATED CHEMISTRY
Volume of Publications by Year. With the rapid growth
in global research activity, scientic publication volume has
steadily increased over the past 20 years. A quantitative
analysis helps to understand just how fast chemistry
publications using articial intelligence are increasing relative
to the increase in total chemistry publications. To this end, the
CAS Content Collection was searched to identify AI-related
publications from 2000 to 2020 based on various AI terms in
their title, keywords, abstract text, and CAS expert-curated
concepts. The search query required screening of each term to
minimize false positives due to polysemy; a maximum of a 2%
false positive rate was allowed for each OR-delimited phrase, as
determined by random screenings of 50100 documents
performed by CAS experts. In addition, matches on particularly
problematic phrases, such as brain and nerve were excluded
from consideration. The resulting search string is provided in
the Supporting Information. From this search, roughly 70 000
journal publications and 17 500 patents from the CAS Content
Collection were identied to be related to AI. Figure 1A and
1B shows the volume of these publications and their volume
normalized by the overall number of journal publications or
patents by year, respectively. Indeed, the numbers of both
journal and patent publications increased with time, showing
similar rapidly growing trends after 2015. This growth stems in
part from the high-prole successes of deep learning projects in
public data challenges starting around 2012, such as the Merck
Molecular Activity Challenge
17
and the ImageNet competi-
tion,
18
which increasingly drew research interest from the
scientic community. Additionally, the introduction of open-
source machine learning frameworks, such as TensorFlow
(2015) and PyTorch (2016), and the availability of
increasingly powerful computing hardware sparked a global
explosion in AI research, enabling further applications of AI to
chemistry. In fact, as of 2020, over 50% of the documents on
AI in chemistry were published during the past 4 years.
Another way to measure recent scientic research trends is by
examining scientic meeting abstracts. For this purpose, the
abstracts from ACS National Meetings were analyzed for the
Figure 1. Annual publication volume in AI-related chemistry from 2000 to 2020: (A) Journal publications, (B) patent publications, and (C) ACS
National Meeting abstracts.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
3198
presence of AI topics, and the number of AI-related abstracts
per year and its amount relative to the total number of yearly
abstracts are shown in Figure 1C. The abstract publications
show similar behavior to the trends in journal and patent
publication. These analyses suggest that not only has there
been an absolute increase of research eort toward AI in
chemistry but also that the proportion of AI-related research is
increasing.
Distribution of AI-Related Publications by Country/
Region and Company. The countries/regions and organ-
izations of origin for AI-related chemistry documents were
then extracted to determine their distributions. Figure 2A and
2B shows the percentages of AI-related journal articles and
patents produced in selected countries/regions and by selected
organizations in the years 20002020, respectively, with the
top commercial patent assignees listed in Figure 2C. China and
the United States contributed the largest numbers of
publications for both journal articles and patents. Medical
diagnostic developers and technology companies make up a
large portion of the commercial patent assignees for AI
chemical research. These companies rely on AI for automation,
control, and optimization of a variety of processes, such as
semiconductor device fabrication and biomarker screening,
which will be explored in more detail in the following sections.
DISTRIBUTION OF AI-RELATED CHEMISTRY
PUBLICATIONS BY RESEARCH AREA
Trends of Publications in Specic Research Areas. To
have a closer look at how AI is involved in dierent chemistry-
related research areas, the roughly 70 000 journal and 17 500
patent publications were further classied into the following 12
categories by CAS experts: Analytical Chemistry, Biochemistry,
Energy Technology and Environmental Chemistry, Food and
Agriculture, Industrial Chemistry and Chemical Engineering,
Inorganic Chemistry, Materials Science, Natural Products,
Organic Chemistry, Physical Chemistry, Synthetic Polymers,
and Pharmacology, Toxicology and Pharmaceuticals. The
numbers of AI-related publications in each area are normalized
Figure 2. Distribution of AI-related publications by country/region and company from 2000 to 2020. (A) Top 20 countries/regions in number of
journal publications. (B) Top 20 countries/regions in number of patent publications. (C) Top 20 companies in number of patent publications.
Figure 3. Publication trends of AI in specic research areas from 2000 to 2020: (A) journal publications and (B) patent publications.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
3199
to that areas respective total yearly publication volume and
shown in Figure 3A (journal publications) and Figure 3B
(patents). The absolute numbers of journal publications in
each area are shown in Figure S1. Among all these specic
chemistry-related areas, documents in Analytical Chemistry
(both journal and patent publications) have the highest
normalized volume in the most recent 10 years; it has also
risen steeply in the last 5 years. Energy Technology and
Environmental Chemistry and Industrial Chemistry and
Chemical Engineering are the two research areas ranked in
the second tier in terms of proportion of research volume and
momentum in journal publications (Figure 3A). Interestingly,
while Biochemistry is among the elds most represented in AI-
related patent publications, its proportion in journal
publications is relatively moderate when compared to other
research areas. This indicates a strong desire or incentive to
patent AI technologies in biochemistry, possibly because of its
use in drug research and development.
Relative Prevalence of Interdisciplinary Research in
Specic Areas. Innovations in science and technology are
often made by nding connections between multiple research
areas to derive novel insights, methods, and products. A
theoretical method developed for molecular dynamics, for
example, may be applied to study the interaction of a ligand
with a protein, which can in turn be used to predict the activity
of a drug. Conversely, data collected from experimental
measurements can be used to optimize parameters of
theoretical simulations. With such continuous conversations,
elds not traditionally associated with each other can be
mutually informative. Interdisciplinary eects such as these are
also present in the AI-related chemical literature, which we
explore here in detail.
From the set of AI-related journal and patent publications,
we identied approximately 15 000 and 3000 interdisciplinary
journal articles and patents, respectively. CAS analysts
determined the primary disciplines for each document and
any secondary disciplines, which also contributed to the work.
The resulting combinations of primary and secondary
disciplines are summarized in Figure 4, which are both
normalized to the total number of interdisciplinary documents
containing each respective primary and secondary disciplines.
In Figure 4, several relationships are apparent among
chemical disciplines. In journal articles, the strongest
correlations are observed between primary and secondary
research areas in Analytical Chemistry and Biochemistry, in
Materials Science and Physical Chemistry, and in Biochemistry
with applications to Pharmacology, Toxicology and Pharma-
ceuticals (Figure 4A). In patents (Figure 4B), the trend is
similar, but inventions in Energy Technology and Environ-
mental Chemistry related to Industrial Chemistry and
Chemical Engineering also show prominently. For example,
journal documents using analytical chemistry techniques such
as mass spectr ometr y and nucle ar magnet ic resonance ,
infrared, and Raman spectroscopies are augmented with
machine learning for use in medical diagnostics,
1928
studies
of metabolomics,
2936
and microbial identication,
3740
while
biochemistry-related analytical chemistry patents concentrate
on the development of analytical devices and methods for use
in similar studies.
4153
AI-related journal documents with
interest in Materials Science and Physical Chemistry discuss
topics, such as the evaluation of structureproperty relation-
ships in materials by augmenting rst-principles calculations
with machine learning models,
5460
using data from high-
throughput experimentation to optimize the properties of
functional materials,
6168
and the use of published data to
enable the discovery of new materials.
6974
In patents, the
combination of AI, Materials Science, and Physical Chemistry
is used in methods for improving semiconductor device
fabrication
7586
and polymer performance.
8790
Additionally,
AI is being used in BiochemistryPharmacology Toxicology
and Pharmaceuticals research to understand drugbiomole-
cule interactions,
91100
apply biomarker data to the prediction
of drug activities,
101106
and model toxicity.
107111
Finally,
patents in Energy Technology and Environmental Chemistry
related to Industrial Chemistry and Chemical Engineering are
often using AI in control systems for fuel production
112119
and engines.
120125
These examples demonstrate how AI can
be applied in research areas where the relationships between
Figure 4. Relative prevalence of interdisciplinary studies in AI-related scientic publications: (A) journal publications and (B) patent publications.
Columns denote primary research areas, rows denote secondary research areas, and each square denotes an interdisciplinary pair of primary and
secondary research areas.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
3200
available data in s eparate domains are not obvious to
researchers.
While interdisciplinary relationships do appear in AI-related
chemistry research, it is natural to question the extent to which
AI is indeed facilitating connections between elds. To answer
this, we rst selected random control groups of journal (n =
81 601) and patent (n = 12 181) publications and identied
sets of interdisciplinary documents (n = 32 097 and n = 4426,
respectively) using the same 12 research areas. In both the AI
and control groups, we then calculated the proportions of
documents belonging to each primarysecondary discipline
pair. By comparing the corresponding proportions in these two
groups, the resulting dierence maps (Figure 5) reveal how AI
is bringing disciplines together (positive values) and areas in
which the use of AI is lagging (negative values). Notably, these
maps show that interdisciplinary biochemicalanalytical
research is greatly facilitated by AI, and that despite recent
advances, Physical Chemistry and Materials Science have used
AI techniques less than other chemistry elds in the period
20002020. This lag is observed despite the relatively high
proportion of interdisciplinary Physical Chemistry Materials
Science documents seen in Figure 4. This may be attributable
to the reliance of Materials Science on Physical Chemistry
principles and techniquesthe incidence of publications at this
intersection is high even in the absence of AI. However, the
use of AI in interdisciplinary research is still maturing; a similar
analysis was done with a more recent time window (2016
2020), where AIs capability in bringing disciplines together
seems to be increasing (Figure S2).
EVOLUTION OF RESEARCH TOPICS IN
AI-RELATED CHEMISTRY PUBLICATIONS
Topic Analysis in Journal Article Publications. By
analyzing the connections of CAS-indexed concepts over time,
one can see when a researc h t opic became potentially
addressable using AI techniques. Figure S3 shows the most
frequently co-occurring concepts and the number of docu-
ments in which they co-occur (presented at a 97.5th percentile
cuto for co-occurrence). For the years 20002004 (Figure
S3A), we see only a few concepts connected to the concepts
Neural network modeling and Algorithms. Several biochem-
istry-related concepts that appear in conjunction with AI
Figure 5. Dierence in proportion of total AI-related publications and control group (non-AI related) by interdisciplinary pair: (A) journal
publications and (B) patent publications.
Figure 6. Trends of co-occurrence in scientic publications for selected research topics and AI algorithms: (A) journal publications and (B) patent
publications.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
3201
Table 1. Notable AI-Related Journal Publications in the Areas of Biochemistry and Pharmacology, Toxicology and Pharmaceuticals
journal title organization
year of
publication highlight
Nature Improved Protein Structure Prediction Using Potentials
from Deep Learning
126
DeepMind Technologies Ltd., UK 2020 neural network, AlphaFold, distance between amino acid residues, potential of mean force,
protein structure prediction
Cell A Deep Learning Approach to Antibiotic Discovery
127
Massachusetts Institute of Technology,
US
2020 deep neural network, property-structure correlation, molecule prediction, compound
screening, empirical test, halicin
Nature
Biotechnology
SignalP 5.0 Improves Signal Peptide Predictions Using
Deep Neural Networks
128
Technical University of Denmark,
Denmark
2019 deep recurrent neural network combined with conditional random eld classication and
transfer learning, signal peptide prediction across all organisms
Nature
Biotechnology
Determining Cell Type Abundance and Expression
from Bulk Tissues with Digital Cytometry
129
Stanford University, US 2019 CIBERSORTx, single-cell RNA-sequencing
Cell Predicting Splicing from Primary Sequence with Deep
Learning
130
Illumina Inc., US 2019 prediction of splice junctions from pre-mRNA transcript sequence, splice-altering
consequence, pathogenic mutations
Cell Single-Cell RNA-Seq Reveals AML Hierarchies
Relevant to Disease Progression and Immunity
131
Massachusetts General Hospital and
Harvard Medical School, US
2019 single-cell RNA sequencing and genotyping, acute myeloid leukemia
Nature DNA Methylation-Based Classication of Central
Nervous System Tumors
132
Hopp Childrens Cancer Center at the
NCT Heidelberg (KiTZ), Germany
2018 machine learning, DNA methylation-based tumor classication
ACS Central
Science
Generating Focused Molecule Libraries for Drug
Discovery with Recurrent Neural Networks
133
Westfa
lische Wilhelms-Universita
t
Mu
nster, Germany
2018 recurrent neural network, drug molecule generation, drug discovery
Nucleic Acids
Research
BepiPred-2.0: Improving Sequence-Based B-Cell
Epitope Prediction Using Conformational Epitopes
134
Technical University of Denmark,
Denmark
2017 random forest algorithm, BepiPred-2.0 web server, sequence-based B-cell epitope prediction
Cell A Landscape of Pharmacogenomic Interactions in
Cancer
135
Wellcome Trust Sanger Institute, UK 2016 machine learning, correlating oncogenic alternations (somatic mutations, copy number
alterations, and hypermethylation) with drug sensitivity in human cancer cell lines
Cell Personalized Nutrition by Prediction of Glycemic
Responses
136
Weizmann Institute of Science, Israel 2015 machine learning, personalized postprandial blood glucose level prediction, therapeutic food
intervention
Nature
Genetics
A General Framework for Estimating the Relative
Pathogenicity of Human Genetic Variants
137
University of Washington, Seattle, US 2014 combined annotation-dependent depletion, integrated annotation method of human genetic
variants
Nature Persistent Gut Microbiota Immaturity in Malnourished
Bangladeshi Children
138
Washington University in St. Louis, US 2014 machine learning, 16S rRNA, correlation of microbiota immaturity index with human
malnourished state
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
3202
concepts are proteins, protein sequences, and protein
conformation.
In the years 20052009 ( Fi gure S3B), Homo sapiens
become a more popular topic because of the increasing AI-
related eort in disease diagnosis and prognosis, and related
concepts, such as biomarkers, tumor markers, prognosis, and
diagnosis start to appear during this period. Protein-associated
concepts, such as protein motifs, proteinprotein interactions,
secondary structure, and amino acids, became more prevalent
in AI-related documents, likely because of the use of AI in
solving high-resolution protein structures. Genetics-associated
concepts, such as sequence annotation and gene expression
prole, were also indexed more frequently. Finally, high-
throughput screening and proteomics were frequently used
concepts during this period.
In the years 20102014 (Figure S3C), genome-related
concepts, such as genome and single nucleotide polymorphism,
were more often studied using AI methods. The application of
AI to pharmaceutical and biomedical elds became more
common, as the concepts drug discovery, drug design, blood
analysis, neoplasm, and microRNA were frequently used. The
use of AI techniques for environmental remediation is
evidenced by the occurrence of concepts, such as absorptive
wastewater treatment and Chemical oxygen demand in this
period.
In the years 20152019 (Figure S3D), the use of AI
becomes more prominent in research topics, such as DNA
methylation, mutation, nanouids, heat transfer, and biodiesel
fuel to solve problems in those research areas. AI also appeared
frequently in publications related to cancer and Alzheimers
disease. Since the beginning of 2020, when the critical need for
research into COVID-19 became apparent, AI has been used
frequently in the areas of drug discovery, disease diagnosis, and
disease tracking (Figure S4).
Quantifying the co-occurrence of use case-specic concepts
with AI-related concept s over time further reve als the
progression of AI adoption. As Figure 6A shows, studies of
QSAR (quantitative structureactivity relationships), a per-
ennial topic in drug discovery research, have employed Neural
network models consistently for some time. On the other
hand, in Materials Science-related topics, such as thermal
conductivity, the use of neural network modeling has grown
more slowly, with its use in publications not increasing rapidly
until the second half of the 2010s. The use of machine learning
in topics such as medical diagnosis and Density functional
theory has only recently begun to increase signicantly.
Topic Analysis in Patent Publications. Frequently co-
occurring concepts were also identied in the patent literature
in 5-year time windows (Figure S5, presented at a 95th
percentile cuto for co-occurrence). Similar patterns in the
evolution of associated concepts were observed in the patent
literature as those observed in the journal literature. Previously
unseen research topics, such as Diagnosis, Prognosis, Peptides,
and Transcription factors, were introduced in the years 2005
2009 (Figure S5B). The use of AI in the study of organic
compounds and hydrocarbons and in the development of
QSPR (quantitative structureproperty relationships ) be-
comes more prominent in the years 20102014 (Figure
S5C), and connections between these topics and AI-related
concepts have increased further since 2015 (Figure S5D).
However, unlike in journal publications, few COVID-related
concepts co-occurred with AI-related concepts in the patent
literature in 2020 (Figure S6). This may be due in part to the
longer turnover time in the patent application process
compared to scientic journal publication.
It is telling to examine the progression of the concept
diagnosis with various AI concepts. While the growth of
documents associating diagnosis with the concept Neural
network modeling is unsubstantial between 2000 and 2015, the
number of documents associating diagnosis with various AI
concepts increases rapidly after 2015, with the concepts of
deep learning, random forest, and support vector machine
seeing signicant usage (Figure 6B). This pattern is consistent
across a variety of topics in chemistry, in which the increase in
usage of AI after 2015 is general rather than being limited to a
single AI methodology.
NOTABLE AI-RELATED JOURNAL AND PATENT
PUBLICATIONS
To highlight the most inuential journal publications using AI
in chemistry, a bibliometric analysis was performed in the
primary literature f rom our search query since 2014.
Publications with over 100 citations were selected and further
classied into groups of related research areas; then, they were
reviewed and selected based on apparent novelty: Biochemistry
and Pharmacology, Toxicology and Pharmaceuticals (Table 1),
Materials Science (Table 2), and Analytical Chemistry,
Synthetic Chemistry, and Physical Chemistry (Table 3). The
US is the leading country of origin: 15 of the 34 papers in
Tables 13 are aliated with US organizations. Other
countries with signicant numbers of important AI documents
are Germany (6) and Switzerland (5). Among organizations,
the Massachusetts Institute of Technology (US) and the
University of Basel (Swi tzerland) were the two biggest
contributors. Three commercial organizations, De epMind
Technologies, Ltd. (UK), Illumina Inc. (US), and Intel
Corp. (US) contributed signicantly.
Among these 34 journal papers, the most frequently indexed
concepts are Machine learning, Neural network, Deep learning,
Density functional theory, and Random forest. In Biochemistry
and Pharmacology and Toxicology and Pharmaceuticals
(Table 1), many of the articles apply AI technology to
research topics involving high-throughput drug screening,
nucleic acid sequence analysis, and protein structure
prediction. Publications in Materials Science research (Table
2) reported AI-driven structureproperty relationship pre-
dictions enabling the discovery of new functional materials as
well as memristors with applications in neuromorphic
computing. In Analytical Chemistry, Synthetic Chemistry and
Physical Chemistry (Table 3), new methods were developed
with AI to complement analytical data, automate ow
chemistry, improve ret rosyntheti c planning, and predi ct
reaction outcomes. In addition, user-friendly computational
tools were developed, and methods combining AI with
physics-based approaches such as density functional theory
were reported to improve the accuracy of calculations.
To identify notable AI-related patents, results from the
search query were rst sorted by size of patent family for each
year. A patent family is a collection of patents led in multiple
countries covering the same or similar content
160
and, thus,
represents high-priority intellectual property for organizations,
which we use here as a proxy to estimate importance.
Documents were then selected from the top 50 largest patent
families per year on apparent novelty and relevance to overall
research trends of AI in chemistry and presented in Table S1.
The US has made the largest contribution to these patent
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
3203
inventions, with 13 of the 15 patents selected granted to
companies based in the United States. Interestingly, most of
the patent assignees are startup companies founded in the past
10 years. This is consistent with the rapid growth of AI-related
chemistry inventions since 2015 and indicates how the
emerging paradigm of AI provides opportunities for innovative
enterprises.
The adoption of AI in the life sciences is prominent,
comprising 8 of the 15 patents covering biomarker develop-
ment, gene expression proling, and biosequence analysis
(Table S1). These patents also reect a strong interest in
applying machine learning to medical diagnostics, consistent
with the topic analysis in Figure S5D. The remaining 7 patents
cover research areas, including Analytical Chemistry, Environ-
mental Chemistry, Materials Science, Industrial Chemistry &
Chemical Engineering, and Synthetic Chemistry.
DISTRIBUTION OF SUBSTANCE INFORMATION IN
AI-RELATED CHEMICAL LITERATURE
Journal Publications by Substance Class. The dis-
tribution of AI-related research activity can also be probed by
studying the numbers of documents involving dierent types of
substances. Because the barriers to AI implementation in
chemistry include challenges in substance representation
6
and
data availability,
7
enumeration of the most common substance
types studied in the literature will point to areas in which
researchers have, in some instances, been able to overcome
such challenges. The substances indexed b y CAS are
categorized into multiple classes. The numbers of AI-related
journal publications for some frequently occurring substance
classes, namely Alloy, Coordination Compound, Element,
Manual Registration, Ring Parent, Small Molecule, Polymer,
Salt, and Inorganic Compound, are shown in Figure 7A.
Substances in the Manual Registration class are predominantly
biomolecules, such as enzymes, hormones, vaccines, and
antibodies. Biosequences are not included in the analysis of
journal documents. Ring Parents represent scaolds dening
the composition and connectivity of molecular ring systems.
As Figure 7A shows, publications containing Small Molecule
substances are the highest in number, followed by those
containing Element and Manual Registration substances, far
outnumbering publications containing substances in the
remaining classes. The high volume of research and invention
in AI involving these classes is likely facilitated by their relative
simplicity and ease of modeling compared to substances in
other classes, such as Coordination Compound and Polymer.
The large number of documents containing Manual Registra-
tion substances in Figure 7A is consistent with the relatively
high publication volume in Biochemistry (Figure S1). Also
shown in Figure 7A are the total numbers of substances
contained in AI-related journal publications for each substance
class. The data show similar trends as those for document
count, albeit skewed by the larger number of small molecule
substances per document.
Figure 7B shows the change in the number of AI-related
journal publications by substance class for the years 2000
2020. While the document count of each substance type
increased during this period (and particularly after 2017),
those containing Small Molecule, Element and Manual
Registration substances displayed the largest increases,
consistent with the data shown in Figure 7A.
Patent Publications by Substance Class. The patent
literature was analyzed using the same methods as for the
Table 2. Notable AI-Related Journal Publications in Materials Science
journal title organization year highlight
Nature Accelerated Discovery of CO
2
Electrocatalysts Using Active Machine
Learning
139
University of
Toronto, Canada
2020 machine learning, density functional theory, CO adsorption energy prediction, surface
and adsorption site screening in Cu-containing intermetallic crystals, new
electrocatalysts
Nature Scalable Energy-Ecient Magnetoelectric SpinOrbit Logic
140
Intel Corp., US 2019 scalable logic device based on quantum materials, electrons angular-linear momentum
transduction with magnetoelectric switching
Nature Materials Ionic Modulation and Ionic Coupling Eects in MoS
2
Devices for
Neuromorphic Computing
141
University of
Michigan, US
2019 logic device based on MoS
2
lm, local phase transition controlled by ion migration
Journal of the
American Chemical
Society
Accelerated Discovery of Organic Polymer Photocatalysts for Hydrogen
Evolution from Water through the Integration of Experiment and
Theory
142
University of
Liverpool, UK
2019 machine learning, correlation of sacricial hydrogen evolution rate with four predicted
molecular properties in conjugated polymers, new photocatalyst discovery
Physical Review
Letters
Crystal Graph Convolutional Neural Networks for an Accurate and
Interpretable Prediction of Material Properties
143
Massachusetts
Institute of
Technology, US
2018 convolutional neural network, density functional theory calculation, property-crystal
graph correlation, crystalline material design
Nature Materials A Non-Volatile Organic Electrochemical Device as a Low-Voltage Articial
Synapse for Neuromorphic Computing
144
Stanford University,
US
2017 organic memristor, polymer material
Nature Materials Design of Ecient Molecular Organic Light-Emitting Diodes by a High-
Throughput Virtual Screening and Experimental Approach
145
Harvard University,
US
2016 combining time-dependent density functional theory with machine learning to predict
electroluminescent molecules with high quantum eciency
Nature Machine-Learning-Assisted Materials Discovery Using Failed
Experiments
146
Haverford College,
US
2016 cheminformatics utilized for hydrothermal synthesis of organicinorganic hybrid
materials via support vector machine-derived interpretable decision trees
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
3204
Table 3. Notable AI-Related Journal Publications in the Areas of Analytical Chemistry, Synthetic Chemistry, and Physical Chemistry
journal title organization year highlight
Science Global Threat of Arsenic in Groundwater
147
Swiss Federal Institute of Aquatic Science
and Technology, Switzerland
2020 random forest model classifying high-risk population areas
Nature Methods Deep Learning Enables Cross-Modality Super-Resolution in
Fluorescence Microscopy
148
University of California, Los Angeles, US 2019 deep learning, uorescence microscopy
Bioinformatics Trainable Weka Segmentation: A Machine Learning Tool for
Microscopy Pixel Classication
149
Ikerbasque-Basque Foundation for Science,
Spain
2017 unsupervised machine learning, segmentation, clustering
Nature Methods The Perseus Computational Platform for Comprehensive
Analysis of (Prote)omics Data
150
Max Planck Institute of Biochemistry,
Germany
2016 connecting proteomics researchers with bioinformatics tools leveraging
machine learning
Science A Robotic Platform for Flow Synthesis of Organic Compounds
Informed by AI Planning
151
Massachusetts Institute of Technology, US 2019 automated synthesis enabled by AI
Nature Planning Chemical Syntheses with Deep Neural Networks and
Symbolic AI
152
Westfa
lische Wilhelms-Universita
tMu
nster,
Germany
2018 deep neural network, Monte Carlo tree search, expansion policy
network, lter network, retrosynthesis
Science Predicting Reaction Performance in CN Cross-Coupling
Using Machine Learning
153
Princeton University, US 2018 random forest, high- throughput experimentation, reaction outcome
prediction
physical chemistry
Journal of Chemical Theory
and Computation
PhysNet: A Neural Network for Predicting Energies, Forces,
Dipole Moments, and Partial Charges
154
University of Basel, Switzerland 2019 neural network, property prediction, using models trained on small
peptide fragments to predict protein properties
Science Advances Machine Learning of Accurate Energy-Conserving Molecular
Force Fields
155
Technische Universita
t Berlin, Germany 2017 gradient-domain machine learning, ab initio molecular dynamics
Science Solving the Quantum Many-Body Problem with Articial
Neural Networks
156
ETH Zurich, Switzerland 2017 machine learning of wave functions to solve many-body problem in
quantum physics
Nature Communications Quantum-Chemical Insights from Deep Tensor Neural
Networks
157
Technische Universita
t Berlin, Germany 2017 deep tensor neural network, quantum chemistry, intermediate size
molecule property prediction
Journal of Chemical Theory
and Computation
Prediction Errors of Molecular Machine Learning Models
Lower than Hybrid DFT Error
158
University of Basel, Switzerland 2017 machine learning, molecule property prediction more accurate than
DFT
Journal of Chemical Theory
and Computation
Big Data Meets Quantum Chemistry Approximations: The Δ-
Machine Learning Approach
159
University of Basel, Switzerland 2015 machine learning, organic molecule thermochemical properties
prediction
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
3205
Figure 7. Publications in AI-related chemistry associated with substance class from 2000 to 2020. (A) Number of AI-related journal publications
and number of substances associated with each class. (B) Trends of AI-related journal publications associated with each substance class. (C)
Number of AI-related patent publications and number of substances associated with each class. (D) Trends of AI-related patent publications
associated with each substance class.
Figure 8. Trends of substance classes in AI-related chemistry publications from 2000 to 2020: (A) journal publications and (B) patent publications.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
3206
journal literature. Figure 7C shows the numbers of AI-related
patent publications and substances associated with dierent
substance classes. Nucleic Acid Sequences and Peptide
Sequences are highest in number, whereas the remaining
relative document and substance counts are similar to those
found in Figure 7A. Patents containing Peptide Sequences or
Nucleic Acid Sequences often contain large numbers of
sequences per document, often far greater than other
substances per patent. The change in the number of AI-
related patent publications containing various substance classes
over time is shown in Figure 7D, again showing trends
consistent with those in Figure 7B.
Analysis of Substances Contained in AI-Related
Chemical Literature. A substance-level perspective of
chemical research over time is also useful for understanding
the utilization of AI. It is interesting to see that in both journal
and patent publications (Figure 8A and 8B, respectively), the
number of substances present do not follow a monotonic
increase over time, as was the case in the progression of total
research volume. Rather, a lull in substance count can be seen
in the rst half of the 2010s before catching up with the
massive increase in publications. This may be partially due to a
small number of documents between the years of 20082014
containing large amounts of Small Molecule substances or
biosequences on the order of 10
3
10
4
, which sometimes can
be seen in the literature. We have also studied the distribution
of these substances across a variety of role indicators, which are
controlled vocabulary terms that describe the use of a
substance within the context of a specic document (Table
S2, Figure S7).
CONCLUSIONS AND OUTLOOK
Applications of AI in chemistry have become increasingly
popular in recent years, as evidenced by the strong growth in
publication volume. Yet, it is striking that growth has not been
uniform. For some elds of chemistry, AI is much further along
the proverbial Hype Cycle of Emerging Technologies
161
than
others. In life-sciences and Analytical Chemistry, for example,
AI-adoption is likely already past the so-called peak of inated
expectations and trough of disillusionment. The utility of AI
in a given domain is intrinsically linked to the quantity and
quality of its data, as well as opportunities to gain insights from
its analysis. AI can help gain insights that would not otherwise
follow from established knowledge. AI is also useful for
extracting insights from large intractable data sets, as well as
aiding in the automation of repetitive tasks. With this in mind,
it is not a surprise to see a surge in AI deployment within
analytical chemistry, where large training sets are readily
obtained, or in biochemistry, which contains a wealth of data
for macromolecules whose structureproperty relationships
are not obvious to researchers. Successes in these more
traditionally data-intensive elds are now being emulated in
other areas of chemistry.
The large numbers and rapid growth of AI-related chemistry
publications involving small molecules reect the popularity of
AI applications in drug discovery. Analyses of total substance
numbers for each class in AI-related publications revealed large
numbers of Nucleic Acid Sequences and Peptide Sequences in
patents, consistent with the prevalence of AI applications in
biochemistry. The distribution of the role indicators assigned
to substances in AI-related publications contextualizes how AI
is being used in recent biochemical an d pharmaceutical
research.
Multiple factors likely explain the signicantly increased use
of AI in chemistry after 2015. The greater availability of
software and hardware tools to implement AI decreased the
barriers to using it in chemical research, while research area-
specic data sets amenable to AI methods have proliferated. In
addition, many researchers have learned techniques in
generating and handling data for use in AI methods.
Between the years 20002020, the co-occurrence of AI- and
research area-specic concepts in publications shows how AI
has been incorporated into a variety of research areas. Many AI
methods have been adapted for chemistry research and are
being further introduced to new areas of chemical study.
In conclusion, thanks to an increasingly interdisciplinary
research landscape, many AI methods have been successfully
adapted to chemistry research. Use of AI has even become
routine in some elds. There are still areas of Chemistry like
organic synthetic chemistry where AI is yet to make an impact.
Perhaps, it is a matter of time before improvements in AI itself,
lessons from successful applications of AI, and interdisciplinary
research combine to help lift these areas out of the trough of
disillusionment and onto the plateau of productivity.
ASSOCIATED CONTENT
*
sı
Supporting Information
The Supporting Information is available free of charge at
https://pubs.acs.org/doi/10.1021/acs.jcim.1c00619.
Total AI-related journal publications by discipl ine,
dierence in proportion of interdisciplinary journal
publications from 2016 to 2020, evolution of co-
occurring concepts, search string used for retrieval of
all publications, table of notable AI-related patents, and
common substance role indicators (PDF)
AUTHOR INFORMATION
Corresponding Author
Zachary J. Baum Chemical Abstracts Service, Columbus,
Ohio 43210, United States;
orcid.org/0000-0002-0585-
8503; Email: ZBaum@cas.org
Authors
Xiang Yu Chemical Abstracts Service, Columbus, Ohio
43210, United States
Philippe Y. Ayala Chemical Abstracts Service, Columbus,
Ohio 43210, United States
Yanan Zhao Chemical Abstracts Service, Columbus, Ohio
43210, United States
Steven P. Watkins Chemical Abstracts Service, Columbus,
Ohio 43210, United States
Qiongqiong Zhou Chemical Abstracts Service, Columbus,
Ohio 43210, United States;
orcid.org/0000-0001-6711-
369X
Complete contact information is available at:
https://pubs.acs.org/10.1021/acs.jcim.1c00619
Notes
The authors declare no competing nancial interest.
Publications using articial intelligence were identied by
optimizing a search of relevant terms on the CAS Content
Collection using CAS STN. Whi le the full data set is
considered proprietary by CAS, the search string used for
retrieval is included in the Supporting Information. Substance
information, primary and secondary disciplines, concepts, and
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
3207
institutional information were extracted directly from the CAS
Content Collection.
ACKNOWLEDGMENTS
We sincerely appreciate Rumiana Tenchov s assistance
curating references, Joshua Blair for obtaining ACS National
Meeting abstracts, Laura Czuba for project coordination, Peter
Jap and Cristina Tomeo for insightful discussion, and Susan
Jervey and Robert Bird for proofreading. We are also grateful
to Manuel Guzman, Gilles Georges, Michael Dennis, Carmin
Gade, Dawn George, and Cynthia Casebolt for executive
sponsorship.
REFERENCES
(1) Dutta, S. Data Modeling: A Fundamental Pillar of Your Future
Ai Technology. CAS Blog. https://www.cas.org/resource/blog/data-
modeling-fundamental-pillar-your-future-ai-technology (accessed May
13, 2021).
(2) Villalba, M.; Wollenhaupt, M.; Ravitz, O. Predicting New
Chemistry: Impact of High-Quality Training Data on Prediction of
Reaction Outcomes. CAS Whitepapers. https://www.ca s.org/
resources/whitepapers/predicting-new-chemistry (accessed May 13,
2021).
(3) Sharma, Y. Data Quality: The Not-So Secret Sauce for Ai and
Machine Learning. CAS Blog. https://www.cas.org/resource/blog/
data-quality-not-so-secret-sauce-ai-and-machine-learning (a ccessed
May 13, 2021).
(4) Griffen, E. J.; Dossetter, A. G.; Leach, A. G. Chemists: Ai Is
Here; Unite to Get the Benefits. J. Med. Chem. 2020, 63, 86958704.
(5) Mater, A. C.; Coote, M. L. Deep Learning in Chemistry. J. Chem.
Inf. Model. 2019, 59, 25452559.
(6) Wills, T. J.; Polshakov, D. A.; Robinson, M. C.; Lee, A. A. Impact
of Chemist-in-the-Loop Molecu lar Re presentations on Machine
Learning Outcomes. J. Chem. Inf. Model. 2020, 60, 44494456.
(7) Tkatchenko, A. Machine Learning for Chemical Discovery. Nat.
Commun. 2020, 11, 4125.
(8) Lo, Y.-C.; Rensi, S. E.; Torng, W.; Altman, R. B. Machine
Learning in Chemoinformatics and Drug Discovery. Drug Discovery
Today 2018, 23, 15381546.
(9) Machine Learning in Chemistry: The Impact of Articial
Intelligence; The Royal Society of Chemistry, 2020.
(10) Mullin, R. The Lab of the Future Is Now. Chem. Eng. News
2021, 28.
(11) Elton, D. C.; Boukouvalas, Z.; Fuge, M. D.; Chung, P. W. Deep
Learning for Molecular Designa Review of the State of the Art.
Molecular Systems Design & Engineering 2019, 4, 828849.
(12) Bender, A.; Cortés-Ciriano, I. Artificial Intelligence in Drug
Discovery: What Is Realistic, What Are Illusions? Part 1: Ways to
Make an Impact, and Why We Are Not There Yet. Drug Discovery
Today 2021,
26, 511524.
(13) Muratov, E. N.; Bajorath, J.; Sheridan, R. P.; Tetko, I. V.;
Filimonov, D.; Poroikov, V.; Oprea, T. I.; Baskin, I. I.; Varnek, A.;
Roitberg, A.; Isayev, O.; Curtalolo, S.; Fourches, D.; Cohen, Y.;
Aspuru-Guzik, A.; Winkler, D. A.; Agrafiotis, D.; Cherkasov, A.;
Tropsha, A. Qsar without Borders. Chem. Soc. Rev. 2020, 49, 3525
3564.
(14) Strieth-Kalthoff, F.; Sandfort, F.; Segler, M. H. S.; Glorius, F.
Machine Learning the Ropes: Principles, Applications and Directions
in Synthetic Chemistry. Chem. Soc. Rev. 2020, 49, 61546168.
(15) Struble, T. J.; Alvarez, J. C.; Brown, S. P.; Chytil, M.; Cisar, J.;
DesJarlais, R. L.; Engkvist, O.; Frank, S. A.; Greve, D. R.; Griffin, D. J.;
Hou, X.; Johannes, J. W.; Kreatsoulas, C.; Lahue, B.; Mathea, M.;
Mogk, G.; Nicolaou, C. A.; Palmer, A. D.; Price, D. J.; Robinson, R. I.;
Salentin, S.; Xing, L.; Jaakkola, T.; Green, W. H.; Barzilay, R.; Coley,
C. W.; Jensen, K. F. Current and Future Roles of Artificial Intelligence
in Medicinal Chemistry Synthesis. J. Med. Chem. 2020, 63, 8667
8682.
(16) CAS Content. https://www.cas.org/about/cas-content (ac-
cessed June 15, 2021).
(17) Dahl, G. E.; Jaitly, N.; Salakhutdinov, R. Multi-Task Neural
Networks for QSAR Predictions. arXiv, 2014, 1406.1231. https://
arxiv.org/abs/1406.1231.
(18) Krizhevsky, A.; Sutskever, I.; Hinton, G. E. Imagenet
Classification with Deep Convolutional Neural Networks. Commun.
ACM 2017, 60,8490.
(19) Cauchi, M.; Weber, C. M.; Bolt, B. J.; Spratt, P. B.; Bessant, C.;
Turner, D. C.; Willis, C. M.; Britton, L. E.; Turner, C.; Morgan, G.
Evaluation of Gas Chromatography Mass Spectrometry and Pattern
Recognition for the Identification of Bladder Cancer from Urine
Headspace. Anal. Methods 2016, 8, 40374046.
(20) Kim, J. Y.; Oh, D.; Sung, K.; Choi, H.; Paeng, J. C.; Cheon, G.
J.; Kang, K. W.; Lee, D. Y.; Lee, D. S. Visual Interpretation of
[(18)F]Florbetaben Pet Supported by Deep Learning-Based
Estimation of Amyloid Burden. Eur. J. Nucl. Med. Mol. Imaging
2021, 48, 11161123.
(21) Jaiswal, A.; Gianchandani, N.; Singh, D.; Kumar, V.; Kaur, M.
Classification of the Covid-19 Infected Patients Using Densenet201
Based Deep Transfer Learning. J. Biomol. Struct. Dyn. 2020,18.
(22) Song, C. L.; Vardaki, M. Z.; Goldin, R. D.; Kazarian, S. G.
Fourier Transform Infrared Spectroscopic Imaging of Colon Tissues:
Evaluating the Significance of Amide I and C-H Stretching Bands in
Diagnostic Applications with Machine Learning. Anal. Bioanal. Chem.
2019, 411, 69696981.
(23) Ren, X.; Ghassemi, P.; Kanaan, Y. M.; Naab, T.; Copeland, R.
L.; Dewitty, R. L.; Kim, I.; Strobl, J. S.; Agah, M. Kernel-Based
Microfluidic Constriction Assay for Tumor Sample Identification.
ACS Sens 2018, 3
, 15101521.
(24) Shen, H.; Zhang, W.; Chen, P.; Zhang, J.; Fang, A.; Wang, B. In
A Feature Selection Scheme for Accurate Identication of Alzheimers
Disease. Bioinformatics and Biomedical Engineering; Ortun
o, F., Rojas,
I., Eds. ; Springer International Publishing, 2016; pp 7181.
(25) Ramos, A
. G.; Antón, A. P.; Sánchez, M. D. N.; Pavón, J. L. P.;
Cordero, B. M. Urinary Volatile Fingerprint Based on Mass
Spectrometry for the Discrimination of Patients with Lung Cancer
and Controls. Talanta 2017, 174, 158164.
(26) Zhai, M. Y.; Zhao, Y.; Gao, H.; Shang, L. W.; Yin, J. H.
Quantitative Study on Articular Cartilage by Fourier Transform
Infrared Spectroscopic Imaging and Support Vector Machine. Chinese
Journal of Analytical Chemistry 2018, 46, 896901.
(27) Zheng, Q.; Li, J.; Yang, L.; Zheng, B.; Wang, J.; Lv, N.; Luo, J.;
Martin, F. L.; Liu, D.; He, J. Raman Spectroscopy as a Potential
Diagnostic Tool to Analyse Biochemical Alterations in Lung Cancer.
Analyst 2020, 145, 385392.
(28) Li, X.; Yang, T.; Li, S.; Yao, J.; Song, Y.; Wang, D.; Ding, J.
Study on Spectral Parameters and the Support Vector Machine in
Surface Enhanced Raman Spectroscopy of Serum for the Detection of
Colon Cancer. Laser Phys. Lett. 2015, 12, 115603.
(29) Zhang, L.; Ma, F.; Qi, A.; Liu, L.; Zhang, J.; Xu, S.; Zhong, Q.;
Chen, Y.; Zhang, C.-y.; Cai, C. Integration of Ultra-High-Pressure
Liquid Chromatography-Tandem Mass Spectrometry with Machine
Lea rning for Identifying Fatty Acid Metabolite Biomarke rs of
Ischemic Stroke. Chem. Commun. 2020, 56, 66566659.
(30) Alakwaa, F. M.; Chaudhary, K.; Garmire, L. X. Deep Learning
Accurately Predicts Estrogen Receptor Status in Breast Cancer
Metabolomics Data. J. Proteome Res. 2018, 17, 337347.
(31) Grissa, D.; Pétéra, M.; Brandolini, M.; Napoli, A.; Comte, B.;
Pujos-Guillot, E. Feature Selection Methods for Early Predictive
Biomarker Discovery Using Untargeted Metabolomic Data. Front
Mol. Biosci 2016, 3, 30.
(32) OShea, K.; Cameron, S. J.; Lewis, K. E.; Lu, C.; Mur, L. A.
Metabolomic-Based Biomarker Discovery for Non-Invasive Lung
Cancer Screening: A Case Study. Biochim. Biophys. Acta, Gen. Subj.
2016, 1860, 26827.
(33) Paul, A.; Srivastava, S.; Roy, R.; Anand, A.; Gaurav, K.; Husain,
N.; Jain, S.; Sonkar, A. A. Malignancy Prediction among Tissues from
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
3208
Oral Scc Patients Including Neck Invasions: A (1)H Hrmas Nmr
Based Metabolomic Study. Metabolomics 2020, 16, 38.
(34) Pan, H.; Yao, C.; Yao, S.; Yang, W.; Wu, W.; Guo, D. A
Metabolomics Strategy for Authentication of Plant Medicines with
Multiple Botanical Origins, a Case Study of Uncariae Rammulus Cum
Uncis. J. Sep. Sci. 2020, 43, 10431050.
(35) Gaul, D. A.; Mezencev, R.; Long, T. Q.; Jones, C. M.; Benigno,
B. B.; Gray, A.; Fernández, F. M.; McDonald, J. F. Highly-Accurate
Metabolomic Detection of Early-Stage Ovarian Cancer. Sci. Rep.
2015, 5, 16351.
(36) Hall, L. M.; Hill, D. W.; Bugden, K.; Cawley, S.; Hall, L. H.;
Chen, M.-H.; Grant, D. F. Development of a Reverse Phase Hplc
Retention Index Model for Nontargeted Metabolomics Using
Synthetic Compounds. J. Chem. Inf. Model. 2018, 58, 591604.
(37) Moawad, A. A.; Silge, A.; Bocklitz, T.; Fischer, K.; Rösch, P.;
Roesler, U.; Elschner, M. C.; Popp, J.; Neubauer, H. A Machine
Learning-Based Raman Spectroscopic Assay for the Identification of
Burkholderia Mallei and Related Species. Molecules 2019, 24, 4516.
(38) Kusic
, D.; Rösch, P.; Popp, J. Fast Label-Free Detection of
Legionella Spp. In Biofilms by Applying Immunomagnetic Beads and
Raman Spectroscopy. Syst. Appl. Microbiol. 2016, 39, 13240.
(39) Yu, F. L.; Zhao, N.; Wu, Z. S.; Huang, M.; Wang, D.; Zhang, Y.
B.; Hu, X.; Chen, X. L.; Huang, L. Q.; Pang, Y. X. Nir Rapid
Assessments of Blumea Balsamifera (Ai-Na-Xiang) in China. Molecules
2017, 22, 1730.
(40) Lund, J. A.; Brown, P. N.; Shipley, P. R. Differentiation of
Crataegus Spp. Guided by Nuclear Magnetic Resonance Spectrometry
with Chemometric Analyses. Phytochemistry 2017, 141,1119.
(41) Lu, J.; Chen, C.; Wang, H.; Wen, Y. Establishing a Machine
Learning Model for Cancer Anticipation and a Method of Detecting
Cancer by Using Multiple Tumor Markers in the Machine Learning
Model for Cancer Anticipation. US2018173847A1.
(42) Bazemore, K. Machine Learning Algorithms and Applied
Diagnostic Methods for Predicting Diseases. US2019353639A1.
(43) Shi, T.; Ding, W.; Chen, G. Methylation Markers, and
Application The reof in Cance r Diagno sis and Clas sication.
CN109680060A.
(44) Barnes, M.; Bifulco, C.; Chen, T.; Tubbs, A. Im aging
Processing System Using Conv olutional Neural Network for
Processing Imaging of Biological Staining of Animal Tissue and
Cells. WO 2015181371A1; AU2015265811-A1; CA2944 831-A1;
EP3149700-A1; US2017154420-A1; JP2017529513-W; EP3149700-
B1; US10275880-B2; CA2944831-C; AU2015265811-B2;
JP6763781-B2.
(45) Horimoto, K.; Fukui, K. Toxicity Learning Apparatus, Toxicity
Learning Method, Learned Model, Toxicity Prediction Apparatus and
Program. JP2020025471A.
(46) Sogawa, I.; Kimura, A.; Koh, Y.; Kimura, A.; Ko, Y.; Sogawa, I.;
Koh, Y. Method and Device for Detecting Tumor Cells by Analyzing
Spectroscopic Data by Statistical Meth od. WO20171 9577 2A1;
JP2017203637-A; IN201847044553-A; CN109073547-A;
US2019072484-A1; EP3457116-A1; EP3457116-A4.
(47) Park, J. Y.; Oh, J. Y.; Kim, J. J.; Lee, B. S.; Yang, H. S.; Sik, Y. H.
Clinical Diagnostic Data Processing System for Predicting Mortality
Risk Levels. WO2020130238A1; KR2020075477-A.
(48) Park, H. S.; Hyoung, R.; Chung, S. Method and Apparatus for
Identifying Strains Based on Mass Spectrometry and Mass Spectra
Peak Database. KR 2020050434 A.
(49) Shimizu, Y.; Takashima, N. Event Estimation Method, Event
Estimation Program and Server Device, Biological Information
Measurement Device, Event Estimation System. JP 2020031701 A.
(50) Sik, H. W.; Park, J. S. Computerized Diagnostic Information
System Method Using Biomarker Genes and Proteins for Risk
Assessment of Urogenital System Cancers and Drug Screening. KR
2020074555 A; KR2164052-B1.
(51) Segal, E.; Bar, N.; Korem, T. Predicting Blood Metabolites. WO
2020157762 A1.
(52) Apte, Z.; Richman, J.; Almonacid, D.; Pedroso, I.; Dumas, V.;
Marquez, V.; Araya, I.; Castro, R.; Saavedra, M.; Alegria, M. Method
and System for Characterization of Metabolism-Associated Con-
ditions, Including Diagnostics and Therapies, Based on Bioinformatics
Approach. WO 2019178610 A1; SG11202002522-A1;
AU2019233926-A1; CN111373481-A; KR2020132954-A;
EP3766073-A1; US2021074384-A1.
(53) Dutta, A.; Kashefhaghighi, D.; Kia, A.; Jaganathan, K.; Gobbel,
J. R. Articial Intelligence-Based Sequencing. WO 2020191391 A2;
WO2020191391-A3; AU2020240141-A1; CA3104951-A1.
(54) Lim, J. S.; Vandermause, J.; van Spronsen, M. A.; Musaelian, A.;
Xie, Y.; Sun, L.; OConnor, C. R.; Egle, T.; Molinari, N.; Florian, J.;
Duanmu, K.; Madix, R. J.; Sautet, P.; Friend, C. M.; Kozinsky, B.
Evolution of Metastable Structures at Bimetallic Surfaces from
Microscopy and Machine-Learning Molecular Dynamics. J. Am.
Chem. Soc. 2020, 142, 1590715916.
(55) Takahashi, K.; Takahashi, L.; Miyazato, I.; Tanaka, Y. Searching
for Hidden Perovskite Materials for Photovoltaic Systems by
Combining Data Science and First Principle Calculations. ACS
Photonics 2018, 5, 771775.
(56) Liu, C.; Li, Y.; Takao, M.; Toyao, T.; Maeno, Z.; Kamachi, T.;
Hinuma, Y.; Takigawa, I.; Shimizu, K.-i. Frontier Molecular Orbital
Based Analysis of Solid-Adsorbate Interactions over Group 13 Metal
Oxide Surfaces. J. Phys. Chem. C 2020, 124, 1535515365.
(57) Li, X.; Xie, Y.; Hu, D.; Lan, Z. Analysis of the Geometrical
Evolution in on-the-Fly Surface-Hopping Nonadiabatic Dynamics
with Machine Lea rning Dimensio nality Reducti on Appr oaches:
Classical Multidimensional Scaling and Isometric Feature Mapping.
J. Chem. Theory Comput. 2017, 13, 46114623.
(58) Deng, C.; Su, Y.; Li, F.; Shen, W.; Chen, Z.; Tang, Q.
Understanding Activity Origin for the Oxygen Reduction Reaction on
Bi-Atom Catalysts by Dft Studies and Machine-Learning. J. Mater.
Chem. A 2020, 8, 2456324571.
(59) Quaranta, V.; Behler, J.; Hellström, M. Structure and Dynamics
of the Liquid-Water/Zinc-Oxide Interface from Machine Learning
Potential Simulations. J. Phys. Chem. C 2019, 123, 12931304.
(60) Lansford, J. L.; Vlachos, D. G. Spectroscopic Probe Molecule
Selection Using Quantum Theory, First-Principles Calculations, and
Machine Learning. ACS Nano 2020, 14, 1729517307.
(61) Datar, A.; Chung, Y. G.; Lin, L.-C. Beyond the Bet Analysis:
The Surface Area Prediction of Nanoporous Materials Using a
Machine Learning Method. J. Phys. Chem. Lett. 2020, 11
, 54125417.
(62) Qi, X.; Ma, W.; Dang, Y.; Su, W.; Liu, L. Optimization of the
Melt/Crystal Interface Shape and Oxygen Concentration During the
Czochralski Silicon Crystal Growth Process Using an Artificial Neural
Network and a Genetic Algorithm. J. Cryst. Growth 2020, 548,
125828.
(63) Karim, M. R.; Ferrandon, M.; Medina, S.; Sture, E.; Kariuki, N.;
Myers, D. J.; Holby, E. F.; Zelenay, P.; Ahmed, T. Coupling High-
Throughput Experiments and Regression Algorithms to Optimize
Pgm-Free Orr Electrocatalyst Synthesis. ACS Applied Energy Materials
2020, 3, 90839088.
(64) Khmaissia, F.; Frigui, H.; Andriotis, A. N.; Menon, M. Data
Driven Modeling of Magnetism in Dilute Magnetic Semiconductors:
Correlation between the Magnetic Features of Diluted Magnetic
Semiconductors and Electronic Properties of the Constituent Atoms.
J. Phys.: Condens. Matter 2019, 31, 445901.
(65) Rong, K.; Wei, J.; Huang, L.; Fang, Y.; Dong, S. Synthesis of
Low Dimensional Hierarchical Transition Metal Oxides Via a Direct
Deep Eutectic Solvent Calcining Method for Enhanced Oxygen
Evolution Catalysis. Nanoscale 2020, 12, 2071920725.
(66) Wu, T.; Wang, J. Deep Mining Stable and Nontoxic Hybrid
Organic-Inorganic Perovskites for Photovoltaics Via Progressive
Machine Learning. ACS Appl. Mater. Interfaces 2020, 12, 57821
57831.
(67) Suresh, T.; Sivarajasekar, N.; Balasubramani, K. Enhanced
Ultrasonic Assisted Biodiesel Production from Meat Industry Waste
(Pig Tallow) Using Green Copper Oxide Nanocatalyst: Comparison
of Response Surface and Neural Network Modelling. Renewable
Energy 2021, 164, 897907.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
3209
(68) Wu, L.; Guo, T.; Li, T. Rational Design of Transition Metal
Single-Atom Electrocatalysts: A Simulation-Based, Machine Learning-
Accelerated Study. J. Mater. Chem. A 2020, 8, 1929019299.
(69) Ulissi, Z. W.; Singh, A. R.; Tsai, C.; Nørskov, J. K. Automated
Discovery and Construction of Surface Phase Diagrams Using
Machine Learning. J. Phys. Chem. Lett. 2016, 7, 39313935.
(70) Graziosi, P.; Kumarasinghe, C.; Neophytou, N. Material
Descriptors for the Discovery of Efficient Thermoelectrics. ACS
Applied Energy Materials 2020, 3, 59135926.
(71) Wang, Z.; Zhang, H.; Li, J. Accelerated Discovery of Stable
Spinels in Energy Systems Via Machine Learning. Nano Energy 2021,
81, 105665.
(72) Masood, H.; Toe, C. Y.; Teoh, W. Y.; Sethu, V.; Amal, R.
Machine Learning for Accelerated Discovery of Solar Photocatalysts.
ACS Catal. 2019, 9, 1177411787.
(73) Davies, D. W.; Butler, K. T.; Walsh, A. Data-Driven Discovery
of Photoactive Quaternary Oxides Using First-Principles Machine
Learning. Chem. Mater. 2019, 31, 72217230.
(74) Li, Z.; Achenie, L. E. K.; Xin, H. An Adaptive Machine
Learning Strategy for Accelerating Discovery of Perovskite Electro-
catalysts. ACS Catal. 2020, 10, 43774384.
(75) Gheith, M. E. M.; Stobert, I.; Hamouda, A. Curvilinear Mask
Models in Semiconductor Structure Manufacture by M achine
Learning. US 10831977 B1; US2020380089-A1.
(76) Chang, B. Y.; Jang, B. Y.; Zhang, F. Substrate Treating
Apparatus and Substrate Treating Method. US 2020192308 A1;
KR2020072060-A; CN111312613-A.
(77) Van Den Brink, M.; Cao, Y.; Zou, Y.; Van Den Brink, M. A.
Machine Learning Based Inverse Optical Proximity Correction and
Process Model Calibration. WO 2019238372 A1; TW202001447-A;
KR2021010897-A; CN112384860-A.
(78) Yati, A. Defect Discovery Using Electron Beam Inspection and
Deep Learning with Real-Time Intelligence to Reduce Nuisance. US
2019213733 A1; WO2019136190-A1; TW201939634-A;
KR2020096993-A; CN111542915-A; US10970834-B2.
(79) Chang, R. Photolithography Mask Design-Rule Check
Assistance Using Computer for Semiconductor Wafer Manufacture.
US 10713411 B1.
(80) Wang, D. Y.; Salcin, E.; Friedmann, M.; Shaughnessy, D.;
Shchegrov, A. V.; Madsen, J. M.; Kuznetsov, A. Methods and Systems
for Co-Located Metrology of Sem iconductor Structures. US
2020243400 A1; WO2020154152-A1; US10804167-B2.
(81)Honda,T.;Kekatpure,R.D.;David,J.D.Temporal
Dependencies of Process Targets for Dierent Machine Learning
Models for Robust Machine Learning Predictions for Semiconductor
Manufactoring Processes. US 2018356807 A1.
(82) David, J. D. Process Control Techniques for Semiconductor
Manufacturing Processes. US 2016148850 A1; WO2016086138-A1;
KR2017086585-A; CN107004060-A; JP2017536584-W;
US2018358271-A1; US10734293-B2; JP6751871-B2.
(83) Lauber, J.; Vajaria, H.; Zhang, Y.; Yong, Z. Multi-Step Image
Alignment Method for Large Oset Die-Die Inspection for Defects in
Semiconductor Device Manufacture. US 2019122913 A1;
WO2019079658-A1; TW201928541-A; US10522376-B2;
CN111164646-A; KR2020060519-A; EP3698322-A1;
JP2021500740-W.
(84) Bhosale, P.; Rizzolo, M.; Yang, C. Automated Method for
Integrated Analysis of Back End of the Line Yield, Line Resistance/
Capacitance and Semiconductor Device Fabrication Process Perform-
ance. US2018349535A1; US10303829-B2.
(85) Sriram an, H. P.; Pathangi, H. S. Defect Detection on
Semiconductor Wafers, Classication, and Process Window Control
by Sem. US2019287238A1; WO2019177800-A1; TW201941162-A;
US10679333-B2; KR2020122401-A; CN111837225-A.
(86) Kwon, N.; Kang, H.; Kim, Y.; Quan, N. Semiconductor Defect
Classication Device, Method for Classifyin g Defect of Semi-
conductor, and Semiconductor Defect Classi cation System.
US2019188840 A1; KR2019073756-A; CN110060228-A;
US10713778-B2.
(87) Lai, W.; Song, W.; Li, X.; Yan, Y.; Huang, W. Intrinsic
Stretchable Electroluminescent Block Copolymer Elastomer and
Preparation Method Thereof. CN111635504A.
(88) Blaier, O.; Schiller, E. Optimization for 3d Printing.
US2019366644A1; EP3584723-A2; EP3584723-A3; US10926475-
B2.
(89) Deetz, J. D.; Wood, C. E.; Truong, R. A. Resin Viscosity
Detection in Additive Manufacturing. US2020338830A1.
(90) Chen, A.; Zheng, X. Repairable Multi-Response Deformable
Liquid Crystal Elastomer Film and Its Preparation Method and
ApplicationinArti cial Intelligence. CN108727544A;
CN108727544-B.
(91) Chhabra, S.; Xie, J.; Frank, A. T. Rnaposers: Machine Learning
Classifiers for Ribonucleic Acid-Ligand Poses. J. Phys. Chem. B 2020,
124, 44364445.
(92) Shamsara, J.; Schu
u
rmann, G. A Machine Learning Approach
to Discriminate Mr1 Binders: The Importance of the Phenol and
Carbonyl Fragments. J. Mol. Struct. 2020, 1217, 128459.
(93) Ji, B.-Y.; You, Z.-H.; Jiang, H.-J.; Guo, Z.-H.; Zheng, K.
Prediction of Drug-Target Interactions from Multi-Molecular Net-
work Based on Line network Representation Method. J. Transl. Med.
2020, 18, 347.
(94) Yuan, Y.; Chang, S.; Zhang, Z.; Li, Z.; Li, S.; Xie, P.; Yau, W.-P.;
Lin, H.; Cai, W.; Zhang, Y.; Xiang, X. A Novel Strategy for Prediction
of Human Plasma Prote in Binding Us ing Machin e Learning
Techniques. Chemom. Intell. Lab. Syst. 2020, 199, 103962.
(95) Aniceto, N.; Freitas, A. A.; Bender, A.; Ghafourian, T.
Simultaneous Prediction of Four Atp-Binding Cassette Transporters
Substrates Using Multi-Label Qsar. Mol. Inf. 2016, 35, 514528.
(96) Zhao, Y.; Zheng, K.; Guan, B.; Guo, M.; Song, L.; Gao, J.; Qu,
H.; Wang, Y.; Shi, D.; Zhang, Y. Dldti: A Learning-Based Framework
for Drug-Target Interaction Identification Using Neural Networks
and Network Representation. J. Transl. Med. 2020, 18, 434.
(97) Hochuli, J.; Helbling, A.; Skaist, T.; Ragoza, M.; Koes, D. R.
Visualizing Convolutional Neural Network Protein-Ligand Scoring. J.
Mol. Graphics Modell. 2018,
84,96108.
(98) Erdas, O.; Andac, C. A.; Gurkan-Alp, A. S.; Alpaslan, F. N.;
Buyukbingol, E. Compressed Images for Affinity Prediction-2 (Cifap-
2): An Improved Machine Learning Methodology on Protein-Ligand
Interactions Based on a Study on Caspase 3 Inhibitors. J. Enzyme
Inhib. Med. Chem. 2015, 30, 80915.
(99) Fan, J.; Liu, K.; Xiangyan, S. Computational Method for
Classifying and Predicting Ligand Docking Conformations.
WO2018213767A1; US2018341754-A1; EP3427170-A1;
EP3427170-A4.
(100) Feinberg, E. N.; Pande, V. S. Machine Learning and Molecular
Simulation Based Methods for Enhancing Binding and Activity
Prediction. US2019272887A1; WO2019173407-A1; CA3093260-A1;
AU2019231261-A1; KR2020128710-A; EP3762730-A1;
CN112204402-A.
(101) Mamoshina, P.; Volosnikova, M.; Ozerov, I. V.; Putin, E.;
Skibina, E.; Cortese, F.; Zhavoronkov, A. Machine Learning on
Human Muscle Transcriptomic Data for Biomarker Discovery and
Tissue- Specific Drug Target Identification. Front. Genet. 2018,
DOI: 10.3389/fgene.2018.00242.
(102) Han, M.; Liu, Q.; Yu, J.; Zheng, S. Identification of Candidate
Molecular Markers Predicting Chemotherapy Resistance in Non-
Small Cell Lung Cancer. Clin. Chem. Lab. Med. 2010, 48, 863867.
(103) Thishya, K.; Vattam, K. K.; Naushad, S. M.; Raju, S. B.;
Kutala, V. K. Artificial Neural Network Model for Predicting the
Bioavailability of Tacrolimus in Patients with Renal Transplantation.
PLoS One 2018, 13, No. e0191921.
(104) Liu, Q.; Muglia, L. J.; Huang, L. F. Network as a Biomarker: A
Novel Network-Based Sparse Bayesian Machine for Pathway-Driven
Drug Response Prediction. Genes 2019, 10, 602.
(105) Miyoshi, F.; Honne, K.; Minota, S.; Okada, M.; Ogawa, N.;
Mimura, T. A Novel Method Predicting Clinical Response Using
Only Background Clinical Data in Ra Patients before Treatment with
Infliximab. Mod. Rheumatol. 2016, 26, 813816.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
3210
(106) Khojasteh, M.; Martin, J.; Pestic-Dragovich, L.; Tang, L.;
Wang, X.; Zhang, W.; Anders, R.; Diaz, L. Methods and Systems for
Predicting Response to Pd-1 Axis Directed Therapeutics of Tumors.
WO 2020072348 A1.
(107) Yang, H.-Y. Prediction of Pneumoconiosis by Serum and
Urinary Biomarkers in Workers Exposed to Asbestos-Contaminated
Minerals. PLoS One 2019, 14, No. e0214808.
(108) Schrey, A. K.; Nickel-Seeber, J.; Drwal, M. N.; Zwicker, P.;
Schultze, N.; Haertel, B.; Preissner, R. Computational Prediction of
Immune Cell Cytotoxicity. Food Chem. Toxicol. 2017, 107, 150166.
(109) Lee, J. J.; Miller, J. A.; Basu, S.; Kee, T. V.; Loo, L. H. Building
Predictive in Vitro Pulmonary Toxicity Assays Using High-
Throughput Imaging and Artificial Intelligence. Arch. Toxicol. 2018,
92, 20552075.
(110) Hamadache, M.; Hanini, S.; Benkortbi, O.; Amrane, A.;
Khaouane,L.;Moussa,C.S.ArtificialNeuralNetwork-Based
Equation to Predict the Toxicity of Herbicides on Rats. Chemom.
Intell. Lab. Syst. 2016, 154,715.
(111) Noskov, S.; Wacker, S.; Du, H.; Guo, J. Systems and
Methods for Predicting Cardiotoxicity of Molecular Parameters of a
Compound Based on Machine Learning Algorithms.
WO2016201575A1; US2018172667-A1.
(112) Lee, F. K.; Friesth, K. L. Quintuple-Eect Generation Multi-
Cycle Hybrid Renewable Energy System with Integrated Energy
Provisioning, Storage Facilities and Amalgamated Control System
Cross-Reference to Related Applications. US 2015143806 A1;
AU2015203118-A1; EP2955372-A2; JP2016000995-A; CA2891435-
A1; CN105257425-A; EP2955372-A3; HK1218148-A0;
US10060296-B2; BR102015013592-A2.
(113) Rangarajan, K.; Winston, J. B.; Jain, A.; Wang, X.; Jian, A.;
Rangarajan, K. P. Integrated Surveillance and Control of Oileld
Activity. FR3070178A1; WO2019040125-A1; AU2018319552-A1;
CA3065094-A1; NO201901443-A; US2020182036-A1; GB2579739-
A.
(114) Tang, J.; Hou, J. Intelligent Ammonia Injection Control
Method and Intelligent Ammonia Injection Controller.
CN111804146A.
(115) Shen, D.; Liu, G.; Wang, Q.; Luo, K. Accurate Ammonia
Injection Control Method for Scr System with Strong Self-Adaptive
Ability. CN109046021A; CN109046021-B.
(116) Liu, X.; Lin, P.; Hu, G.; Wu, K. A Kind of Scr Downstream
Nox Closed-Loop Process Control Method and System.
CN109339916A; WO2020062865-A1; CN109339916-B.
(117) Guo, L.; Zou, S.; Liu, W.; Wu, Q.; Zhou, W.; Zhao, X.; Yao,
T.; Xu, Q. Oil and Gas Gathering and Transportation Riser System
Harmful Flow Type Warning Method and System, Control Method
and System. CN109458561A; WO2020082749-A1.
(118) Li, Y.; Dong, S.; Wu, Q.; Zhang, S. Flue Gas Desulfurization
Device with Flue Gas Monitoring and Control Function.
CN110756039 A; CN211753913-U.
(119) Meng, L.; Gu, X.; Ma, W.; Ning, X.; Jiang, C.; Li, Y.; Jia, Y. Scr
Denitration Ammonia-Spraying Optimization Method and System
BasedonAdvancedMeasuringMeterandAdvancedControl
Algorithm. CN108837698A.
(120) Kim, K. S. Control System of Dual-Fuel Engine.
US2019093572A1; US10260432-B1.
(121) Brummel, H.; Pfeifer, U.; Sterzing, V. Method and Assembly
for Controlling a Combustion Engine with Multiple Burners.
EP3726139A1; WO2020212067-A1.
(122) Eun, L. Device and Method for Eciently Controlling Fuel
Additive Injector of Engine. KR2180985B1.
(123) Chen, S. K.; Mandal, A.; Chien, L.; Ortiz-Soto, E. Machine
Learning for Misre Detection in a Dynamic Firing Level Modulation
Controlled Engine of a Vehicle. US2019145859A1; WO2019099228-
A1; US10816438-B2.
(124) Denys, F.; Leroy, T.; Ngo, C.; Rudlo, J. Device and Method
for Control of a Vehicle Thermal Engine. FR3085442A1.
(125) Streib, M.; Luo, L.; Klinkhammer, T.; Leuz, M.; Kluth, C.;
Polach, S.; Pollach, S. Method for Controlling Engine Knock of an
Internal Combustion Engine. DE102015208359A1; WO2016177531-
A1; DE102015208359-B4; CN1076241 44-A; US2018112631-A1;
US10451009-B2; CN107624144-B.
(126) Senior, A. W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.;
Green, T.; Qin, C. L.; Zidek, A.; Nelson, A. W. R.; Bridgland, A.;
Penedones, H.; Petersen, S.; Simonyan, K.; Crossan, S.; Kohli, P.;
Jones, D. T.; Silver, D.; Kavukcuoglu, K.; Hassabis, D. Improved
Protein Structure Prediction Using Potentials from Deep Learning.
Nature 2020, 577, 706710.
(127) Stokes, J. M.; Yang, K.; Swanson, K.; Jin, W. G.; Cubillos-Ruiz,
A.; Donghia, N. M.; MacNair, C. R.; French, S.; Carfrae, L. A.;
Bloom-Ackermann, Z.; Tran, V. M.; Chiappino-Pepe, A.; Badran, A.
H.; Andrews, I. W.; Chory, E. J.; Church, G. M.; Brown, E. D.;
Jaakkola, T. S.; Barzilay, R.; Collins, J. J. A Deep Learning Approach
to Antibiotic Discovery. Cell 2020, 180, 688702.
(128) Almagro Armenteros, J. J.; Tsirigos, K. D.; Sonderby, C. K.;
Petersen, T. N.; Winther, O.; Brunak, S.; von Heijne, G.; Nielsen, H.
Signalp 5.0 Improves Signal Peptide Predictions Using Deep Neural
Networks. Nat. Biotechnol. 2019, 37, 420423.
(129) Newman, A. M.; Steen, C. B.; Liu, C. L.; Gentles, A. J.;
Chaudhuri, A. A.; Scherer, F.; Khodadoust, M. S.; Esfahani, M. S.;
Luca, B. A.; Steiner, D.; Diehn, M.; Alizadeh, A. A. Determining Cell
Type Abundance and Expression from Bulk Tissues with Digital
Cytometry. Nat. Biotechnol. 2019, 37, 773782.
(130) Jaganathan, K.; Kyriazopoulou Panagiotopoulou, S.; McRae, J.
F.; Darbandi, S. F.; Knowles, D.; Li, Y. I.; Kosmicki, J. A.; Arbelaez, J.;
Cui, W. W.; Schwartz, G. B.; Chow, E. D.; Kanterakis, E.; Gao, H.;
Kia, A.; Batzoglou, S.; Sanders, S. J.; Farh, K. K. H. Predicting Splicing
from Primary Sequence with Deep Learning. Cell 2019, 176, 535
548.
(131) van Galen, P.; Hovestadt, V.; Wadsworth, M. H.; Hughes, T.
K.; Griffin, G. K.; Battaglia, S.; Verga, J. A.; Stephansky, J.; Pastika, T.
J.; Lombardi Story, J.; Pinkus, G. S.; Pozdnyakova, O.; Galinsky, I.;
Stone, R. M.; Graubert, T. A.; Shalek, A. K.; Aster, J. C.; Lane, A. A.;
Bernstein, B. E. Single-Cell Rna-Seq Reveals Aml Hierarchies
Relevant to Disease Progression and Immunity. Cell 2019, 176,
12651281.
(132) Capper, D.; Jones, D. T. W.; Sill, M.; Hovestadt, V.; Schrimpf,
D.; Sturm, D.; Koelsche, C.; Sahm, F.; Chavez, L.; Reuss, D. E.; Kratz,
A.; Wefers, A. K.; Huang, K.; Pajtler, K. W.; Schweizer, L.; Stichel, D.;
Olar, A.; Engel, N. W.; Lindenberg, K.; Harter, P. N.; Braczynski, A.
K.; Plate, K. H.; Dohmen, H.; Garvalov, B. K.; Coras, R.; Holsken, A.;
Hewer, E.; Bewerunge-Hudler, M.; Schick, M.; Fischer, R.;
Beschorner, R.; Schittenhelm, J.; Staszewski, O.; Wani, K.; Varlet,
P.; Pages, M.; Temming, P.; Lohmann, D.; Selt, F.; Witt, H.; Milde,
T.; Witt, O.; Aronica, E.; Giangaspero, F.; Rushing, E.; Scheurlen, W.;
Geisenberger, C.; Rodriguez, F. J.; Becker, A.; Preusser, M.; Haberler,
C.; Bjerkvig, R.; Cryan, J.; Farrell, M.; Deckert, M.; Hench, J.; Frank,
S.; Serrano, J.; Kannan, K.; Tsirigos, A.; Bruck, W.; Hofer, S.;
Brehmer, S.; Seiz-Rosenhagen, M.; Hanggi, D.; Hans, V.; Rozsnoki,
S.; Hansford, J. R.; Kohlhof, P.; Kristensen, B. W.; Lechner, M.;
Lopes, B.; Mawrin, C.; Ketter, R.; Kulozik, A.; Khatib, Z.; Heppner,
F.; Koch, A.; Jouvet, A.; Keohane, C.; Muhleisen, H.; Mueller, W.;
Pohl, U.; Prinz, M.; Benner, A.; Zapatka, M.; Gottardo, N. G.;
Driever, P. H.; Kramm, C. M.; Muller, H. L.; Rutkowski, S.; von Hoff,
K.; Fruhwald, M. C.; Gnekow, A.; Fleischhack, G.; Tippelt, S.;
Calaminus, G.; Monoranu, C.-M.; Perry, A.; Jones, C.; Jacques, T. S.;
Radlwimmer, B.; Gessi, M.; Pietsch, T.; Schramm, J.; Schackert, G.;
Westphal, M.; Reifenberger, G.; Wesseling, P.; Weller, M.; Collins, V.
P.; Blumcke, I.; Bendszus, M.; Debus, J.; Huang, A.; Jabado, N.;
Northcott, P. A.; Paulus, W.; Gajjar, A.; Robinson, G. W.; Taylor, M.
D.; Jaunmuktane, Z.; Ryzhova, M.; Platten, M.; Unterberg, A.; Wick,
W.; Karajannis, M. A.; Mittelbronn, M.; Acker, T.; Hartmann, C.;
Aldape, K.; Schuller, U.; Buslei, R.; Lichter, P.; Kool, M.; Herold-
Mende, C.; Ellison, D. W.; Hasselblatt, M.; Snuderl, M.; Brandner, S.;
Korshunov, A.; von Deimling, A.; Pfister, S. M. DNA Methylation-
Based Classification of Central Nervous System Tumours. Nature
2018, 555, 469474.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
3211
(133) Segler, M. H. S.; Kogej, T.; Tyrchan, C.; Waller, M. P.
Generating Focused Molecule Libraries for Drug Discovery with
Recurrent Neural Networks. ACS Cent. Sci. 2018, 4, 120131.
(134) Jespersen, M. C.; Peters, B.; Nielsen, M.; Marcatili, P.
Bepipred-2.0: Improving Sequence-Based B-Cell Epitope Prediction
Using Conformational Epitopes. Nucleic Acids Res. 2017, 45, W24
W29.
(135) Iorio, F.; Knijnenburg, T. A.; Vis, D. J.; Bignell, G. R.;
Menden, M. P.; Schubert, M.; Aben, N.; Goncalves, E.; Barthorpe, S.;
Lightfoot, H.; Cokelaer, T.; Greninger, P.; van Dyk, E.; Chang, H.; de
Silva, H.; Heyn, H.; Deng, X. M.; Egan, R. K.; Liu, Q. S.; Mironenko,
T.; Mitropoulos, X.; Richardson, L.; Wang, J. H.; Zhang, T. H.;
Moran, S.; Sayols, S.; Soleimani, M.; Tamborero, D.; Lopez-Bigas, N.;
Ross-Macdonald, P.; Esteller, M.; Gray, N. S.; Haber, D. A.; Stratton,
M. R.; Benes, C. H.; Wessels, L. F. A.; Saez-Rodriguez, J.; McDermott,
U.; Garnett, M. J. A Landscape of Pharmacogenomic Interactions in
Cancer. Cell 2016, 166, 740754.
(136) Zeevi, D.; Korem, T.; Zmora, N.; Israeli, D.; Rothschild, D.;
Weinberger, A.; Ben-Yacov, O.; Lador, D.; Avnit-Sagi, T.; Lotan-
Pompan, M.; Suez, J.; Mahdi, J. A.; Matot, E.; Malka, G.; Kosower, N.;
Rein, M.; Zilberman-Schapira, G.; Dohnalova, L.; Pevsner-Fischer,
M.; Bikovsky, R.; Halpern, Z.; Elinav, E.; Segal, E. Personalized
Nutrition by Prediction of Glycemic Responses. Cell 2015, 163,
10791094.
(137) Kircher, M.; Witten, D. M.; Jain, P.; ORoak, B. J.; Cooper, G.
M.; Shendure, J. A General Framework for Estimating the Relative
Pathogenicity of Human Genetic Variants. Nat. Genet. 2014, 46, 310
315.
(138) Subramanian, S.; Huq, S.; Yatsunenko, T.; Haque, R.; Mahfuz,
M.; Alam, M. A.; Benezra, A.; DeStefano, J.; Meier, M. F.; Muegge, B.
D.; Barratt, M. J.; VanArendonk, L. G.; Zhang, Q. Y.; Province, M. A.;
Petri, W. A.; Ahmed, T.; Gordon, J. I. Persistent Gut Microbiota
Immaturity in Malnourished Bangladeshi Children. Nature 2014, 510,
417421.
(139) Zhong, M.; Tran, K.; Min, Y. M.; Wang, C. H.; Wang, Z. Y.;
Dinh, C. T.; De Luna, P.; Yu, Z. Q.; Rasouli, A. S.; Brodersen, P.; Sun,
S.; Voznyy, O.; Tan, C. S.; Askerka, M.; Che, F. L.; Liu, M.;
Seifitokaldani, A.; Pang, Y. J.; Lo, S. C.; Ip, A.; Ulissi, Z.; Sargent, E.
H. Accelerated Discovery of Co2 Electrocatalysts Using Active
Machine Learning. Nature 2020, 581, 178183.
(140) Manipatruni, S.; Nikonov, D. E.; Lin, C. C.; Gosavi, T. A.; Liu,
H. C.; Prasad, B.; Huang, Y. L.; Bonturim, E.; Ramesh, R.; Young, I.
A. Scalable Energy-Efficient Magnetoelectric Spin-Orbit Logic. Nature
2019, 565,3542.
(141) Zhu, X. J.; Li, D.; Liang, X. G.; Lu, W. D. Ionic Modulation
and Ionic Coupling Effects in Mos2 Devices for Neuromorphic
Computing. Nat. Mater. 2019, 18, 141
148.
(142) Bai, Y.; Wilbraham, L.; Slater, B. J.; Zwijnenburg, M. A.;
Sprick, R. S.; Cooper, A. I. Accelerated Discovery of Organic Polymer
Photocatalysts for Hydrogen Evolution from Water through the
Integration of Experiment and Theory. J. Am. Chem. Soc. 2019, 141,
90639071.
(143) Xie, T.; Grossman, J. C. Crystal Graph Convolutional Neural
Networks for an Accurate and Interpretable Prediction of Material
Properties. Phys. Re v. Lett. 2018, DOI: 10.1103/PhysRev-
Lett.120.145301.
(144) van de Burgt, Y.; Lubberman, E.; Fuller, E. J.; Keene, S. T.;
Faria, G. C.; Agarwal, S.; Marinella, M. J.; Talin, A. A.; Salleo, A. A
Non-Volatile Organic Electrochemical Device as a Low-Voltage
Artificial Synapse for Neuromorphic Computing. Nat. Mater. 2017,
16, 414418.
(145) Gomez-Bombarelli, R.; Aguilera-Iparraguirre, J.; Hirzel, T. D.;
Duvenaud, D.; Maclaurin, D.; Blood-Forsythe, M. A.; Chae, H. S.;
Einzinger, M.; Ha, D. G.; Wu, T.; Markopoulos, G.; Jeon, S.; Kang,
H.; Miyazaki, H.; Numata, M.; Kim, S.; Huang, W. L.; Hong, S. I.;
Baldo, M.; Adams, R. P.; Aspuru-Guzik, A. Design of Efficient
Molecular Organic Light-Emitting Diodes by a High-Throughput
Virtual Screening and Experimental Approach. Nat. Mater. 2016, 15,
11201127.
(146) Raccuglia, P.; Elbert, K. C.; Adler, P. D. F.; Falk, C.; Wenny,
M. B.; Mollo, A.; Zeller, M.; Friedler, S. A.; Schrier, J.; Norquist, A. J.
Machine-Learning-Assisted Materials Discovery Using Failed Experi-
ments. Nature 2016, 533,7376.
(147)Podgorski,J.;Berg,M.GlobalThreatofArsenicin
Groundwater. Science 2020, 368, 845850.
(148) Wang, H. D.; Rivenson, Y.; Jin, Y. Y.; Wei, Z. S.; Gao, R.;
Gunaydin, H.; Bentolila, L. A.; Kural, C.; Ozcan, A. Deep Learning
Enables Cross-Modality Super-Resolution in Fluorescence Micros-
copy. Nat. Methods 2019, 16, 103110.
(149) Arganda-Carreras, I.; Kaynig, V.; Rueden, C.; Eliceiri, K. W.;
Schindelin,J.;Cardona,A.;Seung,H.S.TrainableWeka
Segmentation: A Machine Learning Tool for Microscopy Pixel
Classification. Bioinformatics 2017, 33, 24242426.
(150) Tyanova, S.; Temu, T.; Sinitcyn, P.; Carlson, A.; Hein, M. Y.;
Geiger, T.; Mann, M.; Cox, J. The Perseus Computational Platform
for Comprehensive Analysis of (Prote)Omics Data. Nat. Methods
2016, 13, 731740.
(151) Coley, C. W.; Thomas, D. A.; Lummiss, J. A. M.; Jaworski, J.
N.; Breen, C. P.; Schultz, V.; Hart, T.; Fishman, J. S.; Rogers, L.; Gao,
H. Y.; Hicklin, R. W.; Plehiers, P. P.; Byington, J.; Piotti, J. S.; Green,
W. H.; Hart, A. J.; Jamison, T. F.; Jensen, K. F. A Robotic Platform for
Flow Synthesis of Organic Compounds Informed by Ai Planning.
Science 2019,
365, eaax1566.
(152) Segler, M. H. S.; Preuss, M.; Waller, M. P. Planning Chemical
Syntheses with Deep Neural Networks and Symbolic Ai. Nature 2018,
555, 604610.
(153) Ahneman, D. T.; Estrada, J. G.; Lin, S. S.; Dreher, S. D.;
Doyle, A. G. Predicting Reaction Performance in C-N Cross-Coupling
Using Machine Learning. Science 2018, 360, 186190.
(154) Unke, O. T.; Meuwly, M. Physnet: A Neural Network for
Predicting Energies, Forces, Dipole Moments, and Partial Charges. J.
Chem. Theory Comput. 2019, 15, 36783693.
(155) Chmiela, S.; Tkatchenko, A.; Sauceda, H. E.; Poltavsky, I.;
Schutt, K. T.; Muller, K. R. Machine Learning of Accurate Energy-
Conserving Molecular Force F ields. Science Advances 2017, 3,
e1603015.
(156) Carleo, G.; Troyer, M. Solving the Quantum Many-Body
Problem with Artificial Neural Networks. Science 2017, 355, 602605.
(157) Schutt, K. T.; Arbabzadah, F.; Chmiela, S.; Muller, K. R.;
Tkatchenko, A. Quantum-Chemical Insights from Deep Tensor
Neural Networks. Nat. Commun. 2017, DOI: 10.1038/ncomms13890.
(158) Faber, F. A.; Hutchison, L.; Huang, B.; Gilmer, J.; Schoenholz,
S. S.; Dahl, G. E.; Vinyals, O.; Kearnes, S.; Riley, P. F.; von Lilienfeld,
O. A. Prediction Errors of Molecular Machine Learning Models
Lower Than Hybrid Dft Error. J. Chem. Theory Comput. 2017, 13,
52555264.
(159) Ramakrishnan, R.; Dral, P. O.; Rupp, M.; von Lilienfeld, O. A.
Big Data Meets Quantum Chemistry Approximations: The Delta-
Machine Learning Approach. J. Chem. Theory Comput. 2015, 11,
20872096.
(160) European Patent Oce. Patent Families. https://www.epo.
org/searching-for-patents/he lpful-reso urces/ rst-time-here/pat ent-
families.html (accessed June 15, 2021).
(161) Gartner Glossary. https://www.gartner.com/en/information-
technology/glossary/hype-cycle (accessed June 15, 2021).
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 31973212
3212

Preview text:

pubs.acs.org/jcim Review
Artificial Intelligence in Chemistry: Current Trends and Future Directions
Zachary J. Baum,* Xiang Yu, Philippe Y. Ayala, Yanan Zhao, Steven P. Watkins, and Qiongqiong Zhou
Cite This: J. Chem. Inf. Model. 2021, 61, 3197−3212 Read Online ACCESS Metrics & More Article Recommendations * sı Supporting Information
ABSTRACT: The application of artificial intelligence (AI) to
chemistry has grown tremendously in recent years. In this Review,
we studied the growth and distribution of AI-related chemistry
publications in the last two decades using the CAS Content
Collection. The volume of both journal and patent publications have
increased dramatically, especially since 2015. Study of the
distribution of publications over various chemistry research areas
revealed that analytical chemistry and biochemistry are integrating AI
to the greatest extent and with the highest growth rates. We also
investigated trends in interdisciplinary research and identified
frequently occurring combinations of research areas in publications.
Furthermore, topic analyses were conducted for journal and patent
publications to illustrate emerging associations of AI with certain
chemistry research topics. Notable publications in various chemistry
disciplines were then evaluated and presented to highlight emerging
use cases. Finally, the occurrence of different classes of substances
and their roles in AI-related chemistry research were quantified,
further detailing the popularity of AI adoption in the life sciences and analytical chemistry. In summary, this Review offers a broad
overview of how AI has progressed in various fields of chemistry and aims to provide an understanding of its future directions.
KEYWORDS: artificial intelligence, CAS Content Collection, analytical chemistry, biochemistry ■ INTRODUCTION
recent years due to explosive growth in computing power,
Artificial intelligence (AI) refers to the ability of machines to
open-source machine-learning frameworks, and increasing data
Downloaded via 1.52.5.101 on December 25, 2025 at 04:45:36 (UTC).
literacy among chemists.1−9 AI implementations have proven
act in seemingly intelligent ways, making decisions in response
to dramatically reduce design and experimental effort by
to new inputs without being explicitly programmed to do so.
enabling laboratory automation,10 predicting bioactivities of
Whereas typical computer programs generate outputs accord-
new drugs,11−13 optimizing reaction conditions,14 and
ing to explicit sets of instructions, AI systems are designed to
suggesting synthetic routes to complex target molecules.15
See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
use data-driven models to make predictions. These AI models
Although significant publicity has been given to AI and its
are generally first trained on representative data sets with
application in chemistry, perspective on its use and develop-
known output values, thereby “learning” input−output
ment in chemistry is not obvious from the massive volume of
relationships. The resulting trained models can then be used
available information. This Review uses the CAS Content
to predict output values of data similar to the training set or to
Collection to contextualize the current AI landscape,
generate new data. Many problems involving data with
classifying and quantifying chemistry publications related to
complex input−output relationships are difficult or impractical
AI from the years 2000−2020. The CAS Content Collection
to model procedurally, thus creating an opportunity for AI.
covers publications in 50 000 scientific journals from around
AI can feasibly be applied to various tasks in the field of
the world in a wide range of disciplines, 62 patent authorities,
chemistry, where complex relationships are often present in
data sets. For example, the solubility of a new compound may
be predicted either through equations based on empirical data Received: June 1, 2021
or by using theoretical calculations. Alternatively, prediction of Published: July 15, 2021
solubility may also be accomplished by an AI program that has
developed structure−solubility relationships after being trained
on numerous compounds with known solubilities. The use of
AI for tasks, such as property prediction have proliferated in
© 2021 The Authors. Published by American Chemical Society
https://doi.org/10.1021/acs.jcim.1c00619 3197
J. Chem. Inf. Model. 2021, 61, 3197−3212
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
Figure 1. Annual publication volume in AI-related chemistry from 2000 to 2020: (A) Journal publications, (B) patent publications, and (C) ACS National Meeting abstracts.
and 2 defensive publications (Research Disclosures and IP.
CAS Content Collection was searched to identify AI-related
com).16 There are more than 1000 global scientists specialized
publications from 2000 to 2020 based on various AI terms in
in various scientific domains curating, analyzing, and
their title, keywords, abstract text, and CAS expert-curated
connecting data from published sources at CAS. The CAS
concepts. The search query required screening of each term to
Content Collection, as one of the largest collections of
minimize false positives due to polysemy; a maximum of a 2%
scientific databases in the world, has many unique features and
false positive rate was allowed for each OR-delimited phrase, as
annotations added during data curation. Expert-curated CAS
determined by random screenings of 50−100 documents
content is suitable for quantitative analysis of publications
performed by CAS experts. In addition, matches on particularly
against variables, such as time, country, research area, and
problematic phrases, such as “brain” and “nerve” were excluded
substance details. We first examine the growth and distribution
from consideration. The resulting search string is provided in
of AI-related publications in chemistry, which includes the
the Supporting Information. From this search, roughly 70 000
annual growth of publication volume and the distribution of
journal publications and 17 500 patents from the CAS Content
publications among countries, organizations, and research
Collection were identified to be related to AI. Figure 1A and
areas, followed by a topic analysis revealing the evolution of
1B shows the volume of these publications and their volume
frequently used concepts related to AI in chemistry. We then
normalized by the overall number of journal publications or
provide lists of notable AI-related journal and patent
patents by year, respectively. Indeed, the numbers of both
publications in a variety of research areas. Finally, we look at
journal and patent publications increased with time, showing
the types of chemical substances most frequently involved in
similar rapidly growing trends after 2015. This growth stems in
the AI-related literature, highlighting the distribution of AI-
part from the high-profile successes of deep learning projects in
related publications among various classes of substances and
public data challenges starting around 2012, such as the Merck
their roles. We hope that this Review can serve as a useful
Molecular Activity Challenge17 and the ImageNet competi-
resource for those who would like to understand global trends
tion,18 which increasingly drew research interest from the
in AI-oriented research efforts in chemistry. ■
scientific community. Additionally, the introduction of open-
source machine learning frameworks, such as TensorFlow
GROWTH AND DISTRIBUTION OF PUBLICATION
(2015) and PyTorch (2016), and the availability of VOLUME IN AI-RELATED CHEMISTRY
increasingly powerful computing hardware sparked a global
Volume of Publications by Year. With the rapid growth
explosion in AI research, enabling further applications of AI to
in global research activity, scientific publication volume has
chemistry. In fact, as of 2020, over 50% of the documents on
steadily increased over the past 20 years. A quantitative
AI in chemistry were published during the past 4 years.
analysis helps to understand just how fast chemistry
Another way to measure recent scientific research trends is by
publications using artificial intelligence are increasing relative
examining scientific meeting abstracts. For this purpose, the
to the increase in total chemistry publications. To this end, the
abstracts from ACS National Meetings were analyzed for the 3198
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 3197−3212
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
Figure 2. Distribution of AI-related publications by country/region and company from 2000 to 2020. (A) Top 20 countries/regions in number of
journal publications. (B) Top 20 countries/regions in number of patent publications. (C) Top 20 companies in number of patent publications.
Figure 3. Publication trends of AI in specific research areas from 2000 to 2020: (A) journal publications and (B) patent publications.
presence of AI topics, and the number of AI-related abstracts
large portion of the commercial patent assignees for AI
per year and its amount relative to the total number of yearly
chemical research. These companies rely on AI for automation,
abstracts are shown in Figure 1C. The abstract publications
control, and optimization of a variety of processes, such as
show similar behavior to the trends in journal and patent
semiconductor device fabrication and biomarker screening,
publication. These analyses suggest that not only has there
which will be explored in more detail in the following sections.
been an absolute increase of research effort toward AI in
chemistry but also that the proportion of AI-related research is
■ DISTRIBUTIONOFAI-RELATEDCHEMISTRY increasing. PUBLICATIONS BY RESEARCH AREA
Distribution of AI-Related Publications by Country/
Trends of Publications in Specific Research Areas. To
Region and Company. The countries/regions and organ-
have a closer look at how AI is involved in different chemistry-
izations of origin for AI-related chemistry documents were
related research areas, the roughly 70 000 journal and 17 500
then extracted to determine their distributions. Figure 2A and
patent publications were further classified into the following 12
2B shows the percentages of AI-related journal articles and
categories by CAS experts: Analytical Chemistry, Biochemistry,
patents produced in selected countries/regions and by selected
Energy Technology and Environmental Chemistry, Food and
organizations in the years 2000−2020, respectively, with the
Agriculture, Industrial Chemistry and Chemical Engineering,
top commercial patent assignees listed in Figure 2C. China and
Inorganic Chemistry, Materials Science, Natural Products,
the United States contributed the largest numbers of
Organic Chemistry, Physical Chemistry, Synthetic Polymers,
publications for both journal articles and patents. Medical
and Pharmacology, Toxicology and Pharmaceuticals. The
diagnostic developers and technology companies make up a
numbers of AI-related publications in each area are normalized 3199
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 3197−3212
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
Figure 4. Relative prevalence of interdisciplinary studies in AI-related scientific publications: (A) journal publications and (B) patent publications.
Columns denote primary research areas, rows denote secondary research areas, and each square denotes an interdisciplinary pair of primary and secondary research areas.
to that area’s respective total yearly publication volume and
normalized to the total number of interdisciplinary documents
shown in Figure 3A (journal publications) and Figure 3B
containing each respective primary and secondary disciplines.
(patents). The absolute numbers of journal publications in
In Figure 4, several relationships are apparent among
each area are shown in Figure S1. Among all these specific
chemical disciplines. In journal articles, the strongest
chemistry-related areas, documents in Analytical Chemistry
correlations are observed between primary and secondary
(both journal and patent publications) have the highest
research areas in Analytical Chemistry and Biochemistry, in
normalized volume in the most recent 10 years; it has also
Materials Science and Physical Chemistry, and in Biochemistry
risen steeply in the last 5 years. Energy Technology and
with applications to Pharmacology, Toxicology and Pharma-
Environmental Chemistry and Industrial Chemistry and
ceuticals (Figure 4A). In patents (Figure 4B), the trend is
Chemical Engineering are the two research areas ranked in
similar, but inventions in Energy Technology and Environ-
the second tier in terms of proportion of research volume and
mental Chemistry related to Industrial Chemistry and
momentum in journal publications (Figure 3A). Interestingly,
Chemical Engineering also show prominently. For example,
while Biochemistry is among the fields most represented in AI-
journal documents using analytical chemistry techniques such
related patent publications, its proportion in journal
as mass spectrometry and nuclear magnetic resonance,
publications is relatively moderate when compared to other
infrared, and Raman spectroscopies are augmented with
research areas. This indicates a strong desire or incentive to
machine learning for use in medical diagnostics,19−28 studies
patent AI technologies in biochemistry, possibly because of its
of metabolomics,29−36 and microbial identification,37−40 while
use in drug research and development.
biochemistry-related analytical chemistry patents concentrate
Relative Prevalence of Interdisciplinary Research in
on the development of analytical devices and methods for use
Specific Areas. Innovations in science and technology are
in similar studies.41−53 AI-related journal documents with
often made by finding connections between multiple research
interest in Materials Science and Physical Chemistry discuss
areas to derive novel insights, methods, and products. A
topics, such as the evaluation of structure−property relation-
theoretical method developed for molecular dynamics, for
ships in materials by augmenting first-principles calculations
example, may be applied to study the interaction of a ligand
with machine learning models,54−60 using data from high-
with a protein, which can in turn be used to predict the activity
throughput experimentation to optimize the properties of
of a drug. Conversely, data collected from experimental
functional materials,61−68 and the use of published data to
measurements can be used to optimize parameters of
enable the discovery of new materials.69−74 In patents, the
theoretical simulations. With such continuous conversations,
combination of AI, Materials Science, and Physical Chemistry
fields not traditionally associated with each other can be
is used in methods for improving semiconductor device
mutually informative. Interdisciplinary effects such as these are
fabrication75−86 and polymer performance.87−90 Additionally,
also present in the AI-related chemical literature, which we
AI is being used in Biochemistry−Pharmacology Toxicology explore here in detail.
and Pharmaceuticals research to understand drug−biomole-
From the set of AI-related journal and patent publications,
cule interactions,91−100 apply biomarker data to the prediction
we identified approximately 15 000 and 3000 interdisciplinary
of drug activities,101−106 and model toxicity.107−111 Finally,
journal articles and patents, respectively. CAS analysts
patents in Energy Technology and Environmental Chemistry
determined the primary disciplines for each document and
related to Industrial Chemistry and Chemical Engineering are
any secondary disciplines, which also contributed to the work.
often using AI in control systems for fuel production112−119
The resulting combinations of primary and secondary
and engines.120−125 These examples demonstrate how AI can
disciplines are summarized in Figure 4, which are both
be applied in research areas where the relationships between 3200
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 3197−3212
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
Figure 5. Difference in proportion of total AI-related publications and control group (non-AI related) by interdisciplinary pair: (A) journal
publications and (B) patent publications.
Figure 6. Trends of co-occurrence in scientific publications for selected research topics and AI algorithms: (A) journal publications and (B) patent publications.
available data in separate domains are not obvious to
Science documents seen in Figure 4. This may be attributable researchers.
to the reliance of Materials Science on Physical Chemistry
While interdisciplinary relationships do appear in AI-related
principles and techniques−the incidence of publications at this
chemistry research, it is natural to question the extent to which
intersection is high even in the absence of AI. However, the
AI is indeed facilitating connections between fields. To answer
use of AI in interdisciplinary research is still maturing; a similar
this, we first selected random control groups of journal (n =
analysis was done with a more recent time window (2016−
81 601) and patent (n = 12 181) publications and identified
2020), where AI’s capability in bringing disciplines together
sets of interdisciplinary documents (n = 32 097 and n = 4426,
seems to be increasing (Figure S2).
respectively) using the same 12 research areas. In both the AI ■
and control groups, we then calculated the proportions of
EVOLUTION OF RESEARCH TOPICS IN
documents belonging to each primary−secondary discipline
AI-RELATED CHEMISTRY PUBLICATIONS
pair. By comparing the corresponding proportions in these two
Topic Analysis in Journal Article Publications. By
groups, the resulting difference maps (Figure 5) reveal how AI
analyzing the connections of CAS-indexed concepts over time,
is bringing disciplines together (positive values) and areas in
one can see when a research topic became potentially
which the use of AI is lagging (negative values). Notably, these
addressable using AI techniques. Figure S3 shows the most
maps show that interdisciplinary biochemical−analytical
frequently co-occurring concepts and the number of docu-
research is greatly facilitated by AI, and that despite recent
ments in which they co-occur (presented at a 97.5th percentile
advances, Physical Chemistry and Materials Science have used
cutoff for co-occurrence). For the years 2000−2004 (Figure
AI techniques less than other chemistry fields in the period
S3A), we see only a few concepts connected to the concepts
2000−2020. This lag is observed despite the relatively high
Neural network modeling and Algorithms. Several biochem-
proportion of interdisciplinary Physical Chemistry−Materials
istry-related concepts that appear in conjunction with AI 3201
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 3197−3212
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review force, and food genetic prediction lines mean fication number human of cell human compound therapeutic with classi copy epitope of field splice-altering cancer index potential B-cell method prediction, mutations, prediction, human random leukemia discovery residues, sequence, in level immaturity organisms fication annotation acid molecule drug (somatic all myeloid classi sequence-based glucose conditional transcript sensitivity amino across acute tumor microbiota server, blood drug integrated highlight with correlation, generation, alternations of web between with pre-mRNA prediction genotyping, depletion, combined molecule from oncogenic postprandial correlation distance halicin mutations and peptide RNA-sequencing drug methylation-based BepiPred-2.0 property-structure test, network rRNA, Pharmaceuticals prediction signal junctions DNA correlating hypermethylation) personalized 16S AlphaFold, network, neural single-cell pathogenic sequencing state and algorithm, network, splice empirical and structure learning, of RNA annotation-dependent learning, neural forest learning, learning, learning, network, neural recurrent protein screening, transfer consequence, alterations, intervention variants malnourished neural deep deep CIBERSORTx, prediction single-cell machine recurrent random machine machine combined machine Toxicology of year 2020 2020 2019 2019 2019 2019 2018 2018 2017 2016 2015 2014 2014 publication US Pharmacology, the UK US and at ̈t UK Israel and Germany Louis, Technology, US Seattle, Ltd., Center St. of Denmark, Hospital Denmark, Institute, Science, in of US of School, (KiTZ), of Cancer ’s Sanger organization Institute US General Wilhelms-Universita Washington, University Biochemistry Technologies University Medical Germany University Trust Institute of University, Inc., of Heidelberg Children ̈lische ̈nster, US Denmark Harvard NCT Mu Denmark Areas DeepMind Massachusetts Technical Stanford Illumina Massachusetts Hopp Westfa Technical Wellcome Weizmann University Washington the 127 134 in Using Deep 131 in Potentials 133 129 with Drug Epitopes 137 Discovery Central B-Cell Relative Expression for Glycemic Malnourished Using Immunity of in Predictions of the and Hierarchies and Networks Interactions Variants Cytometry Sequence Publications Antibiotic fication AML Libraries Prediction to Peptide Neural Conformational Genetic title Digital 132 Prediction Estimating Immaturity 128 Classi Sequence-Based Abundance Primary by for 138 126 Signal Progression Journal with Reveals Molecule Using from Structure Approach Type Tumors Human Recurrent Networks Pharmacogenomic of Microbiota Learning Cell Disease Improving of Nutrition Children Improves Tissues RNA-Seq Focused with 136 Protein Splicing to 130 System Prediction Framework Gut 135 Deep Learning 5.0 Neural Bulk AI-Related Methylation-Based from Deep Deep from Learning Relevant Nervous Discovery Epitope Landscape Cancer Responses General Pathogenicity Bangladeshi Improved A SignalP Determining Predicting Single-Cell DNA Generating BepiPred-2.0: A Personalized A Persistent Notable 1. Acids Central journal Biotechnology Biotechnology Science Research Genetics Table Nature Cell Nature Nature Cell Cell Nature ACS Nucleic Cell Cell Nature Nature 3202
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 3197−3212
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
concepts are proteins, protein sequences, and protein
longer turnover time in the patent application process conformation.
compared to scientific journal publication.
In the years 2005−2009 (Figure S3B), Homo sapiens
It is telling to examine the progression of the concept
become a more popular topic because of the increasing AI-
diagnosis with various AI concepts. While the growth of
related effort in disease diagnosis and prognosis, and related
documents associating diagnosis with the concept Neural
concepts, such as biomarkers, tumor markers, prognosis, and
network modeling is unsubstantial between 2000 and 2015, the
diagnosis start to appear during this period. Protein-associated
number of documents associating diagnosis with various AI
concepts, such as protein motifs, protein−protein interactions,
concepts increases rapidly after 2015, with the concepts of
secondary structure, and amino acids, became more prevalent
deep learning, random forest, and support vector machine
in AI-related documents, likely because of the use of AI in
seeing significant usage (Figure 6B). This pattern is consistent
solving high-resolution protein structures. Genetics-associated
across a variety of topics in chemistry, in which the increase in
concepts, such as sequence annotation and gene expression
usage of AI after 2015 is general rather than being limited to a
profile, were also indexed more frequently. Finally, high- single AI methodology.
throughput screening and proteomics were frequently used ■ concepts during this period.
NOTABLE AI-RELATED JOURNAL AND PATENT
In the years 2010−2014 (Figure S3C), genome-related PUBLICATIONS
concepts, such as genome and single nucleotide polymorphism,
To highlight the most influential journal publications using AI
were more often studied using AI methods. The application of
in chemistry, a bibliometric analysis was performed in the
AI to pharmaceutical and biomedical fields became more
primary literature from our search query since 2014.
common, as the concepts drug discovery, drug design, blood
Publications with over 100 citations were selected and further
analysis, neoplasm, and microRNA were frequently used. The
classified into groups of related research areas; then, they were
use of AI techniques for environmental remediation is
reviewed and selected based on apparent novelty: Biochemistry
evidenced by the occurrence of concepts, such as absorptive
and Pharmacology, Toxicology and Pharmaceuticals (Table 1),
wastewater treatment and Chemical oxygen demand in this
Materials Science (Table 2), and Analytical Chemistry, period.
Synthetic Chemistry, and Physical Chemistry (Table 3). The
In the years 2015−2019 (Figure S3D), the use of AI
US is the leading country of origin: 15 of the 34 papers in
becomes more prominent in research topics, such as DNA
Tables 1−3 are affiliated with US organizations. Other
methylation, mutation, nanofluids, heat transfer, and biodiesel
countries with significant numbers of important AI documents
fuel to solve problems in those research areas. AI also appeared
are Germany (6) and Switzerland (5). Among organizations,
frequently in publications related to cancer and Alzheimer’s
the Massachusetts Institute of Technology (US) and the
disease. Since the beginning of 2020, when the critical need for
University of Basel (Switzerland) were the two biggest
research into COVID-19 became apparent, AI has been used
contributors. Three commercial organizations, DeepMind
frequently in the areas of drug discovery, disease diagnosis, and
Technologies, Ltd. (UK), Illumina Inc. (US), and Intel disease tracking (Figure S4).
Corp. (US) contributed significantly.
Quantifying the co-occurrence of use case-specific concepts
Among these 34 journal papers, the most frequently indexed
with AI-related concepts over time further reveals the
concepts are Machine learning, Neural network, Deep learning,
progression of AI adoption. As Figure 6A shows, studies of
Density functional theory, and Random forest. In Biochemistry
QSAR (quantitative structure−activity relationships), a per-
and Pharmacology and Toxicology and Pharmaceuticals
ennial topic in drug discovery research, have employed Neural
(Table 1), many of the articles apply AI technology to
network models consistently for some time. On the other
research topics involving high-throughput drug screening,
hand, in Materials Science-related topics, such as thermal
nucleic acid sequence analysis, and protein structure
conductivity, the use of neural network modeling has grown
prediction. Publications in Materials Science research (Table
more slowly, with its use in publications not increasing rapidly
2) reported AI-driven structure−property relationship pre-
until the second half of the 2010s. The use of machine learning
dictions enabling the discovery of new functional materials as
in topics such as medical diagnosis and Density functional
well as memristors with applications in neuromorphic
theory has only recently begun to increase significantly.
computing. In Analytical Chemistry, Synthetic Chemistry and
Topic Analysis in Patent Publications. Frequently co-
Physical Chemistry (Table 3), new methods were developed
occurring concepts were also identified in the patent literature
with AI to complement analytical data, automate flow
in 5-year time windows (Figure S5, presented at a 95th
chemistry, improve retrosynthetic planning, and predict
percentile cutoff for co-occurrence). Similar patterns in the
reaction outcomes. In addition, user-friendly computational
evolution of associated concepts were observed in the patent
tools were developed, and methods combining AI with
literature as those observed in the journal literature. Previously
physics-based approaches such as density functional theory
unseen research topics, such as Diagnosis, Prognosis, Peptides,
were reported to improve the accuracy of calculations.
and Transcription factors, were introduced in the years 2005−
To identify notable AI-related patents, results from the
2009 (Figure S5B). The use of AI in the study of organic
search query were first sorted by size of patent family for each
compounds and hydrocarbons and in the development of
year. A patent family is a collection of patents filed in multiple
QSPR (quantitative structure−property relationships) be-
countries covering the same or similar content160 and, thus,
comes more prominent in the years 2010−2014 (Figure
represents high-priority intellectual property for organizations,
S5C), and connections between these topics and AI-related
which we use here as a proxy to estimate importance.
concepts have increased further since 2015 (Figure S5D).
Documents were then selected from the top 50 largest patent
However, unlike in journal publications, few COVID-related
families per year on apparent novelty and relevance to overall
concepts co-occurred with AI-related concepts in the patent
research trends of AI in chemistry and presented in Table S1.
literature in 2020 (Figure S6). This may be due in part to the
The US has made the largest contribution to these patent 3203
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 3197−3212
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
inventions, with 13 of the 15 patents selected granted to
companies based in the United States. Interestingly, most of surface predict
the patent assignees are startup companies founded in the past predicted to momentum
10 years. This is consistent with the rapid growth of AI-related migration hybrid new four
chemistry inventions since 2015 and indicates how the ion trees prediction, property-crystal learning
emerging paradigm of AI provides opportunities for innovative by with discovery enterprises. crystals, rate −inorganic energy angular-linear decision
The adoption of AI in the life sciences is prominent, ’s machine
comprising 8 of the 15 patents covering biomarker develop- controlled calculation, organic with
ment, gene expression profiling, and biosequence analysis evolution photocatalyst fficiency of electron e
(Table S1). These patents also re intermetallic flect a strong interest in adsorption theory new interpretable theory
applying machine learning to medical diagnostics, consistent CO transition synthesis
with the topic analysis in Figure S5D. The remaining 7 patents hydrogen quantum highlight materials, phase design
cover research areas, including Analytical Chemistry, Environ- functional theory, switching ficial polymers, high
mental Chemistry, Materials Science, Industrial Chemistry & Cu-containing functional local sacri
Chemical Engineering, and Synthetic Chemistry. in quantum material with film, of density machine-derived material density hydrothermal functional on 2 conjugated for in vector screening MoS
■ DISTRIBUTIONOFSUBSTANCEINFORMATIONIN AI-RELATED CHEMICAL LITERATURE based magnetoelectric network, crystalline molecules polymer density on site correlation utilized
Journal Publications by Substance Class. The dis- device with neural support
tribution of AI-related research activity can also be probed by based properties time-dependent via
studying the numbers of documents involving different types of learning, logic learning, correlation, memristor,
substances. Because the barriers to AI implementation in adsorption device
chemistry include challenges in substance representation6 and and electrocatalysts transduction molecular graph electroluminescent materials machine scalable logic machine convolutional organic combining cheminformatics
data availability,7 enumeration of the most common substance
types studied in the literature will point to areas in which
researchers have, in some instances, been able to overcome year 2020 2019 2019 2019 2018 2017 2016 2016
such challenges. The substances indexed by CAS are
categorized into multiple classes. The numbers of AI-related
journal publications for some frequently occurring substance US US US UK
classes, namely Alloy, Coordination Compound, Element, of Canada of of of College, University, University,
Manual Registration, Ring Parent, Small Molecule, Polymer, organization Corp.,
Salt, and Inorganic Compound, are shown in Figure 7A. Toronto, Michigan, Liverpool, Institute Technology, US US US
Substances in the Manual Registration class are predominantly University Intel University University Massachusetts Stanford Harvard Haverford
biomolecules, such as enzymes, hormones, vaccines, and
antibodies. Biosequences are not included in the analysis of ficial
journal documents. Ring Parents represent scaffolds defining Arti High- a
the composition and connectivity of molecular ring systems. for and and Hydrogen by
As Figure 7A shows, publications containing Small Molecule Machine 145 Science 140 for
substances are the highest in number, followed by those Devices Active Logic Accurate Diodes Failed 2
containing Element and Manual Registration substances, far Experiment Low-Voltage an a Approach
outnumbering publications containing substances in the MoS of as Using for 143 Using
remaining classes. The high volume of research and invention Materials −Orbit in Photocatalysts
in AI involving these classes is likely facilitated by their relative in Spin ffects Device 144
simplicity and ease of modeling compared to substances in E Light-Emitting Integration Networks Properties Experimental Discovery
other classes, such as Coordination Compound and Polymer. title Polymer the and
The large number of documents containing Manual Registra- Electrocatalysts Organic 2 Neural Material Computing
tion substances in Figure 7A is consistent with the relatively Coupling 141 Materials CO Magnetoelectric Organic of
high publication volume in Biochemistry (Figure S1). Also Publications through Electrochemical of Ionic of Screening
shown in Figure 7A are the total numbers of substances and Molecular
contained in AI-related journal publications for each substance fficient Computing Water Convolutional Prediction Organic Virtual
class. The data show similar trends as those for document Journal Neuromorphic 146 Discovery Discovery from fficient
count, albeit skewed by the larger number of small molecule 139 for E Energy-E 142 Graph of substances per document. Modulation
Figure 7B shows the change in the number of AI-related Learning Neuromorphic Evolution Theory Interpretable Non-Volatile Synapse Throughput Experiments
journal publications by substance class for the years 2000− AI-Related Accelerated Scalable Ionic Accelerated Crystal A Design Machine-Learning-Assisted
2020. While the document count of each substance type
increased during this period (and particularly after 2017),
those containing Small Molecule, Element and Manual Notable Chemical the
Registration substances displayed the largest increases, 2. Review journal Materials of Materials Materials
consistent with the data shown in Figure 7A.
Patent Publications by Substance Class. The patent American Society Letters Table Nature Nature Nature Journal Physical Nature Nature Nature
literature was analyzed using the same methods as for the 3204
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 3197−3212
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review in small size than leveraging policy outcome on problem tools dynamics areas accurate properties reaction trained intermediate expansion more clustering molecular many-body population models bioinformatics search, properties initio solve chemistry, using to prediction thermochemical with tree ab high-risk segmentation, AI experimentation, highlight protein by Carlo microscopy quantum property retrosynthesis prediction, learning, functions molecule researchers predict classifying learning, enabled Monte throughput to wave prediction network, of molecule organic network, property machine model fluorescence high- machine proteomics network, neural learning synthesis filter physics property Chemistry fragments forest forest, learning learning, learning, network, learning, neural tensor machine network, prediction peptide quantum molecule DFT prediction random deep unsupervised connecting automated deep random neural gradient-domain machine deep machine machine Physical and year 2020 2019 2017 2016 2019 2018 2018 2019 2017 2017 2017 2017 2015 US US ̈nster,u Science Science, Chemistry, ̈tM for Germany Germany Angeles, Aquatic Technology, Los of Biochemistry, of Berlin, Berlin, of US ̈t ̈t Synthetic Switzerland Foundation Switzerland Switzerland Switzerland organization Institute chemistry California, Institute Basel, Switzerland Basel, Basel, Institute Wilhelms-Universita of Universita Universita University, of of of Federal Technology, ̈lische physical Zurich, Chemistry, Planck and Spain Germany Germany Swiss University Ikerbasque-Basque Max Massachusetts Westfa Princeton University Technische ETH Technische University University Δ- in for and Analytical The Forces, of Tool ficial Compounds Molecular Models Networks Arti Neural Areas Learning Cross-Coupling Energies, with 147 Comprehensive Organic Super-Resolution Neural Tensor Learning the 154 for of −N Approximations: C in Machine Deep in Problem Deep Predicting A 149 150 Charges 158 Energy-Conserving Machine title with 159 Groundwater Platform Synthesis for from Data Error in 148 fication 151 153 Chemistry Cross-Modality Flow Partial Accurate Many-Body DFT Classi for Molecular Approach Syntheses Performance Network and Insights Publications of Arsenic Planning 156 of Segmentation: Learning Quantum of Enables Microscopy Pixel AI Computational (Prote)omics 152 155 Hybrid by AI Neural Quantum Weka of Platform Reaction 157 Errors Learning Chemical A Moments, Learning Networks than Meets Journal Threat Machine Fields the Learning Perseus Data Fluorescence Microscopy Analysis Robotic Informed Symbolic Using Dipole Force Neural Networks Lower Machine Global Deep Trainable The A Planning Predicting PhysNet: Machine Solving Quantum-Chemical Prediction Big AI-Related Theory Theory Theory Notable journal Chemical Chemical Chemical 3. Methods Methods of Communications of of Computation Advances Computation Computation and and and Table Science Nature Bioinformatics Nature Science Nature Science Journal Science Science Nature Journal Journal 3205
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 3197−3212
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
Figure 7. Publications in AI-related chemistry associated with substance class from 2000 to 2020. (A) Number of AI-related journal publications
and number of substances associated with each class. (B) Trends of AI-related journal publications associated with each substance class. (C)
Number of AI-related patent publications and number of substances associated with each class. (D) Trends of AI-related patent publications
associated with each substance class.
Figure 8. Trends of substance classes in AI-related chemistry publications from 2000 to 2020: (A) journal publications and (B) patent publications. 3206
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 3197−3212
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
journal literature. Figure 7C shows the numbers of AI-related
Multiple factors likely explain the significantly increased use
patent publications and substances associated with different
of AI in chemistry after 2015. The greater availability of
substance classes. Nucleic Acid Sequences and Peptide
software and hardware tools to implement AI decreased the
Sequences are highest in number, whereas the remaining
barriers to using it in chemical research, while research area-
relative document and substance counts are similar to those
specific data sets amenable to AI methods have proliferated. In
found in Figure 7A. Patents containing Peptide Sequences or
addition, many researchers have learned techniques in
Nucleic Acid Sequences often contain large numbers of
generating and handling data for use in AI methods.
sequences per document, often far greater than other
Between the years 2000−2020, the co-occurrence of AI- and
substances per patent. The change in the number of AI-
research area-specific concepts in publications shows how AI
related patent publications containing various substance classes
has been incorporated into a variety of research areas. Many AI
over time is shown in Figure 7D, again showing trends
methods have been adapted for chemistry research and are
consistent with those in Figure 7B.
being further introduced to new areas of chemical study.
Analysis of Substances Contained in AI-Related
In conclusion, thanks to an increasingly interdisciplinary
Chemical Literature. A substance-level perspective of
research landscape, many AI methods have been successfully
chemical research over time is also useful for understanding
adapted to chemistry research. Use of AI has even become
the utilization of AI. It is interesting to see that in both journal
routine in some fields. There are still areas of Chemistry like
and patent publications (Figure 8A and 8B, respectively), the
organic synthetic chemistry where AI is yet to make an impact.
number of substances present do not follow a monotonic
Perhaps, it is a matter of time before improvements in AI itself,
increase over time, as was the case in the progression of total
lessons from successful applications of AI, and interdisciplinary
research volume. Rather, a lull in substance count can be seen
research combine to help lift these areas out of the “trough of
in the first half of the 2010s before catching up with the
disillusionment” and onto the “plateau of productivity”.
massive increase in publications. This may be partially due to a
small number of documents between the years of 2008−2014 ■ ASSOCIATEDCONTENT
containing large amounts of Small Molecule substances or * sı Supporting Information
biosequences on the order of 103−104, which sometimes can
The Supporting Information is available free of charge at
be seen in the literature. We have also studied the distribution
https://pubs.acs.org/doi/10.1021/acs.jcim.1c00619.
of these substances across a variety of role indicators, which are
controlled vocabulary terms that describe the use of a
Total AI-related journal publications by discipline,
substance within the context of a specific document (Table
difference in proportion of interdisciplinary journal S2, Figure S7).
publications from 2016 to 2020, evolution of co- ■
occurring concepts, search string used for retrieval of CONCLUSIONS AND OUTLOOK
all publications, table of notable AI-related patents, and
common substance role indicators (PDF)
Applications of AI in chemistry have become increasingly
popular in recent years, as evidenced by the strong growth in ■
publication volume. Yet, it is striking that growth has not been AUTHOR INFORMATION
uniform. For some fields of chemistry, AI is much further along Corresponding Author
the proverbial Hype Cycle of Emerging Technologies161 than
Zachary J. Baum − Chemical Abstracts Service, Columbus,
others. In life-sciences and Analytical Chemistry, for example, Ohio 43210, United States; orcid.org/0000-0002-0585-
AI-adoption is likely already past the so-called “peak of inflated 8503; Email: ZBaum@cas.org
expectations” and “trough of disillusionment”. The utility of AI
in a given domain is intrinsically linked to the quantity and Authors Xiang Yu −
quality of its data, as well as opportunities to gain insights from
Chemical Abstracts Service, Columbus, Ohio
its analysis. AI can help gain insights that would not otherwise 43210, United States Philippe Y. Ayala −
follow from established knowledge. AI is also useful for
Chemical Abstracts Service, Columbus,
extracting insights from large intractable data sets, as well as Ohio 43210, United States Yanan Zhao −
aiding in the automation of repetitive tasks. With this in mind,
Chemical Abstracts Service, Columbus, Ohio
it is not a surprise to see a surge in AI deployment within 43210, United States Steven P. Watkins −
analytical chemistry, where large training sets are readily
Chemical Abstracts Service, Columbus,
obtained, or in biochemistry, which contains a wealth of data Ohio 43210, United States Qiongqiong Zhou −
for macromolecules whose structure−property relationships
Chemical Abstracts Service, Columbus,
are not obvious to researchers. Successes in these more Ohio 43210, United States; orcid.org/0000-0001-6711-
traditionally data-intensive fields are now being emulated in 369X other areas of chemistry.
Complete contact information is available at:
The large numbers and rapid growth of AI-related chemistry
https://pubs.acs.org/10.1021/acs.jcim.1c00619
publications involving small molecules reflect the popularity of
AI applications in drug discovery. Analyses of total substance Notes
numbers for each class in AI-related publications revealed large
The authors declare no competing financial interest.
numbers of Nucleic Acid Sequences and Peptide Sequences in
Publications using artificial intelligence were identified by
patents, consistent with the prevalence of AI applications in
optimizing a search of relevant terms on the CAS Content
biochemistry. The distribution of the role indicators assigned
Collection using CAS STN. While the full data set is
to substances in AI-related publications contextualizes how AI
considered proprietary by CAS, the search string used for
is being used in recent biochemical and pharmaceutical
retrieval is included in the Supporting Information. Substance research.
information, primary and secondary disciplines, concepts, and 3207
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 3197−3212
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
institutional information were extracted directly from the CAS
(16) CAS Content. https://www.cas.org/about/cas-content (ac- Content Collection. cessed June 15, 2021). ■
(17) Dahl, G. E.; Jaitly, N.; Salakhutdinov, R. Multi-Task Neural ACKNOWLEDGMENTS
Networks for QSAR Predictions. arXiv, 2014, 1406.1231. https:// arxiv.org/abs/1406.1231.
We sincerely appreciate Rumiana Tenchov’s assistance
(18) Krizhevsky, A.; Sutskever, I.; Hinton, G. E. Imagenet
curating references, Joshua Blair for obtaining ACS National
Classification with Deep Convolutional Neural Networks. Commun.
Meeting abstracts, Laura Czuba for project coordination, Peter ACM 2017, 60, 84−90.
Jap and Cristina Tomeo for insightful discussion, and Susan
(19) Cauchi, M.; Weber, C. M.; Bolt, B. J.; Spratt, P. B.; Bessant, C.;
Jervey and Robert Bird for proofreading. We are also grateful
Turner, D. C.; Willis, C. M.; Britton, L. E.; Turner, C.; Morgan, G.
to Manuel Guzman, Gilles Georges, Michael Dennis, Carmin
Evaluation of Gas Chromatography Mass Spectrometry and Pattern
Gade, Dawn George, and Cynthia Casebolt for executive
Recognition for the Identification of Bladder Cancer from Urine
Headspace. Anal. Methods 2016, 8, 4037−4046. sponsorship. ■
(20) Kim, J. Y.; Oh, D.; Sung, K.; Choi, H.; Paeng, J. C.; Cheon, G.
J.; Kang, K. W.; Lee, D. Y.; Lee, D. S. Visual Interpretation of REFERENCES
[(18)F]Florbetaben Pet Supported by Deep Learning-Based
(1) Dutta, S. Data Modeling: A Fundamental Pillar of Your Future
Estimation of Amyloid Burden. Eur. J. Nucl. Med. Mol. Imaging
Ai Technology. CAS Blog. https://www.cas.org/resource/blog/data- 2021, 48, 1116−1123.
modeling-fundamental-pillar-your-future-ai-technology (accessed May
(21) Jaiswal, A.; Gianchandani, N.; Singh, D.; Kumar, V.; Kaur, M. 13, 2021).
Classification of the Covid-19 Infected Patients Using Densenet201
(2) Villalba, M.; Wollenhaupt, M.; Ravitz, O. Predicting New
Based Deep Transfer Learning. J. Biomol. Struct. Dyn. 2020, 1−8.
Chemistry: Impact of High-Quality Training Data on Prediction of
(22) Song, C. L.; Vardaki, M. Z.; Goldin, R. D.; Kazarian, S. G.
Reaction Outcomes. CAS Whitepapers. https://www.cas.org/
Fourier Transform Infrared Spectroscopic Imaging of Colon Tissues:
resources/whitepapers/predicting-new-chemistry (accessed May 13,
Evaluating the Significance of Amide I and C-H Stretching Bands in 2021).
Diagnostic Applications with Machine Learning. Anal. Bioanal. Chem.
(3) Sharma, Y. Data Quality: The Not-So Secret Sauce for Ai and 2019, 411, 6969−6981.
Machine Learning. CAS Blog. https://www.cas.org/resource/blog/
(23) Ren, X.; Ghassemi, P.; Kanaan, Y. M.; Naab, T.; Copeland, R.
data-quality-not-so-secret-sauce-ai-and-machine-learning (accessed
L.; Dewitty, R. L.; Kim, I.; Strobl, J. S.; Agah, M. Kernel-Based May 13, 2021).
Microfluidic Constriction Assay for Tumor Sample Identification.
(4) Griffen, E. J.; Dossetter, A. G.; Leach, A. G. Chemists: Ai Is ACS Sens 2018, 3, 1510−1521.
Here; Unite to Get the Benefits. J. Med. Chem. 2020, 63, 8695−8704.
(24) Shen, H.; Zhang, W.; Chen, P.; Zhang, J.; Fang, A.; Wang, B. In
(5) Mater, A. C.; Coote, M. L. Deep Learning in Chemistry. J. Chem.
A Feature Selection Scheme for Accurate Identification of Alzheimer’s
Inf. Model. 2019, 59, 2545−2559.
Disease. Bioinformatics and Biomedical Engineering; Ortuño, F., Rojas,
(6) Wills, T. J.; Polshakov, D. A.; Robinson, M. C.; Lee, A. A. Impact
I., Eds. ; Springer International Publishing, 2016; pp 71−81.
of Chemist-in-the-Loop Molecular Representations on Machine
(25) Ramos, Á. G.; Antón, A. P.; Sánchez, M. D. N.; Pavón, J. L. P.;
Learning Outcomes. J. Chem. Inf. Model. 2020, 60, 4449−4456.
Cordero, B. M. Urinary Volatile Fingerprint Based on Mass
(7) Tkatchenko, A. Machine Learning for Chemical Discovery. Nat.
Spectrometry for the Discrimination of Patients with Lung Cancer Commun. 2020, 11, 4125.
and Controls. Talanta 2017, 174, 158−164.
(8) Lo, Y.-C.; Rensi, S. E.; Torng, W.; Altman, R. B. Machine
(26) Zhai, M. Y.; Zhao, Y.; Gao, H.; Shang, L. W.; Yin, J. H.
Learning in Chemoinformatics and Drug Discovery. Drug Discovery
Quantitative Study on Articular Cartilage by Fourier Transform Today 2018, 23, 1538−1546.
Infrared Spectroscopic Imaging and Support Vector Machine. Chinese
(9) Machine Learning in Chemistry: The Impact of Artificial
Journal of Analytical Chemistry 2018, 46, 896−901.
Intelligence; The Royal Society of Chemistry, 2020.
(27) Zheng, Q.; Li, J.; Yang, L.; Zheng, B.; Wang, J.; Lv, N.; Luo, J.;
(10) Mullin, R. The Lab of the Future Is Now. Chem. Eng. News
Martin, F. L.; Liu, D.; He, J. Raman Spectroscopy as a Potential 2021, 28.
Diagnostic Tool to Analyse Biochemical Alterations in Lung Cancer.
(11) Elton, D. C.; Boukouvalas, Z.; Fuge, M. D.; Chung, P. W. Deep Analyst 2020, 145, 385−392.
Learning for Molecular Designa Review of the State of the Art.
(28) Li, X.; Yang, T.; Li, S.; Yao, J.; Song, Y.; Wang, D.; Ding, J.
Molecular Systems Design & Engineering 2019, 4, 828−849.
Study on Spectral Parameters and the Support Vector Machine in
(12) Bender, A.; Cortés-Ciriano, I. Artificial Intelligence in Drug
Surface Enhanced Raman Spectroscopy of Serum for the Detection of
Discovery: What Is Realistic, What Are Illusions? Part 1: Ways to
Make an Impact, and Why We Are Not There Yet. Drug Discovery
Colon Cancer. Laser Phys. Lett. 2015, 12, 115603. Today 2021, 26, 511−524.
(29) Zhang, L.; Ma, F.; Qi, A.; Liu, L.; Zhang, J.; Xu, S.; Zhong, Q.;
(13) Muratov, E. N.; Bajorath, J.; Sheridan, R. P.; Tetko, I. V.;
Chen, Y.; Zhang, C.-y.; Cai, C. Integration of Ultra-High-Pressure
Filimonov, D.; Poroikov, V.; Oprea, T. I.; Baskin, I. I.; Varnek, A.;
Liquid Chromatography-Tandem Mass Spectrometry with Machine
Roitberg, A.; Isayev, O.; Curtalolo, S.; Fourches, D.; Cohen, Y.;
Learning for Identifying Fatty Acid Metabolite Biomarkers of
Aspuru-Guzik, A.; Winkler, D. A.; Agrafiotis, D.; Cherkasov, A.;
Ischemic Stroke. Chem. Commun. 2020, 56, 6656−6659.
Tropsha, A. Qsar without Borders. Chem. Soc. Rev. 2020, 49, 3525−
(30) Alakwaa, F. M.; Chaudhary, K.; Garmire, L. X. Deep Learning 3564.
Accurately Predicts Estrogen Receptor Status in Breast Cancer
(14) Strieth-Kalthoff, F.; Sandfort, F.; Segler, M. H. S.; Glorius, F.
Metabolomics Data. J. Proteome Res. 2018, 17, 337−347.
Machine Learning the Ropes: Principles, Applications and Directions
(31) Grissa, D.; Pétéra, M.; Brandolini, M.; Napoli, A.; Comte, B.;
in Synthetic Chemistry. Chem. Soc. Rev. 2020, 49, 6154−6168.
Pujos-Guillot, E. Feature Selection Methods for Early Predictive
(15) Struble, T. J.; Alvarez, J. C.; Brown, S. P.; Chytil, M.; Cisar, J.;
Biomarker Discovery Using Untargeted Metabolomic Data. Front
DesJarlais, R. L.; Engkvist, O.; Frank, S. A.; Greve, D. R.; Griffin, D. J.; Mol. Biosci 2016, 3, 30.
Hou, X.; Johannes, J. W.; Kreatsoulas, C.; Lahue, B.; Mathea, M.;
(32) O’Shea, K.; Cameron, S. J.; Lewis, K. E.; Lu, C.; Mur, L. A.
Mogk, G.; Nicolaou, C. A.; Palmer, A. D.; Price, D. J.; Robinson, R. I.;
Metabolomic-Based Biomarker Discovery for Non-Invasive Lung
Salentin, S.; Xing, L.; Jaakkola, T.; Green, W. H.; Barzilay, R.; Coley,
Cancer Screening: A Case Study. Biochim. Biophys. Acta, Gen. Subj.
C. W.; Jensen, K. F. Current and Future Roles of Artificial Intelligence 2016, 1860, 2682−7.
in Medicinal Chemistry Synthesis. J. Med. Chem. 2020, 63, 8667−
(33) Paul, A.; Srivastava, S.; Roy, R.; Anand, A.; Gaurav, K.; Husain, 8682.
N.; Jain, S.; Sonkar, A. A. Malignancy Prediction among Tissues from 3208
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 3197−3212
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
Oral Scc Patients Including Neck Invasions: A (1)H Hrmas Nmr
and System for Characterization of Metabolism-Associated Con-
Based Metabolomic Study. Metabolomics 2020, 16, 38.
ditions, Including Diagnostics and Therapies, Based on Bioinformatics
(34) Pan, H.; Yao, C.; Yao, S.; Yang, W.; Wu, W.; Guo, D. A
A p p r o a c h . W O 2 0 1 9 1 7 8 6 1 0 A 1 ; S G 1 12 0 2 0 0 2 5 2 2 - A 1 ;
Metabolomics Strategy for Authentication of Plant Medicines with
AU2019233926-A1; CN111373481-A; KR2020132954-A;
Multiple Botanical Origins, a Case Study of Uncariae Rammulus Cum EP3766073-A1; US2021074384-A1.
Uncis. J. Sep. Sci. 2020, 43, 1043−1050.
(53) Dutta, A.; Kashefhaghighi, D.; Kia, A.; Jaganathan, K.; Gobbel,
(35) Gaul, D. A.; Mezencev, R.; Long, T. Q.; Jones, C. M.; Benigno,
J. R. Artificial Intelligence-Based Sequencing. WO 2020191391 A2;
B. B.; Gray, A.; Fernández, F. M.; McDonald, J. F. Highly-Accurate
WO2020191391-A3; AU2020240141-A1; CA3104951-A1.
Metabolomic Detection of Early-Stage Ovarian Cancer. Sci. Rep.
(54) Lim, J. S.; Vandermause, J.; van Spronsen, M. A.; Musaelian, A.; 2015, 5, 16351.
Xie, Y.; Sun, L.; O’Connor, C. R.; Egle, T.; Molinari, N.; Florian, J.;
(36) Hall, L. M.; Hill, D. W.; Bugden, K.; Cawley, S.; Hall, L. H.;
Duanmu, K.; Madix, R. J.; Sautet, P.; Friend, C. M.; Kozinsky, B.
Chen, M.-H.; Grant, D. F. Development of a Reverse Phase Hplc
Evolution of Metastable Structures at Bimetallic Surfaces from
Retention Index Model for Nontargeted Metabolomics Using
Microscopy and Machine-Learning Molecular Dynamics. J. Am.
Synthetic Compounds. J. Chem. Inf. Model. 2018, 58, 591−604.
Chem. Soc. 2020, 142, 15907−15916.
(37) Moawad, A. A.; Silge, A.; Bocklitz, T.; Fischer, K.; Rösch, P.;
(55) Takahashi, K.; Takahashi, L.; Miyazato, I.; Tanaka, Y. Searching
Roesler, U.; Elschner, M. C.; Popp, J.; Neubauer, H. A Machine
for Hidden Perovskite Materials for Photovoltaic Systems by
Learning-Based Raman Spectroscopic Assay for the Identification of
Combining Data Science and First Principle Calculations. ACS
Burkholderia Mallei and Related Species. Molecules 2019, 24, 4516. Photonics 2018, 5, 771−775.
(38) Kusić, D.; Rösch, P.; Popp, J. Fast Label-Free Detection of
(56) Liu, C.; Li, Y.; Takao, M.; Toyao, T.; Maeno, Z.; Kamachi, T.;
Legionella Spp. In Biofilms by Applying Immunomagnetic Beads and
Hinuma, Y.; Takigawa, I.; Shimizu, K.-i. Frontier Molecular Orbital
Raman Spectroscopy. Syst. Appl. Microbiol. 2016, 39, 132−40.
Based Analysis of Solid-Adsorbate Interactions over Group 13 Metal
(39) Yu, F. L.; Zhao, N.; Wu, Z. S.; Huang, M.; Wang, D.; Zhang, Y.
Oxide Surfaces. J. Phys. Chem. C 2020, 124, 15355−15365.
B.; Hu, X.; Chen, X. L.; Huang, L. Q.; Pang, Y. X. Nir Rapid
(57) Li, X.; Xie, Y.; Hu, D.; Lan, Z. Analysis of the Geometrical
Assessments of Blumea Balsamifera (Ai-Na-Xiang) in China. Molecules
Evolution in on-the-Fly Surface-Hopping Nonadiabatic Dynamics 2017, 22, 1730.
with Machine Learning Dimensionality Reduction Approaches:
(40) Lund, J. A.; Brown, P. N.; Shipley, P. R. Differentiation of
Classical Multidimensional Scaling and Isometric Feature Mapping.
Crataegus Spp. Guided by Nuclear Magnetic Resonance Spectrometry
J. Chem. Theory Comput. 2017, 13, 4611−4623.
with Chemometric Analyses. Phytochemistry 2017, 141, 11−19.
(58) Deng, C.; Su, Y.; Li, F.; Shen, W.; Chen, Z.; Tang, Q.
(41) Lu, J.; Chen, C.; Wang, H.; Wen, Y. Establishing a Machine
Understanding Activity Origin for the Oxygen Reduction Reaction on
Learning Model for Cancer Anticipation and a Method of Detecting
Bi-Atom Catalysts by Dft Studies and Machine-Learning. J. Mater.
Cancer by Using Multiple Tumor Markers in the Machine Learning
Chem. A 2020, 8, 24563−24571.
Model for Cancer Anticipation. US2018173847A1.
(59) Quaranta, V.; Behler, J.; Hellström, M. Structure and Dynamics
(42) Bazemore, K. Machine Learning Algorithms and Applied
of the Liquid-Water/Zinc-Oxide Interface from Machine Learning
Diagnostic Methods for Predicting Diseases. US2019353639A1.
Potential Simulations. J. Phys. Chem. C 2019, 123, 1293−1304.
(43) Shi, T.; Ding, W.; Chen, G. Methylation Markers, and
(60) Lansford, J. L.; Vlachos, D. G. Spectroscopic Probe Molecule
Application Thereof in Cancer Diagnosis and Classification.
Selection Using Quantum Theory, First-Principles Calculations, and CN109680060A.
Machine Learning. ACS Nano 2020, 14, 17295−17307.
(44) Barnes, M.; Bifulco, C.; Chen, T.; Tubbs, A. Imaging
(61) Datar, A.; Chung, Y. G.; Lin, L.-C. Beyond the Bet Analysis:
Processing System Using Convolutional Neural Network for
The Surface Area Prediction of Nanoporous Materials Using a
Processing Imaging of Biological Staining of Animal Tissue and
Machine Learning Method. J. Phys. Chem. Lett. 2020, 11, 5412−5417.
Cells. WO 2015181371A1; AU2015265811-A1; CA2944831-A1;
(62) Qi, X.; Ma, W.; Dang, Y.; Su, W.; Liu, L. Optimization of the
EP3149700-A1; US2017154420-A1; JP2017529513-W; EP3149700-
Melt/Crystal Interface Shape and Oxygen Concentration During the
B1; US10275880-B2; CA2944831-C; AU2015265811-B2; JP6763781-B2.
Czochralski Silicon Crystal Growth Process Using an Artificial Neural
(45) Horimoto, K.; Fukui, K. Toxicity Learning Apparatus, Toxicity
Network and a Genetic Algorithm. J. Cryst. Growth 2020, 548,
Learning Method, Learned Model, Toxicity Prediction Apparatus and 125828. Program. JP2020025471A.
(63) Karim, M. R.; Ferrandon, M.; Medina, S.; Sture, E.; Kariuki, N.;
(46) Sogawa, I.; Kimura, A.; Koh, Y.; Kimura, A.; Ko, Y.; Sogawa, I.;
Myers, D. J.; Holby, E. F.; Zelenay, P.; Ahmed, T. Coupling High-
Koh, Y. Method and Device for Detecting Tumor Cells by Analyzing
Throughput Experiments and Regression Algorithms to Optimize
Spectroscopic Data by Statistical Method. WO2017195772A1;
Pgm-Free Orr Electrocatalyst Synthesis. ACS Applied Energy Materials
JP2017203637-A; IN201847044553-A; CN109073547-A; 2020, 3, 9083−9088.
US2019072484-A1; EP3457116-A1; EP3457116-A4.
(64) Khmaissia, F.; Frigui, H.; Andriotis, A. N.; Menon, M. Data
(47) Park, J. Y.; Oh, J. Y.; Kim, J. J.; Lee, B. S.; Yang, H. S.; Sik, Y. H.
Driven Modeling of Magnetism in Dilute Magnetic Semiconductors:
Clinical Diagnostic Data Processing System for Predicting Mortality
Correlation between the Magnetic Features of Diluted Magnetic
Risk Levels. WO2020130238A1; KR2020075477-A.
Semiconductors and Electronic Properties of the Constituent Atoms.
(48) Park, H. S.; Hyoung, R.; Chung, S. Method and Apparatus for
J. Phys.: Condens. Matter 2019, 31, 445901.
Identifying Strains Based on Mass Spectrometry and Mass Spectra
(65) Rong, K.; Wei, J.; Huang, L.; Fang, Y.; Dong, S. Synthesis of
Peak Database. KR 2020050434 A.
Low Dimensional Hierarchical Transition Metal Oxides Via a Direct
(49) Shimizu, Y.; Takashima, N. Event Estimation Method, Event
Deep Eutectic Solvent Calcining Method for Enhanced Oxygen
Estimation Program and Server Device, Biological Information
Evolution Catalysis. Nanoscale 2020, 12, 20719−20725.
Measurement Device, Event Estimation System. JP 2020031701 A.
(66) Wu, T.; Wang, J. Deep Mining Stable and Nontoxic Hybrid
(50) Sik, H. W.; Park, J. S. Computerized Diagnostic Information
Organic-Inorganic Perovskites for Photovoltaics Via Progressive
System Method Using Biomarker Genes and Proteins for Risk
Machine Learning. ACS Appl. Mater. Interfaces 2020, 12, 57821−
Assessment of Urogenital System Cancers and Drug Screening. KR 57831. 2020074555 A; KR2164052-B1.
(67) Suresh, T.; Sivarajasekar, N.; Balasubramani, K. Enhanced
(51) Segal, E.; Bar, N.; Korem, T. Predicting Blood Metabolites. WO
Ultrasonic Assisted Biodiesel Production from Meat Industry Waste 2020157762 A1.
(Pig Tallow) Using Green Copper Oxide Nanocatalyst: Comparison
(52) Apte, Z.; Richman, J.; Almonacid, D.; Pedroso, I.; Dumas, V.;
of Response Surface and Neural Network Modelling. Renewable
Marquez, V.; Araya, I.; Castro, R.; Saavedra, M.; Alegria, M. Method Energy 2021, 164, 897−907. 3209
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 3197−3212
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
(68) Wu, L.; Guo, T.; Li, T. Rational Design of Transition Metal
(87) Lai, W.; Song, W.; Li, X.; Yan, Y.; Huang, W. Intrinsic
Single-Atom Electrocatalysts: A Simulation-Based, Machine Learning-
Stretchable Electroluminescent Block Copolymer Elastomer and
Accelerated Study. J. Mater. Chem. A 2020, 8, 19290−19299.
Preparation Method Thereof. CN111635504A.
(69) Ulissi, Z. W.; Singh, A. R.; Tsai, C.; Nørskov, J. K. Automated
(88) Blaier, O.; Schiller, E. Optimization for 3d Printing.
Discovery and Construction of Surface Phase Diagrams Using
US2019366644A1; EP3584723-A2; EP3584723-A3; US10926475-
Machine Learning. J. Phys. Chem. Lett. 2016, 7, 3931−3935. B2.
(70) Graziosi, P.; Kumarasinghe, C.; Neophytou, N. Material
(89) Deetz, J. D.; Wood, C. E.; Truong, R. A. Resin Viscosity
Descriptors for the Discovery of Efficient Thermoelectrics. ACS
Detection in Additive Manufacturing. US2020338830A1.
Applied Energy Materials 2020, 3, 5913−5926.
(90) Chen, A.; Zheng, X. Repairable Multi-Response Deformable
(71) Wang, Z.; Zhang, H.; Li, J. Accelerated Discovery of Stable
Liquid Crystal Elastomer Film and Its Preparation Method and
Spinels in Energy Systems Via Machine Learning. Nano Energy 2021,
Application in Artificial Intelligence. CN108727544A; 81, 105665. CN108727544-B.
(72) Masood, H.; Toe, C. Y.; Teoh, W. Y.; Sethu, V.; Amal, R.
(91) Chhabra, S.; Xie, J.; Frank, A. T. Rnaposers: Machine Learning
Machine Learning for Accelerated Discovery of Solar Photocatalysts.
Classifiers for Ribonucleic Acid-Ligand Poses. J. Phys. Chem. B 2020,
ACS Catal. 2019, 9, 11774−11787. 124, 4436−4445.
(73) Davies, D. W.; Butler, K. T.; Walsh, A. Data-Driven Discovery
(92) Shamsara, J.; Schüürmann, G. A Machine Learning Approach
of Photoactive Quaternary Oxides Using First-Principles Machine
to Discriminate Mr1 Binders: The Importance of the Phenol and
Learning. Chem. Mater. 2019, 31, 7221−7230.
Carbonyl Fragments. J. Mol. Struct. 2020, 1217, 128459.
(74) Li, Z.; Achenie, L. E. K.; Xin, H. An Adaptive Machine
(93) Ji, B.-Y.; You, Z.-H.; Jiang, H.-J.; Guo, Z.-H.; Zheng, K.
Learning Strategy for Accelerating Discovery of Perovskite Electro-
Prediction of Drug-Target Interactions from Multi-Molecular Net-
catalysts. ACS Catal. 2020, 10, 4377−4384.
work Based on Line network Representation Method. J. Transl. Med.
(75) Gheith, M. E. M.; Stobert, I.; Hamouda, A. Curvilinear Mask 2020, 18, 347.
Models in Semiconductor Structure Manufacture by Machine
(94) Yuan, Y.; Chang, S.; Zhang, Z.; Li, Z.; Li, S.; Xie, P.; Yau, W.-P.;
Learning. US 10831977 B1; US2020380089-A1.
Lin, H.; Cai, W.; Zhang, Y.; Xiang, X. A Novel Strategy for Prediction
(76) Chang, B. Y.; Jang, B. Y.; Zhang, F. Substrate Treating
of Human Plasma Protein Binding Using Machine Learning
Apparatus and Substrate Treating Method. US 2020192308 A1;
Techniques. Chemom. Intell. Lab. Syst. 2020, 199, 103962. KR2020072060-A; CN111312613-A.
(95) Aniceto, N.; Freitas, A. A.; Bender, A.; Ghafourian, T.
(77) Van Den Brink, M.; Cao, Y.; Zou, Y.; Van Den Brink, M. A.
Simultaneous Prediction of Four Atp-Binding Cassette Transporters’
Machine Learning Based Inverse Optical Proximity Correction and
Substrates Using Multi-Label Qsar. Mol. Inf. 2016, 35, 514−528.
Process Model Calibration. WO 2019238372 A1; TW202001447-A;
(96) Zhao, Y.; Zheng, K.; Guan, B.; Guo, M.; Song, L.; Gao, J.; Qu, KR2021010897-A; CN112384860-A.
H.; Wang, Y.; Shi, D.; Zhang, Y. Dldti: A Learning-Based Framework
(78) Yati, A. Defect Discovery Using Electron Beam Inspection and
for Drug-Target Interaction Identification Using Neural Networks
Deep Learning with Real-Time Intelligence to Reduce Nuisance. US
and Network Representation. J. Transl. Med. 2020, 18, 434.
2019213733 A1; WO2019136190-A1; TW201939634-A;
(97) Hochuli, J.; Helbling, A.; Skaist, T.; Ragoza, M.; Koes, D. R.
KR2020096993-A; CN111542915-A; US10970834-B2.
Visualizing Convolutional Neural Network Protein-Ligand Scoring. J.
(79) Chang, R. Photolithography Mask Design-Rule Check
Mol. Graphics Modell. 2018, 84, 96−108.
Assistance Using Computer for Semiconductor Wafer Manufacture.
(98) Erdas, O.; Andac, C. A.; Gurkan-Alp, A. S.; Alpaslan, F. N.; US 10713411 B1.
Buyukbingol, E. Compressed Images for Affinity Prediction-2 (Cifap-
(80) Wang, D. Y.; Salcin, E.; Friedmann, M.; Shaughnessy, D.;
2): An Improved Machine Learning Methodology on Protein-Ligand
Shchegrov, A. V.; Madsen, J. M.; Kuznetsov, A. Methods and Systems
Interactions Based on a Study on Caspase 3 Inhibitors. J. Enzyme
for Co-Located Metrology of Semiconductor Structures. US
Inhib. Med. Chem. 2015, 30, 809−15.
2020243400 A1; WO2020154152-A1; US10804167-B2.
(99) Fan, J.; Liu, K.; Xiangyan, S. Computational Method for
(81) Honda, T.; Kekatpure, R. D.; David, J. D. Temporal
Classifying and Predicting Ligand Docking Conformations.
Dependencies of Process Targets for Different Machine Learning
WO2018213767A1; US2018341754-A1; EP3427170-A1;
Models for Robust Machine Learning Predictions for Semiconductor EP3427170-A4.
Manufactoring Processes. US 2018356807 A1.
(100) Feinberg, E. N.; Pande, V. S. Machine Learning and Molecular
(82) David, J. D. Process Control Techniques for Semiconductor
Simulation Based Methods for Enhancing Binding and Activity
Manufacturing Processes. US 2016148850 A1; WO2016086138-A1;
Prediction. US2019272887A1; WO2019173407-A1; CA3093260-A1;
K R 20 17 08 65 85 -A; C N10 70 04 06 0- A ; J P 20 17 53 65 84 -W ;
A U 2 0 1 9 2 3 1 26 1 - A 1 ; KR 2 0 20 1 2 8 7 1 0 - A ; E P 3 7 6 2 7 3 0 -A 1 ;
US2018358271-A1; US10734293-B2; JP6751871-B2. CN112204402-A.
(83) Lauber, J.; Vajaria, H.; Zhang, Y.; Yong, Z. Multi-Step Image
(101) Mamoshina, P.; Volosnikova, M.; Ozerov, I. V.; Putin, E.;
Alignment Method for Large Offset Die-Die Inspection for Defects in
Skibina, E.; Cortese, F.; Zhavoronkov, A. Machine Learning on
Semiconductor Device Manufacture. US 2019122913 A1;
Human Muscle Transcriptomic Data for Biomarker Discovery and
WO2019079658-A1; TW201928541-A; US10522376-B2;
Tissue-Specific Drug Target Identification. Front. Genet. 2018,
C N 1 1 1 1 6 4 6 4 6 - A ; K R 2 0 2 0 0 6 0 5 1 9 - A ; E P 3 6 9 8 3 2 2 - A 1 ; DOI: 10.3389/fgene.2018.00242. JP2021500740-W.
(102) Han, M.; Liu, Q.; Yu, J.; Zheng, S. Identification of Candidate
(84) Bhosale, P.; Rizzolo, M.; Yang, C. Automated Method for
Molecular Markers Predicting Chemotherapy Resistance in Non-
Integrated Analysis of Back End of the Line Yield, Line Resistance/
Small Cell Lung Cancer. Clin. Chem. Lab. Med. 2010, 48, 863−867.
Capacitance and Semiconductor Device Fabrication Process Perform-
(103) Thishya, K.; Vattam, K. K.; Naushad, S. M.; Raju, S. B.;
ance. US2018349535A1; US10303829-B2.
Kutala, V. K. Artificial Neural Network Model for Predicting the
(85) Sriraman, H. P.; Pathangi, H. S. Defect Detection on
Bioavailability of Tacrolimus in Patients with Renal Transplantation.
Semiconductor Wafers, Classification, and Process Window Control
PLoS One 2018, 13, No. e0191921.
by Sem. US2019287238A1; WO2019177800-A1; TW201941162-A;
(104) Liu, Q.; Muglia, L. J.; Huang, L. F. Network as a Biomarker: A
US10679333-B2; KR2020122401-A; CN111837225-A.
Novel Network-Based Sparse Bayesian Machine for Pathway-Driven
(86) Kwon, N.; Kang, H.; Kim, Y.; Quan, N. Semiconductor Defect
Drug Response Prediction. Genes 2019, 10, 602.
Classification Device, Method for Classifying Defect of Semi-
(105) Miyoshi, F.; Honne, K.; Minota, S.; Okada, M.; Ogawa, N.;
conductor, and Semiconductor Defect Classification System.
Mimura, T. A Novel Method Predicting Clinical Response Using
US2019188840 A1; KR2019073756-A; CN110060228-A;
Only Background Clinical Data in Ra Patients before Treatment with US10713778-B2.
Infliximab. Mod. Rheumatol. 2016, 26, 813−816. 3210
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 3197−3212
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
(106) Khojasteh, M.; Martin, J.; Pestic-Dragovich, L.; Tang, L.;
Internal Combustion Engine. DE102015208359A1; WO2016177531-
Wang, X.; Zhang, W.; Anders, R.; Diaz, L. Methods and Systems for
A1; DE102015208359-B4; CN107624144-A; US2018112631-A1;
Predicting Response to Pd-1 Axis Directed Therapeutics of Tumors. US10451009-B2; CN107624144-B. WO 2020072348 A1.
(126) Senior, A. W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.;
(107) Yang, H.-Y. Prediction of Pneumoconiosis by Serum and
Green, T.; Qin, C. L.; Zidek, A.; Nelson, A. W. R.; Bridgland, A.;
Urinary Biomarkers in Workers Exposed to Asbestos-Contaminated
Penedones, H.; Petersen, S.; Simonyan, K.; Crossan, S.; Kohli, P.;
Minerals. PLoS One 2019, 14, No. e0214808.
Jones, D. T.; Silver, D.; Kavukcuoglu, K.; Hassabis, D. Improved
(108) Schrey, A. K.; Nickel-Seeber, J.; Drwal, M. N.; Zwicker, P.;
Protein Structure Prediction Using Potentials from Deep Learning.
Schultze, N.; Haertel, B.; Preissner, R. Computational Prediction of Nature 2020, 577, 706−710.
Immune Cell Cytotoxicity. Food Chem. Toxicol. 2017, 107, 150−166.
(127) Stokes, J. M.; Yang, K.; Swanson, K.; Jin, W. G.; Cubillos-Ruiz,
(109) Lee, J. J.; Miller, J. A.; Basu, S.; Kee, T. V.; Loo, L. H. Building
A.; Donghia, N. M.; MacNair, C. R.; French, S.; Carfrae, L. A.;
Predictive in Vitro Pulmonary Toxicity Assays Using High-
Bloom-Ackermann, Z.; Tran, V. M.; Chiappino-Pepe, A.; Badran, A.
Throughput Imaging and Artificial Intelligence. Arch. Toxicol. 2018,
H.; Andrews, I. W.; Chory, E. J.; Church, G. M.; Brown, E. D.; 92, 2055−2075.
Jaakkola, T. S.; Barzilay, R.; Collins, J. J. A Deep Learning Approach
(110) Hamadache, M.; Hanini, S.; Benkortbi, O.; Amrane, A.;
to Antibiotic Discovery. Cell 2020, 180, 688−702.
Khaouane, L.; Moussa, C. S. Artificial Neural Network-Based
(128) Almagro Armenteros, J. J.; Tsirigos, K. D.; Sonderby, C. K.;
Equation to Predict the Toxicity of Herbicides on Rats. Chemom.
Petersen, T. N.; Winther, O.; Brunak, S.; von Heijne, G.; Nielsen, H.
Intell. Lab. Syst. 2016, 154, 7−15.
Signalp 5.0 Improves Signal Peptide Predictions Using Deep Neural
(111) Noskov, S.; Wacker, S.; Duff, H.; Guo, J. Systems and
Networks. Nat. Biotechnol. 2019, 37, 420−423.
Methods for Predicting Cardiotoxicity of Molecular Parameters of a
(129) Newman, A. M.; Steen, C. B.; Liu, C. L.; Gentles, A. J.;
C o m p o u n d B a s e d o n M a c h i n e L e a r n i n g A l g o r i t h m s .
Chaudhuri, A. A.; Scherer, F.; Khodadoust, M. S.; Esfahani, M. S.;
WO2016201575A1; US2018172667-A1.
Luca, B. A.; Steiner, D.; Diehn, M.; Alizadeh, A. A. Determining Cell
(112) Lee, F. K.; Friesth, K. L. Quintuple-Effect Generation Multi-
Type Abundance and Expression from Bulk Tissues with Digital
Cycle Hybrid Renewable Energy System with Integrated Energy
Cytometry. Nat. Biotechnol. 2019, 37, 773−782.
Provisioning, Storage Facilities and Amalgamated Control System
(130) Jaganathan, K.; Kyriazopoulou Panagiotopoulou, S.; McRae, J.
Cross-Reference to Related Applications. US 2015143806 A1;
F.; Darbandi, S. F.; Knowles, D.; Li, Y. I.; Kosmicki, J. A.; Arbelaez, J.;
AU2015203118-A1; EP2955372-A2; JP2016000995-A; CA2891435-
Cui, W. W.; Schwartz, G. B.; Chow, E. D.; Kanterakis, E.; Gao, H.;
A1; CN105257425-A; EP2955372-A3; HK1218148-A0;
Kia, A.; Batzoglou, S.; Sanders, S. J.; Farh, K. K. H. Predicting Splicing
US10060296-B2; BR102015013592-A2.
from Primary Sequence with Deep Learning. Cell 2019, 176, 535−
(113) Rangarajan, K.; Winston, J. B.; Jain, A.; Wang, X.; Jian, A.; 548.
Rangarajan, K. P. Integrated Surveillance and Control of Oilfield
(131) van Galen, P.; Hovestadt, V.; Wadsworth, M. H.; Hughes, T.
Activity. FR3070178A1; WO2019040125-A1; AU2018319552-A1;
K.; Griffin, G. K.; Battaglia, S.; Verga, J. A.; Stephansky, J.; Pastika, T.
CA3065094-A1; NO201901443-A; US2020182036-A1; GB2579739-
J.; Lombardi Story, J.; Pinkus, G. S.; Pozdnyakova, O.; Galinsky, I.; A.
Stone, R. M.; Graubert, T. A.; Shalek, A. K.; Aster, J. C.; Lane, A. A.;
(114) Tang, J.; Hou, J. Intelligent Ammonia Injection Control
Bernstein, B. E. Single-Cell Rna-Seq Reveals Aml Hierarchies
Method and Intelligent Ammonia Injection Controller.
Relevant to Disease Progression and Immunity. Cell 2019, 176, CN111804146A. 1265−1281.
(115) Shen, D.; Liu, G.; Wang, Q.; Luo, K. Accurate Ammonia
(132) Capper, D.; Jones, D. T. W.; Sill, M.; Hovestadt, V.; Schrimpf,
Injection Control Method for Scr System with Strong Self-Adaptive
D.; Sturm, D.; Koelsche, C.; Sahm, F.; Chavez, L.; Reuss, D. E.; Kratz,
Ability. CN109046021A; CN109046021-B.
A.; Wefers, A. K.; Huang, K.; Pajtler, K. W.; Schweizer, L.; Stichel, D.;
(116) Liu, X.; Lin, P.; Hu, G.; Wu, K. A Kind of Scr Downstream
Olar, A.; Engel, N. W.; Lindenberg, K.; Harter, P. N.; Braczynski, A.
Nox Closed-Loop Process Control Method and System.
K.; Plate, K. H.; Dohmen, H.; Garvalov, B. K.; Coras, R.; Holsken, A.;
CN109339916A; WO2020062865-A1; CN109339916-B.
Hewer, E.; Bewerunge-Hudler, M.; Schick, M.; Fischer, R.;
(117) Guo, L.; Zou, S.; Liu, W.; Wu, Q.; Zhou, W.; Zhao, X.; Yao,
Beschorner, R.; Schittenhelm, J.; Staszewski, O.; Wani, K.; Varlet,
T.; Xu, Q. Oil and Gas Gathering and Transportation Riser System
P.; Pages, M.; Temming, P.; Lohmann, D.; Selt, F.; Witt, H.; Milde,
Harmful Flow Type Warning Method and System, Control Method
and System. CN109458561A; WO2020082749-A1.
T.; Witt, O.; Aronica, E.; Giangaspero, F.; Rushing, E.; Scheurlen, W.;
(118) Li, Y.; Dong, S.; Wu, Q.; Zhang, S. Flue Gas Desulfurization
Geisenberger, C.; Rodriguez, F. J.; Becker, A.; Preusser, M.; Haberler,
Device with Flue Gas Monitoring and Control Function.
C.; Bjerkvig, R.; Cryan, J.; Farrell, M.; Deckert, M.; Hench, J.; Frank, CN110756039 A; CN211753913-U.
S.; Serrano, J.; Kannan, K.; Tsirigos, A.; Bruck, W.; Hofer, S.;
(119) Meng, L.; Gu, X.; Ma, W.; Ning, X.; Jiang, C.; Li, Y.; Jia, Y. Scr
Brehmer, S.; Seiz-Rosenhagen, M.; Hanggi, D.; Hans, V.; Rozsnoki,
Denitration Ammonia-Spraying Optimization Method and System
S.; Hansford, J. R.; Kohlhof, P.; Kristensen, B. W.; Lechner, M.;
Based on Advanced Measuring Meter and Advanced Control
Lopes, B.; Mawrin, C.; Ketter, R.; Kulozik, A.; Khatib, Z.; Heppner, Algorithm. CN108837698A.
F.; Koch, A.; Jouvet, A.; Keohane, C.; Muhleisen, H.; Mueller, W.;
(120) Kim, K. S. Control System of Dual-Fuel Engine.
Pohl, U.; Prinz, M.; Benner, A.; Zapatka, M.; Gottardo, N. G.; US2019093572A1; US10260432-B1.
Driever, P. H.; Kramm, C. M.; Muller, H. L.; Rutkowski, S.; von Hoff,
(121) Brummel, H.; Pfeifer, U.; Sterzing, V. Method and Assembly
K.; Fruhwald, M. C.; Gnekow, A.; Fleischhack, G.; Tippelt, S.;
for Controlling a Combustion Engine with Multiple Burners.
Calaminus, G.; Monoranu, C.-M.; Perry, A.; Jones, C.; Jacques, T. S.; EP3726139A1; WO2020212067-A1.
Radlwimmer, B.; Gessi, M.; Pietsch, T.; Schramm, J.; Schackert, G.;
(122) Eun, L. Device and Method for Efficiently Controlling Fuel
Westphal, M.; Reifenberger, G.; Wesseling, P.; Weller, M.; Collins, V.
Additive Injector of Engine. KR2180985B1.
P.; Blumcke, I.; Bendszus, M.; Debus, J.; Huang, A.; Jabado, N.;
(123) Chen, S. K.; Mandal, A.; Chien, L.; Ortiz-Soto, E. Machine
Northcott, P. A.; Paulus, W.; Gajjar, A.; Robinson, G. W.; Taylor, M.
Learning for Misfire Detection in a Dynamic Firing Level Modulation
D.; Jaunmuktane, Z.; Ryzhova, M.; Platten, M.; Unterberg, A.; Wick,
Controlled Engine of a Vehicle. US2019145859A1; WO2019099228-
W.; Karajannis, M. A.; Mittelbronn, M.; Acker, T.; Hartmann, C.; A1; US10816438-B2.
Aldape, K.; Schuller, U.; Buslei, R.; Lichter, P.; Kool, M.; Herold-
(124) Denys, F.; Leroy, T.; Ngo, C.; Rudloff, J. Device and Method
Mende, C.; Ellison, D. W.; Hasselblatt, M.; Snuderl, M.; Brandner, S.;
for Control of a Vehicle Thermal Engine. FR3085442A1.
Korshunov, A.; von Deimling, A.; Pfister, S. M. DNA Methylation-
(125) Streib, M.; Luo, L.; Klinkhammer, T.; Leuz, M.; Kluth, C.;
Based Classification of Central Nervous System Tumours. Nature
Polach, S.; Pollach, S. Method for Controlling Engine Knock of an 2018, 555, 469−474. 3211
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 3197−3212
Journal of Chemical Information and Modeling pubs.acs.org/jcim Review
(133) Segler, M. H. S.; Kogej, T.; Tyrchan, C.; Waller, M. P.
(146) Raccuglia, P.; Elbert, K. C.; Adler, P. D. F.; Falk, C.; Wenny,
Generating Focused Molecule Libraries for Drug Discovery with
M. B.; Mollo, A.; Zeller, M.; Friedler, S. A.; Schrier, J.; Norquist, A. J.
Recurrent Neural Networks. ACS Cent. Sci. 2018, 4, 120−131.
Machine-Learning-Assisted Materials Discovery Using Failed Experi-
(134) Jespersen, M. C.; Peters, B.; Nielsen, M.; Marcatili, P.
ments. Nature 2016, 533, 73−76.
Bepipred-2.0: Improving Sequence-Based B-Cell Epitope Prediction
(147) Podgorski, J.; Berg, M. Global Threat of Arsenic in
Using Conformational Epitopes. Nucleic Acids Res. 2017, 45, W24−
Groundwater. Science 2020, 368, 845−850. W29.
(148) Wang, H. D.; Rivenson, Y.; Jin, Y. Y.; Wei, Z. S.; Gao, R.;
(135) Iorio, F.; Knijnenburg, T. A.; Vis, D. J.; Bignell, G. R.;
Gunaydin, H.; Bentolila, L. A.; Kural, C.; Ozcan, A. Deep Learning
Menden, M. P.; Schubert, M.; Aben, N.; Goncalves, E.; Barthorpe, S.;
Enables Cross-Modality Super-Resolution in Fluorescence Micros-
Lightfoot, H.; Cokelaer, T.; Greninger, P.; van Dyk, E.; Chang, H.; de
copy. Nat. Methods 2019, 16, 103−110.
Silva, H.; Heyn, H.; Deng, X. M.; Egan, R. K.; Liu, Q. S.; Mironenko,
(149) Arganda-Carreras, I.; Kaynig, V.; Rueden, C.; Eliceiri, K. W.;
T.; Mitropoulos, X.; Richardson, L.; Wang, J. H.; Zhang, T. H.;
Schindelin, J.; Cardona, A.; Seung, H. S. Trainable Weka
Moran, S.; Sayols, S.; Soleimani, M.; Tamborero, D.; Lopez-Bigas, N.;
Segmentation: A Machine Learning Tool for Microscopy Pixel
Ross-Macdonald, P.; Esteller, M.; Gray, N. S.; Haber, D. A.; Stratton,
Classification. Bioinformatics 2017, 33, 2424−2426.
M. R.; Benes, C. H.; Wessels, L. F. A.; Saez-Rodriguez, J.; McDermott,
(150) Tyanova, S.; Temu, T.; Sinitcyn, P.; Carlson, A.; Hein, M. Y.;
U.; Garnett, M. J. A Landscape of Pharmacogenomic Interactions in
Geiger, T.; Mann, M.; Cox, J. The Perseus Computational Platform
Cancer. Cell 2016, 166, 740−754.
for Comprehensive Analysis of (Prote)Omics Data. Nat. Methods
(136) Zeevi, D.; Korem, T.; Zmora, N.; Israeli, D.; Rothschild, D.; 2016, 13, 731−740.
Weinberger, A.; Ben-Yacov, O.; Lador, D.; Avnit-Sagi, T.; Lotan-
(151) Coley, C. W.; Thomas, D. A.; Lummiss, J. A. M.; Jaworski, J.
Pompan, M.; Suez, J.; Mahdi, J. A.; Matot, E.; Malka, G.; Kosower, N.;
N.; Breen, C. P.; Schultz, V.; Hart, T.; Fishman, J. S.; Rogers, L.; Gao,
Rein, M.; Zilberman-Schapira, G.; Dohnalova, L.; Pevsner-Fischer,
H. Y.; Hicklin, R. W.; Plehiers, P. P.; Byington, J.; Piotti, J. S.; Green,
M.; Bikovsky, R.; Halpern, Z.; Elinav, E.; Segal, E. Personalized
W. H.; Hart, A. J.; Jamison, T. F.; Jensen, K. F. A Robotic Platform for
Nutrition by Prediction of Glycemic Responses. Cell 2015, 163,
Flow Synthesis of Organic Compounds Informed by Ai Planning. 1079−1094. Science 2019, 365, eaax1566.
(137) Kircher, M.; Witten, D. M.; Jain, P.; O’Roak, B. J.; Cooper, G.
(152) Segler, M. H. S.; Preuss, M.; Waller, M. P. Planning Chemical
M.; Shendure, J. A General Framework for Estimating the Relative
Syntheses with Deep Neural Networks and Symbolic Ai. Nature 2018,
Pathogenicity of Human Genetic Variants. Nat. Genet. 2014, 46, 310− 555, 604−610. 315.
(153) Ahneman, D. T.; Estrada, J. G.; Lin, S. S.; Dreher, S. D.;
(138) Subramanian, S.; Huq, S.; Yatsunenko, T.; Haque, R.; Mahfuz,
Doyle, A. G. Predicting Reaction Performance in C-N Cross-Coupling
M.; Alam, M. A.; Benezra, A.; DeStefano, J.; Meier, M. F.; Muegge, B.
Using Machine Learning. Science 2018, 360, 186−190.
D.; Barratt, M. J.; VanArendonk, L. G.; Zhang, Q. Y.; Province, M. A.;
(154) Unke, O. T.; Meuwly, M. Physnet: A Neural Network for
Petri, W. A.; Ahmed, T.; Gordon, J. I. Persistent Gut Microbiota
Predicting Energies, Forces, Dipole Moments, and Partial Charges. J.
Immaturity in Malnourished Bangladeshi Children. Nature 2014, 510,
Chem. Theory Comput. 2019, 15, 3678−3693. 417−421.
(155) Chmiela, S.; Tkatchenko, A.; Sauceda, H. E.; Poltavsky, I.;
(139) Zhong, M.; Tran, K.; Min, Y. M.; Wang, C. H.; Wang, Z. Y.;
Schutt, K. T.; Muller, K. R. Machine Learning of Accurate Energy-
Dinh, C. T.; De Luna, P.; Yu, Z. Q.; Rasouli, A. S.; Brodersen, P.; Sun,
Conserving Molecular Force Fields. Science Advances 2017, 3,
S.; Voznyy, O.; Tan, C. S.; Askerka, M.; Che, F. L.; Liu, M.; e1603015.
Seifitokaldani, A.; Pang, Y. J.; Lo, S. C.; Ip, A.; Ulissi, Z.; Sargent, E.
(156) Carleo, G.; Troyer, M. Solving the Quantum Many-Body
H. Accelerated Discovery of Co2 Electrocatalysts Using Active
Problem with Artificial Neural Networks. Science 2017, 355, 602−605.
Machine Learning. Nature 2020, 581, 178−183.
(157) Schutt, K. T.; Arbabzadah, F.; Chmiela, S.; Muller, K. R.;
(140) Manipatruni, S.; Nikonov, D. E.; Lin, C. C.; Gosavi, T. A.; Liu,
Tkatchenko, A. Quantum-Chemical Insights from Deep Tensor
H. C.; Prasad, B.; Huang, Y. L.; Bonturim, E.; Ramesh, R.; Young, I.
Neural Networks. Nat. Commun. 2017, DOI: 10.1038/ncomms13890.
A. Scalable Energy-Efficient Magnetoelectric Spin-Orbit Logic. Nature
(158) Faber, F. A.; Hutchison, L.; Huang, B.; Gilmer, J.; Schoenholz, 2019, 565, 35−42.
S. S.; Dahl, G. E.; Vinyals, O.; Kearnes, S.; Riley, P. F.; von Lilienfeld,
(141) Zhu, X. J.; Li, D.; Liang, X. G.; Lu, W. D. Ionic Modulation
O. A. Prediction Errors of Molecular Machine Learning Models
and Ionic Coupling Effects in Mos2 Devices for Neuromorphic
Lower Than Hybrid Dft Error. J. Chem. Theory Comput. 2017, 13,
Computing. Nat. Mater. 2019, 18, 141−148. 5255−5264.
(142) Bai, Y.; Wilbraham, L.; Slater, B. J.; Zwijnenburg, M. A.;
(159) Ramakrishnan, R.; Dral, P. O.; Rupp, M.; von Lilienfeld, O. A.
Sprick, R. S.; Cooper, A. I. Accelerated Discovery of Organic Polymer
Big Data Meets Quantum Chemistry Approximations: The Delta-
Photocatalysts for Hydrogen Evolution from Water through the
Machine Learning Approach. J. Chem. Theory Comput. 2015, 11, 2087−2096.
Integration of Experiment and Theory. J. Am. Chem. Soc. 2019, 141,
(160) European Patent Office. Patent Families. https://www.epo. 9063−9071.
org/searching-for-patents/helpful-resources/first-time-here/patent-
(143) Xie, T.; Grossman, J. C. Crystal Graph Convolutional Neural
families.html (accessed June 15, 2021).
Networks for an Accurate and Interpretable Prediction of Material
(161) Gartner Glossary. https://www.gartner.com/en/information-
Properties. Phys. Rev. Lett. 2018, DOI: 10.1103/PhysRev-
technology/glossary/hype-cycle (accessed June 15, 2021). Lett.120.145301.
(144) van de Burgt, Y.; Lubberman, E.; Fuller, E. J.; Keene, S. T.;
Faria, G. C.; Agarwal, S.; Marinella, M. J.; Talin, A. A.; Salleo, A. A
Non-Volatile Organic Electrochemical Device as a Low-Voltage
Artificial Synapse for Neuromorphic Computing. Nat. Mater. 2017, 16, 414−418.
(145) Gomez-Bombarelli, R.; Aguilera-Iparraguirre, J.; Hirzel, T. D.;
Duvenaud, D.; Maclaurin, D.; Blood-Forsythe, M. A.; Chae, H. S.;
Einzinger, M.; Ha, D. G.; Wu, T.; Markopoulos, G.; Jeon, S.; Kang,
H.; Miyazaki, H.; Numata, M.; Kim, S.; Huang, W. L.; Hong, S. I.;
Baldo, M.; Adams, R. P.; Aspuru-Guzik, A. Design of Efficient
Molecular Organic Light-Emitting Diodes by a High-Throughput
Virtual Screening and Experimental Approach. Nat. Mater. 2016, 15, 1120−1127. 3212
https://doi.org/10.1021/acs.jcim.1c00619
J. Chem. Inf. Model. 2021, 61, 3197−3212