Materials define the upper limit of most engineering systems. For example, electric vehicles won’t run 1,000 kilometers on a single charge without fundamental changes in the battery materials. Boeing 787 reduced its weight by 20 percent after replacing aluminum with new composite materials. However, the development of a new material involves lots of trial-and-error and is usually driven by the intuition of material scientists, which can typically take 10 - 20 years.

The vision of my research is to significantly accelerate this process with a machine learning system. I believe that a machine that learns from all materials data will outperform any material scientist in every domain, and it might also produce new scientific knowledge by bridging different fields. Practically, the increasing global climate and environment challenges demand rapid material innovations in various fields like battery, solar cells, etc. The old material development process may be too slow for the 2050 climate goals. My research also aims to solve real-world problems and use them to motivate algorithm development in machine learning.

Neural networks for materials


I developed Crystal Graph Convolutional Neural Networks (CGCNN) [Xie et al., Phys. Rev. Lett. (2018)] algorithm as a general framework for learning materials. It preserves all necessary symmetries (permutation, rotation, translation) of a material system and can be used as general encoder for all types of materials. Since its publication, CGCNN has been used for inorganic materials [Xie et al., Phys. Rev. Lett. (2018)], amorphous polymers and liquids [Xie et al., Nat. Commun. (2019)], crystaline polymers [Zeng et al., arXiv:1811.06231 (2018)], surfaces [Lym et al., J. Phys. Chem. C (2019)], etc. by us and other researchers.

I also extended CGCNN to learn from different types of material simulation data, like DFT and MD. For instance, it is shown that CGCNN can predict multiple material properties as accurate as DFT calculations after training with materials. We also further extend the method to predict electron density [Gong et al., Phys. Rev. B (2019)], and learn from time-series MD simulation data by incoorperating a dynamic loss function [Xie et al., Nat. Commun. (2019)].

The CGCNN algorithm is open-sourced at GitHub, which has received over 100 stars and been cited in over 100 papers in the past 1.5 years. It has been reimplemented in popular machine learning packages like PyTorch geometric and DLTK, as well as materials informatics packages like XenonPy and matminer.

New scientific knowledge from materials models


I also work on extracting new scientific knowledge from the machine learning models trained on materials data. New knowledge might emerge because ML models allow us to scrutinize huge quantities of data that cannot be processed by humans. For example, we discovered new solvation environments for Li-ions in a state-of-the-art amorphous polymer electrolyte by learning from ~50 GB of MD simulation data [Xie et al., Nat. Commun. (2019)], which allows us to explain an interesting phenonmenon recently discovered by experiments [Pesko et al., J. Electrochem. Soc. (2017)]. In addition, I explored the physical meaning of the CGCNN model. By simply learning to predict the formation energy of materials without any other prior knowledge, the model automatically discovers rules of the periodic table, identifies the similarity between different boron structures, and uncovers the stability of different coordination environments [Xie et al., J. Chem. Phys. (2018)].

Solving real-world problems


I aim to solve real-world problem by accelerating materials design with the machine learning methods. Lithium metal electrode is one of the next generation battery technology that can significantly increase the energy density of batteries, but the main challenge in the field is to achieve dendrite-free cycling [Albertus et al, Nat. Energy (2018)]. By learning from an open material database, we applied CGCNN to screen ~13,000 materials to suppress the dendrite formation in lithium metal batteries. [Ahmad et al., ACS Cent. Sci. (2018)]. We achieved prediction acuracy close to quantum simulations and accelerated the screening by times (exclude computation time by the open database authors) or 3 times (include the computation time). I found that application research often in turn stimulates algorithm innovations. In the above study, we developed an ensumble approach to estimate the prediction uncertainty during extrapolation.