About the Database
The Energy-GNoME database was developed to identify and predict materials suitable for energy applications, such as thermoelectrics, cathodes, and perovskites. The process combines machine learning (ML) techniques with an iterative active learning approach, enabling continuous integration and refinement.
Material class | # Sub-class | # Materials (unique) | Dashbard |
---|---|---|---|
Cathodes | 9 | 20454 | Explore Cathodes |
Perovskites | 2 | 4259 | Explore Perovskites |
Thermoelectrics | 6 | 7530 | Explore Thermoelectrics |
Protocol overview
Here, we provide a brief overview of the protocol workflow illustrated in Figure 1.
1. Defining the Energy Material Region
We hypothesize a high-dimensional feature space where an energy material region \(E\) exists, containing materials suitable for specific energy applications. By leveraging existing datasets (e.g., MP database), we identify the intersection \(M^E = M \cap E\), forming the initial training set for our models.
2. Two-Phase Workflow
The protocol comprises two phases:
- Training Phase: Train ML models to classify and predict material properties.
- Prediction Phase: Identifies promising materials within the GNoME database and predicts their properties.
Training Phase
- Data Preparation: The specialized energy database \(M^E\) serves as the training set. Missing structural information leads to a conditional split:
- Structure Pipeline: Graph-based representation, regressors use the E(3)NN models.
- Composition Pipeline: Chemical descriptors-based representation, regressors use the GBDT models.
- AI-Experts (Screening Models): A committee of binary GBDT classifiers learns to identify materials similar to those in \(M^E\) by delineating the boundary of \(E\).
Prediction Phase
- Screening: Materials from the GNoME database (\(G\)) pass through the AI-experts to compute the likelihood of belonging to \(E\). Crystals with \(P(y \in M^E) > 0.5\) are retained for property prediction.
- Regression: Depending on the pipeline used, the materials are either converted to graphs (E(3)NNs) or descriptors (GBDTs) to predict their properties.
- Energy-GNoME Database: Candidates with predicted properties are stored for evaluation, refinement, and use by the community.
3. Iterative Active Learning
The protocol allows continuous improvement by integrating new experimental or computational data from the material science and engineering community. This iterative cycle refines both the AI-expert classifiers and regressors, making Energy-GNoME a dynamic and living database.
Learn More
For a deeper understanding of the protocol and the fundamental hypotheses behind it, we invite you to explore our detailed article:
Preprint
De Angelis P., Trezza G., Barletta G., Asinari P., Chiavazzo E. "Energy-GNoME: A Living Database of Selected Materials for Energy Applications". arXiv November 15, 2024. doi: 10.48550/arXiv.2411.10125.
-
Last update: 15/01/2025 14:11:55 ↩