Predicting Chemical Biodegradability for Sustainable Chemical Manufacturing: A Machine Learning Approach Using 3D Molecular Descriptors

VIEWS - 61 (Abstract) 11 (PDF)
Alaa M Elsayad, Hassan Yousif Ahmed, Khaled A Elsayad, Ammar Elyas Babiker Hassan, Mustafa Mohammed Hassan Mustafa, Akhtar Nawaz Khan, Arif Abdelwhab Ali, Sahar A. Mokhtar

Abstract


Achieving sustainable cities and promoting responsible consumption require innovative approaches to chemical design and manufacturing. Precise prediction of chemical biodegradability is crucial for evaluating environmental concerns and facilitating the transition towards green chemistry. This study investigates the effectiveness of ten distinct groups of three-dimensional (3D) molecular descriptors for classifying compounds with rapid biodegradability. The Merck molecular force field (MMFF94s) was used to compute descriptors and generate 3D conformations for a dataset of chemical compounds. The dataset underwent rigorous preprocessing, including feature selection, outlier management, and scaling. Support Vector Machines (SVMs) were tested alongside three tree-based ensemble learning algorithms: Extreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), and Random Forest. Bayesian optimization was employed to optimize model hyperparameters and enhance cross-validated Area Under the Receiver Operating Characteristic Curve (AUC). The GETAWAY descriptors, 3D autocorrelation descriptors, and 3D-MoRSE descriptors consistently demonstrated superior performance compared to other descriptors across all machine learning models. An SVM model trained on 3D autocorrelation descriptors achieved the highest prediction accuracy (0.88), sensitivity (0.83), specificity (0.91), F1-score (0.82), Cohen's Kappa statistic (0.74), and an AUC of 0.93 on an independent test set. Advanced analytical techniques, including Permutation Feature Importance (PFI), SHapley Additive exPlanations (SHAP), and partial dependency plots (PDP) were utilized to identify the most influential 3D autocorrelation descriptors. The findings of this study demonstrate that 3D molecular descriptors, particularly 3D autocorrelations, play a critical role in developing accurate and interpretable models for predicting chemical biodegradability. These models contribute significantly to the advancement of green chemical design and the development of effective regulatory policies that support the objectives of SDG 11 (Sustainable Cities and Communities) and SDG 12 (Responsible Consumption and Production). By fostering sustainable chemical manufacturing practices, we can create healthier and more resilient urban environments while minimizing the environmental impact of human activities.

Keywords


Biodegradability, 3D molecular descriptors, SVM, XGboost, gradient boosting, random forest permutation feature importance, SHAP, QSAR, environmental risk assessment, sustainable chemistry

Full Text:

References

View



DOI: https://doi.org/10.26789/AEB.2024.02.009
Crossmark

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Alaa M Elsayad, Hassan Yousif Ahmed, Khaled A Elsayad, Ammar Elyas Babiker Hassan, Mustafa Mohammed Hassan Mustafa, Akhtar Nawaz Khan, Arif Abdelwhab Ali, Sahar A. Mokhtar

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.



Cookies Notification