Machine Learning-Based Water Quality Prediction

Original scientific paper

Journal of Sustainable Development of Energy, Water and Environment Systems
ARTICLE IN PRESS (scheduled for Vol 14, Issue 01 (general)), 1130634
DOI: https://doi.org/10.13044/j.sdewes.d13.0634 (registered soon)
Ali Al-Ataby1 , Beza Getu1, Hussain Attia2
1 AURAK, Ras Al Khaimah, United Arab Emirates
2 American University of Ras Al Khaimah, Ras Al Khaimah, United Arab Emirates

Abstract

Water is an indispensable resource for all forms of life, with a particularly critical role in supporting human health, agriculture, and industrial development. With the predicted water scarcity worldwide, it is critical to have a tool to analyse and predict water potability accurately and in real-time. This study used machine learning models to predict water potability based on quality features such as potential of Hydrogen (pH) value, hardness, solids content, chloramines, sulfate, and conductivity. Potability is determined based on the concentration of these features in the water. Four machine learning algorithms, namely, Random Forest, Logistic Regression, Extreme Gradient Boosting (XGBoost), and Deep Learning Neural Networks, are used to analyse water potability after training using a water quality dataset. Initial experiments showed moderate performance, with Random Forest (F1-score = 0.47 and area under the receiver operating characteristic curve of 0.68) and XGBoost (F1-score = 0.49 and area under the receiver operating characteristic curve of 0.66), outperforming the other two models. After addressing class imbalance and introducing more features using feature engineering, the performance of the four models was significantly improved, with Random Forest achieving an F1-score of 0.85 and an area under the curve of 0.90 and XGBoost achieving an F1-score of 0.86 and an area under the curve of 0.91. The results clearly indicate that Random Forest and XGBoost consistently outperformed the Linear Regression model and the Deep Learning model in terms of predictive accuracy and robustness. These results demonstrate the critical importance of feature engineering and hyperparameter optimization in enhancing model effectiveness. A real-time water potability prediction application was developed to classify water as either “safe to drink” or “unsafe to drink”, and its functionality was successfully validated, and its output was displayed on a user-friendly graphical user interface (GUI).

Keywords: Water, Potability, Machine Learning, Random Forest, XGBoost, Deep Learning, Feature Engineering, AUC.

Creative Commons License
Views (in 2025): 113 | Downloads (in 2025): 32
Total views: 113 | Total downloads: 32

DBG