Student Performance Prediction Model: Intervention Mechanisms and Risk Identification Based on Machine Learning
DOI:
https://doi.org/10.56028/aetr.15.1.1368.2025Keywords:
Machine Learning, Performance Prediction, Linear Regression, Random Forest, Academic Risk Identification.Abstract
This study employs machine learning methods to predict student academic performance, using techniques such as linear regression and random forest. The dataset consists of student records on mathematics and Portuguese subjects, including socio-demographic, academic, and behavioral features. Prior to modeling, data preprocessing steps were conducted, such as encoding categorical variables and removing anomalous samples where the final grade (G3) was zero. The performance of predictive models was evaluated using multiple metrics, including Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and R-squared (R²), across both subjects. The results show that removing abnormal G3=0 data significantly improves model accuracy, with the random forest outperforming linear regression in capturing non-linear relationships. Furthermore, SHAP (SHapley Additive exPlanations) analysis was applied to interpret model outputs and identify the most influential factors affecting student performance, such as prior grades, absenteeism, study time, and history of academic failure. Finally, the study predicts each student’s final score (G3), identifies at-risk students based on predicted low performance, and provides actionable recommendations for early academic intervention. These findings offer valuable insights for educators and policymakers to support data-driven decision-making in student support systems.