Evaluating Bias and Inclusiveness of Large Language Models on Social, Political, and Historical Education
DOI:
https://doi.org/10.56028/aetr.15.1.1758.2025Keywords:
Large Language Model, Machine learning, AI ethnics, Political bias, Educational technology.Abstract
Educational institutions have been impacted by the rapid shift of information gathering mediums and student perspectives shaped by generative artificial intelligence in social, political, and historical education. Through a comparative analysis of four distinct large language models (LLMs): ChatGPT, Gemini, DeepSeek, and Qwen and applying a series of educational inquiries and classroom-applicable scenarios, the study evaluates biases, accuracy, inclusiveness, and potential misleading content in LLM-generated responses using a five-band benchmark designed based on the Common Core State Standards (CCSS) educational standards. Results reveals significant difference in terms of consistency and inclusivity performance among models; though showed few relations to political stance of the model (produced by which country), the outcome analysis still highlights how generative AI responses could impact educators and learners predominantly through its range of information and phrasing logic.