From Assistants to Agents: The Evolution of Large Language Models in Data Science Workflows

Authors

  • Xiyuan Yin

DOI:

https://doi.org/10.56028/aetr.14.1.1582.2025

Keywords:

Large Language Models (LLMs), Data Science, Technical Evolution.

Abstract

This paper presents a comprehensive overview of the evolution of data science from a statistics-centric discipline to a machine learning–driven field, culminating in the current integration of large language models (LLMs). It identifies key limitations in traditional LLM applications—such as limited cross-domain adaptability, lack of interpretability, and workflow rigidity—and explores recent innovations addressing these challenges. Three representative frameworks—R&D-Agent, SPIO, and Agent Laboratory—illustrate LLMs’ transition from assistive tools to autonomous agents capable of planning, executing, and optimizing entire data science workflows. These systems leverage dual-agent cooperation, modular architectures, and self-correcting capabilities to improve performance in end-to-end data analysis and scientific research. The paper concludes by outlining future priorities, including domain-specific customization, standardized agent evaluation, and improved interpretability, all of which are essential for the next generation of intelligent, autonomous data science systems.

Downloads

Published

2025-07-26