With the hype surrounding Generative AI, the general belief is that large language models can also serve as comprehensive “intelligent tutors”. For example, ChatGPT or similar models can be used “out of the box” to support students in setting goals, completing assignments and becoming more knowledgeable in specific topics.
The reality shows us differently. General-purpose AI models are trained on a large corpus of text, including blog posts and social media, which often contains mediocre text, certainly not at the level of quality we would expect from textbooks. Even if prompted carefully, they may carry false information or a misbelief. One example is the resurgence of myths such as the “learning styles”, which ChatGPT objectively proposes as fact.
Education, traditionally, relies on “structured content”: teachers must carefully prepare and design their content, deciding on learning goals, selecting appropriate resources, and determining the assessment strategy. The content structure also depends on the context, for example, the level of education, the purpose, and the general background of the students. The general-purpose GenAI model, such as ChatGPT, does not have this information unless it is explicitly included as context, for example, through in-text prompting.
Previous constructs, prior to 2022 and the emergence of GenAI, were Intelligent Tutoring Systems — also known as Adaptive Learning Systems. These AI tutors were built on a logical and explicit representation of the student’s knowledge (the so-called “learner model”) and the pedagogical approach that the AI Tutor should follow. ITS have long dominated the scene in AIEd, becoming the most powerful operationalisation of an AI tutor. However, they have faced considerable drawbacks, such as the difficulty to adapt to varying and changing context, the difficulty to deal with more “ill defined” content, such history, humanities, philosophy, and all other subjects in which “there is no right or wrong”, but all answers are subject to interpretation. In these cases, we have a conflict, because the structure that algorithms and computation require does not sufficiently adapt to the body of knowledge which students have to navigate.
In this sense, GenAI models such as ChatGPT have opened up new possibilities. Their ability to “master” language flexibly is certainly a comparable advantage to the rigidity of Intelligent Tutoring Systems. However, flexibility comes at a cost: the amount of the so-called “hallucinations”-the confabulated facts and made-up citations — has a significant drawback. One cannot control the output of a Generative AI model, nor assure that this output is coherent, meaningful, and pedagogically aligned.
Ideally, we would combine the structure and trustworthiness of Intelligent Tutoring Systems with the flexibility and language generation power of Large Language Models to be able to have “the best of both worlds”. However, implementations that effectively use these two technologies are yet to be seen.
