EdTech Archives EdTech Archives The Journal of Applied Instructional Design, 15(2)

When ChatGPT Asks Back: Designing Chatbots to Promote Deeper Learning

Maria Klar

Abstract

Currently, chatbots based on generative AI are used by students often at their own initiative and unguided. Learners use chatbots primarily as static text generators, thereby underusing their potential. This experimental study investigates whether chatbot-initiated questions as a form of monitoring prompt can promote learning. It provides an analysis of chatbot interaction patterns, mental effort, and knowledge outcomes among 71 high school students. Two groups engaged in 15 minutes of research with a ChatGPT-4o-based chatbot, and the experimental chatbot ended each output with a comprehension or reflection question. Results show that chatbot-initiated questions increased reflective responses but did not significantly impact mental effort or knowledge gain. In terms of interaction, the control group rarely engaged in deeper conversation while in the experimental group 41% of all prompts were answers. This study demonstrates that lightweight system prompt interventions can help students go beyond using chatbots as static text generators.

Introduction

Generative AI (genAI) chatbots have become part of the learners’ toolkit and will likely continue to become even more performant in the future. On the one hand, educators and researchers (Skulmowski, 2024; Zhai et al., 2024) are concerned about learners’ overreliance on genAI chatbots and the offloading of learning processes. On the other hand, these chatbots also provide rich opportunities for self-regulated learning (SRL; Pan et al., 2025). To support students in skillful learning with genAI chatbots, we need more insight into their actual interaction styles, as well as design knowledge on how to develop chatbots that facilitate rather than hinder deeper learning.

This study explores the interaction patterns of high school students in a research task with a chatbot. It investigates whether chatbot-initiated questions can serve as coregulatory prompts to enhance SRL, influencing students’ interaction patterns, mental effort, and learning outcomes.

Background

While chatbots can provide interactive learning experiences, prior research indicates that students primarily engage with them as static content generators rather than conversational partners (Klar, 2025). Research also shows that learners perceive chatbots as easy-to-use (Ngo, 2023) and have high self-efficacy which could lead to overconfidence and reduced mental effort (Stadler et al., 2024). Students already use chatbots without external guidance in information tasks (von Garrel & Mayer, 2023). The design of chatbot system prompts, i.e., their permanent role, may shape how learners interact with them and encourage more effective self-regulated learning. For example, chatbots can output monitoring prompts in the shape of comprehension questions and reflection questions (Guo, 2022). As the underlying genAI technology improves further, chatbots could become more widespread and more reliable in the coregulation of learning (Hadwin et al., 2018) and a continuum of self-regulated, coregulated, and AI-regulated learning becomes increasingly feasible (Molenaar, 2022). Research that accompanies these technological developments and iteratively tests designs that enhance learning is needed (Reigeluth & Honebein, 2024). This study investigates whether chatbot-initiated questions, as a form of coregulation, can influence student behavior, promoting deeper learning strategies while maintaining learner autonomy.

Method

A two-group randomized experimental study was conducted with 71 high school students (14.7 years old on average) who performed 15-minute exploratory research on conspiracy theories using a chatbot based on ChatGPT-4o. Participants were randomly assigned to:

  • Control Group (CG): Standard chatbot responses.

  • Experimental Group (EG): The chatbot was given a system prompt to ask comprehension and reflection questions at the end of each response, for example: “Why do you think people sometimes prefer simple explanations?”

Apart from the chatbot system prompts, both groups were equal regarding task, time, or instruction. Both groups were given the open-ended task to find out as much as they could about the topic of conspiracy theories. This open task was designed to reflect authentic information-seeking settings. The EG was not specifically informed about the chatbot role nor instructed to respond to the questions. Student interactions were recorded and their chatbot interactions were coded with qualitative content analysis. Mental effort was assessed using Paas’ (1992) self-report scale, and knowledge gain was measured through pre-post testing. The knowledge tests were open-ended questions and the texts were rated in terms of breadth, depth, and factuality. Two independent raters reached very good interrater agreement with an average weighted κ = .86 across the three criteria after two rounds. Group differences were analyzed with a MANOVA and post-hoc ANOVAs.

Results

The quantitative findings suggest chatbot-initiated questions significantly influenced self-regulated learning behavior: The EG did answer chatbot questions in 41% of their interactions, indicating that they responded to the prompt for deeper processing in many cases while taking the liberty to not respond to them in other cases. As hypothesized, the CG used significantly more adaptation prompts like “Make it shorter” or “Give me more details on this aspect” (M = 2.5) than in the EG (M = 1.42, p = .043), suggesting that when chatbot responses included questions, students engaged in fewer modifications. Knowledge gain was slightly higher in the EG (M = 1.8) than in the CG (M = 1.35), but the difference was not statistically significant (p = .141). Mental effort ratings did not differ significantly (p = .960), contradicting the hypothesis that chatbot questions encourage investing more mental effort.

To gain a better understanding of the learners’ chatbot interactions, every prompt they entered was coded. Table 1 shows the occurrences of these chatbot interactions for both groups individually and in total. The experimental group used about two more prompts per chat than the control group (11.8 prompts on average in the EG; 8.9 in the CG). This overall higher engagement allowed the experimental group to pose slightly more questions despite also giving vastly more answers. In both groups, but especially in the control group, the interaction patterns can be described as using the chatbot as a static text generator. Only 31% of the students in both groups used prompts to adapt the chatbot output to their needs. In the control group, the students did not engage in a conversation with the chatbot: they rarely asked follow-up questions or showed other kinds of conversational interaction. There are more cases of prompts asking for coregulation in the CG than the EG, such as “teach me”. Although these numbers are small, they show that some students in this sample were aware of this chatbot affordance.

Table 1

Codes, Subcodes, and Their Frequencies for Chatbot Interaction

Code/Subcode

Instances

n = 71

EG

n = 35

CG

n = 36

Total

699

377

322

Question

278

142

136

  • Question

174

84

90

  • Follow-up question

68

43

25

  • Search term

17

7

10

  • Asking the bot for its opinion

12

6

6

  • Clarifying what was meant by a question

7

2

5

Answer

169

156

13

  • Short answer (1-2 aspects)

133

122

11

  • Long answer (>2 aspects)

24

23

1

  • I don’t know

12

11

1

Adapting Prompts

140

50

90

  • Adapting the form

77

23

54

  • List, bullet points

16

6

10

  • Short or shorter

15

4

11

  • Summary

13

7

6

  • Easy or easier

11

1

10

  • Definition

6

2

4

  • Adapting the content

55

23

28

  • More, more on a specific aspect

37

18

19

  • Everything, everything important

8

2

6

  • Coregulation

12

4

8

  • Asking for feedback

5

3

2

  • “teach me”

3

1

2

  • “test my knowledge”

2

0

2

  • “explain better”

2

0

2

Off-topic, e.g., “Best Winter perfumes”, “What is inflation?”

36

7

29

Conversational Prompts

55

19

36

  • Reaction, e.g., “aha”, “good to know”, “okay”

26

6

20

  • Moderating the conversation, e.g., “Before I answer this, let me ask…”

13

7

6

  • Thank you

9

3

6

  • Hello, Bye

6

2

4

  • Please

1

1

0

Questions on the functionality of the chatbot, e.g., “Do you have an opinion?”, “Are you ChatGPT?”

20

2

18

Note. Instances = number of coded student prompts; EG = experimental group; CG = control group.

Discussion

Chatbot-initiated questions successfully encouraged learners to reflect on their current understanding, prior knowledge, and conceptions. However, their effect on mental effort and knowledge gain was limited, possibly due to students’ perception of chatbots as “easy” tools rather than as partners in learning. Overall, the students underutilized core chatbot affordances like engaging in deeper conversation and adapting the chatbot responses to their needs. The chatbot-initiated questions alleviated this to some degree. This shows that students are responsive to coregulatory chatbot designs that are easy to implement.

The investigated design in the form of a system prompt was just one of many ways to support self-regulated and coregulated learning. As this technology is evolving, there are innumerable options to explore further chatbot designs. The phase in which we need “research to improve” instructional designs rather than “research to prove” their effectiveness, as Reigeluth and Honebein (2024) suggest, will likely continue for some time. As this is a time of exploration for learners, educators, and researchers alike, why should we not include learners in participatory research for chatbot designs? Future research could work with learners (and educators) on designing the chatbot support that learners need and want (Amaefule et al., 2024; Newman et al., 2024). This would shift the discussion away from overreliance and offloading to learner empowerment and co-designed chatbots as partners in learning.

References

  1. Amaefule, C. O., Britzwein, J., Yip, J. C., & Brod, G. (2024). Children’s perspectives on self-regulated learning: A co-design study on children’s expectations towards educational technology. Education and Information Technologies, 30, 6117–6140. https://doi.org/10.1007/s10639-024-13031-0
  2. Guo, L. (2022). Using metacognitive prompts to enhance self-regulated learning and learning outcomes: A meta-analysis of experimental studies in computer-based learning environments. Journal of Computer Assisted Learning, 38(3), 811–832. https://doi.org/10.1111/jcal.12650
  3. Hadwin, A., Järvelä, S., & Miller, M. (2018). Self-regulation, co-regulation, and shared regulation in collaborative learning environments. In D. H. Schunk & J. A. Greene (Eds.), Handbook of self-regulation of learning and performance (2nd ed., pp. 83–106). Routledge.
  4. Klar, M. (2025). Using ChatGPT is easy, using it effectively is tough? A mixed methods study on K–12 students’ perceptions, interaction patterns, and support for learning with generative AI chatbots. Smart Learning Environments, 12, Article 32. https://doi.org/10.1186/s40561-025-00385-2
  5. Molenaar, I. (2022). The concept of hybrid human-AI regulation: Exemplifying how to support young learners’ self-regulated learning. Computers & Education: Artificial Intelligence, 3, Article 100070. https://doi.org/10.1016/j.caeai.2022.100070
  6. Newman, M., Sun, K., Dalla Gasperina, I. B., Shin, G. Y., Pedraja, M. K., Kanchi, R., Song, M. B., Li, R., Lee, J. H., & Yip, J. (2024). “I want it to talk like Darth Vader”: Helping children construct creative self-efficacy with generative AI. Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), (Article 117, pp. 1–18). Association for Computing Machinery. https://doi.org/10.1145/3613904.3642492
  7. Ngo, T. T. A. (2023). The perception by university students of the use of ChatGPT in education. International Journal of Emerging Technologies in Learning (iJET), 18(17), 4–19. https://doi.org/10.3991/ijet.v18i17.39019
  8. Paas, F. (1992). Training strategies for attaining transfer of problem-solving skill in statistics: A cognitive-load approach. Journal of Educational Psychology, 84(4), 429–434. https://doi.org/10.1037/0022-0663.84.4.429
  9. Pan, M., Lai, C., & Guo, K. (2025). Effects of GenAI-empowered interactive support on university EFL students’ self-regulated strategy use and engagement in reading. The Internet and Higher Education, 65, 100991. https://doi.org/10.1016/j.iheduc.2024.100991
  10. Reigeluth, C. M., & Honebein, P. C. (2024). Will instructional methods and media ever live in unconfounded harmony? Generating useful media research via the instructional theory framework. Educational Technology Research and Development, 72(5), 2543–2563. https://doi.org/10.1007/s11423-023-10253-w
  11. Skulmowski, A. (2024). Placebo or assistant? Generative AI between externalization and anthropomorphization. Educational Psychology Review, 36, Article 58. https://doi.org/10.1007/s10648-024-09894-x
  12. Stadler, M., Bannert, M., & Sailer, M. (2024). Cognitive ease at a cost: LLMs reduce mental effort but compromise depth in student scientific inquiry. Computers in Human Behavior, 160, Article 108386. https://doi.org/10.1016/j.chb.2024.108386
  13. von Garrel, J., & Mayer, J. (2023). Artificial intelligence in studies—Use of ChatGPT and AI-based tools among students in Germany. Humanities and Social Sciences Communications, 10, Article 799. https://doi.org/10.1057/s41599-023-02304-7
  14. Zhai, C., Wibowo, S., & Li, L. D. (2024). The effects of over-reliance on AI dialogue systems on students’ cognitive abilities: A systematic review. Smart Learning Environments, 11, Article 28. https://doi.org/10.1186/s40561-024-00316-7