GenAI-Assisted Learning in Online Problem-Solving Assessments: Cognitive and Behavioral Insights Across Course Levels

An-Ching Shih; Yuan-Hsuan Lee; En-Yu Liao; Jiun-Yu Wu

doi:10.59668/2579.27067

GenAI-Assisted Learning in Online Problem-Solving Assessments: Cognitive and Behavioral Insights Across Course Levels

An-Ching Shih, Yuan-Hsuan Lee, En-Yu Liao, & Jiun-Yu Wu

Abstract

Generative Artificial Intelligence (GenAI) agents play an increasing role in education, particularly in complex problem-solving and analytical programming. This study explores GenAI usage across different levels of Computer-Based Problem-Solving Assessments, revealing that analysis-focused queries dominated initially, with application-level queries prevailing. As task difficulty increased, GenAI reliance grew, with more application-level queries but fewer knowledge-level ones. Effective strategies such as step-by-step decomposition and precise questioning with the Chain-of-Thought approach can enhance GenAI-assisted learning, improving problem-solving efficiency.

Introduction

Recent developments in Generative Artificial Intelligence Agents (GenAI Agents) with large language models (LLM) have revolutionized education and how people learn (Malinka et al., 2023). GenAI offers personalized support in knowledge acquisition and programming (Qadir, 2023; Sun et al., 2024; Wu et al., 2021). Many studies emphasize the importance of developing learners' skills and literacy using GenAI agents (Kasneci et al., 2023; Mogavi et al., 2024; Zhai et al., 2021). However, the use of agents in complex problem-solving with analytical programming is scant. This research explores students' assessment-as-learning processes using GenAIs during online Computer-Based Problem-Solving Assessments. Two statistics courses with varying levels of difficulty were studied to provide guidance on how best to facilitate learner-GenAI collaborations in high-stack evaluation settings.

The study analyzes students' GenAI usage context, frequency, and the content of queries during online assessments. We collected real-time interaction logs and queries and categorized them into "Working" and "Searching" to distinguish the contexts and conditions of GenAI usage. To evaluate the cognitive depth of these queries, Bloom's taxonomy (Forehand, 2010; Seaman, 2011), which defines six cognitive levels of educational objectives—Remember, Understand, Apply, Analyze, Evaluate, and Create—was used. Based on the hierarchies, the taxonomy was aggregated into three cognitive levels: Knowledge-level, Application-level, and Innovation-level, to classify user queries.

Three research questions were proposed:

RQ1: How do the usage states during exams differ between the two courses, and how do they evolve?

RQ2: Do the proportions of cognitive-level questions differ between the two courses with GenAI agent uses?

RQ3: What are the patterns of different cognitive-level questions for high and low achievers in the two courses?

Methods

Procedure and Data Preparation

Data was collected in the online Computer-Based Problem-Solving assessments at the end of semesters from two 16-week analytics courses at a Taiwan university. There are 38 students in the Introductory and 26 in the Advanced course. The courses focused on introducing students to applied statistical methods and analytical problem-solving. Both courses instructed students to use GenAI agents, e.g., ChatGPT or Perplexity, to complete their assignments. Data was gathered during their three-hour online closed-book assessments, where students solved complex analytical problems using R, with access to online search engines and GenAI agents. The entire exam was divided into 100 segments based on duration. Researchers recorded two main assessment behavioral sequences: "Working" (assessment-related behaviors) and "Searching" (using online resources or GenAI agents). Each segment might include three behaviors, i.e., categorized as Work.Sheet, Work.Mixed, Work.Analysis, Search.GenAI, Search.Mixed, or Search.Browser. Two behavior sequences or their combinations per segment were created and combined, as shown in Fig.1, using the TraMineR package (Gabadinho et al., 2009). In addition, Learner-GenAI inquiries were collected for qualitative encoding and analysis.

Procedure and Data Preparation

The Mixture Hidden Markov Model (MHMM) (Helske & Helske, 2019), analyzed usage patterns across the two courses for RQ1. For RQ2, Linear Mixed Models (LMMs) (Bates et al., 2015) were used to examine course effects on GenAI questioning, with Statistical Prior Ability as a covariate. To address RQ3, the framework Analysis summarized the content of questions for high and low achievers across courses and cognitive levels.

Result

In response to RQ1, MHMM identified three states in the introductory course: analysis with GenAI, analysis with a browser, and mixed-tool sheet work. Fig. 2B shows that analysis dominated the exam, with equal 12% transitions to Search.Browser and Search.GenAI. Students preferred mixed-tool assistance when working on exam sheets.

Two additional GenAI-driven states emerged in the Advanced Course, extending their role beyond analysis to exam sheets and mixed tasks. Fig. 2A shows that GenAI (gray section) accounts for a substantial portion of the overall behaviors. It suggested a more versatile Learner-GenAI collaboration in the advanced course.

The LMM analysis (Fig. 3) showed a significant interaction between Prior Ability and Course Level in GenAI questioning, addressing RQ2. On average, participants asked 13.6 questions in the introductory course and 28.4 in the advanced course when their prior ability was 52. A one-point increase in Prior Ability reduced questions by 0.62 in the Advanced Course but not in the Introductory Course. Moreover, after controlling for Prior Ability, Fig. 3 indicated that the interaction between course level and the cognitive level of the questions was significant. Application questions took the highest proportion in both courses, but the Knowledge rate was higher in the introductory course, while the Innovation ratio was the lowest in both courses.

Finally, the top and bottom 10% of students were interviewed about their purposes for using GenAI. As shown in Table 1, High achievers tend to ask precise knowledge-level questions [H-In-K-3], while low achievers often pose vague ones [L-In-K-1]. In application-level questions, high achievers articulate their requests about processes [H-Ad-A-2], whereas low achievers struggle to define key concepts before analysis [L-In-A-3].

Discussion

As the course difficulty advances, skills in analytical application and implementation become more critical, shifting the focus of learner-GenAI queries. In the introductory course, students would use GenAIs for analysis and, periodically, alternate with browser searches for knowledge-level questions, but it played a nondominant role in exam assistance. Conversely, Advanced Course students used GenAI for a wider range of purposes, posing twice as many application-level support queries while reducing their dependence on knowledge-level assistance.

Research on GenAI query content shows that high achievers formulate precise, knowledge-level, structured application-level questions, whereas low achievers do not. To improve query effectiveness, learners could adopt a chain-of-thought approach to systematically break down prompts for precise GenAI queries by clearly defining context, subject, and objective. Breaking tasks into steps enhances comprehension: start with knowledge-level concepts, then move to application-level queries if the decomposition is unclear. Specifying response formats (length, scope) refines knowledge-level questions, and reverse-thinking helps ensure relevance. By applying these strategies, learners can optimize their use of GenAI, enhancing problem-solving skills and higher-order thinking.

Conclusion

This study examines how course difficulty influences the use of GenAI agents in time-limited problem-solving. Findings indicate that GenAIs were primarily used for analytical tasks, with application-level queries being the most frequent. As course difficulty increased, Learner-GenAI engagement diversified, with more queries and a higher proportion at the application level, while reliance on knowledge-level queries declined. To optimize GenAI-assisted learning, strategies such as structured decomposition, precise questioning, and response format specification can enhance effectiveness, fostering higher-order thinking and adaptive problem-solving skills.

References

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01
Forehand, M. (2010). Bloom’s taxonomy. In M. Orey (Ed.), Emerging perspectives on learning, teaching, and technology (pp. 41–47). Global Text.
Gabadinho, A., Ritschard, G., Studer, M., & Müller, N. S. (2009). Mining sequence data in R with the TraMineR package: A user’s guide. University of Geneva, Department of Econometrics and Laboratory of Demography.
Helske, S., & Helske, J. (2019). Mixture hidden Markov models for sequence data: The seqHMM package in R. Journal of Statistical Software, 88(3), 1–32. https://doi.org/10.18637/jss.v088.i03
Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gaspar, J. S., Hugger, K., Knight, S., Kübler, S., Meyer, A., Mezerji, S. A., Mirante, G., Olsson, J., Rivera-Rodrigo, J., Scantamburlo, T., Schünemann, B., Specht, M., Steinert, A., . . . Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, Article 102274. https://doi.org/10.1016/j.lindif.2023.102274
Malinka, K., Peresíni, M., Firc, A., Hujnák, O., & Janus, F. (2023). On the educational impact of ChatGPT: Is artificial intelligence ready to obtain a university degree? In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 (pp. 47–53). Association for Computing Machinery. https://doi.org/10.1145/3587102.3588827
Mogavi, R. H., Deng, C., Kim, J. J., Zhou, P., Kwon, Y. D., Metwally, A. H. S., Tlili, A., Bassanelli, S., Bucchiarone, A., Gujar, S., Lennartz, C., Maia-Estrada, P., Meditarraneo, C., Senthilnathan, K., Zhang, H., . . . Hui, P. (2024). ChatGPT in education: A blessing or a curse? A qualitative study exploring early adopters’ utilization and perceptions. Computers in Human Behavior: Artificial Humans, 2(1), Article 100027. https://doi.org/10.1016/j.chbah.2023.100027
Qadir, J. (2023). Engineering education in the era of ChatGPT: Promise and pitfalls of generative AI for education. In 2023 IEEE Global Engineering Education Conference (EDUCON) (pp. 1–9). IEEE. https://doi.org/10.1109/EDUCON54332.2023.10225544
Seaman, M. (2011). Bloom’s taxonomy. Curriculum and Teaching Dialogue, 13(1–2), 29–43.
Sun, D., Boudouaia, A., Zhu, C., & Li, Y. (2024). Would ChatGPT-facilitated programming mode impact college students’ programming behaviors, performances, and perceptions? An empirical study. International Journal of Educational Technology in Higher Education, 21(1), 14.
Wu, J.-Y., Yang, C. C. Y., Liao, C.-H., & Nian, M.-W. (2021). Analytics 2.0 for precision education: An integrative theoretical framework of the human and machine symbiotic learning. Educational Technology & Society, 24(1), 267–279. https://jstor.org/stable/26977873
Zhai, X., Chu, X., Chai, C. S., Jong, M. S. Y., Istenic, A., Spector, M., Liu, J.-B., Yuan, J., & Li, Y. (2021). A review of artificial intelligence (AI) in education from 2010 to 2020. Complexity, 2021, Article 8812542. https://doi.org/10.1155/2021/8812542