Comparing AI and Human Feedback on Writing Assignments – Alignment and Students’ Perceptions

Clara Schumacher; Leo Sylvio Rüdian; Jakub Kuzilek; Marvin Kretschmer; Yassin M. Elsir; Claudia Ruhland; Niels Pinkwart

doi:10.59668/2579.27064

Comparing AI and Human Feedback on Writing Assignments – Alignment and Students’ Perceptions

Clara Schumacher, Leo Sylvio Rüdian, Jakub Kuzilek, Marvin Kretschmer, Yassin M. Elsir, Claudia Ruhland, & Niels Pinkwart

Abstract

Feedback is key for successful learning, but is scarcely provided in higher education; AI might be promising. In addition to feedback alignment between the teacher and the AI, students’ feedback perceptions are also crucial and investigated in this exploratory study (N = 65). Findings reveal that human and AI feedback are correctly identified in ~50% of cases, but this depends on the presentation order. Human feedback is considered significantly fairer and better accepted. Human and AI feedback are considerably well aligned considering the small dataset.

Introduction

Feedback is known to be a major factor for successful learning (Wisniewski et al., 2020). But instead of only indicating what is incorrect, feedback is particularly effective when it is formative, providing explanations of errors and advice on appropriate solution strategies (Hattie & Timperley, 2007). However, in higher education, feedback is seldom provided to learners due to resource constraints (Boud & Molloy, 2013), which is particularly true for writing tasks. However, the advent of artificial intelligence (AI) in educational contexts might be a means to scale feedback.

Research investigating ChatGPT-generated feedback on text assignments reveals that the quality of human feedback is higher than that of AI feedback for clarity of directions for improvement, accuracy, prioritization of essential features, and supportive tone, but not for criteria-based feedback (Steiss et al., 2024). To be useful, learners need to be willing to accept feedback and use it to revise their work (Hattie & Timperley, 2007; Narciss, 2008). Hence, besides investigating the accuracy of human and AI feedback, students’ perceptions also need to be examined (Nazaretsky et al., 2024; Rüdian et al., 2025), but receive limited attention in research (Strijbos et al., 2021). Recent research considering feedback perceptions found that students' capabilities to correctly detect the feedback source depend on the course and task, and that students prefer human over AI-feedback, which is also related to the correct identification of the feedback source (Nazaretsky et al., 2024).

To advance research on students' perceptions of AI feedback, this initial exploratory study within a larger research project first examines whether students in an AI feedback group correctly detect the feedback source (RQ1a) and whether this relates to the presentation order (RQ1b). Second, it is investigated whether students’ perceptions of feedback differ by source (RQ2). Third, the congruence between human- and AI-based feedback scoring is analyzed (RQ3).

Methods

Participants and design

This study was conducted in three parallel seminar groups in the teacher education Master’s program, in which 78 students were enrolled. The N = 65 participating students (73.8% female) were M = 24.65 (SD = 3.04) years old and had studied on average for 10.51 semesters (SD = 2.45).

Students had to write a research project exposé (500 words). Students signed themselves up for the AI-feedback group (n = 38) or the human-feedback group (n = 27). The AI-group received feedback from the lecturer (rubrics and short text), plus AI-feedback with values assigned to extended rubrics. To avoid an order effect, students randomly received either human or AI feedback first. The human-feedback group only received the lecturer's feedback. The feedback rubrics included six categories and three measures (0 = not fulfilled, 0.5 = partly fulfilled, 1 = fulfilled; see Table 2). Students received the feedback three days after the submission deadline or one week later, depending on their participation in the course.

Instruments

To assess students’ feedback perceptions, the Feedback Perceptions Questionnaire (FPQ; Strijbos et al., 2021; 16 items, 5-point Likert scale) was adapted. The 16 items were rated on a 5-point Likert scale (1 = fully disagree, 5 = fully agree). For the subscales, Cronbach’s α ranged from .71 to .93. Students in the AI-feedback group were asked whether the feedback was AI-generated. Furthermore, participants stated demographic details (e.g., semester studied, age, and current grade point average).

Generation of the AI feedback

The AI feedback tool is designed to support teachers by automating the process of providing criteria-oriented feedback on student submissions. The tool analyzes students' text submissions using predefined analytic rubrics, leveraging natural language processing and machine learning, including large language models' capabilities.

The system is composed of three main components. First, a feature extraction pipeline processes student submissions, converting them into numerical indicators based on linguistic and contextual criteria. This step ensures that the model can quantitatively assess different aspects of the student’s submission. Second, the training process involves a machine-learning regression tree model that learns from historical student submissions and teacher-provided ratings. Third, the prediction component applies the trained model to new student submissions to predict rubric-based ratings.

Results

RQ1. Correct detection of the feedback source and relation to presentation order

To determine whether students correctly identified the AI feedback frequencies, the frequencies were inspected. Human feedback was correctly identified as human by 54.1%, 29.7% considered it as AI feedback, and 16.2% were insecure. 28.9% thought the AI feedback was not AI-based, 47.4% correctly identified it as AI feedback, and 23.7% were unsure.

Investigating whether correct identification relates to the presentation order of the feedback types (12 participants received AI feedback first, 25 received human feedback first), Pearson correlation reveals that participants correctly identify human feedback when human feedback is presented second (r = .407, p = .023), and AI feedback is also correctly identified when human feedback is presented second (r = .433, p = .027).

RQ2. Students’ perceptions of the feedback depend on the source

To investigate whether students in the AI group perceived the two feedback types differently in terms of affect, willingness to use the feedback, fairness, acceptance, and usefulness, two-sided paired t-tests were used. Results (see Table 1) indicate that human feedback is perceived as significantly fairer and more acceptable.

Table 1 Students’ perceptions of AI-feedback and human-feedback
Variable		M	SD	t(36)	p	d
Positive affect	Human-feedback	3.64	.94
Positive affect	AI-feedback	3.36	1.02	1.36	.181	.22
Negative affect	Human-feedback	1.28	.65
Negative affect	AI-feedback	1.45	.81	-1.83	.076	-.30
Willingness to use feedback	Human-feedback	3.80	1.01
Willingness to use feedback	AI-feedback	3.54	1.11	1.43	.161	.24
Fairness	Human-feedback	4.09	.63
Fairness	AI-feedback	3.67	.87	2.94	.006	.48
Acceptance	Human-feedback	4.68	.49
Acceptance	AI-feedback	4.41	.79	2.46	.019	.40
Usefulness	Human-feedback	3.59	.89
Usefulness	AI-feedback	3.33	1.18	1.51	.139	.25

RQ3. Congruence of the scoring of the human and AI-based feedback

For all feedback, the human rater (70) resulted in M = .829 points (SD = .131), and the AI (41) resulted in M = .837 (SD = .097). To assess the congruence of the scoring (human versus AI), the mean absolute error (MAE) was computed as an interpretable metric. For the imbalanced and small dataset (see Table 2), we obtained an MAE of 0.21 (overall RMSE of 0.27), indicating good congruence on a scale of [0,1].

Table 2 Criteria set, statistics, including Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE)
ID	Criterion	M	SD	MAE	RMSE
1	Concept idea and state of research	.89	.24	.31	.24
2	Scientific justification and relevance	.83	.29	.18	.29
3	Development of research question	.86	.25	.23	.25
4	Theoretical framework and concepts	.69	.29	.27	.30
5	Number of academic sources	.95	.21	.09	.22
6	Length	.80	.32	.20	.32

Discussion

Results indicate that about 54% of participants correctly detected human feedback, whereas only about 47% correctly identified AI feedback, a pattern similarly reported by Nazaretsky et al. (2024). Furthermore, the results suggest that students are better able to identify human feedback when it is presented after AI feedback. In a larger study, the mediation effects of order and correct identification with feedback perceptions need further examination. In this sample, students consider human feedback fairer and accept it more than AI feedback, but it must also be taken into account that the usefulness of each piece of feedback is considered only mediocre. Still, this feedback tool highlights the possibilities of using AI to assist teachers in effectively rating students’ submissions, ensuring timely, structured, and criteria-based feedback at scale. Keeping a human-in-the-loop approach maintains educators’ essential role in guiding student learning while harnessing the benefits of AI. Our human-in-the-loop approach might also account for students’ preference for human feedback (Nazaretsky et al., 2024).

As this exploratory study was conducted in the field, the groups receiving AI feedback or human feedback first are not equal in size, resulting in statistical limitations. More training data might also have resulted in better performance in predicting all feedback criteria.

Funding statement

This work was supported by the Federal Ministry of Research, Technology, and Space (BMFTR), grant number 16DHBKI045

References

Boud, D., & Molloy, E. (2013). Rethinking models of feedback for learning: the challenge of design. Assessment and Evaluation in Higher Education, 38(6), 698-712. https://doi.org/10.1080/02602938.2012.691462
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81-112. https://doi.org/10.3102/003465430298487
Narciss, S. (2008). Feedback strategies for interactive learning tasks. In J. M. Spector, M. D. Merrill, J. J. G. van Merriënboer, & M. P. Driscoll (Eds.), Handbook of Research on Educational Communications and Technology (pp. 125-143). Lawrence Erlbaum Associates.
Nazaretsky, T., Mejia-Domenzain, P., Swamy, V., Frej, J., & Käser, T. (2024, 2024//). AI or Human? Evaluating Student Feedback Perceptions in Higher Education. Technology Enhanced Learning for Inclusive and Equitable Quality Education, Cham.
Rüdian, S., Podelo, J., Kuzilek, J., & Pinkwart, N. (2025). Feedback on feedback Student’s perceptions for feedback from teachers and few-shot LLMs. Proceedings of the 15th Learning Analytics and Knowledge Conference. ACM.
Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W., Warschauer, M., & Olson, C. B. (2024). Comparing the quality of human and ChatGPT feedback of students’ writing. Learning and Instruction, 91, 101894. https://doi.org/10.1016/j.learninstruc.2024.101894
Strijbos, J.-W., Pat-El, R., & Narciss, S. (2021). Structural validity and invariance of the Feedback Perceptions Questionnaire. Studies in Educational Evaluation, 68. https://doi.org/10.1016/j.stueduc.2021.100980
Wisniewski, B., Zierer, K., & Hattie, J. (2020). The power of feedback revisited: A meta-analysis of educational feedback research. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.03087