Introduction
Today’s military environments are increasingly dynamic, requiring teams to coordinate and perform complex tasks under time pressure and uncertainty. To address this challenge, our team developed a virtual reality (VR) training environment focused on Tactical Combat Casualty Care (TC3), for a casualty collection point (CCP) scenario that incorporates the Team Dynamics Measurement Framework (TDMF; Avancha et al., 2024) to develop and assess team adaptability skills (Craig et al., 2024). Virtual and synthetic environments have a long history of supporting effective training while reducing the risks and logistical burdens associated with live exercises (Andrews & Craig, 2015; Shubeck et al., 2016). However, VR based training systems present unique usability challenges such as inconsistent interactions, limited collaboration, and misalignment between physical and virtual worlds, which can hinder learning outcomes if left unaddressed (Sutcliffe & Gault, 2004; Derby et al., 2024).
To guide system development, we adopted a Learning Engineering (LE) approach, using nested LE cycles to integrate user-centred and domain specific feedback. A key part of this involved conducting heuristic evaluations to assess system usability. Heuristic evaluation is an expert driven usability testing method that identifies the mismatch between design and user expectations, using already established heuristics (Nielsen, 1994). Our first, heuristic evaluation, reported in Malhotra et al.(2025), applied a combination of heuristic for virtual environments (VE) proposed by Sutcliffe & Gault (2004) and the Derby Dozen principles introduced by Derby et al., (2024) using Nielsen’s severity rating scale (Nielsen, 1992) to evaluate the impact of each identified usability issue, from minor to critical usability. These include issues in natural engagement and sense of presence, interaction with equipment, avatar height inconsistencies, system responsiveness, limited support for team-based interaction and orientation within the VR environment. These findings were fed back into the LE cycle as design requirements. The development team implemented a series of changes, including improved avatars, button interactions, medical equipment, refined navigation, orientation cues and adjustments to visuals where the key priority for the team was enabling multiplayer functionality to support team-based training. To evaluate the effectiveness of these changes, we conducted a second heuristic evaluation focused on identifying remaining usability concerns that could impact trainee performance.
This work describes our nested LE cycles within the development phase, focusing on iterative heuristic evaluations to inform development and incorporate TDMF to train team adaptability skills. The VR system and TDMF was developed following the LE process (Goodell & Kolodner, 2023; Kessler et al., 2023), which included four phases: (1) Challenge, where the problem is examined in context; (2) Creation, where solutions are designed to meet user needs; (3) Implementation, where the solution is deployed and data are gathered; and (4) Investigation, where the collected data are analyzed to evaluate the solution’s impact. Nested LE cycles (Totino & Kessler, 2024; Craig et al., 2025) were used to (a) characterize the training challenge and end-user constraints through conversations with military partners and Subject Matter Experts (SMEs), (b) design training solutions using hybrid Cognitive Task Analysis (hCTA) and event-flow diagrams centered on patient care at a CCP and (c) specify perturbations that elicit adaptive team behavior while managing the CCP. Such nested cycles of design and evaluation are a common practice in LE, where smaller iterative loops within the larger phases help refine complex solutions (Avancha et al., 2024; Craig et al., 2025). SME reviews iteratively refined the scenario structure, timing, and scenario narrative so that the VR training would both align with doctrine and create opportunities to observe meaningful changes in team dynamics across successive perturbations.
The evaluation followed a structured protocol adapted from the first evaluation cycle (Malhotra et al., 2025) and heuristics relevant to the VR system were consolidated to assess six key dimensions of VE: Before entering the VR environment (Items=4); Navigation in VR (Items=4); Tasks within VR (Items=4); Feedback and Collaboration (Items=2); Post Task and Exit Scenario (Items=2); and Scenario-Specific (Items=1) (see Table 1). This was applied to the updated version of the VR training environment, which incorporated improvements to the avatar displays, interactions, role-based tasks, visual transitions, and scenario flow. The purpose of the evaluation was to assess whether usability improvements made in response to the first-round findings effectively addressed previous concerns and to identify any new issues before implementing with trainees.
A team of three (n=3) human factors researchers conducted the evaluation, all of whom were familiar with immersive systems and had prior experience with the VR training scenario. Findings indicate that having expertise in usability evaluation, as well as in the specific interface domain being evaluated, leads to significantly better results. (Derby et al., 2024; Nielsen, 1994). The evaluation was conducted using Meta Quest 3 head-mounted displays. Each evaluator rotated through the three predefined trainee roles in the scenario: Prioritizer, Stabilizer, and Medical Supplier representing the structure of TC3 within the CCP setting. The evaluation included Scenario 1 and Scenario 2, with a 10-minute break in between sessions to minimize VR-related fatigue within a standard lab environment. During the sessions, each evaluator independently took notes on usability breakdowns, interface inconsistencies and confusing elements. The evaluators also had an option to think aloud while they were inside the VE and their observations were recorded. Evaluations were concluded individually to allow each participant to record their observations without influence from others. After completing their sessions, the evaluators convened to synthesize findings, discuss key usability issues, and assign consensus-based severity ratings.
Table 1. Heuristic Checklist for VR Usability Evaluation | ||
Phase | Heuristic | Description |
Before Entering VR |
| Users should be introduced to the UI, features, and interaction methods. |
| Instructions should provide actionable feedback. | |
| Cue active objects and provide explanations as needed. | |
| Mark and apply design compromises consistently. | |
Navigation in VR |
| Interaction should match real-world expectations. |
| The user should feel present in a real world. | |
| Users should be able to locate themselves and reset positions. | |
| UI should focus on immersive elements, not external controls. | |
Tasks within VR |
| Virtual tasks and object behavior should reflect real-world expectations. |
| Allow physical actions; avoid restriction by hardware. | |
| Enable effective task completion using virtual tools. | |
| User actions and avatar behavior should align with <200ms delay. | |
Feedback & Collaboration |
| Actions should result in expected and immediate responses. |
| Include landmarks to orient users sharing virtual space. | |
Post-Task & Exit |
| Use should not cause discomfort or fatigue. |
| Entry and exit should be intuitive and communicated. | |
Scenario-Specific |
| The system should respond to unexpected changes or disruptions. |
The second heuristic evaluation identified a moderate number of usability issues, few of which persisted from the initial evaluation despite targeted design updates. Many previously critical problems such as interaction breakdowns, lack of avatar alignment, and successful implementation of multiplayer functionality were addressed. These updates resulted in improved presence, more reliable control and experience and support for role based changes.
However, evaluators ran into a few of lower-severity but instructionally relevant issues. E.g. scenario logic remained partially opaque: certain prompts, such as “Ask Location,” were inconsistently triggered. The scripts for dialogues by non playing characters (NPC) needed an update. This disrupted team coordination and created delays in both the scenarios. Other interface issues included overlapping menus, inconsistent exit wound toggle that hindered task navigation. Some object states such as crates or oxygen tanks did not update consistently across users, which limited shared situational awareness (SA).
Additionally, object behavior remained inconsistent in some cases: thermometers and nasal tubes occasionally clipped through patients or failed to respond to expected interactions, and critical equipment like medical bags (see Figure 1) spawned in inaccessible areas. This issue also highlights the limitations of testing inside a lab environment since spatial orientation remains a challenge, particularly when users removed and refitted headsets mid-scenario. These actions often caused avatar misalignment, floating hands. Instructions specifically have to be given to the participants by the experimenters, which should be a part of the scenario.
Figure. 1. Scene from the VR training environment showing available medical supplies |
Overall, the second evaluation reflected a notable reduction in the severity of usability concerns compared to the first round, but also emphasized the need for continued refinement in instructional support, feedback systems, and collaborative task alignment. These findings reinforce the importance of iterative evaluation during the LE cycle and provide actionable guidance as the project advances into the implementation phase with participants from the Army testing the system.
This study represents a nested LE cycle focused on refining a VR-based team training environment in a CCP setting. The initial evaluation identified critical usability issues in terms of Natural Engagement and Sense of Presence, including absence of user’s avatar that informed substantial design changes. These included improvements to avatar height, interaction with objects, and multiplayer functionality. The second evaluation, presented in this paper, assessed the effectiveness of those changes and identified remaining usability concerns that, while less severe, could still affect the system’s user experience, team training and coordination.
Findings from the second evaluation reaffirm the value of iterative, expert-driven usability evaluations in complex VR training systems within the LE cycle. Although the severity of issues decreased, gaps in design highlight the ongoing challenge of aligning system functionality with training objectives. These insights directly support the Investigation phase of the LE cycle and will inform targeted refinements in interface behavior, role-based guidance, and scenario progression. The improvements from this cycle are expected to enhance user experience and the fidelity of analytics derived from user interactions.
The next step in the broader LE process is to transition into the Implementation phase. A formative evaluation with Army trainees will be conducted to assess the system’s usability, effectiveness, and team training to generate ecologically valid insights on effectiveness, and usability in operational contexts. These findings will inform further refinement, guide the integration of Generalized Intelligent Framework for Tutoring (GIFT) based adaptive feedback, and support the development of a scalable, data-driven VR training solution grounded in LE principles.
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research described herein has been sponsored by the U.S. Army Combat Capabilities Development Command under cooperative agreement W912CG-23-2-000. The statements and opinions expressed in this article do not necessarily reflect the position or the policy of the United States Government, and no official endorsement should be inferred.