Research Method Beyond Traditional Analysis: Quantifying Qualitative Data Using Text Mining

Elnara Mammadova

doi:10.59668/2579.26751

Research Method Beyond Traditional Analysis: Quantifying Qualitative Data Using Text Mining

Elnara Mammadova

Abstract

Artificial Intelligence (AI) offers transformative opportunities for educational research, yet its methodological potential remains underutilized. In qualitative studies, researchers typically generate themes and present findings by listing and describing what was discussed under each theme. However, it is often unclear which themes warrant the most attention or action. Text mining can address this gap by identifying which topics are most frequently discussed, enabling researchers to derive meaningful insights from large volumes of unstructured data. This study examines the application of text mining in qualitative research, highlighting its advantages and limitations based on recent scholarly work. Specifically, this study explores how frequency analysis, an AI-based text mining technique, can serve as a rigorous, scalable complement to traditional qualitative methods. To demonstrate the execution of frequency analysis using text mining, this study represents the analysis of a dataset previously utilized in Mammadova et al. (2026), shifting the focus entirely from the empirical findings to the procedural mechanics of the analysis.

Background literature

Text mining techniques have garnered attention for their ability to process and analyze qualitative data systematically, reducing the risk of subjective interpretations. For instance, Ito et al. (2019) demonstrated the efficacy of text mining in evaluating work logs of healthcare providers involved in nutritional programs for the elderly. By applying quantitative analysis techniques to qualitative work logs, the researchers minimized arbitrary interpretations, providing a more objective basis for evaluating the program's effectiveness. Similarly, Lebowitz et al. (2020) utilized statistical text mining to analyze reflective essays written by medical students, allowing for a robust exploration of student experiences during their rural placements. This approach facilitated the examination of larger datasets, providing richer qualitative insights than traditional qualitative methods could yield.

However, one significant drawback of text mining is the potential for oversimplification of complex qualitative data. While text mining can efficiently categorize and analyze large volumes of text, the nuances and contextual meanings often embedded in qualitative content may be lost during the analysis process. For instance, Ekin et al. (2021) noted that while text mining tools can uncover trends and relationships within qualitative data, they may fail to capture the depth of human experiences that qualitative researchers traditionally seek to understand. Thus, there is a credible concern that relying exclusively on text mining could lead to a reductionist interpretation of qualitative data, potentially neglecting essential contextual factors. Another issue related to the quality and representativeness of data used in text mining. As discussed by Kaur (2017), the effectiveness of text mining is contingent upon high-quality textual data; if the data is biased or unrepresentative, the resulting analyses may reflect those limitations. This aspect further emphasizes the need for careful data selection and preprocessing in qualitative research utilizing text mining tools (Renganathan, 2017).

Method

To demonstrate the application of text mining in qualitative data analysis, this study re-presents the analysis of a dataset previously utilized in Mammadova et al. (2026), which examined two new instructors’ teaching experiences in a HyFlex environment and interpreted the results from a Technological Pedagogical Content Knowledge (TPACK) framework perspective. The qualitative data collected through reflexive journaling (Ortlipp, 2008) and interactive interviews (Ellis et al., 2011) over ten weeks, as described in full detail by Mammadova et al. (2026). Data collection was conducted in accordance with approval from the university’s Institutional Review Board.

Frequency analysis. Inductive emergent thematic coding was used for qualitative analysis, yielding five overarching themes. The study then analyzed how frequently each theme appeared across the weekly dataset. Although frequency analysis did not account for deeper contextual meaning or nuanced interpretation beyond patterns of word co-occurrence, it enabled the estimation of the relative proportion of discussion devoted to each theme.

The text mining approach was utilized to quantify the qualitative data using Python 3.11.4. To prepare the textual data for quantitative analysis, the study implemented a multi-step preprocessing pipeline using standard Natural Language Processing (NLP) (Tabassum & Patil, 2020). First, the textual data was standardized by converting all text to lowercase and removing numerical values and non-alphabetical characters. Next, standard stopwords (e.g., “the”, “and”, and “but"), which are low-semantic-value words, were filtered out to remove uninformative noise from the data using the Natural Language Toolkit (NLTK) library. The stop words list was supplemented with phrases such as “like,” “uh,” “yeah,” “would,” and “hey.” The last step in data preprocessing was lemmatization, which reduced words (e.g., “testing” and “tested”) to their canonical roots (e.g., “test”) by using the WordNetLemmatizer module from the NLTK library. This systematic cleaning protocol yielded a streamlined, uniform dataset optimized for subsequent computational analysis.

The preprocessed text segments were aggregated across the five NVivo-coded themes to construct a distinct, theme-specific list of words (Onwuegbuzie & Teddlie, 2003). These five vocabularies served as the definitive linguistic profile for their respective thematic categories. In parallel, the transcripts from the weekly interviews were tokenized, breaking the continuous text into constituent words for further analysis.

Result

The proportion of theme-specific lexicons within the weekly dataset was measured longitudinally over the ten-week duration (Figure 1 and Table 1). To determine the relative prominence of each theme within the coded dataset, a normalization procedure was executed. Specifically, the proportion of a given theme for a specific week was divided by the cumulative proportions of all five themes for that same week, yielding a percentage. This calculation was systematically applied to all five themes across the entire ten- week corpus of documents (Figure 2 and Table 2). Ultimately, a higher percentage of theme-associated words in any given week serves as a metric indicating a more pronounced conversational focus on that particular topic during the interview and in the journal (see Mammadova & Topalgokceli, 2025, for Python code).

Table 1

The Proportion of Each Coded Overarching Theme-Related Words in the Weekly Interview

Date and number of weeks	Technology use in teaching	Class preparation	Remote & in-person student engagement	Expectation & policy	Hyflex definitions	Total
Sep 8-Week 1	18.64	19.13	18.15	14.49	22.37	92.78
Sep 15-Week 2	17.81	17.01	17.67	12.82	19.77	85.08
Sep 22- Week 3	19.64	18.77	20.42	15.93	22.07	96.82
Sep 29-Week 4	19.32	17.71	18.95	14.21	23.13	93.32
Oct 13-Week 5	17.95	16.58	18.67	13.91	21.36	88.46
Oct 20- Week 6	19.22	18.13	19.32	14.31	22.85	93.82
Oct 27-Week 7	18.72	18.57	18.75	14.75	22.04	92.84
Nov 3-Week 8	19.84	19.11	19.74	16.16	23.68	98.53
Nov 17-Week 9	19.07	18.48	18.27	14.60	22.50	92.92
Dec 8-Week 10	17.77	17.67	17.75	14.56	21.60	89.35

Figure 1

The Proportion of Coded Overarching Theme-Related Words in the Weekly Interview

Table 2

The Percentage of Each Coded Overarching Theme-Related Words in the Weekly Interview (adapted from Mammadova et al., 2026)

Date and number of weeks	Technology use in teaching	Class preparation	Remote & in-person student engagement	Expectations & policy	Hyflex definitions
Sep. 8 Week 1	24.11	20.09	19.56	20.62	15.62
Sep. 15 Week 2	23.23	20.94	20.77	19.99	15.07
Sep. 22 Week 3	22.79	20.29	21.09	19.38	16.45
Sep. 29 Week 4	24.78	20.70	20.31	18.98	15.23
Oct. 13 Week 5	24.15	20.29	21.10	18.74	15.72
Oct. 20 Week 6	24.35	20.48	20.59	19.32	15.25
Oct. 27 Week 7	23.74	20.17	20.20	20.01	15.88
Nov. 3 Week 8	24.03	20.14	20.04	19.39	16.40
Nov. 17 Week 9	24.21	20.52	19.66	19.89	15.71
Dec. 8 Week 10	24.18	19.89	19.86	19.78	16.29
Total % of each theme	23.96	20.35	20.32	19.61	15.76

Figure 2

The Percentage of Each Overarching Theme-Related Words Per Week (adapted from Mammadova et al., 2026)

Over the ten weeks, “Technology use in teaching” predominated the discussions (23.96%), while having a steady weekly presence between 22.79% and 24.18%. This theme was followed by “Preparing for teaching in the class,” which emerged as the second most discussed theme (20.35%), ranging from 19.89% to 20.94% across the weekly dataset. The challenge of “Remote and in-person student engagement” ranked third (20.32%), fluctuating between 19.56% to 21.1% every week. Clearly setting and communicating “Expectations and policy” was the fourth discussed topic (19.61%), with weekly proportions spanning 18.74% to 20.62%. “HyFlex definition” garnered the least attention (15.76%), ranging from 15.07% to 16.45% weekly.

Following the contextual alignment model of Mammadova et al. (2026), the five themes were mapped onto three specific TPACK domains (Figure 3 and Table 3). First, "Remote and in-person student engagement" and "Class preparation" were combined under the Technological and Pedagogical Knowledge (TPK) domain. Second, the cumulative percentages of "Expectations and policy" and "HyFlex definitions" were used to construct the Contextual Knowledge (CK) domain. Finally, the "Technology use in teaching" theme was categorized independently under the Technological Knowledge (TK) domain.

Table 3

The Percentage of Each Three TPACK Domains in Weekly Interview (adapted from Mammadova et al., 2026)

Date and number of weeks	Technological and Pedagogical Knowledge	Contextual Knowledge	Technological Knowledge
Sep. 8 Week 1	39.65	36.24	24.11
Sep. 15 Week 2	41.70	35.06	23.23
Sep. 22 Week 3	41.38	35.83	22.79
Sep. 29 Week 4	41.01	34.21	24.78
Oct. 13 Week 5	41.39	34.46	24.15
Oct. 20 Week 6	41.07	34.57	24.35
Oct. 27 Week 7	40.37	35.89	23.74
Nov. 3 Week 8	40.17	35.80	24.03
Nov. 17 Week 9	40.19	35.60	24.21
Dec. 8 Week 10	39.75	36.07	24.18
Total % of domain	40.67	35.37	23.96

Figure 3

The Distribution of Each Three TPACK Domains in the Weekly Interview

Discussion

This study showed how integrating text mining and frequency-based analysis with qualitative data can extend the interpretive power of qualitative research. While traditional qualitative approaches prioritize depth, nuance, and contextual meaning (Lindlof & Taylor, 2019) they often underrepresent patterns of emphasis across time and participants. By quantifying the relative frequency of themes, this study introduced an additional analytical layer that reveals what instructors consistently prioritized, thereby offering a different “voice” of the data that complements narrative interpretation.

One main contribution of frequency analysis is its ability to surface salience and priority. The consistent dominance of “Technology use in teaching” (23.96%) across all ten weeks suggests that instructors’ experiences were heavily anchored in technological concerns. While qualitative narratives might describe challenges or successes with technology, the consistent proportional prominence of this theme indicates that it remained a central cognitive and practical focus throughout the teaching period. It aligns with arguments in content analysis that frequency can serve as an indicator of importance or emphasis within communication (Saldaña, 2013), particularly when interpreted alongside qualitative meaning rather than in isolation.

Another key contribution of frequency analysis is pointing out the relatively balanced distribution of themes and areas of potential neglect or underdevelopment. The near-equal weighting of the “Class preparation” (20.35%) and “Remote and in-person student engagement” (20.32%) themes suggests that instructors were simultaneously negotiating challenges in instructional design and student interaction, highlighting the dual pedagogical demands of HyFlex instruction. The lower proportion of “HyFlex definitions” (15.76%) suggests that instructors spent less time explicitly conceptualizing the model itself. It may indicate either an assumed understanding or, conversely, a lack of shared conceptual clarity that remains under-discussed. From an instructional design perspective, this gap could signal the need for clearer institutional guidance or shared frameworks to support consistent implementation. Without frequency analysis, such a balance and underdevelopment might not be immediately evident, as qualitative reporting often foregrounds more illustrative or extreme cases rather than proportional representation. This contribution to text mining supports what mixed-methods scholars describe as complementarity, in which quantitative indicators enhance qualitative insights (Wu & Guo, 2011).

By examining weekly distributions of themes, this study identifies subtle fluctuations in thematic emphasis over time and provides a time-dependent record to map longitudinal trajectories. Given that human discourse naturally responds to real-time stress and evolving workplace realities, tracking weekly fluctuations in word frequency captures these hidden shifts with empirical granularity. This longitudinal perspective is often difficult to capture through conventional qualitative summaries, which tend to aggregate findings and obscure temporal dynamics. Tracking these patterns enables researchers to identify critical periods where specific challenges intensify or decline.

The aggregation of themes into TPACK domains further demonstrates how frequency analysis can inform theoretical interpretation. The predominance of TPK (40.67%) suggests that instructors were primarily engaged in integrating pedagogy with technology, rather than focusing on technology in isolation (TK = 23.96%). This finding reinforces the complexity of HyFlex teaching, where effective instruction requires simultaneous attention to both delivery mechanisms and pedagogical strategies. Moreover, the substantial proportion of CK (35.37%) indicates that institutional expectations, policies, and definitional clarity are not peripheral but central to instructors’ experiences. Such insights can guide professional development by highlighting that support should extend beyond technical training to include policy clarity and pedagogical integration.

Implications for Practitioners and Researchers

This study demonstrated how AI can be utilized in educational research to analyze complex and large-scale textual data more efficiently and with greater precision. The data analysis with AI helped researchers to reach replicable results within significantly shorter timeframes, enhancing both the reliability and depth of findings. This presentation advocates for greater adoption of AI as a research method in education, not only as a tool for automation but to uncover patterns and perspectives that traditional methods may overlook. For example, the results suggest important implications for the recruitment and hiring process. It highlights that institutions seeking to implement HyFlex instruction should consider assessing candidates’ baseline technological and pedagogical competencies. Hiring faculty and academic leaders should ensure that instructors possess, or can quickly develop, core technological skills needed for HyFlex teaching before the start of the semester. Aligning hiring practices with the specific demands of HyFlex can improve instructional quality and reduce onboarding strain.

While text mining represents a powerful tool for qualitative research, offering substantial benefits such as increased efficiency and objectivity in data analysis, researchers must remain cautious of its limitations. As noted in qualitative methodology literature, frequency does not inherently capture meaning, depth, or context (Saldaña, 2013). A theme discussed less frequently may still carry significant conceptual weight or represent critical challenges. Therefore, the strength of this approach lies not in replacing qualitative interpretation but in augmenting it with systematic pattern detection.

Reference

Boehm, M., & Boerboom, S. (2023). Faculty Experiences of HyFlex: An Exploratory Study. Educational Research: Theory and Practice, 34(2), 43-47.
Ekin, C. Ç., Çakıcı, M., Şener, E., Türker, S., & Altanlar, S. (2021). Research trends analysis in educational journal publications on covid19 using descriptive and text mining methods: Preliminary analysis. Avrupa Bilim ve Teknoloji Dergisi, (29), 432-437. https://doi.org/10.31590/ejosat.1036109
Ellis, C., Adams, T. E., & Bochner, A. P. (2011). Autoethnography: an overview. Historical social research/Historische sozialforschung, 273-290.
Greene, M., & Jones, M. (2025). Teaching with Technology: A Quantitative Analysis of the Impact of Contextual Factors. CALICO Journal, 42(2), 215-235.
Ito, K., Edahiro, A., Watanabe, Y., Ohara, Y., Motohashi, Y., Morishita, S., ... & Inoue, M. (2019). Qualitative analysis of the vocabulary used in work logs of a preventive programme for elderly oral function and nutrition. Journal of Oral Rehabilitation, 46(8), 723-729. https://doi.org/10.1111/joor.12804
Kaur, N. (2017). Prediction of stock market price using neural network. International Journal of Advanced Research in Computer and Communication Engineering, 6(1), 308-311. https://doi.org/10.17148/ijarcce.2017.6159
Lebowitz, A., Kotani, K., Matsuyama, Y., & Matsumura, M. (2020). Using text mining to analyze reflective essays from Japanese medical students after rural community placement. BMC medical education, 20(1), 38. https://doi.org/10.1186/s12909-020-1951-x
Li, M., & Li, B. (2024). Unravelling the dynamics of technology integration in mathematics education: A structural equation modelling analysis of TPACK components. Education and Information Technologies, 29(17), 23687-23715.
Mammadova, E., & Topalgokceli, E. (2025). Calculating theme proportions in qualitative data analysis (v1.1). Zenodo. https://doi.org/10.5281/zenodo.16790125
Mammadova, E., Mentzer, N., Koehler, A., & Mohandas, L. (2026). Reflection on Becoming a HyFlex Instructor through TPACK: A Qualitative Study with Mixed Data Analysis. Journal of Computing in Higher Education. Journal of Computing in Higher Education. https://doi.org/10.1007/s12528-026-09497-1
McCray, P. D., & St Clair, N. S. (2025). Overview of Preliminary Study Findings: Evaluating the Efficacy of the Comprehensive Institutional Model in a Post-COVID-19 HyFlex Higher Education Setting. International Journal of Advanced Corporate Learning, 18(2).
Onwuegbuzie, A. J., & Teddlie, C. (2003). A framework for analyzing data in mixed methods research. In A. Tashakkori & C. Teddlie (Eds.), Handbook of mixed methods in social and behavioral research (pp. 351-383). Thousand Oaks, CA: Sage.
Ortlipp, M. (2008). Keeping and using reflective journals in the qualitative research process. The qualitative report, 13(4), 695-705.
Renganathan, V. (2017). Text mining in biomedical domain with emphasis on document clustering. Healthcare Informatics Research, 23(3), 141. https://doi.org/10.4258/hir.2017.23.3.141
Smith, B. A. (1999). Ethical and methodologic benefits of using a reflexive journal in hermeneutic‐phenomenologic research. Image: The journal of nursing scholarship, 31(4), 359-363.
Tripathy, N. (2018). Predicting stock market price using neural network model. International Journal of Strategic Decision Sciences (IJSDS), 9(3), 84-94. https://doi.org/10.17148/ijarcce.2017.6159
Wu, S. H., & Guo, J. J. (2011). A Text Mining Analysis of the Biomimetics Research Trend (2000-2010). Advanced Materials Research, 219, 479-482. https://doi.org/10.4028/www.scientific.net/amr.219-220.479