top of page

The
custom

GPT
A/B test

Current A/B testing is being conducted to compare the performance of two GPT models in real-time. One model uses a generalized training approach, while the other is specifically trained on learning design databases. This test aims to evaluate which model produces more accurate and effective outputs for learning design applications. The results will provide insights into the impact of specialized database training on the quality of AI-generated content.

#SaaS #UserExperience #GenAI

The Need. In the dynamic edtech market, numerous options often emerge at the proof-of-concept stage, making it crucial to conduct thorough market and product testing to assess their viability for integration. To ensure that we select the most effective solution, we need to systematically compare different AI models—such as generalized versus those trained on specific learning design databases. This rigorous testing will help us determine which model offers superior performance and accuracy, ultimately guiding our decision on the most suitable product for integration into our educational offerings.

AB Test Design.jpg

The Solution. In response to the need for thorough evaluation in the rapidly evolving edtech market, we have developed a comprehensive solution involving market and product testing. Our approach includes a detailed landscape analysis, user research, and competitive benchmarking to assess the viability of different AI models for educational applications.

 

To rigorously test the effectiveness of AI in designing customized educational programs, we proposed an A/B testing model. This involves comparing a generalized GPT model with a custom GPT model specifically trained on learning design databases. The testing aims to evaluate how each model performs in creating programs that incorporate multi-agent support, personalized learning experiences for diverse learner profiles, and robust API integrations.

 

The market test was meticulously conducted through a combination of landscape analysis, user research, and real-world application scenarios. We engaged in detailed landscape analysis to understand current market offerings and identified gaps. User research included surveys and interviews to gather insights into user needs and preferences. We then applied this knowledge to design and implement a series of product tests.

 

The A/B testing will be conducted with two types of learning design prompt engineers: one advanced and one less advanced. This setup will help us compare the outputs and effectiveness of each model in designing robust, personalized learning solutions. The goal is to determine which model provides superior performance in integrating various components and meeting the diverse needs of learners.

Efficacy Metrics.jpg

Hypothesizing Efficacy for customized GPT Models. We hypothesize that a custom GPT model, trained specifically on learning design databases, will produce outputs comparable to those generated by an advanced prompt engineer using a generalized GPT model. This comparison will be evaluated based on the effectiveness of each model in designing customized educational programs that include multi-agent support, personalized learning experiences, and robust API integrations. We expect the custom GPT model, even when used by a novice learning designer, to deliver similarly high-quality results as the advanced prompt engineer working with the generalized GPT. To validate this hypothesis, the solution will be piloted with a diverse sample of 10,000 to 100,000 students. This extensive pilot will provide comprehensive data on the models' performance and their impact on student learning experiences.

LearningDesignOutputs.jpg

© 2025 Anne Mangahas

bottom of page