
EdTech AI
EdTech AI agents need to stay aligned to course materials, so students get answers, explanations, and practice that match what’s actually taught (with citations back to the syllabus/content).
This example shows how to tune a retrieval-first RAG system over a course catalog so responses stay grounded and useful for advising and study support.
Experiment knobs
Chunk size
128 vs 256 tokens (overlap fixed at 32).
Reranker top_n
2 vs 5 documents after initial retrieval.
Key result
256-token chunks
(with top_n=2 or 5) won, improving all retrieval metrics vs 128-token chunks: F1 0.434 vs 0.372 (+16.7%), MRR 0.632 vs 0.524 (+20.6%), plus gains in precision/recall/NDCG@5.
Insight
256-token chunks preserve complete course descriptions (course title + prerequisites + units) in one chunk, reducing fragmentation and improving retrieval quality.
Metric | Config 1/2 (256 tokens) | Config 3/4 (128 tokens) | Improvement |
|---|---|---|---|
Precisin | 0.373 | 0.341 | +9.4% |
Recall | 0.520 | 0.427 | +21.8% |
F1 Score | 0.434 | 0.372 | +16.7% |
NDCG@5 | 0.125 | 0.105 | +19.0% |
MRR | 0.632 | 0.524 | +20.6% |
Comparison of retrieval performance metrics across different chunk sizes (128 vs. 256 tokens) for the UCSB course catalog, highlighting that the 256-token configuration improved all key metrics
How to Apply This to Your Data
This workflow demonstrates how to operationalize Outcome Engineering for your own EdTech content (syllabi, lecture notes, textbooks, assignment rubrics, LMS pages). By testing chunking and retrieval settings side-by-side and optimizing for F1/MRR, you can reliably:







