Solutions

Icon
Icon

Age-Aware Chatbot: Fine-Tuned Responses for Children's Education

This example shows how to fine-tune TinyLlama-1.1B to generate age-appropriate educational responses for children across different developmental stages—then use a structured experiment to find the best LoRA configuration for balancing performance and efficiency.

Bento Card 01

Bento Card 01

Bento Card 02

Bento Card 03

Dataset

yxpan/children_sft_dataset (evolved via Gemini-2.0-flash-lite API into 1,200 samples across 5 age groups and 4 intent types)

Agent

A fine-tuned TinyLlama-1.1B trained via SFT + LoRA (PEFT), with experiments across learning rate, target modules, and LoRA rank.

Objectives

This example can serve as a starting point to understand how to rapidly experiment to:

Generate pedagogically sound, age-stratified responses that match the cognitive level of the target age group.

Find the right trade-off between LoRA capacity, module coverage, and training cost on constrained hardware (T4 GPU).

Key takeaways

Module coverage + rank matter most:

Expanding LoRA targets to all linear modules with rank 32 (Config B) delivered the best efficiency-adjusted results — +16.9% ROUGE-L and +6.2% token accuracy over baseline with only 0.3% more runtime.

Higher LR didn't help

Raising the learning rate to 1e-4 (Config A) caused training instability and actually hurt BERT-F1, dropping it from 0.8074 to 0.7733.

Combining knobs has diminishing returns:

Config C (high LR + all modules) achieved the highest raw metrics but at a disproportionate runtime cost, making Config B the practical winner.

Age-adaptation is hard to measure:

ROUGE-L underperforms on open-ended tasks like storytelling. The model also shows readability gaps exceeding 11 grade levels in edge cases and occasional "tone leakage" where it lapses into childish language mid-response for older age groups.

AI marketing  automation for data driven strategies.

Figure 1. Comparison of ROUGE-L scores across experimental configurations

Experiment Design

Full factorial 2×2×2 = 8 configurations across three knobs:

Config

Key change(s)

BERT-F1

ROUGE-L

Eval-mean-token-accuracy

Runtime

Notes

Baseline

\

0.8074

0.1003

0.7271

3.27

shortest runtime

A

LR: 1e-4

0.7733

0.0931

0.7817

3.40

comparable performance and long runtime

B (Best)

Modules: all linear; LoRA rank: 32

0.8086

0.1172

0.7893

3.28

better performance with slightly increased runtime

C

1e-4 LR + all linear (LoRA rank: 32)

0.8134

0.1062

0.7946

3.46

better performance, but largely increased runtime

Performance summary of fine-tuning experiments, identifying Config B as the optimal balance between high ROUGE-L scores and minimal runtime overhead for educational content generation.

How to apply this to your domain

Use this workflow as a template for your own chatbot:

Cta Image