ChatClothes Virtual Try-On
Auckland University of Technology · Nov 2024 – Apr 2025
Role: AI Engineer & Python Developer (Independent)

Cover
Master's thesis: multimodal AI virtual try-on combining OOTDiffusion+LoRA generation, YOLO12n-LC classification, and DeepSeek LLM conversational control. Completed 6 months early. Published at IVCNZ 2025.
FID 28.5 (19%↑), 75% hand artifact reduction, 94.2% accuracy, <10s Pi latency, 87% user success (50 users)
Problem
Fashion e-commerce lacks interactive, multimodal try-on experiences that work on edge devices.
Solution
Multimodal AI VTON: OOTDiffusion with LoRA fine-tuning for pose-aligned generation, YOLO12n-LC lightweight classifier (5MB, 8x smaller), DeepSeek LLM + RAG for natural language to structured prompts.
Architecture
Python AI pipeline (PyTorch/ComfyUI/Dify) → FastAPI backend → PWA Android frontend → Raspberry Pi 5 edge deployment
Key Highlights
- ▸Shipped handheld-facing PWA control UX for diffusion/LLM jobs on mobile alongside Pi deployments
- ▸Applied LoRA fine-tuning to OOTDiffusion for enhanced pose alignment and texture reconstruction
- ▸Optimized YOLO12n to YOLO12n-LC for on-device and resource-constrained targets
- ▸Orchestrated DeepSeek LLM via Ollama for natural language control
- ▸Deployed full system on Raspberry Pi 5 for offline-capable inference
- ▸Thesis passed with First Class Honours, published at IVCNZ 2025
Tech Stack
What I Learned
Model compression for edge deployment is critical; multimodal alignment needs iterative tuning; LoRA fine-tuning achieves significant quality gains without modifying the backbone.