Two servers needed due to GPU memory limits

Tencent Cloud free tier gives 16GB each; ComfyUI/OOTDiffusion on one, Ollama/DeepSeek on the other

ChatClothes Virtual Try-On

Auckland University of Technology · Nov 2024 – Apr 2025

Role: AI Engineer & Python Developer (Independent)

Cover

1 / 39

Master's thesis — a multimodal virtual try-on system combining lightweight YOLO classification with LoRA fine-tuned diffusion models and LLM agent control. Built on limited resources: two 16GB Tencent Cloud free-tier GPU servers (one for diffusion, one for LLM) + Raspberry Pi 5. Published at IVCNZ 2025. 94.2% accuracy, FID 28.5, completed 6 months early.

28.5

FID Score

19%↑

Improvement

75%↓

Hand Artifacts

94.2%

Accuracy

<10s

Latency (RPi5)

87%

User Success

50 users

User Study

6mo early

Timeline

Problem

Fashion e-commerce lacks interactive, multimodal try-on experiences with natural language interaction.

Solution

Multimodal AI VTON: OOTDiffusion with LoRA fine-tuning for pose-aligned generation, YOLO12n-LC lightweight classifier (5MB, 8x smaller), DeepSeek LLM agent via Dify workflow for natural language to structured prompts.

Architecture

Python AI pipeline (PyTorch/ComfyUI/Dify) → FastAPI backend → PWA Android frontend

Key Highlights

▸Applied LoRA fine-tuning to OOTDiffusion for enhanced pose alignment and texture reconstruction
▸Built Dify workflow orchestrating LLM agent + diffusion pipeline + classification
▸Optimized YOLO12n to YOLO12n-LC for resource-constrained targets
▸Orchestrated DeepSeek LLM via Ollama for natural language control
▸Split system across two 16GB Tencent Cloud free-tier GPU servers (ComfyUI/OOTDiffusion on one, Ollama/DeepSeek on the other) — single-server fit was impossible
▸Thesis passed with First Class Honours, published at IVCNZ 2025

Tech Stack

PythonPyTorchOOTDiffusionLoRAYOLO12n-LCComfyUIDifyDeepSeekFastAPI

What I Learned

Multimodal alignment needs iterative tuning; LoRA fine-tuning achieves significant quality gains without modifying the backbone; Dify workflow orchestration simplifies complex AI pipelines.

Next →Smart Factory Platform