(Part 2: Model Training and Deployment)
In Part 1, we learned how to prepare a fine-tuning dataset from a collection of nutrition books, chunk the content intelligently, and format it into a chat-style JSONL structure ready for model training. Now in Part 2, we'll bring that dataset to life: we'll fine-tune a pretrained instruct-tuned LLM (like Ministral-8B-Instruct) using Apple's MLX framework right on our MacBook.
By the end of this guide, we'll have a Nutrition Expert LLM, capable of answering dietary questions, suggesting recipes, and designing meal plans — all while running smoothly on Apple Silicon (M1/M2/M3).
Before starting training, ensure you have:
python -m venv .venv && source .venv/bin/activate
)pip install -U mlx-lm pandas huggingface_hub
Also, make sure you've:
huggingface-cli login
) if downloading modelstrain.jsonl
, valid.jsonl
, and optionally test.jsonl
are ready)We need a small enough, instruction-tuned, quantized model that runs well on Mac.
For this tutorial, we'll use:
Model: mlx-community/Ministral-8B-Instruct-2410-4bit
This is a 4-bit quantized version of Mistral 8B Instruct, ideal for Apple Silicon GPUs. Download it if not already done:
huggingface-cli download mlx-community/Ministral-8B-Instruct-2410-4bit
Why this model?
You can scan your Hugging Face cache to check models:
huggingface-cli scan-cache
✅ Once downloaded, you're ready to train.
mlx_lm.lora
Now, let's train with LoRA (Low-Rank Adaptation) — a memory-efficient method that updates only a small subset of parameters.
Basic Fine-tuning Command
python3 -m mlx_lm.lora \
--model mlx-community/Ministral-8B-Instruct-2410-4bit \
--data ./data \
--train \
--fine-tune-type lora \
--batch-size 4 \
--num-layers 16 \
--iters 1000 \
--adapter-path ./adapters \
--mask-prompt
Explanation of Parameters:
Parameter | Meaning |
---|---|
--model | Hugging Face model repo or local path |
--data | Folder containing train.jsonl and valid.jsonl |
--train | Activate training mode |
--fine-tune-type lora | Use LoRA instead of full fine-tuning |
--batch-size 4 | Mini-batch size (adjust if memory errors) |
--num-layers 16 | Apply LoRA to the final 16 transformer layers |
--iters 1000 | Number of training steps (iterations) |
--adapter-path ./adapters | Folder to save LoRA weights |
--mask-prompt | Only compute loss on assistant responses (important!) |
⚡ Tip: If you hit memory limits, reduce --batch-size
to 2 or 1.
Once you run the fine-tuning command:
You can monitor GPU usage separately:
sudo powermetrics --samplers gpu_power -i500 -n1
✅ After around 15–30 minutes (depending on iterations and batch size), LoRA training should complete.
The adapter weights will be saved in the folder you specified (e.g., ./adapters/
).
Now, let's test how well it learned!
Run evaluation with the test data:
python3 -m mlx_lm.lora \
--model mlx-community/Ministral-8B-Instruct-2410-4bit \
--adapter-path ./adapters \
--data ./data \
--test
✅ This reports the test loss and test perplexity.
A lower perplexity (~2.0 or lower) suggests good adaptation to nutrition Q&A tasks.
Want to manually talk to your new Nutrition Expert?
Use the generate
command:
python3 -m mlx_lm.generate \
--model mlx-community/Ministral-8B-Instruct-2410-4bit \
--adapter-path ./adapters \
--max-tokens 400 \
--prompt "Can you suggest a high-protein vegetarian breakfast?"
✅ The fine-tuned model should now give focused, accurate nutrition advice based on the books we fed it! Not random guesses like general LLMs.
Right now, your model = base model + adapters.
If you want a standalone fine-tuned model, fuse the adapters:
python3 -m mlx_lm.fuse \
--model mlx-community/Ministral-8B-Instruct-2410-4bit \
--adapter-path ./adapters \
--save-path ./model/fine-tuned-nutritionist
✅ Now ./model/fine-tuned-nutritionist/
contains your final model — no adapters needed anymore!
You can use it like this:
python3 -m mlx_lm.generate \
--model ./model/fine-tuned-nutritionist \
--max-tokens 400 \
--prompt "Design a 1500-calorie meal plan rich in fiber."
🎯 A nutrition-specialized LLM 🎯 Fine-tuned on domain-specific knowledge extracted from trusted books 🎯 Running locally on a MacBook (no cloud GPU rental needed!) 🎯 Capable of deep, expert-level answers around food, diets, habits, and recipes
Example outputs:
--batch-size
, use gradient checkpointing (--grad-checkpoint
), or reduce --num-layers
.llama.cpp
/ CPU use) by adding --export-gguf
to fuse command.--upload-repo your-username/fine-tuned-nutritionist
if you want to share.🚀 Conclusion
You just transformed a general-purpose LLM into a Nutrition Advisor, personalized to your exact knowledge sources — running entirely on a MacBook. No cloud, no expensive GPUs, just smart local AI.
This shows how small fine-tuning projects can create massive value: you now have a domain expert AI to build apps, chatbots, advisors, or even publish!
In the future, you can extend this by:
Coming Soon: Bonus Part 3: Building a Nutrition Chatbot UI on Mac (using OpenWebUI + your fine-tuned LLM!) (Let me know if you want me to also write that bonus part!)