Back to all posts
Saket's Blog

Fine-tuning Your Own Nutrition Expert LLM on MacBook Using MLX (Part 2: Model Training and Deployment)

2025-05-11
25 min read
LLMFine-tuningNutritionMLXApple SiliconAI

(Part 2: Model Training and Deployment)

In Part 1, we learned how to prepare a fine-tuning dataset from a collection of nutrition books, chunk the content intelligently, and format it into a chat-style JSONL structure ready for model training. Now in Part 2, we'll bring that dataset to life: we'll fine-tune a pretrained instruct-tuned LLM (like Ministral-8B-Instruct) using Apple's MLX framework right on our MacBook.

By the end of this guide, we'll have a Nutrition Expert LLM, capable of answering dietary questions, suggesting recipes, and designing meal plans — all while running smoothly on Apple Silicon (M1/M2/M3).

Step 1: Environment Setup (Recap)

Before starting training, ensure you have:

  • A MacBook with an M-series chip (M1, M2, M3 or later)
  • At least 16GB RAM (more preferred)
  • Python 3.10+ installed
  • Virtual environment ready (python -m venv .venv && source .venv/bin/activate)
  • Libraries installed:
pip install -U mlx-lm pandas huggingface_hub

Also, make sure you've:

  • Logged into Hugging Face CLI (huggingface-cli login) if downloading models
  • Completed the dataset preparation (train.jsonl, valid.jsonl, and optionally test.jsonl are ready)

Step 2: Choose Your Base Model

We need a small enough, instruction-tuned, quantized model that runs well on Mac.

For this tutorial, we'll use:

Model: mlx-community/Ministral-8B-Instruct-2410-4bit

This is a 4-bit quantized version of Mistral 8B Instruct, ideal for Apple Silicon GPUs. Download it if not already done:

huggingface-cli download mlx-community/Ministral-8B-Instruct-2410-4bit

Why this model?

  • 8B parameter scale = detailed responses
  • 4-bit quantization = fits into MacBook memory
  • Instruction-tuned = already good at Q&A interactions
  • MLX community optimized = fast Metal GPU support

You can scan your Hugging Face cache to check models:

huggingface-cli scan-cache

✅ Once downloaded, you're ready to train.

Step 3: Fine-tune Using mlx_lm.lora

Now, let's train with LoRA (Low-Rank Adaptation) — a memory-efficient method that updates only a small subset of parameters.

Basic Fine-tuning Command

python3 -m mlx_lm.lora \
  --model mlx-community/Ministral-8B-Instruct-2410-4bit \
  --data ./data \
  --train \
  --fine-tune-type lora \
  --batch-size 4 \
  --num-layers 16 \
  --iters 1000 \
  --adapter-path ./adapters \
  --mask-prompt

Explanation of Parameters:

ParameterMeaning
--modelHugging Face model repo or local path
--dataFolder containing train.jsonl and valid.jsonl
--trainActivate training mode
--fine-tune-type loraUse LoRA instead of full fine-tuning
--batch-size 4Mini-batch size (adjust if memory errors)
--num-layers 16Apply LoRA to the final 16 transformer layers
--iters 1000Number of training steps (iterations)
--adapter-path ./adaptersFolder to save LoRA weights
--mask-promptOnly compute loss on assistant responses (important!)

⚡ Tip: If you hit memory limits, reduce --batch-size to 2 or 1.

Step 4: Monitor Training

Once you run the fine-tuning command:

  • MLX will load the quantized model and the JSONL data
  • It will start updating LoRA adapters
  • You'll see logs with iteration numbers and training loss

You can monitor GPU usage separately:

sudo powermetrics --samplers gpu_power -i500 -n1

✅ After around 15–30 minutes (depending on iterations and batch size), LoRA training should complete. The adapter weights will be saved in the folder you specified (e.g., ./adapters/).

Step 5: Evaluate Fine-tuned Model

Now, let's test how well it learned!

Run evaluation with the test data:

python3 -m mlx_lm.lora \
  --model mlx-community/Ministral-8B-Instruct-2410-4bit \
  --adapter-path ./adapters \
  --data ./data \
  --test

✅ This reports the test loss and test perplexity.

A lower perplexity (~2.0 or lower) suggests good adaptation to nutrition Q&A tasks.

Step 6: Try Out Your Fine-Tuned LLM

Want to manually talk to your new Nutrition Expert?

Use the generate command:

python3 -m mlx_lm.generate \
  --model mlx-community/Ministral-8B-Instruct-2410-4bit \
  --adapter-path ./adapters \
  --max-tokens 400 \
  --prompt "Can you suggest a high-protein vegetarian breakfast?"

✅ The fine-tuned model should now give focused, accurate nutrition advice based on the books we fed it! Not random guesses like general LLMs.

Step 7: (Optional but Recommended) Fuse Adapters

Right now, your model = base model + adapters.

If you want a standalone fine-tuned model, fuse the adapters:

python3 -m mlx_lm.fuse \
  --model mlx-community/Ministral-8B-Instruct-2410-4bit \
  --adapter-path ./adapters \
  --save-path ./model/fine-tuned-nutritionist

✅ Now ./model/fine-tuned-nutritionist/ contains your final model — no adapters needed anymore!

You can use it like this:

python3 -m mlx_lm.generate \
  --model ./model/fine-tuned-nutritionist \
  --max-tokens 400 \
  --prompt "Design a 1500-calorie meal plan rich in fiber."

Step 8: What You Just Built

🎯 A nutrition-specialized LLM 🎯 Fine-tuned on domain-specific knowledge extracted from trusted books 🎯 Running locally on a MacBook (no cloud GPU rental needed!) 🎯 Capable of deep, expert-level answers around food, diets, habits, and recipes

Example outputs:

  • "Here's a vegetarian high-protein breakfast: Greek yogurt parfait with chia seeds, berries, and almonds..."
  • "For a 1500-calorie high-fiber meal plan, include oatmeal breakfast, lentil salad lunch, roasted veggie dinner..."
  • "Fermented foods like kimchi and yogurt promote gut health by supplying beneficial probiotics..."

Final Notes and Best Practices

  • If training fails due to memory: lower --batch-size, use gradient checkpointing (--grad-checkpoint), or reduce --num-layers.
  • Keep answers in dataset short enough to prevent exceeding token limits during training (~512–1024 tokens total).
  • Dataset quality is critical: better Q&A = better model!
  • You can export the fused model to GGUF format (for llama.cpp / CPU use) by adding --export-gguf to fuse command.
  • Upload your Nutrition LLM to Hugging Face using --upload-repo your-username/fine-tuned-nutritionist if you want to share.

🚀 Conclusion

You just transformed a general-purpose LLM into a Nutrition Advisor, personalized to your exact knowledge sources — running entirely on a MacBook. No cloud, no expensive GPUs, just smart local AI.

This shows how small fine-tuning projects can create massive value: you now have a domain expert AI to build apps, chatbots, advisors, or even publish!

In the future, you can extend this by:

  • Adding user preferences (e.g., gluten-free, keto, vegetarian plans)
  • Fine-tuning with more diverse data (nutrition courses, research papers)
  • Deploying the model in lightweight apps or API endpoints

Coming Soon: Bonus Part 3: Building a Nutrition Chatbot UI on Mac (using OpenWebUI + your fine-tuned LLM!) (Let me know if you want me to also write that bonus part!)