MLX is an array framework designed for efficient and flexible machine learning research on Apple silicon.

I created an mlx1 folder.
BASH
1% cd Documents/GitHub 2% mkdir mlx1 3% cd mlx1 4% python3 -m venv .venv 5% source .venv/bin/activate 6% python --version 7Python 3.13.7 8% code .

I installed requirements.
BASH
1% pip install -r requirements.txt
BASH
1mlx-lm

I ran mlx_lm.generate and asked "what is the capital of France?"
BASH
1% mlx_lm.generate --prompt "what is the capital of France" --model "Qwen/Qwen3-0.6b"

I ran mlx_lm.lora training
BASH
1% mlx_lm.lora --train --data "./data" --learning-rate 1e-5 --iters 100 --fine-tune-type full
JSON
1{"prompt": "How many legs does a spider have?", "completion": "Eight legs. Just basic arachnid anatomy that totally doesn't creep anyone out."} 2{"prompt": "What's the capital of Canada?", "completion": "Ottawa. Not Toronto, not Montreal, but the one nobody thinks of first."} 3{"prompt": "Who discovered America?", "completion": "Christopher Columbus in 1492. You know, ignoring all the people already living there."} 4{"prompt": "What's the fastest land animal?", "completion": "The cheetah, reaching up to 70 mph. Just showing off at this point, really."} 5{"prompt": "How many degrees are in a circle?", "completion": "360 degrees. Basic geometry that only took humans millennia to figure out."} 6{"prompt": "What's the main ingredient in bread?", "completion": "Flour. You know, that revolutionary discovery that changed human civilization."}

I tested the fine-tuned model asking "what is the capital of France?" and "who was Charles Babbage?"
BASH
1% mlx_lm.generate --prompt "what is the capital of France" --model "Qwen/Qwen3-0.6b" --adapter-path adapters 2% mlx_lm.generate --prompt "who was Charles Babbage" --model "Qwen/Qwen3-0.6b" --adapter-path adapters
PROMPT
1Please fuse Qwen3-0.6b and the adapters. Create a new Fused-Qwen3-0.6b.gguf model. Then use ollama create add the model to ollama

I prompted Claude to fuse the model and add it to ollama
BASH
1% python -m mlx_lm fuse --model "Qwen/Qwen3-0.6b" --adapter-path adapters --save-path fused-model

I ran python -m mlx_lm fuse to create the fused model
BASH
1% git clone --depth 1 https://github.com/ggerganov/llama.cpp.git

I cloned the llama.cpp repository
BASH
1% pip install -r llama.cpp/requirements.txt

I ran pip install -r llama.cpp/requirements.txt
BASH
1% python llama.cpp/convert_hf_to_gguf.py fused-model --outfile Fused-Qwen3-0.6b.gguf --outtype f32

I ran convert_hf_to_gguf.py to convert the fused model to GGUF format
BASH
1 % ollama create fused-qwen3-0.6b -f Modelfile

I ran ollama create to add the model to ollama
TEXT
1FROM ./Fused-Qwen3-0.6b.gguf 2 3# Set parameters for deterministic outputs (temperature 0 = no randomness) 4PARAMETER temperature 0 5PARAMETER top_p 1 6PARAMETER top_k 1 7PARAMETER num_ctx 4096 8PARAMETER repeat_penalty 1.0 9 10# Set the model template - Qwen3 format with thinking support 11TEMPLATE """{{ if .System }}<|im_start|>system 12{{ .System }}<|im_end|> 13{{ end }}<|im_start|>user 14{{ .Prompt }}<|im_end|> 15<|im_start|>assistant 16""" 17 18# Set stop tokens 19PARAMETER stop "<|im_end|>"
BASH
1 % ollama list

I ran ollama list
BASH
1 % ollama run fused-qwen3-0.6b "who was Ada Lovelace"

I ran ollama run and asked "who was Ada Lovelace"
BASH
1 % ollama run fused-qwen3-0.6b "who was Charles Babbage"

I ran ollama run and asked "who was Charles Babbage"