一. 下载llama.cpp

https://github.com/ggerganov/llama.cpp.git

二. 搭建容器

docker run --name ollama_build -v xxx:xxx python:3.10.18-bullseye

xxx挂载需要构建GGUF的模型和llama.cpp目录映射

三. 基础环境安装和构建GGUF文件

deb http://mirrors.aliyun.com/debian/ bullseye main non-free contrib
deb-src http://mirrors.aliyun.com/debian/ bullseye main non-free contrib
deb http://mirrors.aliyun.com/debian-security/ bullseye-security main
deb-src http://mirrors.aliyun.com/debian-security/ bullseye-security main
deb http://mirrors.aliyun.com/debian/ bullseye-updates main non-free contrib
deb-src http://mirrors.aliyun.com/debian/ bullseye-updates main non-free contrib
# deb http://mirrors.aliyun.com/debian/ bullseye-backports main non-free contrib
# deb-src http://mirrors.aliyun.com/debian/ bullseye-backports main non-free contrib

docker exec -it ollama_build /bin/bash

# 将上面的文件也复制到挂载目录下
cp ./source.list /etc/apt

apt-get clean && apt-get update && apt-get install -y build-essential

# 进入llama.cpp目录
cd llama.cpp
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
# 执行转换
python convert_hf_to_gguf.py \
    /models/Qwen3-8B-FP8 \
    --outfile ./qwen3-8b-fp8.gguf \
    --outtype f16

四. ollama构建模型

FROM ./qwen3-8b-fp8.gguf

# 进入ollama容器中，然后将构建后的gguf放入某个目录内，将上方文件内容替换为构建的gguf名称保存文件名为：Modelfile
ollama create qwen3:8b-f16 -f ./Modelfile
# 等待加载完即可使用

PS：上方Modelfile文件内只是很简单导入，如果需要精确需要该模型的准确模板可以看看ollama官方是否有对应模型，有可以直接copy官方模板再将FROM引入文件改掉即可

ollama show --modelfile qwen3:8b > Modefile

五. ollama环境变量说明

开启调试模式：

OLLAMA_DEBUG=1

设置DEBUG模式：

OLLAMA_LOG_LEVEL=DEBUG

上下文长度(这个是全局设置)：默认2048

OLLAMA_CONTEXT_LENGTH=14000

上下文或者在参数里面的options里面添加：num_ctx=1400

ollama手动添加模型

一. 下载llama.cpp https://github.com/ggerganov/llama.cpp.git 二. 搭建容器 docker run --name ollama_build -v xxx:xxx python:3.10.18-bullseye xxx挂载需要构建GGUF的模型和l

一. 下载llama.cpp

二. 搭建容器

三. 基础环境安装和构建GGUF文件

四. ollama构建模型

五. ollama环境变量说明