lemonade/llamacpp icon
public
Published on 7/17/2025
llama.cpp

llama.cpp is an experimental backend in Lemonade Server that enables running GGUF models using llama.cpp's Vulkan-powered server for both CPU and GPU, alongside the default OGA backend. It provides support for chat, embeddings, and reranking endpoints within Lemonade's unified API.

Models
Context
openai Llama-4-Scout-17B-16E-Instruct-GGUF model icon

Llama-4-Scout-17B-16E-Instruct-GGUF

OpenAI

openai Qwen2.5-Coder-32B-Instruct-GGUF model icon

Qwen2.5-Coder-32B-Instruct-GGUF

OpenAI

openai Qwen3-30B-A3B-GGUF model icon

Qwen3-30B-A3B-GGUF

OpenAI

openai Devstral-Small-2507-GGUF model icon

Devstral-Small-2507-GGUF

OpenAI

No Rules configured

No Docs configured

Prompts

Learn more

No Prompts configured

Context

Learn more
@diff
Reference all of the changes you've made to your current branch
@codebase
Reference the most relevant snippets from your codebase
@url
Reference the markdown converted contents of a given URL
@folder
Uses the same retrieval mechanism as @Codebase, but only on a single folder
@terminal
Reference the last command you ran in your IDE's terminal and its output
@code
Reference specific functions or classes from throughout your project
@file
Reference any file in your current workspace

No Data configured

MCP Servers

Learn more

No MCP Servers configured