lemonade/llamacpp

lemonade/llamacpp icon

public

Published on 7/17/2025

llama.cpp

llama.cpp is an experimental backend in Lemonade Server that enables running GGUF models using llama.cpp's Vulkan-powered server for both CPU and GPU, alongside the default OGA backend. It provides support for chat, embeddings, and reranking endpoints within Lemonade's unified API.

Models

Context

Models

Llama-4-Scout-17B-16E-Instruct-GGUF

OpenAI

Qwen2.5-Coder-32B-Instruct-GGUF

OpenAI

Qwen3-30B-A3B-GGUF

OpenAI

Devstral-Small-2507-GGUF

OpenAI

Rules

No Rules configured

Docs

No Docs configured

Prompts

No Prompts configured

Context

@diff

Reference all of the changes you've made to your current branch

@codebase

Reference the most relevant snippets from your codebase

@url

Reference the markdown converted contents of a given URL

@folder

Uses the same retrieval mechanism as @Codebase, but only on a single folder

@terminal

Reference the last command you ran in your IDE's terminal and its output

@code

Reference specific functions or classes from throughout your project

@file

Reference any file in your current workspace

Data

No Data configured

MCP Servers

No MCP Servers configured