@lemonade
Refreshingly fast LLMs on GPUs and NPUs. Install, run LLMs locally, and integrate with apps in minutes! https://lemonade-server.ai/
Llama 4 Scout is a 17B parameter multimodal AI model with 16 experts, offering industry-leading text and image understanding. https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct
State-of-the-art code LLM with 32B parameters, matching GPT-4o coding abilities. Enhanced code generation, reasoning and fixing capabilities. https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct-GGUF
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.
Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. https://huggingface.co/mistralai/Devstral-Small-2507_gguf
The AMD Qwen-1.5-7B-Chat-Hybrid model is a quantized 7B parameter chat language model designed for hybrid execution across both the NPU and integrated GPU on AMD Ryzen AI-powered PCs. It’s intended for efficient, high-performance local inference using AMD’s OnnxRuntime GenAI framework, maximizing the capabilities of consumer AMD hardware. https://huggingface.co/amd/Qwen1.5-7B-Chat-awq-g128-int4-asym-fp16-onnx-hybrid
Qwen3 Coder 30B A3B Instruct GGUF maintains impressive performance and efficiency, featuring the following key enhancements: Significant Performance among open models on Agentic Coding, Agentic Browser-Use, and other foundational coding tasks. Long-context Capabilities with native support for 256K tokens, extendable up to 1M tokens using Yarn, optimized for repository-scale understanding.