Llama Stack
Llama Stack is an open-source library that standardizes the core building blocks that simplify AI application development. It codifies best practices across the Llama ecosystem. More specifically, it provides
- Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
- Plugin architecture to support the rich ecosystem of different API implementations in various environments, including local development, on-premises, cloud, and mobile.
- Prepackaged verified distributions which offer a one-stop solution for developers to get started quickly and reliably in any environment.
- Multiple developer interfaces like CLI and SDKs for Python, Typescript, iOS, and Android.
- Standalone applications as examples for how to build production-grade AI applications with Llama Stack.
To try Llama Stack locally, run:
curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/install.sh | bash
Learn more about how to get started with llama stack in this guide
Chat model
We recommend configuring Llama 4 Maverick as your chat model.
- YAML
- JSON
config.yaml
models:
- name: Llama4 Maverick
provider: llamastack
model: meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
apiBase: http://<llama stack endpoint>/v1/openai/v1/
config.json
{
"models": [
{
"title": "Llama4 Maverick",
"provider": "llamastack",
"model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
"apiBase": "http://<llama stack endpoint>/v1/openai/v1/"
}
]
}
Autocomplete model
We recommend configuring CodeLlama 7B as your autocomplete model.
- YAML
- JSON
config.yaml
models:
- name: CodeLlama 7B
provider: llamastack
model: codellama:7b
apiBase: http://<llama stack endpoint>/v1/openai/v1/
roles:
- autocomplete
config.json
{
"tabAutocompleteModel": {
"title": "CodeLlama 7B",
"provider": "llamastack",
"model": "codellama:7b",
"apiBase": "http://<llama stack endpoint>/v1/openai/v1/"
}
}
Embeddings model
By default, Llama Stack uses all-MiniLM-L6-v2 as the embeddings model.
- YAML
- JSON
config.yaml
models:
- name: all-MiniLM-L6-v2
provider: llamastack
model: all-MiniLM-L6-v2
apiBase: http://<llama stack endpoint>/v1/openai/v1/
roles:
- embed
config.json
{
"embeddingsProvider": {
"provider": "llamastack",
"model": "all-MiniLM-L6-v2",
"apiBase": "http://<llama stack endpoint>/v1/openai/v1/"
}
}
Reranking model
Llama Stack currently did not support Reranking API yet.