vllm serve
. See their server documentation and the engine arguments documentation.
Chat model
We recommend configuring Llama3.1 8B as your chat model.- YAML
- JSON
config.yaml
Autocomplete model
We recommend configuring Qwen2.5-Coder 1.5B as your autocomplete model.- YAML
- JSON
config.yaml
Embeddings model
We recommend configuring Nomic Embed Text as your embeddings model.- YAML
- JSON
config.yaml
Reranking model
Continue automatically handles vLLM’s response format (which usesresults
instead of data
).
Click here to see a list of reranking model providers.
The continue implementation uses OpenAI under the hood. View the source