vllm serve
. See their server documentation and the engine arguments documentation.
Chat model
We recommend configuring Llama3.1 8B as your chat model.config.yaml
Autocomplete model
We recommend configuring Qwen2.5-Coder 1.5B as your autocomplete model.config.yaml
Embeddings model
We recommend configuring Nomic Embed Text as your embeddings model.config.yaml
Reranking model
Continue automatically handles vLLM’s response format (which usesresults
instead of data
).
Click here to see a list of reranking model providers.
The continue implementation uses OpenAI under the hood. View the source