Skip to main content

vLLM

Run the OpenAI-compatible server by vLLM using vllm serve. See their server documentation and the engine arguments documentation.

vllm serve NousResearch/Meta-Llama-3-8B-Instruct --max-model-len 1024

The continue implementation uses OpenAI under the hood and automatically selects the available model. You only need to set the apiBase like this:

config.json
{
"models": [
{
"title": "My vLLM OpenAI-compatible server",
"apiBase": "http://localhost:8000/v1"
}
]
}

View the source