Configure OpenVINO Model Server with Continue to use Intel-optimized models for CPU, iGPU, GPU and NPU via the OpenAI-compatible API, supporting code completion with models like CodeLlama and Qwen
apiBase
to running OVMS instance. Refer to this demo on official OVMS documentation to easily set up your own local server.
Example configuration once OVMS is launched: