OpenVINO™ Mode Server
is scalable inference server for models optimized with OpenVINO™ for Intel
CPU, iGPU, GPU and NPU.
OpenVINO™ Mode Server supports text generation via OpenAI Chat Completions API. Simply select OpenAI provider to point apiBase to running OVMS instance. Refer to this demo on official OVMS documentation to easily set up your own local server.Example configuration once OVMS is launched: