Hugging Face

Hugging Face is the main platform for sharing open AI models. It provides inference in two ways. Inference Providers and Inference Endpoints.

Inference Providers

Inference Providers is a serverless service powered by external inference providers and routed through Hugging Face and paid per token.

You can access your access token from Hugging Face and prioritize your providers in settings.

name: My Config
version: 0.0.1
schema: v1

models:
  - name: deepseek
    provider: huggingface-inference-providers
    model: deepseek-ai/DeepSeek-V3.2-Exp
    apiKey: <YOUR_HF_TOKEN>
    apiBase: https://router.huggingface.co/v1

Inference Endpoints is a dedicated service that allows you to run your open models dedicated hardware. It is a more advanced way to get inference from Hugging Face models where you have more control over the whole process.

Before you can use Inference Endpoints, you need to create an endpoint. You can do this by going to Inference Endpoints and clicking on "Create Endpoint".

name: My Config
version: 0.0.1
schema: v1

models:
  - name: deepseek
    provider: huggingface-inference-endpoints
    model: <ENDPOINT_ID>
    apiKey: <YOUR_HF_TOKEN>
    apiBase: https://<YOUR_ENDPOINT_ID>.aws.endpoints.huggingface.cloud

Gemini Inception