The local model service must expose an internal OpenAI-compatible API that Aperium calls through the dedicated local provider. The provider must already be implemented and verified before your deployment begins.Documentation Index
Fetch the complete documentation index at: https://docs.aperium.apps.hillspire.com/llms.txt
Use this file to discover all available pages before exploring further.
Model-serving requirements
- The model is pinned by exact artifact version, image digest, and serving configuration.
- The serving endpoint is internal-only.
- Readiness must mean the model is loaded and able to answer a small chat request.
- Health checks must fail when the model is unloaded, the GPU is unavailable, or the inference runtime cannot allocate the required memory.
- The model must support reliable structured tool/function calls with the tool schemas that Aperium sends for every enabled MCP connector.
- Context length must be sufficient for the system prompt, conversation context, and the active MCP tool schema set.
- GPU memory, batching, concurrency, and max-token settings must be sized from load tests, not defaults.
Required env profile
Use the dedicated local OpenAI-compatible provider with an internalbase_url. These values map to the LLM providers section of the env reference:
Why fallback is off
The on-prem deployment shape exists to keep inference inside your network boundary. Re-enabling cloud fallback silently violates that contract. Treat any change toENABLE_LLM_FALLBACK as a security review item: a release that flips it on must be accompanied by an explicit, documented exception from your security model.