Ollama - Kigma Docs

Ollama API Documentation

This document details the API endpoints available for interacting with the Ollama language model server.

Introduction

The Ollama API allows you to interact with and manage language models hosted on your Ollama server. You can generate text, chat with models, create and manage models, and more.

Endpoints

Generate a Completion

Endpoint: POST /api/generate

Description: Generates a text completion for a given prompt using a specified model. This endpoint supports streaming responses.

Parameters:

model (required): The name of the model to use (see Model Names).
prompt: The prompt to generate a response for.
suffix: The text to append after the model's response.
images (optional): A list of base64-encoded images for multimodal models.
format (optional): The response format. Currently, only "json" is supported.
options (optional): Additional model parameters (see the Modelfile documentation (opens in a new tab)).
system (optional): Overrides the system message defined in the Modelfile.
template (optional): Overrides the prompt template defined in the Modelfile.
context (optional): Encoded conversation history from a previous request for maintaining conversational memory.
stream (optional): Set to false to disable streaming responses.
raw (optional): Set to true to disable prompt templating.
keep_alive (optional): Controls how long the model stays loaded in memory (default: 5m).

Examples: See the original document for detailed examples of using this endpoint.

Generate a Chat Completion

Endpoint: POST /api/chat

Description: Generates the next message in a chat conversation using a specified model. Supports streaming responses.

Parameters:

model (required): The name of the model to use.
messages (required): An array of message objects representing the chat history.
tools (optional): Tools for the model to use (requires stream to be false).
format (optional): Response format (currently only "json" is supported).
options (optional): Additional model parameters.
stream (optional): Set to false to disable streaming responses.
keep_alive (optional): Controls model memory retention (default: 5m).

Message Object Fields:

role: The role of the message ("system", "user", "assistant", or "tool").
content: The text content of the message.
images (optional): A list of images for multimodal models.
tool_calls (optional): A list of tools the model wants to use.

Examples: See the original document for detailed examples.

Create a Model

Endpoint: POST /api/create

Description: Creates a new model from a Modelfile.

Parameters:

name (required): The name of the new model.
modelfile (optional): The contents of the Modelfile.
stream (optional): Set to false to disable streaming responses.
path (optional): Path to the Modelfile.

Examples: See the original document for examples.

Check if a Blob Exists

Endpoint: HEAD /api/blobs/:digest

Description: Checks if a file blob exists on the Ollama server.

Query Parameters:

digest (required): The SHA256 digest of the blob.

Examples: See the original document for examples.

Create a Blob

Endpoint: POST /api/blobs/:digest

Description: Creates a blob from a file on the server.

Query Parameters:

digest (required): The expected SHA256 digest of the file.

Examples: See the original document for examples.

List Local Models

Endpoint: GET /api/tags

Description: Lists all locally available models.

Examples: See the original document for examples.

Show Model Information

Endpoint: POST /api/show

Description: Displays information about a specific model.

Parameters:

name (required): The name of the model.
verbose (optional): Set to true for a more detailed response.

Examples: See the original document for examples.

Copy a Model

Endpoint: POST /api/copy

Description: Creates a copy of a model under a different name.

Parameters:

source (required): The name of the model to copy.
destination (required): The new name for the copied model.

Examples: See the original document for examples.

Delete a Model

Endpoint: DELETE /api/delete

Description: Deletes a model and its associated data.

Parameters:

name (required): The name of the model to delete.

Examples: See the original document for examples.

Pull a Model

Endpoint: POST /api/pull

Description: Downloads a model from the Ollama library.

Parameters:

name (required): The name of the model to download.
insecure (optional): Allows insecure connections (use only for development).
stream (optional): Set to false to disable streaming responses.

Examples: See the original document for examples.

Push a Model

Endpoint: POST /api/push

Description: Uploads a model to the Ollama library.

Parameters:

name (required): The name of the model to upload (including namespace and tag).
insecure (optional): Allows insecure connections (use only for development).
stream (optional): Set to false to disable streaming responses.

Examples: See the original document for examples.

Generate Embeddings

Endpoint: POST /api/embed

Description: Generates embeddings for given text using a specified model.

Parameters:

model (required): The name of the model to use.
input (required): The text or list of texts to generate embeddings for.
truncate (optional): Truncates input to fit context length (defaults to true).
options (optional): Additional model parameters.
keep_alive (optional): Controls model memory retention (default: 5m).

Examples: See the original document for examples.

List Running Models

Endpoint: GET /api/ps

Description: Lists models currently loaded in memory.

Examples: See the original document for examples.

Generate Embedding (Deprecated)

Endpoint: POST /api/embeddings

Description: This endpoint is deprecated. Use /api/embed instead.

Parameters:

model (required): The name of the model to use.
prompt (required): The text to generate embeddings for.
options (optional): Additional model parameters.
keep_alive (optional): Controls model memory retention (default: 5m).

Examples: See the original document for examples.

Conventions

Model Names

Model names follow the format model:tag, where model can optionally include a namespace (e.g., example/model). The tag is optional and defaults to latest. Examples: orca-mini:3b-q4_1, llama3:70b, llama3.

Durations

All durations are returned in nanoseconds.

Streaming Responses

Some endpoints stream responses as JSON objects. Streaming can be disabled by setting the stream parameter to false.