Ollama API Documentation
This document details the API endpoints available for interacting with the Ollama language model server.
Introduction
The Ollama API allows you to interact with and manage language models hosted on your Ollama server. You can generate text, chat with models, create and manage models, and more.
Endpoints
Generate a Completion
Endpoint: POST /api/generate
Description: Generates a text completion for a given prompt using a specified model. This endpoint supports streaming responses.
Parameters:
model
(required): The name of the model to use (see Model Names).prompt
: The prompt to generate a response for.suffix
: The text to append after the model's response.images
(optional): A list of base64-encoded images for multimodal models.format
(optional): The response format. Currently, only "json" is supported.options
(optional): Additional model parameters (see the Modelfile documentation (opens in a new tab)).system
(optional): Overrides the system message defined in the Modelfile.template
(optional): Overrides the prompt template defined in the Modelfile.context
(optional): Encoded conversation history from a previous request for maintaining conversational memory.stream
(optional): Set tofalse
to disable streaming responses.raw
(optional): Set totrue
to disable prompt templating.keep_alive
(optional): Controls how long the model stays loaded in memory (default:5m
).
Examples: See the original document for detailed examples of using this endpoint.
Generate a Chat Completion
Endpoint: POST /api/chat
Description: Generates the next message in a chat conversation using a specified model. Supports streaming responses.
Parameters:
model
(required): The name of the model to use.messages
(required): An array of message objects representing the chat history.tools
(optional): Tools for the model to use (requiresstream
to befalse
).format
(optional): Response format (currently only "json" is supported).options
(optional): Additional model parameters.stream
(optional): Set tofalse
to disable streaming responses.keep_alive
(optional): Controls model memory retention (default:5m
).
Message Object Fields:
role
: The role of the message ("system", "user", "assistant", or "tool").content
: The text content of the message.images
(optional): A list of images for multimodal models.tool_calls
(optional): A list of tools the model wants to use.
Examples: See the original document for detailed examples.
Create a Model
Endpoint: POST /api/create
Description: Creates a new model from a Modelfile.
Parameters:
name
(required): The name of the new model.modelfile
(optional): The contents of the Modelfile.stream
(optional): Set tofalse
to disable streaming responses.path
(optional): Path to the Modelfile.
Examples: See the original document for examples.
Check if a Blob Exists
Endpoint: HEAD /api/blobs/:digest
Description: Checks if a file blob exists on the Ollama server.
Query Parameters:
digest
(required): The SHA256 digest of the blob.
Examples: See the original document for examples.
Create a Blob
Endpoint: POST /api/blobs/:digest
Description: Creates a blob from a file on the server.
Query Parameters:
digest
(required): The expected SHA256 digest of the file.
Examples: See the original document for examples.
List Local Models
Endpoint: GET /api/tags
Description: Lists all locally available models.
Examples: See the original document for examples.
Show Model Information
Endpoint: POST /api/show
Description: Displays information about a specific model.
Parameters:
name
(required): The name of the model.verbose
(optional): Set totrue
for a more detailed response.
Examples: See the original document for examples.
Copy a Model
Endpoint: POST /api/copy
Description: Creates a copy of a model under a different name.
Parameters:
source
(required): The name of the model to copy.destination
(required): The new name for the copied model.
Examples: See the original document for examples.
Delete a Model
Endpoint: DELETE /api/delete
Description: Deletes a model and its associated data.
Parameters:
name
(required): The name of the model to delete.
Examples: See the original document for examples.
Pull a Model
Endpoint: POST /api/pull
Description: Downloads a model from the Ollama library.
Parameters:
name
(required): The name of the model to download.insecure
(optional): Allows insecure connections (use only for development).stream
(optional): Set tofalse
to disable streaming responses.
Examples: See the original document for examples.
Push a Model
Endpoint: POST /api/push
Description: Uploads a model to the Ollama library.
Parameters:
name
(required): The name of the model to upload (including namespace and tag).insecure
(optional): Allows insecure connections (use only for development).stream
(optional): Set tofalse
to disable streaming responses.
Examples: See the original document for examples.
Generate Embeddings
Endpoint: POST /api/embed
Description: Generates embeddings for given text using a specified model.
Parameters:
model
(required): The name of the model to use.input
(required): The text or list of texts to generate embeddings for.truncate
(optional): Truncates input to fit context length (defaults totrue
).options
(optional): Additional model parameters.keep_alive
(optional): Controls model memory retention (default:5m
).
Examples: See the original document for examples.
List Running Models
Endpoint: GET /api/ps
Description: Lists models currently loaded in memory.
Examples: See the original document for examples.
Generate Embedding (Deprecated)
Endpoint: POST /api/embeddings
Description: This endpoint is deprecated. Use /api/embed
instead.
Parameters:
model
(required): The name of the model to use.prompt
(required): The text to generate embeddings for.options
(optional): Additional model parameters.keep_alive
(optional): Controls model memory retention (default:5m
).
Examples: See the original document for examples.
Conventions
Model Names
Model names follow the format model:tag
, where model
can optionally include a namespace (e.g., example/model
). The tag
is optional and defaults to latest
. Examples: orca-mini:3b-q4_1
, llama3:70b
, llama3
.
Durations
All durations are returned in nanoseconds.
Streaming Responses
Some endpoints stream responses as JSON objects. Streaming can be disabled by setting the stream
parameter to false
.