Skip to main content

Vision

The Mistral vision model lets you analyze images directly in your VIKTOR app. Upload an image, send it to the model, and get a text description back — all without an external API key. For basic text chat, see the Viktor LLM overview.

Chat Completions API only

The Mistral vision model is only supported via the Chat Completions API. The Responses API is not supported for this model.

Base64 images only

VIKTOR's LLM endpoint only supports base64-encoded data URIs for images. HTTP URLs (e.g. https://example.com/image.jpg) are not supported — images must be encoded inline.

Example

The following example shows a complete app that accepts an image upload and sends it to the Mistral vision model for analysis:

import base64

import viktor as vkt
from openai import OpenAI

client = OpenAI(
base_url=vkt.ViktorOpenAI.get_base_url(version="v1"),
api_key=vkt.ViktorOpenAI.get_api_key(),
)

class Parametrization(vkt.Parametrization):
image_file = vkt.FileField(
"Upload Image",
file_types=[".jpg", ".jpeg"],
description="Upload a JPG image to be analyzed by the AI model.",
)


class Controller(vkt.Controller):
parametrization = Parametrization

@vkt.WebView("Image Analysis")
def analyze_image(self, params, **kwargs):
if not params.image_file:
return vkt.WebResult(html="<p>Please upload an image to get started.</p>")

image_bytes = params.image_file.file.getvalue_binary()
base64_image = base64.b64encode(image_bytes).decode("utf-8")
image_url = f"data:image/jpeg;base64,{base64_image}"

messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe what is shown in this image."},
{"type": "image_url", "image_url": {"url": image_url}},
],
}
]

response = client.chat.completions.create(
model="mistral.ministral-3-14b-instruct",
messages=messages,
)

result_text = response.choices[0].message.content
return vkt.WebResult(html=f"<p>{result_text}</p>")

Key patterns

  • Base64 encoding — read the image as bytes, encode with base64.b64encode, and wrap in a data:image/jpeg;base64,... data URI.
  • Vision message format — the content field is a list containing a text item (your prompt) and an image_url item (the base64 data URI).
  • Model — use mistral.ministral-3-14b-instruct as the model ID.