Tools

Overview

This document describes a custom multimodal tool server hosted on GWDG infrastructure. It provides core AI capabilities: image generation, image editing, text-to-speech (TTS), and web search, accessible directly through the Chat AI UI. These tools are designed to enrich user interaction by enabling dynamic media creation and transformation within conversational workflows.

Prerequisites

To use the tool server, the following conditions must be met:

Default LLM: The system should ideally run Qwen3-30B-A3B-Instruct-2507 as the active language model for optimal performance.
Tool Activation: Tools must be enabled in the Chat AI UI by checking the “Enable Tools” box in the settings panel.

Web Search is enabled separately, because it can result in data being sent to external service providers. To enable it, check the “GWDG Tools” and “Web Search” checkboxes in the sidebar as shown below.

Once activated, the agent can discover and invoke tools based on user intent.

Available Tools

Tool Name	Description
`generate_image`	Generates images from text prompts using the FLUX.1-schnell model
`edit_image`	Applies edits to existing images (e.g., inpainting, masking, style transfer) using Qwen-Image-Edit-2511
`speak_text`	Converts text to speech using the XTTSv2 model
`web_search_preview`	Uses a web search provider, such as Google, to provide additional information on an LLM provided query

Web Search

The web search tool allows the AI to look up the latest information from the internet to improve its responses. When enabled, the AI can generate search queries based on your question and the full conversation history, send them to a search engine (such as Google), and use the retrieved results to provide more accurate and up-to-date answers. This is especially useful for topics where current or rapidly changing information is important. Web Search is not available for externally hosted models. You may need to explicitly ask the model to search the web for it to make such a tool call.

Usage Flow

Once tools are enabled, the agent follows a structured flow to interpret user input and invoke the appropriate tool:

Tool Discovery
The agent lists available tools and their capabilities.
Example: “What tools can I use?” → Agent responds with generate_image, edit_image, speak_text.

Tool Selection
Based on user intent, the agent selects the relevant tool.
Example: “Make an image of a glowing jellyfish in deep space” → Agent selects generate_image.

Invocation
The agent sends a structured input payload to the tool server.
Example:

{
  "prompt": "a glowing jellyfish floating in deep space",
  "size": "1024x1024"
}

Response Handling
The agent receives the output and renders it in the UI or stores it for further use.

More Example
- “Change this image to Van Gogh style” → edit_image → applies style transfer

“Make audio from this text: ‘Welcome to GWDG. Your research matters.’” → speak_text → plays audio