Tools
Overview
This document describes a custom multimodal tool server hosted on GWDG infrastructure. It provides core AI capabilities: image generation, image editing, text-to-speech (TTS), and web search, accessible directly through the Chat AI UI. These tools are designed to enrich user interaction by enabling dynamic media creation and transformation within conversational workflows.
Prerequisites
To use the tool server, the following conditions must be met:
- Default LLM: The system should ideally run Qwen3-30B-A3B-Instruct-2507 as the active language model for optimal performance.
- Tool Activation: Tools must be enabled in the Chat AI UI by checking the âEnable Toolsâ box in the settings panel.
Web Search is enabled separately, because it can result in data being sent to external service providers.
To enable it, check the “GWDG Tools” and “Web Search” checkboxes in the sidebar as shown below.
Once activated, the agent can discover and invoke tools based on user intent.
Available Tools
Tool Name | Description |
---|---|
generate_image | Generates images from text prompts using the FLUX.1-schnell model |
edit_image | Applies edits to existing images (e.g., inpainting, masking, style transfer) using Qwen-Image-Edit |
speak_text | Converts text to speech using the XTTSv2 model |
web_search_preview | Uses a web search provider, such as Google, to provide additional information on an LLM provided query |
Web Search
The web search tool allows the AI to look up the latest information from the internet to improve its responses. When enabled, the AI can generate search queries based on your question and the full conversation history, send them to a search engine (such as Google), and use the retrieved results to provide more accurate and up-to-date answers. This is especially useful for topics where current or rapidly changing information is important. Web Search is not available for externally hosted models. You may need to explicitly ask the model to search the web for it to make such a tool call.
Usage Flow
Once tools are enabled, the agent follows a structured flow to interpret user input and invoke the appropriate tool:
- Tool Discovery
The agent lists available tools and their capabilities.
Example: âWhat tools can I use?â â Agent responds withgenerate_image
,edit_image
,speak_text
.
Tool Selection
Based on user intent, the agent selects the relevant tool.
Example: âMake an image of a glowing jellyfish in deep spaceâ â Agent selectsgenerate_image
.Invocation
The agent sends a structured input payload to the tool server.
Example:{ "prompt": "a glowing jellyfish floating in deep space", "size": "1024x1024" }
Response Handling
The agent receives the output and renders it in the UI or stores it for further use.
More Example
- âChange this image to Van Gogh styleâ â
edit_image
â applies style transfer
- âChange this image to Van Gogh styleâ â
- âMake audio from this text: âWelcome to GWDG. Your research matters.ââ â
speak_text
â plays audio