Tools

Overview

This document describes a custom multimodal tool server hosted on GWDG infrastructure. It provides core AI capabilities: image generation, image editing, text-to-speech (TTS), and web search, accessible directly through the Chat AI UI. These tools are designed to enrich user interaction by enabling dynamic media creation and transformation within conversational workflows.

Prerequisites

To use the tool server, the following conditions must be met:

  • Default LLM: The system should ideally run Qwen3-30B-A3B-Instruct-2507 as the active language model for optimal performance.
  • Tool Activation: Tools must be enabled in the Chat AI UI by checking the “Enable Tools” box in the settings panel.

Web Interface Example Web Interface Example

Web Search is enabled separately, because it can result in data being sent to external service providers. To enable it, check the “GWDG Tools” and “Web Search” checkboxes in the sidebar as shown below. Screenshot of Chat AI sidebar with checked “GWDG Tools”, “Web Search” checkboxes Screenshot of Chat AI sidebar with checked “GWDG Tools”, “Web Search” checkboxes

Once activated, the agent can discover and invoke tools based on user intent.

Available Tools

Tool NameDescription
generate_imageGenerates images from text prompts using the FLUX.1-schnell model
edit_imageApplies edits to existing images (e.g., inpainting, masking, style transfer) using Qwen-Image-Edit
speak_textConverts text to speech using the XTTSv2 model
web_search_previewUses a web search provider, such as Google, to provide additional information on an LLM provided query

The web search tool allows the AI to look up the latest information from the internet to improve its responses. When enabled, the AI can generate search queries based on your question and the full conversation history, send them to a search engine (such as Google), and use the retrieved results to provide more accurate and up-to-date answers. This is especially useful for topics where current or rapidly changing information is important. Web Search is not available for externally hosted models. You may need to explicitly ask the model to search the web for it to make such a tool call.

Usage Flow

Once tools are enabled, the agent follows a structured flow to interpret user input and invoke the appropriate tool:

  1. Tool Discovery
    The agent lists available tools and their capabilities.
    Example: “What tools can I use?” → Agent responds with generate_image, edit_image, speak_text.

Web Interface Example Web Interface Example

  1. Tool Selection
    Based on user intent, the agent selects the relevant tool.
    Example: “Make an image of a glowing jellyfish in deep space” → Agent selects generate_image.

  2. Invocation
    The agent sends a structured input payload to the tool server.
    Example:

    {
      "prompt": "a glowing jellyfish floating in deep space",
      "size": "1024x1024"
    }
  3. Response Handling
    The agent receives the output and renders it in the UI or stores it for further use.

Web Interface Example Web Interface Example

  1. More Example

    • “Change this image to Van Gogh style” → edit_image → applies style transfer

Web Interface Example Web Interface Example

  • “Make audio from this text: ‘Welcome to GWDG. Your research matters.’” → speak_text → plays audio

Web Interface Example Web Interface Example