Available Models

Chat AI provides a large assortment of state-of-the-art open-weight Large Language Models (LLMs) which are hosted on our platform with the highest standards of data protection. The data sent to these models, including the prompts and message contents, are never stored at any location on our systems. Additionally, Chat AI offers models hosted externally such as Anthropic Claude and OpenAI GPT-5.

Available models are regularly upgraded as newer, more capable ones are released. We select models to include in our services based on user demand, cost, and performance across various benchmarks, such as HumanEval, MATH, HellaSwag, MMLU, etc. Certain models are more capable at specific tasks and with specific settings, which are described below to the best of our knowledge.

List of open-weight models, hosted by GWDG

Organization	Model	Open	Release date	Context window in tokens	Advantages	Limitations	Recommended settings
🇨🇭 Swiss AI	Apertus 70B Instruct 2509	yes	Sep 2025	65k	Fully open-source, Multilingual	-	temp=0.8 top_p=0.9
🇨🇳 DeepSeek	DeepSeek R1 Distill Llama 70B	yes	Jan 2025	32k	Good overall performance, faster than R1	Censorship, political bias	default temp=0.7, top_p=0.8
🇫🇷 Mistral	Devstral 2 123B Instruct 2512	yes	Dec 2025	256K	Coding, agentic tasks	-	default
🇺🇸 Google	Gemma 3 27B Instruct	yes	Mar 2025	128k	Vision, good overall performance	-	default
🇺🇸 Google	Gemma 4 31B Instruct	yes	Apr 2026	256k	Vision, great overall performance	-	default
🇸🇬 Z.ai	GLM-4.7	yes	Dec 2025	200k	Great performance	-	temp=1.0 top_p=0.95
🇨🇳 OpenGVLab	InternVL 3.5 30B A3B	yes	Aug 2025	40k	Vision, lightweight and fast	-	default
🇺🇸 Google	MedGemma 27B Instruct	yes	Jul 2025	128k	Vision, medical knowledge	-	default
🇺🇸 Meta	Llama 3.1 8B Instruct	yes	Jul 2024	128k	Fast overall performance	-	default
🇺🇸 Meta	Llama 3.3 70B Instruct	yes	Jul 2024	128k	Good overall performance, reasoning and creative writing	-	default temp=0.7, top_p=0.8
🇫🇷 Mistral	Mistral Large Instruct 3 675B Instruct 2512	yes	Dec 2025	256K	Great overall performance, vision	-	temp < 0.1 higher temp for creative use cases
🇺🇸 OpenAI	GPT OSS 120B	yes	Aug 2025	128k	Great overall performance, fast	-	default
🇨🇳 Alibaba Cloud	Qwen 3 30B A3B Instruct 2507	yes	Jul 2025	256k	Good performance, fast	-	temp=0.6, top_p=0.95
🇨🇳 Alibaba Cloud	Qwen 3 Coder 30B A3B Instruct	yes	Jul 2025	256k	Coding	-	temp=0.7, top_p=0.8
🇨🇳 Alibaba Cloud	Qwen 3 Omni 30B A3B Instruct	yes	Sep 2025	256k	Multimodal	-	default
🇨🇳 Alibaba Cloud	Qwen 3.5 122B A10B	yes	Feb 2026	256K	Vision, great overall performance	-	temp=0.6, top_p=0.95
🇨🇳 Alibaba Cloud	Qwen 3.5 27B	yes	Feb 2026	256K	Vision, good overall performance	-	temp=0.6, top_p=0.95
🇨🇳 Alibaba Cloud	Qwen 3.5 35B A3B	yes	Feb 2026	256K	Vision, good overall performance	-	temp=0.6, top_p=0.95
🇨🇳 Alibaba Cloud	Qwen 3.5 397B A17B	yes	Feb 2026	256K	Vision, great overall performance	-	temp=0.6, top_p=0.95
🇨🇳 Alibaba Cloud	Qwen 3.6 35B A3B	yes	Apr 2026	262K	Vision, great overall performance	-	temp=1.0, top_p=0.95
🇩🇪 OpenGPT-X	Teuken 7B Instruct Research	yes	Nov 2024	128k	European languages	-	default
🇺🇸 intfloat x Mistral	E5 Mistral 7B Instruct	yes	Jan 2024	4096	Embeddings	API Only	-

List of external models, hosted by external providers

Organization	Model	Open	Release date	Context window in tokens	Advantages	Limitations	Recommended settings
🇺🇸 Anthropic	Claude Sonnet 4.6	no	Feb 2026	1M	Balanced performance, fast responses	-	default
🇺🇸 OpenAI	GPT-5.5	no	Apr 2026	1M	General tasks, reasoning	-	default
🇺🇸 OpenAI	GPT-5.4	no	Mar 2026	272K	Professional knowledge work, coding, data analysis, agentic workflows	-	default
🇺🇸 OpenAI	GPT-5.4 Mini	no	Mar 2026	272K	Fast overall performance	-	default
🇺🇸 OpenAI	GPT-5.4 Nano	no	Mar 2026	272K	Fastest overall performance	-	default
🇺🇸 OpenAI	GPT-5.3 Chat	no	Mar 2026	128K	Multimodal chat, adaptive chain-of-thought, safety guardrails	-	default
🇺🇸 OpenAI	GPT-5.2 Chat	no	Dec 2025	400k	Great overall performance	-	default
🇺🇸 OpenAI	GPT-5.2	no	Dec 2025	400k	Great overall performance	-	default
🇺🇸 OpenAI	GPT-5.1 Chat	no	Nov 2025	400k	Great overall performance	-	default
🇺🇸 OpenAI	GPT-5.1	no	Nov 2025	400k	Great overall performance	-	default
🇺🇸 OpenAI	GPT-5 Chat	no	Aug 2025	400k	Good overall performance	-	default
🇺🇸 OpenAI	GPT-5	no	Aug 2025	400k	Good overall performance, reasoning	-	default
🇺🇸 OpenAI	GPT-5 Mini	no	Aug 2025	400k	Fast overall performance	-	default
🇺🇸 OpenAI	GPT-5 Nano	no	Aug 2025	400k	Fastest overall performance	-	default
🇺🇸 OpenAI	o3	no	Apr 2025	200k	-	outdated	default
🇺🇸 OpenAI	o3-mini	no	Jan 2025	200k	-	outdated	default
🇺🇸 OpenAI	GPT-4.1	no	Apr 2025	1M	-	outdated	default
🇺🇸 OpenAI	GPT-4.1 Mini	no	Jun 2024	1M	-	outdated	default

Open-weight models, hosted by GWDG

The models listed in this section are hosted on our platform with the highest standards of data protection. The data sent to these models, including the prompts and message contents, are never stored at any location on our systems.

Apertus 70B Instruct

Apertus is a fully open language model designed to push the boundaries of transparent and compliant AI. It supports over 1,800 languages and a context window size of up to 65,536 tokens, using only fully compliant and open training data. The model achieves comparable performance to closed-source models while respecting opt-out consent of data owners. It was pretrained on 15T tokens with a staged curriculum of web, code, and math data.

DeepSeek R1 Distill Llama 70B

Developed by the Chinese company DeepSeek (深度求索), DeepSeek R1 Distill Llama 70B is a dense model distilled from DeepSeek-R1 but based on LLama 3.3 70B, in order to fit the capabilities and performance of R1 into a 70B parameter-size model.

Warning

DeepSeek models have been reported to produce politically biased responses, and censor certain topics that are sensitive for the Chinese government.

Devstral 2 123B Instruct 2512

Developed by mistralai, Devstral 2 is an agentic LLM designed for software engineering and coding tasks. It is capable of exploring codebases, working with multiple files, and powering software engineering agents.

Google Gemma 3 27B Instruct

Gemma is Google’s family of light, open-weights models developed with the same research used in the development of its commercial Gemini model series. Gemma 3 27B Instruct is quite fast and thanks to its support for vision (image input), it is a great choice for all sorts of conversations.

Google Gemma 4 31B Instruct

Gemma 4 models offer frontier-level performance, well-suited for reasoning, agentic workflows, coding, and multimodal understanding.

GLM-4.7

GLM-4.7 is a coding-focused model that delivers significant improvements over its predecessor in multilingual agentic coding and terminal-based tasks. It achieves strong performance on SWE-bench, SWE-bench Multilingual, and Terminal Bench 2.0. GLM-4.7 also excels at tool use, web browsing, and mathematical reasoning, with notable gains on benchmarks like HLE and τ²-Bench.

InternVL 3.5 30B-A3B

InternVL 3.5 30B-A3B is a lightweight, fast and powerful multimodal model developed by OpenGVLab. It significantly advances versatility, reasoning capability, and efficiency, by featuring a Visual Resolution Router (ViR) for dynamic visual token adjustment and Decoupled Vision-Language Deployment (DvD) for efficient GPU load balancing, achieving up to 4× inference speedup compared to its predecessor. The model excels at multimodal reasoning, OCR, document understanding, multi-image comprehension, video understanding, GUI tasks, and embodied agency.

Google MedGemma 27B Instruct

MedGemma 27B Instruct is a variant of Gemma 3 suitable for medical text and image comprehension. It has been trained on a variety of medical image data, including chest X-rays, dermatology images, ophthalmology images, and histopathology slides, as well as medical text, such as medical question-answer pairs, and FHIR-based electronic health record data. MedGemma variants have been evaluated on a range of clinically relevant benchmarks to illustrate their baseline performance.

Meta Llama 3.1 8B Instruct

The standard model we recommend. It is the most lightweight with the fastest performance and good results across all benchmarks. It is sufficient for general conversations and assistance.

Meta Llama 3.3 70B Instruct

Achieves good overall performance, on par with GPT-4, but with a much larger context window and more recent knowledge cutoff. Best in English comprehension and further linguistic reasoning, such as translations, understanding dialects, slang, colloquialism and creative writing.

Mistral Large 3 Instruct 675B Instruct 2512

Developed by Mistral AI, Mistral Large 3 is a general-purpose multimodal MoE model with 675B total and 41B active parameters. This model is fine-tuned for instruction tasks, ideal for chat, agentic and instruction based use cases. It supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic, and has a large 256K context window size.

OpenAI GPT OSS 120B

In August 2025, OpenAI released the gpt-oss model series, consisting of two open-weight LLMs that are optimized for faster inference with state-of-the-art performance across many domains, including reasoning and tool use. According to OpenAI, the gpt-oss-120b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks.

Qwen 3 30B A3B Instruct 2507

This MoE model features 30.5B total parameters with 3.3B activated parameters for efficient inference. It delivers significant improvements in instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage, with better alignment for subjective and open-ended tasks. The model supports a 256K native context length and operates in non-thinking mode, achieving strong performance across knowledge, reasoning, coding, and multilingual benchmarks.

Qwen 3 Coder 30B A3B Instruct

Qwen 3 Coder 30B A3B Instruct is a specialized coding model that achieves strong performance on agentic coding, browser-use, and other foundational coding tasks among open models.

Qwen 3 Omni 30B A3B Instruct

Qwen3 Omni is a natively multilingual omni-modal foundation model that processes text, images, audio, and video. It achieves state-of-the-art performance on many audio/video benchmarks with ASR, audio understanding, and voice conversation performance comparable to Gemini 2.5 Pro. The model features a novel MoE-based Thinker–Talker architecture with AuT pretraining, supports 119 text languages, 19 speech input languages, and enables low-latency interaction with flexible control via system prompts.

Qwen 3.5 122B A10B

Qwen 3.5 122B A10B is a powerful language model developed by Alibaba Cloud. With 122 billion parameters it delivers strong performance across reasoning, coding, and general tasks. The model supports vision capabilities for multimodal applications.

Qwen 3.5 27B

Qwen 3.5 27B is a large language model developed by Alibaba Cloud. With 27 billion parameters it provides good overall performance across various tasks while being more memory-efficient. The model supports vision capabilities for understanding and processing images alongside text.

Qwen 3.5 35B A3B

Qwen 3.5 35B A3B is an MoE model with 35 billion total parameters and 3 billion activated parameters for efficient inference. It delivers good overall performance across reasoning, coding, and general tasks. The model supports vision capabilities for multimodal applications and operates efficiently.

Qwen 3.5 397B A17B

Qwen 3.5 397B A17B is a MoE model with 397 billion total parameters and 17 billion activated parameters. It represents one of the most powerful open-weight models available, delivering exceptional performance across reasoning, coding, mathematics, and general tasks. The model supports vision capabilities, and provides state-of-the-art performance among open models.

Qwen 3.6 35B A3B

Qwen 3.6 35B A3B is an MoE model with 35 billion total parameters and 3 billion activated parameters for efficient inference. Built on direct feedback from the community, Qwen 3.6 prioritizes stability, real-world utility, including for coding and agentic tasks.

OpenGPT-X Teuken 7B Instruct Research

OpenGPT-X is a research project funded by the German Federal Ministry of Economics and Climate Protection (BMWK) and led by Fraunhofer, Forschungszentrum Jülich, TU Dresden, and DFKI. Teuken 7B Instruct Research v0.4 is an instruction-tuned 7B parameter multilingual LLM pre-trained with 4T tokens, focusing on covering all 24 EU languages and reflecting European values.

External models, hosted by external providers

Warning

These OpenAI and Anthropic models are hosted on external providers, and Chat AI only relays the contents of your messages to their servers. We therefore recommend the open-weight models, hosted by us, to ensure the highest security and data privacy.

Anthropic Claude Sonnet 4.6

Claude Sonnet 4.6 provides an excellent balance between capability and speed. It is optimized for production workloads and delivers fast, reliable responses for coding, analysis, and conversational tasks.

OpenAI GPT-5, 5.1, 5.2, 5.3, 5.4, and 5.5 Series

OpenAI’s GPT-5 series models achieve state-of-the-art performance across various benchmarks. The series consists of the following models along with their intended use cases:

OpenAI GPT-5.5: Built for the most complex professional work, including coding use cases, tool-heavy agents, grounded assistants, and customer-facing workflows where execution quality and response polish are important.
OpenAI GPT-5.4: Built for professional knowledge work, including document and spreadsheet tasks, coding, data analysis, agentic workflows, and software automation.
OpenAI GPT-5.4 Mini: A lightweight variant of GPT-5.4 for cost-sensitive applications.
OpenAI GPT-5.4 Nano: A highly optimized variant of GPT-5.4. Ideal for applications requiring low latency.
OpenAI GPT-5.3 Chat: Fast, context-aware chat experiences with adaptive chain-of-thought reasoning, tuned safety and instruction-following, multimodal chat, and agent development support. Ideal for interactive assistants, customer care, IT helpdesk, HR, and sales enablement.
OpenAI GPT-5/5.1/5.2 Chat: Designed for advanced, natural, multimodal, and context-aware conversations.
OpenAI GPT-5/5.1/5.2: Designed for logic-heavy and multi-step tasks.
OpenAI GPT-5 Mini: A lightweight variant of GPT-5 for cost-sensitive applications.
OpenAI GPT-5 Nano: A highly optimized variant of GPT-5.

OpenAI GPT-4.1

OpenAI’s GPT-4.1-class models improve on the older GPT-4 series. These models also outperform GPT-4o and GPT-4o Mini, especially in coding and instruction following. They have a large context window size of 1M tokens, with improved long-context comprehension, and an updated knowledge cutoff of June 2024.

OpenAI GPT-4.1 Mini

This was developed as a more cost-effective and faster alternative to GPT-4.1.

OpenAI o1 and o1 Mini

OpenAI’s o1-class models were developed to perform complex reasoning tasks. These models have now been superceded by the o3-series, and are therefore no longer recommended.

OpenAI o3

Released in April 2025, OpenAI’s o3-class models were developed to perform complex reasoning tasks across the domains of coding, math, science, visual perception, and more. These models have an iterative thought process, and therefore take their time to process internally before responding to the user. The thought process for o3 models are not shown to the user.

OpenAI o3 Mini

This was developed as a more cost-effective and faster alternative to o3.