Available Models

Current Models

Chat AI currently hosts a large assortment of high-quality, open-source models. All models except external models such as ChatGPT are self-hosted with the guarantee of the highest standards of data protection. These models run completely on our hardware and don’t store any user data.

Models requiring more GPUs may take longer to allocate resources and print responses. Models are updated on-demand as newer versions are released. All models are chosen for their high performance across multiple or specific LLM benchmarks, such as HumanEval, MATH, HellaSwag, MMLU, etc. Certain models are optimal for certain tasks according to the benchmarks. A model performs better at a particular task if its completion options are configured accordingly.

Model NameDeveloperOSKnowledge cutoffContext windowAdvantagesPossible config. for advantages
Llama 3.1 8B InstructMetayesDecember 2023128k tokensFastest general usagedefault (Temp:0.5; top_P:0.5)
Llama 3.1 70B InstructMetayesDecember 2023128k tokensGreat overall performance
Multilingual reasoning and creative writing
default
Temp:0.7; top_P:0.8
Llama 3.1 SauerkrautLM 70B InstructVAGOsolutions x MetayesDecember 2023128k tokensGerman overall usagedefault
Mistral Large InstructMistralyesJuly 2024128k tokensGreat overall performance
Coding and multilingual reasoning
default
Codestral 22B InstructMistral AIyesLate 202133k tokensWriting, editing, fixing and commenting code
Exploratory programming
Temp:0.2; top_P:0.1
Temp:0.6; top_P:0.7
Qwen 2.5 72B InstructAlibaba CloudyesSeptember 2024128k tokensGlobal affairs, Chinese overall usage.
mathematics, logic
default
Temp:0.2; top_P:0.1
ChatGPT 3.5OpenAInoSeptember 202116k tokensLarge computedefault
ChatGPT 4OpenAInoSeptember 20218k tokensLarge computedefault

Meta

Llama 3.1-8B Instruct

The standard model we recommend. It is the most lightweight with the fastest performance and good results across all benchmarks. It is sufficient for general conversations and assistance.

Llama 3.1-70B Instruct

Trained the same way as the above model, but with many more parameters. Achieves the best overall performance of our models, on par with ChatGPT 4, but with a much larger context window and more recent knowledge cutoff. Best in English comprehension and further linguistic reasoning, such as translations, understanding dialects/slang//colloquialism and creative writing.

Llama 3.1 SauerkrautLM 70B Instruct

SauerkrautLM is trained by VAGOsolutions on the above model specifically for prompts in German.

Mistral AI

Mistral Large Instruct

Mistral Large Instruct 2407 is a dense language model with 123B parameters. It achieves state-of-the-art benchmarking scores in general performance, code and reasoning, and instruction following. It is also multi-lingual and supports many European and Asian languages.

Codestral 22B

Codestral 22B is developed specifically for the goal of code completion. It was trained on more than 80 different programming languages, including Python, SQL, bash, C++, Java, and PHP. It uses a context window of 33k for evaluation of large code generating, and can fit on one GPU of our cluster. We recommend using Codestral for all coding purposes.

Alibaba Cloud

Qwen 2.5 72B Instruct

Another large model with benchmark scores slightly above Mistral Large Instruct. Qwen is trained on the most recent data however, and is thus better suited for global affairs. It is the best model for prompts in Chinese, and it also has exceptional performance in mathematics and logic.

OpenAI

Chat-GPT 3.5 and 4

These models should be the most familiar to users. The external OpenAI models are hosted by Microsoft and have some limitations. While Microsoft is contractually bound not to use user data for training or marketing purposes and adheres to GDPR, they do store user inquiries for up to 30 days.

Despite these limitations, the external OpenAI models have their benefits. Chat-GPT 3.5 has benchmark results similar to the Llama 3.1 8B Instruct, and Chat-GPT 4 has results similar to our larger models. They have a large amount of compute dedicated to them, making them potentially more accessible during periods of high demand. However, they may experience a small delay due to the need for an external connection. We highly recommend the self-hosted, open-sourced models to ensure the most security and data privacy.