SAIA

SAIA is our Scalable Artificial Intelligence (AI) Accelerator that hosts our AI services. Such services include Chat AI and CoCo AI, with more to be added soon. SAIA API (application programming interface) keys can be requested and used to access the services from within your code. API keys are not necessary to use the Chat AI web interface.

SAIA Workflow SAIA Workflow

API Request

If a user has an API key, they can use the available models from within their terminal or python scripts. To get access to an API key, go to the KISSKI LLM Service page and click on “Book”. There you will find a form to fill out with your credentials and intentions with the API key. Please use the same email address as is assigned to your AcademicCloud account. Once received, DO NOT share your API key with other users!

API Booking API Booking

API Usage

The API service is compatible with the OpenAI API standard. We provide the following endpoints:

  • /chat/completions
  • /completions

API Use Cases

You can use your API key to access Chat AI directly from your terminal. Here is an example of how to do text completion with the API.

curl -i -X POST \
  --url https://chat-ai.academiccloud.de/v1/completions \
  --header 'Accept: application/json' \
  --header 'Authorization: Bearer <api_key>' \
  --header 'Content-Type: application/json'\
  --data '{                    
  "model": "meta-llama-3-8b-instruct",
  "prompt": "San Francisco is a",
  "max_tokens": 7,
  "temperature": 0
}'

Take care to replace the <api_key> with your own API key. All the model names available are:

  • “meta-llama-3.1-8b-instruct”
  • “meta-llama-3.1-70b-instruct”
  • “llama-3.1-sauerkrautlm-70b-instruct”
  • “codestral-22b”
  • “qwen-2-72b-instruct”

The OpenAI GPT 3.5 and 4 models are not available for API usage. For configuring your own requests in greater detail, such as setting the frequency_penalty,seed,max_tokens and more, refer to the openai API reference page.

It is possible to import an entire conversation into your command. This conversation can be from a previous session with the same model or another, or between you and a friend/colleague if you would like to ask them more questions (just be sure to update your system prompt to say “You are a friend/colleague trying to explain something you said that was confusing”).

curl -i -N -X POST \
  --url https://chat-ai.academiccloud.de/v1/chat/completions \
  --header 'Accept: application/json' \
  --header 'Authorization: Bearer <api_key>' \
  --header 'Content-Type: application/json'\
  --data '{                     
  "model": "meta-llama-3-8b-instruct",
  "messages": [{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"How tall is the Eiffel tower?"},{"role":"assistant","content":"The Eiffel Tower stands at a height of 324 meters (1,063 feet) above ground level. However, if you include the radio antenna on top, the total height is 330 meters (1,083 feet)."},{"role":"user","content":"Are there restaurants?"}],
  "temperature": 0
}'

For ease of usage, you can access the Chat AI models by executing a Python file, for example, by pasting the below code into the file.

from openai import OpenAI
  
# API configuration
api_key = '<api_key>' # Replace with your API key
base_url = "https://chat-ai.academiccloud.de/v1"
model = "meta-llama-3-8b-instruct" # Choose any available model
  
# Start OpenAI client
client = OpenAI(
    api_key = api_key,
    base_url = base_url
)
  
# Get response
chat_completion = client.chat.completions.create(
        messages=[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"How tall is the Eiffel tower?"},{"role":"assistant","content":"The Eiffel Tower stands at a height of 324 meters (1,063 feet) above ground level. However, if you include the radio antenna on top, the total height is 330 meters (1,083 feet)."},{"role":"user","content":"Are there restaurants?"}],
        model= model,
    )
  
# Print full response as JSON
print(chat_completion) # You can extract the response text from the JSON object

In certain cases, a long response can be expected from the model, which may take long with the above method, since the entire response gets generated first and then printed to the screen. Streaming could be used instead to retrieve the response proactively as it is being generated.

from openai import OpenAI
 
# API configuration
api_key = '<api_key>' # Replace with your API key
base_url = "https://chat-ai.academiccloud.de/v1"
model = "meta-llama-3-8b-instruct" # Choose any available model
 
# Start OpenAI client
client = OpenAI(
    api_key = api_key,
    base_url = base_url
)
 
# Get stream
stream = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Name the capital city of each country on earth, and describe its main attraction",
        }
    ],
    model = model ,
    stream = True
)
 
# Print out the response
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

If you use Visual Studio Code or Jetbrains as your IDE, the recommended way to maximise your API key ease of usage, particularly for code completion, is to install the Continue plugin and set the configurations accordingly. Refer to CoCo AI for further details.

Developer reference

The GitHub repositories SAIA-Hub, SAIA-HPC and of Chat AI provide all the components for the architecture in the diagram above.

Further services

If you have more questions, feel free to contact us at support@gwdg.de.