AI Transparency Statement

Goal

Typical LLMs have the advantage, but also the problem, of having been trained on an incredible amount of data. This means that they know a lot, but are often unable to answer very specific questions. In these cases, LLMs are very prone to hallucination, which means that they basically make things up. One way to improve the performance of LLMs for very specific questions is to use Retrieval-Augmented-Generation (RAG). Here, users provide custom documents that contain the knowledge base they want to ask questions about later. Before an LLM responds to a user’s query, the most relevant documents previously provided by the user are retrieved and provided to the LLM as additional context.

In our innovative approach, we provide a reference section at the bottom of the answer where you can find the actual part of the document our RAG associated with your question. As these references are quoted directly from the documents provided by the user, there is no possibility of hallucination.

General Functionality

This section briefly outlines the individual steps required to process a RAG request.

Ingesting Documents

In order to use RAG, a user must first provide a knowledge base, i.e. a set of documents. These documents are uploaded, converted to markup and then indexed in a database. Here they are chunked and then transformed into a vector by a special embedding model. This vector is then stored in a vector database. In this way, each document provided by a user must be processed to build the knowledge base. This knowledge base, also known as “Arcana”, is stored at the GWDG until the user explicitly deletes it!

Submitting Requests

Once the general knowledge base has been built, a user can submit queries. Before these queries are sent to the LLM, they are also transformed into a vector representation using the same embedding model that was used to create the knowledge base in the previous step. This vector is then used in a similarity search on the previously created vector database to look for parts in the previously captured documents that have a similar meaning. A configurable number of similar vectors are then returned and passed to the LLM along with the user’s original request.

Generating an Answer

The LLM uses the additional information to provide a more specific response to the user’s request. This is already much less susceptible to hallucinations, but they are still possible. In our approach, we provide explicit references to the documents containing the chunks used by the LLM to formulate the answer. These references are provided in a special reference box at the bottom, which contains the actual citations from the original documents provided by the user. Therefore, no hallucination is possible on these references.

Further Considerations

Access to the Stored Documents

The ingested knowledge base can be freely shared with other users. To do this, the Arcana ID and key must be provided. IMPORTANT: This gives users full access to your documents and cannot (yet) be revoked individually!

Storage of your Documents

Your indexed data will remain on GWDG systems at all times. We will not share your documents with third parties.

Processing of your Requests

The processing of your requests, including generating the embeddings, retrieving the relevant documents and performing the inference, is all done on GWDG hardware. Neither your requests nor your indexed documents are shared with third parties.