RAG-BASED DOCUMENT QUERY SYSTEM WITH AMAZON BEDROCK
I developed a Retrieval-Augmented Generation (RAG) system using Amazon Bedrock. I stored PDF documents on Amazon S3 as the data source and created a FAISS-based vector database using Amazon’s Quick Create option. I utilized the Amazon Titan Text Embeddings V2 model to convert documents into vectors. By applying Fixed-size chunking (Max_token=500, Overlap=10), I segmented the data and stored it in the vector database.
To explore Bedrock, I used Mistral 7B Instruct and Llama 3.2.3B Instruct models to test and compare various LLM models. The information about input and output tokens consumed in Mistral and the latency is seen at the top of the model.

After this comparison, I started to apply RAG. RAG (Retrieval-Augmented Generation) is a technique that allows large language models (LLM) to produce more accurate and up-to-date answers by retrieval from external data sources. In short, it allows LLM to produce answers based on real-time data in case it hallucinates.
From the Knowledge Bases menu, we click on Create a knowledge base. We enter its name and description, select the S3 bucket to which we uploaded our document, and proceed.

We use Fixed-size chunkin to split the document. This is to split the uploaded pdf file into chunks and save it to vectordase. I choose Max_token=500, and owerlab=10. Owerlab provides context and continuity between the chunks of the document.


The next step is to choose a vector database to store these embeddings. These vectorstore databases store the numerical vectors of our data. This allows us to perform similarity searches on our stored data. Here, the Quick create a new vector store option is the fast option offered by Amazon. I use this.


But with the option “choose a vector store you have created” you can choose a different vector store. For example : Pinecone, Redis, MongoDB Atlas

I uploaded a pdf file to my S3 bucket

My RAG system was created.

Now we can test it