LLM-based question-answering chatbot  using CPU only

LLM-based question-answering chatbot using CPU only

Design an LLM-based question-answering chatbot on enterprise/private data.

The application should take the question as input and generate the answer based on the knowledge provided in the document and personalize the answer per the user question.

The model should be able to prevent hallucinations.

Approach

This approach process of 2 steps:

  1. Data Ingestion

  2. LLM Based Answer Retrieval

Data Ingestion Steps -

  1. Load and extract the Txt file

  2. Split the extracted data into smaller chunks

  3. Embed the text chipmunks using any embedding model e.g.; hugging face instruct embeddings

  4. Build and semantic index of each chunk

  5. Ingest the index into a vector database as a knowledge base

LLM-Based Answer Retrieval steps -

  1. Take the input from the user with a UI (Gradio)

  2. Embed the question using the same embedding model which is used for data ingestion

  3. Semantically search and retrieve the relevant text chunks using a knowledge base (vector database)

  4. Enhance the prompt using both questions and retrieved doc

  5. Call the LLM model with an enhanced prompt and generate the answer

  6. Display the answer to the user using UI (Gradio)

Assumptions

  • No OpenAI LLM API call. Only using open source pre trained LLM.

  • The model should run on CPU (because of computing constrain)

Solution

Library used -

Application Flow Diagram

Performance Evaluation

  • Using weights and bias to keep track for all the prompts results and manually checking the output’s performance.

  • Using BertScore as a performance metric.

Drawbacks of the current model

The performance of the current model is not very great and the reason could be that this model is a quantized model designed to run only on CPU, therefore, less number learned parameters and also the context window of this model is not large enough to pass all the relevant docs at once.

Future Scope

  1. Use SOTA open-source model with a larger context window (like Falcon)

  2. Finetune the model with instructions

  3. Create one more layer of LLM to validate semantic search retrieval of docs.

  4. Experiment with different prompt templates

  5. Experiment with guardrails for edge cases

  6. Experiment with the chunk size of the doc and k number of retrieved documents for the prompt.

Output Screenshots

User Interface for prompting -

W&B experiment log -

Check out the GIT Repo