Project: Text-Based Question Answering System with Context Retrieval

This project implements a question-answering system that leverages pre-trained models and semantic similarity search to provide informative responses based on a given context.

Key Features

Context Retrieval: Utilizes Sentence Transformers to generate embeddings for both user input and stored documents (corpus).
Semantic Similarity Search: Employs FAISS (Facebook AI Similarity Search) to efficiently identify the most relevant documents to the user's question.
Context-Aware Response Generation: Combines the retrieved context with the user's question and feeds it into a GPT-4 model for response generation.

Breakdown of the Code

Imports

Flask: Web framework for building the API server.
Request: Handling user input from the web interface.
Jsonify: Converting responses to JSON format for efficient transmission.
OS: Setting environment variables (currently unused).
Faiss: Library for fast similarity search.
Sentence Transformers: Pre-trained model for generating text embeddings.
GPT4all: Client library for accessing the GPT-4 model.
Threading: Enabling threaded response generation (commented out currently).
Requests: Library for making HTTP requests (commented out currently).

Flask Application Setup

app = Flask(__name__)  # Creates a Flask application instance.

Sentence Transformer Model Loading

To generate text embeddings for both user input and the corpus, we use a pre-trained Sentence Transformer model:

model = SentenceTransformer('all-MiniLM-L6-v2')

SentenceTransformer('all-MiniLM-L6-v2'): This command loads the 'all-MiniLM-L6-v2' model, which is a lightweight, fast model suitable for generating embeddings for semantic textual similarity tasks.
The model generates dense vector representations (embeddings) for text, which can be used for various tasks, including similarity search, clustering, and classification.

Corpus Loading

data_dir: Path to the directory containing the corpus documents (e.g., Porsche wiki articles).
The code iterates through files in the data directory, reads their contents, and builds the corpus as a list of document lists. Embedding Generation

Embedding Generation

embeddings = model.encode(corpus)  # Generates embeddings for each document in the corpus

FAISS Index Construction

d = len(embeddings[0])  # Dimensionality of the embedding vectors.
nlist = 10  # Number of neighbors to consider during search.
newindex = faiss.IndexFlatL2(d)  # Creates a FAISS index optimized for efficient similarity search.
newindex.train(embeddings)  # Trains the index on the generated embeddings.
newindex.add(embeddings)  # Adds the corpus embeddings to the index.

GPT-4 Model Loading

gptj = gpt4all.GPT4All("llama-2-7b-chat.ggmlv3.q4_0.bin")  # Loads the GPT-4 model for text generation.

API Endpoints

/: Serves the static HTML content (presumably the web interface).
/get-response: Handles user input, retrieves the response using generate_response, and returns the generated answer in JSON format.

Response Generation Function (generate_response)

xq = model.encode([user_input])  # Generates an embedding for the user's input.
k = 1  # Specifies the number of nearest neighbors to retrieve.
D, I = newindex.search(xq, k)  # Searches the FAISS index for the nearest neighbor (most similar document) to the user input.
most_similar_document = corpus[I[0][0]]  # Extracts the most relevant document from the corpus based on the search results.
context = " ".join(most_similar_document)  # Concatenates the retrieved document content into a single string.
question = user_input  # Stores the user's input as the question.
input_text = f"Context: {context}\n\nQuestion: {question}\n\nAnswer:"  # Prepares the input text for GPT-4, combining context and question with placeholders.
max_tokens = 100  # Sets a maximum token limit for GPT-4 input (adjustable).
answer = gptj.generate(input_text)  # Generates the response text using the GPT-4 model with the prepared context and question.

Threaded Response Generation (Commented Out)

def generate_response_threaded(user_input):
    response_thread = threading.Thread(target=generate_response, args=(user_input,))
    response_thread.start()
    response_thread.join()

Application Execution

if __name__ == "__main__":
    app.run(debug=True)  # Starts the Flask application in debug mode, allowing for automatic code reloading during development.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
Porsche wiki		Porsche wiki
__pycache__		__pycache__
README.md		README.md
app.py		app.py
azure-pipelines.yml		azure-pipelines.yml
index.html		index.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project: Text-Based Question Answering System with Context Retrieval

Key Features

Breakdown of the Code

Imports

Flask Application Setup

Sentence Transformer Model Loading

Corpus Loading

Embedding Generation

FAISS Index Construction

GPT-4 Model Loading

API Endpoints

Response Generation Function (generate_response)

Threaded Response Generation (Commented Out)

Application Execution

About

Releases

Packages

Languages

luccidx/llama-gpt-knowledgebot

Folders and files

Latest commit

History

Repository files navigation

Project: Text-Based Question Answering System with Context Retrieval

Key Features

Breakdown of the Code

Imports

Flask Application Setup

Sentence Transformer Model Loading

Corpus Loading

Embedding Generation

FAISS Index Construction

GPT-4 Model Loading

API Endpoints

Response Generation Function (generate_response)

Threaded Response Generation (Commented Out)

Application Execution

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages