Project Description

Current Goal

Build a system that automatically answer students’ questions posted on Piazza.

How?

We will use GPT to answer students’ questions. However, to make GPT’s answers more accurate, we will send along “context” that can help GPT answer the questions. This context is a chunk of transcription of Prof. Kaiser’s lecture.

Inner Working of the Current System

Data preparation

Transcription

Transcribe Prof. Kaiser’s lectures using Whisper from OpenAI. Result = a lot of text files

Prepare a database of vectors (vectors = embeddings of text chunks)

Split all of the text into chunks
Embed all those chunks using SBERT (in the code, you will see HuggingFaceEmbeddings, but it is just a wrapper of SBERT) into vectors
Save those vectors in the Chroma database

Real-time response

Fetch a student’s question from Piazza
Embed that question text using SBERT into a vector (called vector Q)
Perform similarity search -> find k nearest neighbors to vector Q (by default k = 4). Neighbors = Vectors saved in Chroma
Send that question, along with the text associated with the closest vector (a context), to GPT
Post GPT’s answer back to Piazza

Current Problems

Sometimes, the context vector that has the answer isn't nearest to vector Q, e.g. answer is in the second nearest vector. -> We can try fixing this by sending more than one context vector to GPT.
It is hard to know how we should split the transcriptions (what chunk size/overlap to use). -> We can split texts with different parameters, embed and store all of them in the database. (Certain sentences might appear in multiple vectors.)

Comments

Current code is using VectorDBQAWithSourcesChain, but it is deprecated. Please change it to RetrievalQAWithSourcesChain.
Currently, there are no official Piazza APIs, so we will use unofficial ones (https://github.com/hfaran/piazza-api/tree/develop) to fetch questions from Piazza and post answers back to it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Description.md

Project Description.md

Project Description

Current Goal

How?

Inner Working of the Current System

Current Problems

Recommended Readings

Comments

Files

Project Description.md

Latest commit

History

Project Description.md

File metadata and controls

Project Description

Current Goal

How?

Inner Working of the Current System

Current Problems

Recommended Readings

Comments