✨ Doc-Chat ✨

Doc-Chat is a repository for you to chat and extract insights from any website or document of your choice.

Possible Use Cases:

⚡ A web documentation that you are curious about.
📫 A pdf book online.
⚡ A recently released publication

If it's accessible to crawl, it's yours! The Doc-chat is ready to absorb any knowledge you provide and will serve as your trusty study companion!

Brain of Doc-Chat

How to use Doc-Chat

Clone the repository :
- git clone https://github.com/thatgirlfrommoon/Doc-Chat.git
Run app.py
- streamlit run app.py

Development Setup

For Windows: Install uv for python package management from https://docs.astral.sh/uv/getting-started/installation/

curl -LsSf https://astral.sh/uv/install.sh | less or
pip install uv

Python package

The pyproject.toml contains metadata about the project. The following command will create a pyproject.toml file.

uv init

To create a virtual environment at .venv

uv venv

The virtual environment can be "activated" to make its packages available

In Terminal

source .venv/bin/activate

In powershell

.venv\Scripts\activate

Install packages

uv.lock is a human-readable TOML file but is managed by uv and should not be edited manually.

uv run .\hello.py

Set up OPENAI key

Add the key in ".env-sample" file in the path "./bot" and rename the file name to ".env".

Run the Crawler

cd .\DocCrawl\

Now start crawling (You may edit the urls to crawl, if needed)

scrapy crawl document_spider

With this, the crawled document would be available in the main directory "./scraped_files" as text files. For the time being, we consider only one scraped file for next step.

Create a vector DB Storage

cd ..
uv run .\VectorDB\create_vector_store.py

With this step, based on the length of the document, chromadb collections are created in "./vectorstore" path.

Power up the bot

streamlit run app.py

A link will open up in the browser with url : http://localhost:8501/

There you go!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
DocCrawl		DocCrawl
VectorDB		VectorDB
images		images
.env-sample		.env-sample
.gitignore		.gitignore
.python-version		.python-version
DocChat_backend.py		DocChat_backend.py
LICENSE		LICENSE
README.md		README.md
app.py		app.py
hello.py		hello.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ Doc-Chat ✨

Brain of Doc-Chat

How to use Doc-Chat

Development Setup

Python package

Install packages

Set up OPENAI key

Run the Crawler

Create a vector DB Storage

Power up the bot

About

Releases

Packages

Languages

License

SarathMohandas/Doc-Chat

Folders and files

Latest commit

History

Repository files navigation

✨ Doc-Chat ✨

Brain of Doc-Chat

How to use Doc-Chat

Development Setup

Python package

Install packages

Set up OPENAI key

Run the Crawler

Create a vector DB Storage

Power up the bot

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages