Skip to content
/ reach Public

How about a CLI for searching PDFs and Web for your Research

Notifications You must be signed in to change notification settings

zfyre/reach

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

77 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Reach - Research Assistant CLI Tool

Reach is a command-line tool designed to enhance academic research by integrating search capabilities from Google, arXiv, and Gemini to provide comprehensive information gathering, summarization, and knowledge graph generation.

Features

  • πŸ” Multi-source Search: Integrated search across Google, arXiv, and Gemini
  • πŸ“ Web Content Extraction: Automatically extracts relevant content from web pages
  • πŸ“Š Summarization: Generates concise summaries of research content
  • πŸ”— Knowledge Graph Generation: Creates knowledge graphs to visualize relationships between concepts
  • βš™οΈ Configurable: Customize search parameters, keywords, and categories
  • πŸ–₯️ Terminal UI: Interactive terminal interface for query exploration and knowledge graph visualization
  • πŸ“š Graph Database: High-performance graph database for storing relationship-based data

Prerequisites

  • Rust (latest stable version)
  • Python 3.12 or higher
  • API keys:
    • Google Search API key
    • Google Custom Search Engine ID
    • Gemini API key

Installation

  1. Clone the repository
git clone https://github.com/yourusername/reach.git
cd reach
  1. Set up Python environment
# Create virtual environment
python -m venv .venv

# Activate virtual environment (Windows)
.venv\Scripts\activate

# Activate virtual environment (Linux/macOS)
source .venv/bin/activate

# Install dependencies
pip install --upgrade pip
pip install -U crawl4ai
crawl4ai-setup
  1. Build the Rust project
cargo build --release

Configuration

  1. Reach requires several API keys to function properly. You can configure these using the CLI:
# Launch the interactive configuration
cargo run -- config

# Or set each key individually
cargo run -- config --google-api-key YOUR_GOOGLE_API_KEY
cargo run -- config --search-engine-id YOUR_SEARCH_ENGINE_ID
cargo run -- config --gemini-api-key YOUR_GEMINI_API_KEY
  1. You can customize your arXiv search preferences:
# Launch the interactive configuration
cargo run -- arxiv-config

# Show current configuration
cargo run -- arxiv-config --show

The arXiv configuration allows you to set:

  • Keywords to include in searches
  • Keywords to exclude from searches
  • Specific authors to focus on
  • arXiv categories to search within

Usage

Basic Search

# Search with default parameters
cargo run -- search "Diffusion Models"

# Specify result limit
cargo run -- search "Flow based Diffusion Models" --max-results 5

Generate Summaries

cargo run -- summarize "What are Diffusion Models?"

Generate Knowledge Graph

cargo run -- knowledge-graph "What are Flow based Diffusion Models?"

Interactive Terminal UI

# Launch the terminal user interface
cargo run -- tui

The terminal UI provides:

  • Multiple session management
  • Interactive conversation history
  • Visual knowledge graph representation
  • Two operational modes: Query and Search/Knowledge Graph Building

TUI Controls

  • e - Enter edit mode for text input
  • q - Quit the application
  • t - Toggle between Query and Search modes
  • n - Create a new session
  • h - Show/hide help
  • Left/Right arrows - Navigate between sessions
  • Up/Down arrows - Scroll through message history
  • Enter - Submit input (when in edit mode)
  • Esc - Exit edit mode

Core Components

ReachDB

ReachDB is a high-performance graph database implementation in Rust, designed for efficient storage and traversal of relationship-based data. It uses memory-mapped files for fast access to node and relationship records.

Key Features

  • Memory-mapped storage: Fast access to node and relationship data
  • Bidirectional relationships: Each relationship connects source and target nodes
  • User-defined relationship types: Custom relationship semantics through generics
  • Efficient traversals: Iterators for relationship traversal
  • Persistent storage: Data remains on disk between sessions

Example Usage

// Define relationship types
#[derive(Debug)]
enum RelationType {
    IsA(u8),
    HasA(u8),
    DependsOn(u8)
}

// Implement the UserDefinedRelationType trait
impl UserDefinedRelationType for RelationType {
    // ... implementation ...
}

// Open or create a new database
let mut db = Reachdb::<RelationType>::open("data", Some(10000), Some(10000))?;

// Add edges (automatically creates nodes if they don't exist)
db.add_edge("Person", "Human", "IS-A")?;
db.add_edge("Person", "Arms", "HAS-A")?;

ReachTUI

ReachTUI is built using Ratatui and Crossterm to create a responsive terminal UI that allows users to interact with the Reach knowledge graph system through multiple sessions.

Key Components

  • Multiple session management: Work with different research topics in separate sessions
  • Two operational modes: Query and Search/Knowledge Graph Building
  • Interactive conversation history: View and navigate through past interactions
  • Visual knowledge graph representation: See relationships between concepts
  • Action tracking: Monitor actions performed during the research process

Layout

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Sessions                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                    β”‚                        β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚                        β”‚
β”‚    β”‚          β”‚ β”‚               β”‚  β”‚                        β”‚
β”‚    β”‚ Actions  β”‚ β”‚ Conversation  β”‚  β”‚   Knowledge Graph      β”‚
β”‚    β”‚          β”‚ β”‚               β”‚  β”‚                        β”‚
β”‚    β”‚          β”‚ β”‚               β”‚  β”‚                        β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚                        β”‚
β”‚                                    β”‚                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                       Input Field                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Metadata Module

The metadata module provides centralized access to important constants used throughout the Reach CLI application, including:

  • Author information
  • Version numbers
  • Configuration file names

Project Structure

reach/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ apis/                # API integrations (Google, Gemini, arXiv)
β”‚   β”œβ”€β”€ config/              # Configuration handling
β”‚   β”œβ”€β”€ metadata/            # Application metadata constants
β”‚   β”‚   └── README.md        # Metadata documentation
β”‚   β”œβ”€β”€ reachdb/             # Graph database implementation
β”‚   β”‚   └── README.md        # ReachDB documentation
β”‚   β”œβ”€β”€ rsearch/             # Research functionality
β”‚   β”‚   β”œβ”€β”€ knowledge_graph.rs  # Knowledge graph generation
β”‚   β”‚   └── utils.rs         # Utility functions
β”‚   β”œβ”€β”€ reachtui/            # Terminal user interface components
β”‚   β”‚   └── README.md        # TUI documentation
β”‚   β”œβ”€β”€ scripts/             # Python scripts for web scraping
β”‚   β”œβ”€β”€ display/             # Output formatting
β”‚   └── errors/              # Error handling
β”œβ”€β”€ .venv/                   # Python virtual environment
└── data/                    # Output data storage
    β”œβ”€β”€ summaries.json       # Generated summaries
    └── knowledge_graph.json # Generated knowledge graphs

Development

Running Tests

# Run all tests
cargo test

# Run tests that require configuration
cargo test --features requires_config

GitHub Actions

The project includes GitHub Actions workflows that:

  • Build the project
  • Set up the Python environment
  • Install dependencies
  • Run tests

License

[Specify your license here]

Contributing

[Add contribution guidelines if applicable]

Author

Created by Me kshitiz4kaushik@gmail.com

Version

Current version: 1.0.0

About

How about a CLI for searching PDFs and Web for your Research

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages