Welcome to GraphR! This tool scrapes and summarizes academic profiles from PubMed using automated browser interaction. It's designed to quickly extract key information, saving you time from manual searches.
-
🔍 Profile Scraping: Extracts profile details including name, bio, and profile picture.
-
📚 Publication Summary: Gathers and summarizes recent publications.
-
📝 Smart Summarization: Uses OpenAI API to generate a concise overview of the profile.
-
🖥️ Interactive Graph: Simple and responsive visualization of papers associated with the author using graph networks, including community detection.
Please watch the YouTube video here.
Get started by cloning the repository and setting up the environment:
-
Ensure you have Python 3.10 installed.
- You can check your Python version with:
python3 --version
- You can check your Python version with:
-
Then do this:
git clone https://github.com/pradhanhitesh/graphR.git
cd graphR
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
This project uses OpenAI's API for text summarization. Add your API key to your environment:
export API_KEY='your-api-key-here'
Alternatively, you can set it in a .env
file:
API_KEY=your-openai-api-key-here
Start the Flask server with the following command:
python app.py
Navigate to http://127.0.0.1:5000 in your browser to access the application.
📁 graphR/
├── 📂 static/
│ └── images/
| └── css/
| └── js/
├── 📂 templates/
│ └── home.html
│ └── profile.html
│ └── graph.html
├── 📂 functions/
│ └── graphR.py
| └── __init__.py
├── app.py
├── requirements.txt
└── README.md
app.py
: Main Flask application server.functions/graphR.py
: Contains scraping and summarization functions.templates/
: HTML templates for rendering the frontend.static/
: Contains static files like CSS, JS and images.
-
User Input: Enter the PubMed profile link in the input box.
-
Scraping: The backend uses Selenium to navigate the profile page and extract key details.
-
Summarization: The extracted data is passed to OpenAI's 4o-mini for a summarized response.
-
Display: The results are rendered on a dynamic profile page.
-
Limited PubMed Support: Current implementation only supports basic scraping of PubMed profiles
-
Error generating Profile Image: PubMed do not stores user profile image, therefore, trying altenative ways.
-
Large scope of search: PubMed do not necessarily differeniates researchers, therefore, the search name is indexed across all articles in PubMed.
-
Slower website response: Rendering website and functions on Render is very slow compared to Vercel.
-
[FIXED] Error generating Profile Name: Used meta-content tag to find the profile name
-
[FIXED] Longer Profile Generation Time: Used concurency module to simultaneously scrap multiple pages
-
[FIXED] Large Serveless Function: Moved deployments from Vercel to Render to deploy large bundle size (>250MB) and also increased wait time from 60 secs to 120 seconds
Have suggestions or issues? Feel free to open an issue.
Distributed under the MIT License. See LICENSE
for more information.
Made with ❤️ by [Hitesh] (https://github.com/pradhanhitesh)