A desktop application that crawls websites and generates visual site maps with screenshots. Built with Electron, Express, and Mermaid.js.
- Crawl websites and generate interactive site maps
- Capture full-page screenshots of each page
- Highlight specific HTML elements during crawl
- Configurable crawl depth
- Visual flowchart representation of site structure
- Customizable storage location for crawl data
- Download the latest release from the Releases page
- Open the DMG file
- Drag the Web Crawler app to your Applications folder
- Launch the app from Applications
Note: On first launch, you may need to right-click the app and select "Open" to bypass Mac security settings.
- Node.js (v14 or higher)
- npm (usually comes with Node.js)
- Git
- Clone the repository:
git clone https://github.com/yourusername/web-crawler-visual
cd web-crawler-visual
- Install dependencies:
npm install
- Install additional required dependencies:
npm install electron express axios cheerio puppeteer
- Start the application in development mode:
npm start
To create a distributable version:
npm run make
The packaged application will be available in the out
directory.
- Launch the application
- Enter a starting URL in the "Starting URL" field
- (Optional) Set the maximum number of URLs to crawl
- (Optional) Enter CSS selectors for elements to highlight, separated by commas
- (Optional) Choose a highlight color
- Click "Start Crawling"
- Wait for the crawl to complete and view the generated site map
By default, crawl data is stored in the application's user data directory. To change this:
- Click "Choose Directory"
- Select your preferred storage location
- To revert to the default location, click "Reset to Default"
web-crawler-visual/
├── src/
│ ├── client.js # Frontend JavaScript
│ ├── server.js # Backend Express server
│ └── index.html # Main application window
├── public/
│ └── crawls/ # Default storage for crawl data
└── package.json # Project configuration
- Frontend: HTML, CSS, JavaScript with Mermaid.js for diagrams
- Backend: Express.js server running in Electron
- Crawling: Uses Puppeteer for page screenshots and Cheerio for HTML parsing
- Storage: File-based storage for crawl data and screenshots
- Visualization: Mermaid.js for flowchart generation
-
"Error: Failed to launch browser"
- Ensure you have enough disk space
- Try running with elevated privileges
-
"Error rendering diagram"
- Check the browser console for specific error messages
- Ensure the crawl completed successfully
- Try reducing the maximum number of URLs to crawl
-
"Permission denied" when saving files
- Ensure you have write permissions for the selected directory
- Try using the default storage location
The application creates logs in the following locations:
- Mac:
~/Library/Logs/web-crawler-visual/
- Windows:
%USERPROFILE%\AppData\Roaming\web-crawler-visual\logs\
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.