Skip to content

Extract readable content from web pages - Mozilla’s and Mizchi Readability ported to Go

License

Notifications You must be signed in to change notification settings

lil-emmanuel/go-readability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📖 Go Readability: Extract Readable Content from Web Pages

Go Readability

Welcome to Go Readability! This project extracts readable content from web pages. It brings together Mozilla’s and Mizchi's Readability, now powered by Go. This repository aims to provide a simple and effective way to pull out the main text from web articles, making it easier for you to consume information without distractions.

🚀 Features

  • Easy to Use: Get started quickly with minimal setup.
  • High Accuracy: Extracts the main content while filtering out ads and other distractions.
  • Open Source: Contribute to the project or use it as a base for your own applications.

📥 Getting Started

To begin using Go Readability, visit our Releases page. Download the latest version and execute it on your machine.

Installation

  1. Clone the Repository:

    git clone https://github.com/lil-emmanuel/go-readability.git
    cd go-readability
  2. Build the Project:

    go build
  3. Run the Application:

    ./go-readability [URL]

Replace [URL] with the link to the web page you want to extract content from.

📖 How It Works

Go Readability analyzes the HTML structure of web pages. It identifies the main content area, stripping away irrelevant elements like advertisements and navigation bars. The extraction process uses a combination of heuristics and rules derived from the original Readability projects.

Core Components

  • HTML Parser: Parses the HTML and identifies key content areas.
  • Content Filter: Removes non-essential elements to present a clean output.
  • Output Formatter: Formats the extracted content for easy reading.

🛠️ Usage

To use Go Readability, simply run the command with the desired URL. The application will return the main text content. You can also redirect the output to a file for later use.

Example Command

./go-readability https://example.com/article

This command will fetch the main content from the specified URL.

📝 Documentation

For more detailed documentation, including advanced usage and configuration options, please refer to the Wiki.

📦 Contributing

We welcome contributions to Go Readability! Here’s how you can help:

  1. Fork the Repository: Create your own fork of the project.
  2. Create a Branch: Work on a new feature or fix.
    git checkout -b feature/new-feature
  3. Commit Your Changes: Make your changes and commit them.
    git commit -m "Add new feature"
  4. Push to Your Fork: Push your changes to your fork.
    git push origin feature/new-feature
  5. Create a Pull Request: Submit a pull request to the main repository.

📅 Roadmap

  • Version 1.1: Add support for additional content types (e.g., PDFs).
  • Version 1.2: Improve the accuracy of content extraction.
  • Version 2.0: Introduce a web interface for easier access.

📣 Community

Join our community to discuss ideas, report issues, or share your projects using Go Readability. You can find us on:

  • GitHub Issues: Report bugs or request features.
  • Slack Channel: Join our community for real-time discussions.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

📦 Releases

To stay updated with the latest features and improvements, check out our Releases section. Download the latest version and execute it on your machine.

🌟 Acknowledgments

  • Thanks to the original authors of Mozilla’s and Mizchi's Readability.
  • Special thanks to the Go community for their support and contributions.

🤝 Support

If you have any questions or need support, feel free to open an issue on GitHub or reach out through our community channels.

🌐 Links

Thank you for checking out Go Readability! We hope it enhances your reading experience on the web.