This repository contains the development version of the Transbronchial Biopsy Document Extractor, a tool designed to automate the extraction of information from non-structured or semi-structured documents of transbronchial biopsies. Leveraging a combination of regular expressions (regex) and natural language processing (NLP) techniques, this tool aims to streamline the processing of biopsy reports by extracting key data points, such as patient information, biopsy findings, creation of variables, and diagnostic conclusions.
- Hospital Database Connections 🏥: Establishes secure connections to hospital databases, allowing for the retrieval and updating of patient records. This feature ensures seamless data flow between the system and healthcare providers' databases, enhancing patient data management and accessibility.
- PDF Parsing with JAR Tools 📄➡️📜: Implements a Java-based tool to convert PDF documents into text, facilitating the extraction of relevant information from patient records and biopsy reports. This process enables the system to handle a wider variety of document formats, improving the flexibility and comprehensiveness of data extraction and analysis.
- Regex Patterns 🔍: Custom regex patterns are used to identify and extract standardized information from the text, such as patient IDs, dates, and specific medical terminology related to transbronchial biopsies.
- Data Normalization 📊: Converts extracted information into a structured format, facilitating easier integration with databases and further analysis.
- Work in Progress 🚧: Ongoing efforts to improve extraction accuracy, expand the range of document types processed (NLP Analysis 🧠), and refine the user interface for easier use by medical professionals and researchers.
- Clone this repository to your local machine.
- Ensure you have Python 3.11+ installed.
- Install required dependencies:
pip install -r requirements.txt
Developement ongoing, add the usage when classes are made