Transbronchial Biopsy Document Extractor 📄🔍

Overview

This repository contains the development version of the Transbronchial Biopsy Document Extractor, a tool designed to automate the extraction of information from non-structured or semi-structured documents of transbronchial biopsies. Leveraging a combination of regular expressions (regex) and natural language processing (NLP) techniques, this tool aims to streamline the processing of biopsy reports by extracting key data points, such as patient information, biopsy findings, creation of variables, and diagnostic conclusions.

Features 🌟

Hospital Database Connections 🏥: Establishes secure connections to hospital databases, allowing for the retrieval and updating of patient records. This feature ensures seamless data flow between the system and healthcare providers' databases, enhancing patient data management and accessibility.
PDF Parsing with JAR Tools 📄➡️📜: Implements a Java-based tool to convert PDF documents into text, facilitating the extraction of relevant information from patient records and biopsy reports. This process enables the system to handle a wider variety of document formats, improving the flexibility and comprehensiveness of data extraction and analysis.
Regex Patterns 🔍: Custom regex patterns are used to identify and extract standardized information from the text, such as patient IDs, dates, and specific medical terminology related to transbronchial biopsies.
Data Normalization 📊: Converts extracted information into a structured format, facilitating easier integration with databases and further analysis.
Work in Progress 🚧: Ongoing efforts to improve extraction accuracy, expand the range of document types processed (NLP Analysis 🧠), and refine the user interface for easier use by medical professionals and researchers.

Installation 🛠️

Clone this repository to your local machine.
Ensure you have Python 3.11+ installed.
Install required dependencies:

pip install -r requirements.txt

Usage

Developement ongoing, add the usage when classes are made

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
src		src
whl_packages		whl_packages
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
notebook.ipynb		notebook.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transbronchial Biopsy Document Extractor 📄🔍

Overview

Features 🌟

Installation 🛠️

Usage

About

Releases

Packages

Languages

License

drci-foch/BTB_extraction

Folders and files

Latest commit

History

Repository files navigation

Transbronchial Biopsy Document Extractor 📄🔍

Overview

Features 🌟

Installation 🛠️

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages