PDFlex is a powerful PDF processing toolkit for Python. It provides robust tools for PDF validation, text extraction, merging (with custom separator pages), searching, and more—all built to streamline your PDF automation workflows.
- PDF Validation: Quickly verify if a file is a valid PDF.
- Text Extraction: Extract text from PDFs using either PyMuPDF or PyPDF.
- Directory Processing: Process entire directories of PDFs for text extraction.
- PDF Merging: Merge multiple PDF files into one, automatically inserting a custom separator page between documents.
- The separator page displays the title (derived from the filename) with underscores and hyphens removed.
- Supports both portrait and landscape separator pages (ideal for lecture slides).
- PDF Searching: Recursively search for PDFs in a directory based on filename patterns (e.g., numeric float prefixes).
PDFlex is available on PyPI. To install using pip:
pip install -U pdflex
Alternatively, install in an isolated environment with pipx:
pipx install pdflex
For the fastest installation using uv:
uv tool install pdflex
PDFlex provides a convenient CLI for merging and searching PDFs. The CLI supports two primary commands: merge
and search
.
Merge multiple PDF files into a single document while automatically inserting a separator page before each document.
Usage:
pdflex merge /path/to/file1.pdf /path/to/file2.pdf -o merged_output.pdf
Add the --landscape
flag to create separator pages in landscape orientation:
pdflex merge /path/to/file1.pdf /path/to/file2.pdf -o merged_output.pdf --landscape
Search for PDF files in a directory based on filename filters (or search for lecture slides with numeric float prefixes) and merge them into one PDF.
Usage:
-
General Search:
pdflex search /path/to/search -o merged_output.pdf --prefix "Chapter" --suffix ".pdf"
-
Lecture Slides Merge: (Merges all PDFs whose filenames start with a numeric float prefix like
1.2_
,3.2_
, etc., in sorted order. Separator pages will be in landscape orientation.)pdflex search /path/to/algorithms-and-computation -o merged_lectures.pdf --lecture
You can also use PDFlex directly from your Python code. Below are examples for some common tasks.
from pathlib import Path
from pdflex.merge import merge_pdfs
# List of PDF file paths to merge
pdf_files = [
"/path/to/document1.pdf",
"/path/to/document2.pdf"
]
# Merge files, using landscape separator pages (ideal for lecture slides)
merge_pdfs(pdf_files, output_path="merged_output.pdf", landscape=True)
from pdflex.search import search_pdfs, search_numeric_prefixed_pdfs
# General search: Find PDFs that start with a prefix and/or end with a suffix
pdf_list = search_pdfs("/path/to/search", prefix="Chapter", suffix=".pdf")
print("Found PDFs:", pdf_list)
# Lecture slides: Find PDFs with numeric float prefixes (e.g., "1.2_Intro.pdf")
lecture_slides = search_numeric_prefixed_pdfs("/path/to/algorithms-and-computation")
print("Found lecture slides:", lecture_slides)
Contributions are welcome! Whether it's bug reports, feature requests, or code contributions, please feel free to:
- Open an issue
- Submit a pull request
- Improve documentation.
- Share your ideas!
This project is built upon several awesome PDF open-source projects:
PDFlex is released under the MIT license.
Copyright (c) 2020 to present PDFlex and contributors.