可以将word(doc、docx)、excel、pdf、ppt、csv、txt文件的文本内容提取出来,同时能够提取出word、pdf文件的目录
-
Updated
Jun 29, 2022 - Java
可以将word(doc、docx)、excel、pdf、ppt、csv、txt文件的文本内容提取出来,同时能够提取出word、pdf文件的目录
Apache NiFi + Apache Tika + OptimaizeLangDetector
Text extraction from scanned pdf documents in java
Developed a Spatial Search website that allow users to search documents from FBI Vault website. Extract the most frequently occurring location in each of documents, and load the geo-tagged data into Apache Solr to index the documents, visualize search results using the Google Maps API.
Tika detector for MKV and WebM
a tool set for indexing and searching through documents
بفهرسة اغلب انواع الوثائق والبحث فيها , استبدال العملات وتوحيد صيغ التواريخ والاوقات , يدعم الوثائق شبه المهيكلة باعطاء وزن اعلى للتاغ ذو الاهميه الاكبر, ويوسع الاستعلام باخذ مرادفات مفرداته باستخدام مكتبة ووردنت
[SLOW][WIP] Broodmother is a high performance, distributed, search engine using Apache Tika, Apache Solr, Akka, Neo4j, and Spring.
AWS Lambda code to index S3 buckets into Elasticsearch
Run Apache Tika as a service in AWS Lambda by scanning documents in S3 and storing the extracted text back to S3
PDF parsing and extraction utility using Apache Tika
Information Retrieval system for indexing and searching files stored on disk, with support for Romanian language
microservice web application for uploading and downloading audio files
Add a description, image, and links to the apache-tika topic page so that developers can more easily learn about it.
To associate your repository with the apache-tika topic, visit your repo's landing page and select "manage topics."