forked from langchain-ai/langchain
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
johnsnowlabs embeddings support (langchain-ai#11271)
- **Description:** Introducing the [JohnSnowLabsEmbeddings](https://www.johnsnowlabs.com/) - **Dependencies:** johnsnowlabs - **Tag maintainer:** @C-K-Loan - **Twitter handle:** https://twitter.com/JohnSnowLabs https://twitter.com/ChristianKasimL --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
- Loading branch information
1 parent
9b71716
commit d1a3953
Showing
5 changed files
with
415 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
# Johnsnowlabs | ||
|
||
Gain access to the [johnsnowlabs](https://www.johnsnowlabs.com/) ecosystem of enterprise NLP libraries | ||
with over 21.000 enterprise NLP models in over 200 languages with the open source `johnsnowlabs` library. | ||
For all 24.000+ models, see the [John Snow Labs Model Models Hub](https://nlp.johnsnowlabs.com/models) | ||
|
||
## Installation and Setup | ||
|
||
|
||
```bash | ||
pip install johnsnowlabs | ||
``` | ||
|
||
To [install enterprise features](https://nlp.johnsnowlabs.com/docs/en/jsl/install_licensed_quick, run: | ||
```python | ||
# for more details see https://nlp.johnsnowlabs.com/docs/en/jsl/install_licensed_quick | ||
nlp.install() | ||
``` | ||
|
||
|
||
You can embed your queries and documents with either `gpu`,`cpu`,`apple_silicon`,`aarch` based optimized binaries. | ||
By default cpu binaries are used. | ||
Once a session is started, you must restart your notebook to switch between GPU or CPU, or changes will not take effect. | ||
|
||
## Embed Query with CPU: | ||
```python | ||
document = "foo bar" | ||
embedding = JohnSnowLabsEmbeddings('embed_sentence.bert') | ||
output = embedding.embed_query(document) | ||
``` | ||
|
||
|
||
## Embed Query with GPU: | ||
|
||
|
||
```python | ||
document = "foo bar" | ||
embedding = JohnSnowLabsEmbeddings('embed_sentence.bert','gpu') | ||
output = embedding.embed_query(document) | ||
``` | ||
|
||
|
||
|
||
|
||
## Embed Query with Apple Silicon (M1,M2,etc..): | ||
|
||
```python | ||
documents = ["foo bar", 'bar foo'] | ||
embedding = JohnSnowLabsEmbeddings('embed_sentence.bert','apple_silicon') | ||
output = embedding.embed_query(document) | ||
``` | ||
|
||
|
||
|
||
## Embed Query with AARCH: | ||
|
||
```python | ||
documents = ["foo bar", 'bar foo'] | ||
embedding = JohnSnowLabsEmbeddings('embed_sentence.bert','aarch') | ||
output = embedding.embed_query(document) | ||
``` | ||
|
||
|
||
|
||
|
||
|
||
|
||
## Embed Document with CPU: | ||
```python | ||
documents = ["foo bar", 'bar foo'] | ||
embedding = JohnSnowLabsEmbeddings('embed_sentence.bert','gpu') | ||
output = embedding.embed_documents(documents) | ||
``` | ||
|
||
|
||
|
||
## Embed Document with GPU: | ||
|
||
```python | ||
documents = ["foo bar", 'bar foo'] | ||
embedding = JohnSnowLabsEmbeddings('embed_sentence.bert','gpu') | ||
output = embedding.embed_documents(documents) | ||
``` | ||
|
||
|
||
|
||
|
||
|
||
## Embed Document with Apple Silicon (M1,M2,etc..): | ||
|
||
```python | ||
|
||
```python | ||
documents = ["foo bar", 'bar foo'] | ||
embedding = JohnSnowLabsEmbeddings('embed_sentence.bert','apple_silicon') | ||
output = embedding.embed_documents(documents) | ||
``` | ||
|
||
|
||
|
||
## Embed Document with AARCH: | ||
|
||
```python | ||
|
||
```python | ||
documents = ["foo bar", 'bar foo'] | ||
embedding = JohnSnowLabsEmbeddings('embed_sentence.bert','aarch') | ||
output = embedding.embed_documents(documents) | ||
``` | ||
|
||
|
||
|
||
|
||
Models are loaded with [nlp.load](https://nlp.johnsnowlabs.com/docs/en/jsl/load_api) and spark session is started with [nlp.start()](https://nlp.johnsnowlabs.com/docs/en/jsl/start-a-sparksession) under the hood. | ||
|
||
|
||
|
184 changes: 184 additions & 0 deletions
184
docs/docs/integrations/text_embedding/johnsnowlabs_embedding.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,184 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"# Johnsnowlabs Embedding\n", | ||
"\n", | ||
"### Loading the Johnsnowlabs embedding class to generate and query embeddings\n", | ||
"\n", | ||
"Models are loaded with [nlp.load](https://nlp.johnsnowlabs.com/docs/en/jsl/load_api) and spark session is started with [nlp.start()](https://nlp.johnsnowlabs.com/docs/en/jsl/start-a-sparksession) under the hood.\n", | ||
"For all 24.000+ models, see the [John Snow Labs Model Models Hub](https://nlp.johnsnowlabs.com/models)\n" | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"! pip install johnsnowlabs\n" | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"outputs": [], | ||
"source": [ | ||
"# If you have a enterprise license, you can run this to install enterprise features\n", | ||
"# from johnsnowlabs import nlp\n", | ||
"# nlp.install()" | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"#### Import the necessary classes" | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"execution_count": 1, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Found existing installation: langchain 0.0.189\n", | ||
"Uninstalling langchain-0.0.189:\n", | ||
" Successfully uninstalled langchain-0.0.189\n" | ||
] | ||
} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain.embeddings.johnsnowlabs import JohnSnowLabsEmbeddings" | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"#### Initialize Johnsnowlabs Embeddings and Spark Session" | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"outputs": [], | ||
"source": [ | ||
"embedder = JohnSnowLabsEmbeddings('en.embed_sentence.biobert.clinical_base_cased')" | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"#### Define some example texts . These could be any documents that you want to analyze - for example, news articles, social media posts, or product reviews." | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"outputs": [], | ||
"source": [ | ||
"texts = [\"Cancer is caused by smoking\", \"Antibiotics aren't painkiller\"]" | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"#### Generate and print embeddings for the texts . The JohnSnowLabsEmbeddings class generates an embedding for each document, which is a numerical representation of the document's content. These embeddings can be used for various natural language processing tasks, such as document similarity comparison or text classification." | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"outputs": [], | ||
"source": [ | ||
"embeddings = embedder.embed_documents(texts)\n", | ||
"for i, embedding in enumerate(embeddings):\n", | ||
" print(f\"Embedding for document {i+1}: {embedding}\")" | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"#### Generate and print an embedding for a single piece of text. You can also generate an embedding for a single piece of text, such as a search query. This can be useful for tasks like information retrieval, where you want to find documents that are similar to a given query." | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"outputs": [], | ||
"source": [ | ||
"query = \"Cancer is caused by smoking\"\n", | ||
"query_embedding = embedder.embed_query(query)\n", | ||
"print(f\"Embedding for query: {query_embedding}\")" | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 2 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython2", | ||
"version": "2.7.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.