Skip to content

Commit 9f94c32

Browse files
authored
docs: Slow inserts/updates FAQ (#42)
1 parent e30596d commit 9f94c32

File tree

2 files changed

+61
-1
lines changed

2 files changed

+61
-1
lines changed

docs/faq/index.md

+60
Original file line numberDiff line numberDiff line change
@@ -318,3 +318,63 @@ is not included and is instead aliased to `None`.
318318
To resolve this issue you must always provide an embedding function when you call `get_collection`
319319
or `get_or_create_collection` methods to provide the Http client with the necessary information to compute embeddings.
320320

321+
### Adding documents is slow
322+
323+
**Symptoms:**
324+
325+
Adding documents to Chroma appears slow.
326+
327+
**Context:**
328+
329+
You've tried adding documents to a collection using the `add()` or `upsert()` methods.
330+
331+
**Cause:**
332+
333+
There are several reasons why the addition may be slow:
334+
335+
- Very large batches
336+
- Slow embeddings
337+
- Slow network
338+
339+
Let's break down each of the factors.
340+
341+
**Very large batches**
342+
343+
If you are trying to add 1000s or even 10,000s of documents at once and depending on how much data is already in your collection Chroma (specifically the HNSW graph updates) can become a bottleneck.
344+
345+
To debug if this is the case you can reduce the size of the batch and see if the operation is faster. You can also check how many records are in the collection with `count()` method.
346+
347+
**Slow embeddings**
348+
349+
This is the most common reason for slow addition. Some embedding functions are slower than others. To debug this you can try the following example by adjusting the embeding function to your own. What the code tests is how much it takes to compute the embedings and then to add them to the collection in separate steps such that each can be measured independenty.
350+
351+
```python
352+
from chromadb.utils import embedding_functions
353+
import chromadb
354+
import uuid
355+
356+
list_of_sentences = ["Hello world!", "How are you?"] # this should be your list of documents to add
357+
358+
# change the below EF definition to match your embedding function
359+
default_ef = embedding_functions.DefaultEmbeddingFunction()
360+
361+
start_time = time.perf_counter()
362+
embeddings=default_ef(list_of_sentences)
363+
end_time = time.perf_counter()
364+
print(f"Embedding time: {end_time - start_time}")
365+
366+
client = chromadb.PersistentClient(path="my_chroma_data")
367+
collection = client.get_or_create_collection("my_collection")
368+
369+
start_time = time.perf_counter()
370+
# this will add your documents and the generated embeddings without Chroma doing the embedding for you internally
371+
collection.add(ids=[f"{uuid.uuid4()}" for _ in range(len(list_of_sentences))],documents=list_of_sentences, embeddings=embeddings)
372+
end_time = time.perf_counter()
373+
print(f"Chroma add time: {end_time - start_time}")
374+
375+
```
376+
377+
**Slow network**
378+
379+
If you are adding documents to a remote Chroma the network speed can become a bottleneck. To debug this you can with a local `PersistentClient` and see if the operation is faster.
380+

docs/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Latest ChromaDB version: [0.5.13](https://github.com/chroma-core/chroma/releases
1111

1212
## New and Noteworthy
1313

14-
- ⁉️[FAQs](faq/index.md) - Updated FAQ sections - 📅`10-Oct-2024`
14+
- ⁉️[FAQs](faq/index.md) - Updated FAQ sections - 📅`15-Oct-2024`
1515
- 🔥 [SSL-Terminating Proxies](security/ssl-proxies.md) - Learn how to secure Chroma server with `Envoy` or `Nginx` proxies - 📅`31-Jul-2024`
1616
- 🗑️ [WAL Pruning](core/advanced/wal-pruning.md#chroma-cli) - Learn how to prune (cleanup) your Chroma database (WAL) with Chroma's built-in CLI `vacuum` command - 📅`30-Jul-2024`
1717
-[Multi-Category Filtering](strategies/multi-category-filters.md) - Learn how to filter data based on multiple categories - 📅`15-Jul-2024`

0 commit comments

Comments
 (0)