You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/faq/index.md
+60
Original file line number
Diff line number
Diff line change
@@ -318,3 +318,63 @@ is not included and is instead aliased to `None`.
318
318
To resolve this issue you must always provide an embedding function when you call `get_collection`
319
319
or `get_or_create_collection` methods to provide the Http client with the necessary information to compute embeddings.
320
320
321
+
### Adding documents is slow
322
+
323
+
**Symptoms:**
324
+
325
+
Adding documents to Chroma appears slow.
326
+
327
+
**Context:**
328
+
329
+
You've tried adding documents to a collection using the `add()` or `upsert()` methods.
330
+
331
+
**Cause:**
332
+
333
+
There are several reasons why the addition may be slow:
334
+
335
+
- Very large batches
336
+
- Slow embeddings
337
+
- Slow network
338
+
339
+
Let's break down each of the factors.
340
+
341
+
**Very large batches**
342
+
343
+
If you are trying to add 1000s or even 10,000s of documents at once and depending on how much data is already in your collection Chroma (specifically the HNSW graph updates) can become a bottleneck.
344
+
345
+
To debug if this is the case you can reduce the size of the batch and see if the operation is faster. You can also check how many records are in the collection with `count()` method.
346
+
347
+
**Slow embeddings**
348
+
349
+
This is the most common reason for slow addition. Some embedding functions are slower than others. To debug this you can try the following example by adjusting the embeding function to your own. What the code tests is how much it takes to compute the embedings and then to add them to the collection in separate steps such that each can be measured independenty.
350
+
351
+
```python
352
+
from chromadb.utils import embedding_functions
353
+
import chromadb
354
+
import uuid
355
+
356
+
list_of_sentences = ["Hello world!", "How are you?"] # this should be your list of documents to add
357
+
358
+
# change the below EF definition to match your embedding function
If you are adding documents to a remote Chroma the network speed can become a bottleneck. To debug this you can with a local `PersistentClient` and see if the operation is faster.
0 commit comments