You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 10, 2022. It is now read-only.
The best efficient way to read opus files in python (the we know of) that does incur any significant overhead (i.e. launching subprocesses, using a daisy chain of libraries with sox, FFMPEG etc) is to use pysoundfile (a python CFFI wrapper around libsoundfile).
319
+
320
+
When this solution was being researched the community had been waiting for a major libsoundfile release for years. Opus support has been implemented some time ago upstream, but it has not been properly released. Therefore we opted for a custom build + monkey patching.
321
+
322
+
At the time when you read / use this - probably there will be decent / proper builds of libsndfile.
When you attempt writing large files (90-120s), there is an upstream bug in libsndfile that prevents writing such files with `opus` / `vorbis`. Most likely will be fixed by major libsndfile releases.
354
+
298
355
# **Contacts**
299
356
300
357
Please contact us [here](mailto:open_stt@googlegroups.com) or just create a GitHub issue!
@@ -310,16 +367,17 @@ Please contact us [here](mailto:open_stt@googlegroups.com) or just create a GitH
310
367
# **Acknowledgements**
311
368
312
369
This repo would not be possible without these people:
370
+
313
371
- Many thanks for helping to encode the initial bulk of the data into mp3 to [akreal](https://nuget.pkg.github.com/akreal);
314
372
- 18 hours of ground truth annotation datasets for validation are a courtesy of [activebc](https://activebc.ru/);
315
373
316
374
Kudos!
317
375
318
376
# **FAQ**
319
377
320
-
## **0. ~~Why not MP3?~~ MP3 encoding / decoding**
Even though OGG / Opus is considered to be better for speech with higher compression, we opted for a more conventional well known format.
438
495
439
496
Also LPC net codec boasts ultra-low bitrate speech compression as well. But we decided to opt for a more familiar format to avoid worry about actually losing signal in compression.
440
497
441
498
## **1. Issues with reading files**
442
499
443
-
####**Maybe try this approach:**
500
+
### **Maybe try this approach:**
444
501
445
502
<details><summary>See example</summary>
446
503
<p>
@@ -461,28 +518,53 @@ if abs_max>0:
461
518
462
519
## **2. Why share such dataset?**
463
520
464
-
We are not altruists, life just is **not a zero sum game**.
521
+
We are not altruists, life just is **not a zero sum game**.
465
522
466
523
Consider the progress in computer vision, that was made possible by:
524
+
467
525
- Public datasets;
468
526
- Public pre-trained models;
469
527
- Open source frameworks;
470
528
- Open research;
471
529
472
-
TTS does not enjoy the same attention by ML community because it is data hungry and public datasets are lacking, especially for languages other than English.
530
+
STT does not enjoy the same attention by ML community because it is data hungry and public datasets are lacking, especially for languages other than English.
473
531
Ultimately it leads to worse-off situation for the general community.
474
532
475
533
## **3. Known issues with the dataset to be fixed**
476
534
477
535
- Speaker labels coming soon;
478
536
- Validation sets for new domains: Radio/Public Speech will be added in next releases.
479
537
538
+
## **4. Why migrate to OPUS?**
539
+
540
+
After extensive testing, both during training and validation, we confirmed that converting 16kHz int16 data to OPUS does not at the very least degrade quality.
541
+
542
+
Also designed for speech, OPUS even at default compression rates takes less space than MP3 and does not introduce artefacts.
543
+
544
+
Some people even reported quality improvements when training using OPUS.
Сc-by-nc and commercial usage available after agreement with dataset authors.
550
+
CC-BY-NC and commercial usage available after agreement with dataset authors.
485
551
486
552
# **Donations**
487
553
488
554
[Donate](https://buymeacoff.ee/8oneCIN) (each coffee pays for several full downloads) or via [open_collective](https://opencollective.com/open_stt) or just use our DO referral [link](https://sohabr.net/habr/post/357748/) to help.
0 commit comments