[spark] Add audio predictors #2466

xyang16 · 2023-03-16T22:00:01Z

Description

Brief description of what this PR is about

If this change is a backward incompatible change, why must this change be made?
Interesting edge cases to note here

codecov-commenter · 2023-03-16T22:25:17Z

Codecov Report

Patch coverage: 74.51% and project coverage change: +1.65 🎉

Comparison is base (bb5073f) 72.08% compared to head (3357349) 73.74%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #2466      +/-   ##
============================================
+ Coverage     72.08%   73.74%   +1.65%     
- Complexity     5126     6887    +1761     
============================================
  Files           473      680     +207     
  Lines         21970    30066    +8096     
  Branches       2351     3107     +756     
============================================
+ Hits          15838    22172    +6334     
- Misses         4925     6390    +1465     
- Partials       1207     1504     +297

Impacted Files	Coverage Δ
api/src/main/java/ai/djl/modality/cv/Image.java	`69.23% <ø> (-4.11%)`	⬇️
...rc/main/java/ai/djl/modality/cv/MultiBoxPrior.java	`76.00% <ø> (ø)`
...rc/main/java/ai/djl/modality/cv/output/Joints.java	`71.42% <ø> (ø)`
.../main/java/ai/djl/modality/cv/output/Landmark.java	`100.00% <ø> (ø)`
...main/java/ai/djl/modality/cv/output/Rectangle.java	`72.41% <0.00%> (ø)`
...i/djl/modality/cv/translator/BigGANTranslator.java	`21.42% <0.00%> (-5.24%)`	⬇️
.../modality/cv/translator/ImageFeatureExtractor.java	`0.00% <0.00%> (ø)`
.../ai/djl/modality/cv/translator/YoloTranslator.java	`27.77% <0.00%> (+18.95%)`	⬆️
...ain/java/ai/djl/modality/cv/util/NDImageUtils.java	`67.10% <0.00%> (+7.89%)`	⬆️
api/src/main/java/ai/djl/modality/nlp/Decoder.java	`63.63% <ø> (ø)`
... and 228 more

... and 347 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

lanking520 · 2023-03-20T18:00:25Z

This will not work if user passed in a large audio chunks. I would suggest to build logic on how to split chunks in the DJL into 5-10 second pieces. Some concrete example like this to refer:

https://github.com/gradient-ai/Whisper-AutoCaption

xyang16 · 2023-03-28T18:21:36Z

This will not work if user passed in a large audio chunks. I would suggest to build logic on how to split chunks in the DJL into 5-10 second pieces. Some concrete example like this to refer:

https://github.com/gradient-ai/Whisper-AutoCaption

Added code to split audio chunks. Right now I join the output of all the chunks. You can see multiple <|startoftranscript|> and <|endoftext|>:

<|startoftranscript|> <|en|> <|transcribe|> <|notimestamps|> Well , she was very short . She was about five foot tall . So she always had this rather bou ff ant hairstyle , and you see , to give her a few extra inches , and very , very high heels , which she wore even first thing on a Sunday morning . And a terrifying mean , I think . I say all these things about her because <|endoftext|><|startoftranscript|> <|en|> <|transcribe|> <|notimestamps|> As a youngest child by some years after my older siblings , I was always kind of an observer of this , and a slightly am used one of her interactions with the rest of the family , where there was a certain amount of sparks flying and thunder rolling . <|endoftext|><|startoftranscript|> <|en|> <|transcribe|> <|notimestamps|> and I understood one another well enough for that not to be the case without a <|endoftext|>

I think it will be good to remove the context token like <|startoftranscript|> to make the output clean.

I saw we can specify skip_special_tokens=True to remove context tokens.

[audio] Move WhisperTranslator to audio extension

6343a36

xyang16 requested review from zachgk, frankfliu and a team as code owners March 16, 2023 22:00

xyang16 requested a review from lanking520 March 16, 2023 22:01

xyang16 force-pushed the whisper branch from 9d2ae0c to fd69513 Compare March 17, 2023 00:04

xyang16 force-pushed the whisper branch 2 times, most recently from ba29c2a to f83c447 Compare March 28, 2023 18:14

xyang16 force-pushed the whisper branch 2 times, most recently from 55610b9 to bbfa35b Compare March 28, 2023 18:39

[spark] Add audio predictors

3357349

xyang16 force-pushed the whisper branch from bbfa35b to 3357349 Compare March 28, 2023 19:19

lanking520 approved these changes Apr 3, 2023

View reviewed changes

xyang16 merged commit 866be61 into deepjavalibrary:master Apr 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark] Add audio predictors #2466

[spark] Add audio predictors #2466

xyang16 commented Mar 16, 2023

codecov-commenter commented Mar 16, 2023 •

edited

Loading

lanking520 commented Mar 20, 2023

xyang16 commented Mar 28, 2023 •

edited

Loading

[spark] Add audio predictors #2466

[spark] Add audio predictors #2466

Conversation

xyang16 commented Mar 16, 2023

Description

codecov-commenter commented Mar 16, 2023 • edited Loading

Codecov Report

lanking520 commented Mar 20, 2023

xyang16 commented Mar 28, 2023 • edited Loading

codecov-commenter commented Mar 16, 2023 •

edited

Loading

xyang16 commented Mar 28, 2023 •

edited

Loading