Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spark] Add audio predictors #2466

Merged
merged 2 commits into from
Apr 3, 2023
Merged

Conversation

xyang16
Copy link
Contributor

@xyang16 xyang16 commented Mar 16, 2023

Description

Brief description of what this PR is about

  • If this change is a backward incompatible change, why must this change be made?
  • Interesting edge cases to note here

@xyang16 xyang16 requested review from zachgk, frankfliu and a team as code owners March 16, 2023 22:00
@xyang16 xyang16 requested a review from lanking520 March 16, 2023 22:01
@codecov-commenter
Copy link

codecov-commenter commented Mar 16, 2023

Codecov Report

Patch coverage: 74.51% and project coverage change: +1.65 🎉

Comparison is base (bb5073f) 72.08% compared to head (3357349) 73.74%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2466      +/-   ##
============================================
+ Coverage     72.08%   73.74%   +1.65%     
- Complexity     5126     6887    +1761     
============================================
  Files           473      680     +207     
  Lines         21970    30066    +8096     
  Branches       2351     3107     +756     
============================================
+ Hits          15838    22172    +6334     
- Misses         4925     6390    +1465     
- Partials       1207     1504     +297     
Impacted Files Coverage Δ
api/src/main/java/ai/djl/modality/cv/Image.java 69.23% <ø> (-4.11%) ⬇️
...rc/main/java/ai/djl/modality/cv/MultiBoxPrior.java 76.00% <ø> (ø)
...rc/main/java/ai/djl/modality/cv/output/Joints.java 71.42% <ø> (ø)
.../main/java/ai/djl/modality/cv/output/Landmark.java 100.00% <ø> (ø)
...main/java/ai/djl/modality/cv/output/Rectangle.java 72.41% <0.00%> (ø)
...i/djl/modality/cv/translator/BigGANTranslator.java 21.42% <0.00%> (-5.24%) ⬇️
.../modality/cv/translator/ImageFeatureExtractor.java 0.00% <0.00%> (ø)
.../ai/djl/modality/cv/translator/YoloTranslator.java 27.77% <0.00%> (+18.95%) ⬆️
...ain/java/ai/djl/modality/cv/util/NDImageUtils.java 67.10% <0.00%> (+7.89%) ⬆️
api/src/main/java/ai/djl/modality/nlp/Decoder.java 63.63% <ø> (ø)
... and 228 more

... and 347 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@lanking520
Copy link
Contributor

This will not work if user passed in a large audio chunks. I would suggest to build logic on how to split chunks in the DJL into 5-10 second pieces. Some concrete example like this to refer:

https://github.com/gradient-ai/Whisper-AutoCaption

@xyang16 xyang16 force-pushed the whisper branch 2 times, most recently from ba29c2a to f83c447 Compare March 28, 2023 18:14
@xyang16
Copy link
Contributor Author

xyang16 commented Mar 28, 2023

This will not work if user passed in a large audio chunks. I would suggest to build logic on how to split chunks in the DJL into 5-10 second pieces. Some concrete example like this to refer:

https://github.com/gradient-ai/Whisper-AutoCaption

Added code to split audio chunks. Right now I join the output of all the chunks. You can see multiple <|startoftranscript|> and <|endoftext|>:

<|startoftranscript|> <|en|> <|transcribe|> <|notimestamps|> Well , she was very short . She was about five foot tall . So she always had this rather bou ff ant hairstyle , and you see , to give her a few extra inches , and very , very high heels , which she wore even first thing on a Sunday morning . And a terrifying mean , I think . I say all these things about her because <|endoftext|><|startoftranscript|> <|en|> <|transcribe|> <|notimestamps|> As a youngest child by some years after my older siblings , I was always kind of an observer of this , and a slightly am used one of her interactions with the rest of the family , where there was a certain amount of sparks flying and thunder rolling . <|endoftext|><|startoftranscript|> <|en|> <|transcribe|> <|notimestamps|> and I understood one another well enough for that not to be the case without a <|endoftext|>

I think it will be good to remove the context token like <|startoftranscript|> to make the output clean.

I saw we can specify skip_special_tokens=True to remove context tokens.

@xyang16 xyang16 force-pushed the whisper branch 2 times, most recently from 55610b9 to bbfa35b Compare March 28, 2023 18:39
@xyang16 xyang16 merged commit 866be61 into deepjavalibrary:master Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants