-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[spark] Add audio predictors #2466
Conversation
Codecov ReportPatch coverage:
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## master #2466 +/- ##
============================================
+ Coverage 72.08% 73.74% +1.65%
- Complexity 5126 6887 +1761
============================================
Files 473 680 +207
Lines 21970 30066 +8096
Branches 2351 3107 +756
============================================
+ Hits 15838 22172 +6334
- Misses 4925 6390 +1465
- Partials 1207 1504 +297
... and 347 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
This will not work if user passed in a large audio chunks. I would suggest to build logic on how to split chunks in the DJL into 5-10 second pieces. Some concrete example like this to refer: |
ba29c2a
to
f83c447
Compare
Added code to split audio chunks. Right now I join the output of all the chunks. You can see multiple <|startoftranscript|> and <|endoftext|>:
I think it will be good to remove the context token like <|startoftranscript|> to make the output clean. I saw we can specify skip_special_tokens=True to remove context tokens. |
55610b9
to
bbfa35b
Compare
Description
Brief description of what this PR is about