-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does the model support the variable audio length? #2
Comments
Hi, Yes the model support variable audio lengths, as this was a requirement for the HEAR challenge. For the timestamp embedding, we submitted a model "base" with 160 ms window, and a model "2 Level" with a larger window 800 ms. Precisely, we concatenated the embeding as implemented here. I'm not sure which preprocessing method would be the best, but I'd guess that silence trimming won't affect the performance to a large extent. I hope this helps! |
Thanks for your reply, I learned a lot. Recently i want to retrain the model use my data, and i want to fine-tune pretained model for 2-class, so how can i do it? can you give some examples? thanks a lot ! |
@kkoutini likely related to this: Is it correct that the default
|
yes, unfortunately the |
@kkoutini thanks for the pointer. I'm still not sure why this is also raised for inputs with less than 998/1000 frames. Is this due to pos-enc interpolation? is there any use of changing parameters like |
This warning is always shown when the input size doesn't equal the size provided when training the model.
Unfortunately the |
@kkoutini thanks! I guess this can be closed then |
Hi,
Thanks for sharing this great work!
I have a lot of audio files, but the length are different, so i want to know if the model support the variable audio length? or another question, you know, some audio events need more length to get the better embedding to get a good classification result, (maybe i can remove the silence, but i do not know how to keep the
spectrometer smooth, if remove the silence and make a step in wave, i think the spectrometer is polluted, and the embedding maybe have some problem), so how to process those audio files?
Thanks.
Looking forward to your reply.
The text was updated successfully, but these errors were encountered: