You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been attempting to reproduce the experimental results from your paper "Efficient Test-Time Adaptation of Vision-Language Models" as outlined in the repository. However, after running the experiments using the provided code and data, the results I obtained do not align with those reported in the paper.
Steps to Reproduce:
1.Clone the repository and set up the environment as per the instructions in the README.
2.Run the following commands to reproduce the results:
3.I then modified the code by removing the update_cache() and compute_cache_logits() sections in the function run_test_tda(), so that the model now operates as a zero-shot CLIP. I then re-ran the same commands.
Results:
As shown in the image, "CLIPRN50/vit" refers to zero-shot CLIP. The accuracy of the zero-shot CLIP in my experiment is higher than the results reported in the paper, and it is close to the performace of "TDARN50/vit".
Could you please provide guidance on how to resolve this discrepancy?
The text was updated successfully, but these errors were encountered:
Description:
I have been attempting to reproduce the experimental results from your paper "Efficient Test-Time Adaptation of Vision-Language Models" as outlined in the repository. However, after running the experiments using the provided code and data, the results I obtained do not align with those reported in the paper.
Steps to Reproduce:
1.Clone the repository and set up the environment as per the instructions in the README.
2.Run the following commands to reproduce the results:
3.I then modified the code by removing the
update_cache()
andcompute_cache_logits()
sections in the functionrun_test_tda()
, so that the model now operates as a zero-shot CLIP. I then re-ran the same commands.Results:
As shown in the image, "CLIPRN50/vit" refers to zero-shot CLIP. The accuracy of the zero-shot CLIP in my experiment is higher than the results reported in the paper, and it is close to the performace of "TDARN50/vit".
Could you please provide guidance on how to resolve this discrepancy?
The text was updated successfully, but these errors were encountered: