Image Generation Performance Evaluation #58

mmderakhshani · 2025-02-05T13:18:21Z

Thank you for providing this amazing GitHub repository. Please let me know which checkpoint and configuration you used to compute the results for tables 2 and 3 related to FID performance evaluation and GenEval evaluation. I also would like to know which split of MSCOCO 30k you use. Any link for this link would be appreciated.

I am trying to replicate your number, and I would appreciate having access to your evaluation script.

Best,
Mohammad

mmderakhshani · 2025-02-06T23:14:56Z

I am getting the following FID score: FID: 26.207129500696738.

I have computed it on the following dataset: https://huggingface.co/datasets/stasstaf/MS-COCO-validation using the standard Github code https://github.com/mseitzer/pytorch-fid. Could you please tell me how I can reproduce your numbers?

Sierkinhane · 2025-02-07T08:55:08Z

Hi, we used this split https://github.com/boomb0om/text2image-benchmark and we followed PixArt to fine-tune our model on a coco-like dataset (e.g., openimages) for evaluating MSCOCO FID. The other evaluations were assessed using the released checkpoints in GitHub.

mmderakhshani · 2025-02-07T15:24:51Z

Is this finetuning dataset/script available? I want to provide a solid comparison with your work and that is why I am asking for it.

mmderakhshani · 2025-02-07T21:10:36Z

Or would it be possible if I ask you to share with me the check point after finetuning? I do really need to have this one as I need to do inference on this model. Thanks.

Sierkinhane · 2025-02-08T02:59:30Z

Hi, I can share it with you via email. (sierkinhane@gmail.com)

dhevarghese · 2025-02-10T10:12:09Z

Hi,

Following up on the evaluations, I’m working on reproducing the results from Table 3 (GenEval). Here are the results I obtained:

Results:

Summary
=======
Total images: 553
Total prompts: 553
% correct images: 56.42%
% correct prompts: 56.42%
Task breakdown
==============
two_object       = 66.67% (66 / 99)
color_attr       = 34.00% (34 / 100)
colors           = 87.23% (82 / 94)
counting         = 47.50% (38 / 80)
position         = 13.00% (13 / 100)
single_object    = 98.75% (79 / 80)
Overall score (avg. over tasks): 0.57858

For image generation, I used the following parameters:

Guidance scale: 5
Generation timesteps: 50
Seed: 42
Checkpoint: Pretrained showo from the config (showlab/show-o)

Could you confirm if these settings align with those used in the original evaluation? If there are any additional details or adjustments I should consider, I’d appreciate your guidance.

Thanks!

Sierkinhane · 2025-02-10T10:53:33Z

Hi, we got a GenEval score around 0.53 when setting fewer inference steps (<=25) and more inference steps would obtain better performance.

dhevarghese · 2025-02-10T11:03:43Z

For the sake of reproducibility of the results in the paper, I'd like to achieve the score mentioned in the paper (0.68). Would it be possible to provide the hyperparmeters that were used to get that score? As it is much higher than what I get in my test runs with higher inference steps (0.57 maximum)

Sierkinhane · 2025-02-11T01:39:33Z

Hi, you should use this checkpoint https://huggingface.co/showlab/show-o-512x512.

dhevarghese · 2025-02-11T13:34:23Z

Thank you! I tried using this checkpoint, but something seems to be off. Below are the GenEval results I obtained:

Summary
=======
Total images: 553
Total prompts: 553
% correct images: 5.42%
% correct prompts: 5.42%

Task breakdown
==============
two_object       = 1.01% (1 / 99)
color_attr       = 0.00% (0 / 100)
colors           = 9.57% (9 / 94)
counting         = 0.00% (0 / 80)
position         = 0.00% (0 / 100)
single_object    = 25.00% (20 / 80)

Overall score (avg. over tasks): 0.05931

Would you be able to check if this checkpoint reproduces the reported score of 0.68? Are there any specific settings I might be missing? Any advice would be greatly appreciated.

Sierkinhane · 2025-02-11T13:51:21Z

The score is very weird. Can you check if the images were correctly generated? Besides, you must use this config https://github.com/showlab/Show-o/blob/main/configs/showo_demo_512x512.yaml

dhevarghese · 2025-02-11T20:47:27Z

Thank you so much for your guidance! It turns out the issue was with the config. I was able to reproduce the reported score using the provided configuration file. I appreciate your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image Generation Performance Evaluation #58

Image Generation Performance Evaluation #58

mmderakhshani commented Feb 5, 2025 •

edited

Loading

mmderakhshani commented Feb 6, 2025

Sierkinhane commented Feb 7, 2025

mmderakhshani commented Feb 7, 2025

mmderakhshani commented Feb 7, 2025

Sierkinhane commented Feb 8, 2025

dhevarghese commented Feb 10, 2025

Sierkinhane commented Feb 10, 2025

dhevarghese commented Feb 10, 2025

Sierkinhane commented Feb 11, 2025

dhevarghese commented Feb 11, 2025

Sierkinhane commented Feb 11, 2025

dhevarghese commented Feb 11, 2025

Image Generation Performance Evaluation #58

Image Generation Performance Evaluation #58

Comments

mmderakhshani commented Feb 5, 2025 • edited Loading

mmderakhshani commented Feb 6, 2025

Sierkinhane commented Feb 7, 2025

mmderakhshani commented Feb 7, 2025

mmderakhshani commented Feb 7, 2025

Sierkinhane commented Feb 8, 2025

dhevarghese commented Feb 10, 2025

Sierkinhane commented Feb 10, 2025

dhevarghese commented Feb 10, 2025

Sierkinhane commented Feb 11, 2025

dhevarghese commented Feb 11, 2025

Sierkinhane commented Feb 11, 2025

dhevarghese commented Feb 11, 2025

mmderakhshani commented Feb 5, 2025 •

edited

Loading