Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Community Pipeline] UnCLIP Text Interpolation Pipeline #2257

Merged
merged 11 commits into from
Feb 13, 2023

Conversation

Abhinay1997
Copy link
Contributor

No description provided.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Feb 6, 2023

The documentation is not available anymore as the PR was closed or merged.

@Abhinay1997 Abhinay1997 changed the title UnCLIP Text Interpolation Pipeline [Community Pipeline] UnCLIP Text Interpolation Pipeline Feb 6, 2023
Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks super cool, thanks for adding it @Abhinay1997

Would be interesting to try it out on some of the examples of DALLE-2, e.g.

"a photo of an adult lion → a photo of lion cub"
"a photo of a landscape in winter → a photo of a landscape in fall"
"a photo of a victorian house → a photo of a modern house"

Also think this is a cool pipeline to build a spaces with :-)

@williamberman can you also have a look here?

@Abhinay1997
Copy link
Contributor Author

Thanks @patrickvonplaten. 😄

Need your input on the attention mask to be used for the interpolated text embeddings because the results are not great when the difference in prompt length is large.

@patrickvonplaten
Copy link
Contributor

Thanks @patrickvonplaten. smile

Need your input on the attention mask to be used for the interpolated text embeddings because the results are not great when the difference in prompt length is large.

cc @williamberman maybe?

@Abhinay1997
Copy link
Contributor Author

@patrickvonplaten , see our discussion here #1869 willberman suggested we use the larger of the two for now

Comment on lines 638 to 667
for interp_val in np.linspace(0, 1, steps):
# Use the start and end prompts for 0 and 1 values as slerp results are subjectively worse than slerp results for the same.
if interp_val == 0:
text_embeds = start_text_embeds
last_hidden_state = start_last_hidden_state
elif interp_val == 1:
text_embeds = end_text_embeds
last_hidden_state = end_last_hidden_state
else:
text_embeds = UnCLIPTextInterpolationPipeline.slerp(interp_val, start_text_embeds, end_text_embeds)
last_hidden_state = UnCLIPTextInterpolationPipeline.slerp(
interp_val, start_last_hidden_state, end_last_hidden_state
)

text_model_output.text_embeds = text_embeds.unsqueeze(0).to(device)
text_model_output.last_hidden_state = last_hidden_state.unsqueeze(0).to(device)

res = self._generate(
text_model_output=text_model_output,
text_attention_mask=attention_mask,
generator=generator,
prior_num_inference_steps=prior_num_inference_steps,
decoder_num_inference_steps=decoder_num_inference_steps,
super_res_num_inference_steps=super_res_num_inference_steps,
prior_guidance_scale=prior_guidance_scale,
decoder_guidance_scale=decoder_guidance_scale,
output_type=output_type,
return_dict=return_dict,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally we should batch the embeddings instead of effectively running the pipeline in a loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will batch the pipeline run.

Comment on lines 652 to 653
text_model_output.text_embeds = text_embeds.unsqueeze(0).to(device)
text_model_output.last_hidden_state = last_hidden_state.unsqueeze(0).to(device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we use the interpolated results directly instead of mutating text_model_output

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Will make the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamberman , made changes based on your feedback. Could you review them when you can ?

P.S Ran the code through black and isort multiple times but it's still failing the code quality test

We recently updated the versions of our linters etc.. could you try making sure they're up to date and running make style locally before pushing?

return ImagePipelineOutput(images=image)

@staticmethod
def slerp(val, low, high):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! slerp doesn't have to be a static or regular method on the class. Let's just move it to a regular function at the top of the file :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh. Yeah that makes sense.


@torch.no_grad()
# Copied from diffusers.pipelines.unclip.pipeline_unclip.UnCLIPPipeline.__call__
def _generate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to keep the __call__ function pretty self contained so lets move _generate back to inside __call__. This should work well with the other comment on batching the interpolated text embeddings :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean like this ?

def __call__(.....):
     def _generate(.....):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost! could we just remove the _generate function and have all of the logic directly in the __call__ method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure !

@williamberman
Copy link
Contributor

Great start @Abhinay1997 !

@Abhinay1997
Copy link
Contributor Author

Abhinay1997 commented Feb 8, 2023

@williamberman , made changes based on your feedback. Could you review them when you can ?

P.S Ran the code through black and isort multiple times but it's still failing the code quality test

Comment on lines 641 to 650
for interp_val in torch.linspace(0, 1, steps):
text_embeds = slerp(interp_val, text_model_output.text_embeds[0], text_model_output.text_embeds[1])
last_hidden_state = slerp(
interp_val, text_model_output.last_hidden_state[0], text_model_output.last_hidden_state[1]
)
batch_text_embeds.append(text_embeds.unsqueeze(0))
batch_last_hidden_state.append(last_hidden_state.unsqueeze(0))

batch_text_embeds = torch.cat(batch_text_embeds)
batch_last_hidden_state = torch.cat(batch_last_hidden_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@williamberman
Copy link
Contributor

Love the progress @Abhinay1997 ! Could you also add some example code for running the pipeline along with the outputs it gives :) ?

Comment on lines +956 to +991
### UnCLIP Text Interpolation Pipeline

This Diffusion Pipeline takes two prompts and interpolates between the two input prompts using spherical interpolation ( slerp ). The input prompts are converted to text embeddings by the pipeline's text_encoder and the interpolation is done on the resulting text_embeddings over the number of steps specified. Defaults to 5 steps.

```python
import torch
from diffusers import DiffusionPipeline

device = torch.device("cpu" if not torch.cuda.is_available() else "cuda")

pipe = DiffusionPipeline.from_pretrained(
"kakaobrain/karlo-v1-alpha",
torch_dtype=torch.float16,
custom_pipeline="unclip_text_interpolation"
)
pipe.to(device)

start_prompt = "A photograph of an adult lion"
end_prompt = "A photograph of a lion cub"
#For best results keep the prompts close in length to each other. Of course, feel free to try out with differing lengths.
generator = torch.Generator(device=device).manual_seed(42)

output = pipe(start_prompt, end_prompt, steps = 6, generator = generator, enable_sequential_cpu_offload=False)

for i,image in enumerate(output.images):
img.save('result%s.jpg' % i)
```

The resulting images in order:-

![result_0](https://huggingface.co/datasets/NagaSaiAbhinay/UnCLIPTextInterpolationSamples/resolve/main/lion_to_cub_0.png)
![result_1](https://huggingface.co/datasets/NagaSaiAbhinay/UnCLIPTextInterpolationSamples/resolve/main/lion_to_cub_1.png)
![result_2](https://huggingface.co/datasets/NagaSaiAbhinay/UnCLIPTextInterpolationSamples/resolve/main/lion_to_cub_2.png)
![result_3](https://huggingface.co/datasets/NagaSaiAbhinay/UnCLIPTextInterpolationSamples/resolve/main/lion_to_cub_3.png)
![result_4](https://huggingface.co/datasets/NagaSaiAbhinay/UnCLIPTextInterpolationSamples/resolve/main/lion_to_cub_4.png)
![result_5](https://huggingface.co/datasets/NagaSaiAbhinay/UnCLIPTextInterpolationSamples/resolve/main/lion_to_cub_5.png)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamberman Code example for the pipeline.

@Abhinay1997
Copy link
Contributor Author

@williamberman

  1. Refactored the call function as suggested :)
  2. I updated to the same linter versions as the github action and make style works on my local machine but fails the check for the PR.

@williamberman
Copy link
Contributor

Awesome, looks basically good to do @Abhinay1997! I needed to merge in master to get the updated linter versions :)

Refactor to linter formatting

Co-authored-by: Will Berman <wlbberman@gmail.com>
@Abhinay1997
Copy link
Contributor Author

Thanks for the help Will ! Hope we are good for the merge now.

@williamberman
Copy link
Contributor

Awesome, this is great @Abhinay1997! Would you be interested in making a spaces to showcase the pipeline? https://huggingface.co/spaces

@williamberman williamberman merged commit a688c7b into huggingface:main Feb 13, 2023
@Abhinay1997
Copy link
Contributor Author

Sure @williamberman ! I was thinking of doing it once the PR is merged :)

@Abhinay1997 Abhinay1997 deleted the unclip_text branch June 14, 2023 03:56
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
…2257)

* UnCLIP Text Interpolation Pipeline

* Formatter fixes

* Changes based on feedback

* Formatting fix

* Formatting fix

* isort formatting fix(?)

* Remove duplicate code

* Formatting fix

* Refactor __call__ and change example in readme.

* Update examples/community/unclip_text_interpolation.py

Refactor to linter formatting

Co-authored-by: Will Berman <wlbberman@gmail.com>

---------

Co-authored-by: Will Berman <wlbberman@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants