[Community Pipeline] UnCLIP Text Interpolation Pipeline #2257

Abhinay1997 · 2023-02-06T10:07:13Z

No description provided.

HuggingFaceDocBuilderDev · 2023-02-06T10:16:03Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten

This looks super cool, thanks for adding it @Abhinay1997

Would be interesting to try it out on some of the examples of DALLE-2, e.g.

"a photo of an adult lion → a photo of lion cub"
"a photo of a landscape in winter → a photo of a landscape in fall"
"a photo of a victorian house → a photo of a modern house"

Also think this is a cool pipeline to build a spaces with :-)

@williamberman can you also have a look here?

Abhinay1997 · 2023-02-07T08:40:43Z

Thanks @patrickvonplaten. 😄

Need your input on the attention mask to be used for the interpolated text embeddings because the results are not great when the difference in prompt length is large.

patrickvonplaten · 2023-02-07T13:56:09Z

Thanks @patrickvonplaten. smile

Need your input on the attention mask to be used for the interpolated text embeddings because the results are not great when the difference in prompt length is large.

cc @williamberman maybe?

Abhinay1997 · 2023-02-07T14:08:09Z

@patrickvonplaten , see our discussion here #1869 willberman suggested we use the larger of the two for now

williamberman · 2023-02-07T19:39:45Z

examples/community/unclip_text_interpolation.py

+        for interp_val in np.linspace(0, 1, steps):
+            # Use the start and end prompts for 0 and 1 values as slerp results are subjectively worse than slerp results for the same.
+            if interp_val == 0:
+                text_embeds = start_text_embeds
+                last_hidden_state = start_last_hidden_state
+            elif interp_val == 1:
+                text_embeds = end_text_embeds
+                last_hidden_state = end_last_hidden_state
+            else:
+                text_embeds = UnCLIPTextInterpolationPipeline.slerp(interp_val, start_text_embeds, end_text_embeds)
+                last_hidden_state = UnCLIPTextInterpolationPipeline.slerp(
+                    interp_val, start_last_hidden_state, end_last_hidden_state
+                )
+
+            text_model_output.text_embeds = text_embeds.unsqueeze(0).to(device)
+            text_model_output.last_hidden_state = last_hidden_state.unsqueeze(0).to(device)
+
+            res = self._generate(
+                text_model_output=text_model_output,
+                text_attention_mask=attention_mask,
+                generator=generator,
+                prior_num_inference_steps=prior_num_inference_steps,
+                decoder_num_inference_steps=decoder_num_inference_steps,
+                super_res_num_inference_steps=super_res_num_inference_steps,
+                prior_guidance_scale=prior_guidance_scale,
+                decoder_guidance_scale=decoder_guidance_scale,
+                output_type=output_type,
+                return_dict=return_dict,
+            )
+


I think ideally we should batch the embeddings instead of effectively running the pipeline in a loop

Sure. Will batch the pipeline run.

williamberman · 2023-02-07T19:41:09Z

examples/community/unclip_text_interpolation.py

+            text_model_output.text_embeds = text_embeds.unsqueeze(0).to(device)
+            text_model_output.last_hidden_state = last_hidden_state.unsqueeze(0).to(device)


ideally we use the interpolated results directly instead of mutating text_model_output

Got it. Will make the change.

@williamberman , made changes based on your feedback. Could you review them when you can ?

P.S Ran the code through black and isort multiple times but it's still failing the code quality test

We recently updated the versions of our linters etc.. could you try making sure they're up to date and running make style locally before pushing?

williamberman · 2023-02-07T19:41:56Z

examples/community/unclip_text_interpolation.py

+        return ImagePipelineOutput(images=image)
+
+    @staticmethod
+    def slerp(val, low, high):


nice! slerp doesn't have to be a static or regular method on the class. Let's just move it to a regular function at the top of the file :)

Ohh. Yeah that makes sense.

williamberman · 2023-02-07T19:43:51Z

examples/community/unclip_text_interpolation.py

+
+    @torch.no_grad()
+    # Copied from diffusers.pipelines.unclip.pipeline_unclip.UnCLIPPipeline.__call__
+    def _generate(


We try to keep the __call__ function pretty self contained so lets move _generate back to inside __call__. This should work well with the other comment on batching the interpolated text embeddings :)

Do you mean like this ?

def __call__(.....): def _generate(.....):

almost! could we just remove the _generate function and have all of the logic directly in the __call__ method?

williamberman · 2023-02-07T19:46:07Z

Great start @Abhinay1997 !

Abhinay1997 · 2023-02-08T13:53:46Z

@williamberman , made changes based on your feedback. Could you review them when you can ?

P.S Ran the code through black and isort multiple times but it's still failing the code quality test

williamberman · 2023-02-10T21:45:47Z

examples/community/unclip_text_interpolation.py

+        for interp_val in torch.linspace(0, 1, steps):
+            text_embeds = slerp(interp_val, text_model_output.text_embeds[0], text_model_output.text_embeds[1])
+            last_hidden_state = slerp(
+                interp_val, text_model_output.last_hidden_state[0], text_model_output.last_hidden_state[1]
+            )
+            batch_text_embeds.append(text_embeds.unsqueeze(0))
+            batch_last_hidden_state.append(last_hidden_state.unsqueeze(0))
+
+        batch_text_embeds = torch.cat(batch_text_embeds)
+        batch_last_hidden_state = torch.cat(batch_last_hidden_state)


williamberman · 2023-02-10T21:48:48Z

Love the progress @Abhinay1997 ! Could you also add some example code for running the pipeline along with the outputs it gives :) ?

Abhinay1997 · 2023-02-11T10:01:45Z

examples/community/README.md

+### UnCLIP Text Interpolation Pipeline
+
+This Diffusion Pipeline takes two prompts and interpolates between the two input prompts using spherical interpolation ( slerp ). The input prompts are converted to text embeddings by the pipeline's text_encoder and the interpolation is done on the resulting text_embeddings over the number of steps specified. Defaults to 5 steps. 
+
+```python
+import torch
+from diffusers import DiffusionPipeline
+
+device = torch.device("cpu" if not torch.cuda.is_available() else "cuda")
+
+pipe = DiffusionPipeline.from_pretrained(
+    "kakaobrain/karlo-v1-alpha",
+    torch_dtype=torch.float16,
+    custom_pipeline="unclip_text_interpolation"
+)
+pipe.to(device)
+
+start_prompt = "A photograph of an adult lion"
+end_prompt = "A photograph of a lion cub"
+#For best results keep the prompts close in length to each other. Of course, feel free to try out with differing lengths.
+generator = torch.Generator(device=device).manual_seed(42)
+
+output = pipe(start_prompt, end_prompt, steps = 6, generator = generator, enable_sequential_cpu_offload=False)
+
+for i,image in enumerate(output.images):
+    img.save('result%s.jpg' % i)
+```
+
+The resulting images in order:-
+
+![result_0](https://huggingface.co/datasets/NagaSaiAbhinay/UnCLIPTextInterpolationSamples/resolve/main/lion_to_cub_0.png)
+![result_1](https://huggingface.co/datasets/NagaSaiAbhinay/UnCLIPTextInterpolationSamples/resolve/main/lion_to_cub_1.png)
+![result_2](https://huggingface.co/datasets/NagaSaiAbhinay/UnCLIPTextInterpolationSamples/resolve/main/lion_to_cub_2.png)
+![result_3](https://huggingface.co/datasets/NagaSaiAbhinay/UnCLIPTextInterpolationSamples/resolve/main/lion_to_cub_3.png)
+![result_4](https://huggingface.co/datasets/NagaSaiAbhinay/UnCLIPTextInterpolationSamples/resolve/main/lion_to_cub_4.png)
+![result_5](https://huggingface.co/datasets/NagaSaiAbhinay/UnCLIPTextInterpolationSamples/resolve/main/lion_to_cub_5.png)


@williamberman Code example for the pipeline.

Abhinay1997 · 2023-02-11T10:13:01Z

@williamberman

Refactored the call function as suggested :)
I updated to the same linter versions as the github action and make style works on my local machine but fails the check for the PR.

examples/community/unclip_text_interpolation.py

williamberman · 2023-02-13T03:58:53Z

Awesome, looks basically good to do @Abhinay1997! I needed to merge in master to get the updated linter versions :)

Refactor to linter formatting Co-authored-by: Will Berman <wlbberman@gmail.com>

Abhinay1997 · 2023-02-13T04:28:32Z

Thanks for the help Will ! Hope we are good for the merge now.

williamberman · 2023-02-13T06:15:54Z

Awesome, this is great @Abhinay1997! Would you be interested in making a spaces to showcase the pipeline? https://huggingface.co/spaces

Abhinay1997 · 2023-02-13T06:16:54Z

Sure @williamberman ! I was thinking of doing it once the PR is merged :)

…2257) * UnCLIP Text Interpolation Pipeline * Formatter fixes * Changes based on feedback * Formatting fix * Formatting fix * isort formatting fix(?) * Remove duplicate code * Formatting fix * Refactor __call__ and change example in readme. * Update examples/community/unclip_text_interpolation.py Refactor to linter formatting Co-authored-by: Will Berman <wlbberman@gmail.com> --------- Co-authored-by: Will Berman <wlbberman@gmail.com>

Abhinay1997 added 2 commits February 6, 2023 15:35

UnCLIP Text Interpolation Pipeline

2d4d23e

Formatter fixes

985a53f

Abhinay1997 mentioned this pull request Feb 6, 2023

[Community Pipeline] UnCLIP image / text interpolations #1869

Closed

2 tasks

Abhinay1997 changed the title ~~UnCLIP Text Interpolation Pipeline~~ [Community Pipeline] UnCLIP Text Interpolation Pipeline Feb 6, 2023

patrickvonplaten requested a review from williamberman February 7, 2023 07:46

patrickvonplaten approved these changes Feb 7, 2023

View reviewed changes

williamberman reviewed Feb 7, 2023

View reviewed changes

Abhinay1997 added 6 commits February 8, 2023 18:46

Changes based on feedback

d5b20e7

Formatting fix

373ac44

Formatting fix

270e121

isort formatting fix(?)

8d571e6

Remove duplicate code

135e49a

Formatting fix

ba63430

patrickvonplaten requested a review from williamberman February 9, 2023 09:38

williamberman reviewed Feb 10, 2023

View reviewed changes

Refactor __call__ and change example in readme.

ea97db7

Abhinay1997 commented Feb 11, 2023

View reviewed changes

Merge branch 'main' into unclip_text

8bcb70e

williamberman reviewed Feb 13, 2023

View reviewed changes

examples/community/unclip_text_interpolation.py Outdated Show resolved Hide resolved

Update examples/community/unclip_text_interpolation.py

1242e4d

Refactor to linter formatting Co-authored-by: Will Berman <wlbberman@gmail.com>

williamberman approved these changes Feb 13, 2023

View reviewed changes

williamberman merged commit a688c7b into huggingface:main Feb 13, 2023

Abhinay1997 deleted the unclip_text branch June 14, 2023 03:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Community Pipeline] UnCLIP Text Interpolation Pipeline #2257

[Community Pipeline] UnCLIP Text Interpolation Pipeline #2257

Abhinay1997 commented Feb 6, 2023

HuggingFaceDocBuilderDev commented Feb 6, 2023 •

edited

Loading

patrickvonplaten left a comment

Abhinay1997 commented Feb 7, 2023

patrickvonplaten commented Feb 7, 2023

Abhinay1997 commented Feb 7, 2023

williamberman Feb 7, 2023

Abhinay1997 Feb 8, 2023

williamberman Feb 7, 2023

Abhinay1997 Feb 8, 2023

williamberman Feb 10, 2023

williamberman Feb 7, 2023

Abhinay1997 Feb 8, 2023

williamberman Feb 7, 2023

Abhinay1997 Feb 8, 2023

williamberman Feb 10, 2023

Abhinay1997 Feb 11, 2023

williamberman commented Feb 7, 2023

Abhinay1997 commented Feb 8, 2023 •

edited

Loading

williamberman Feb 10, 2023

williamberman commented Feb 10, 2023

Abhinay1997 Feb 11, 2023

Abhinay1997 commented Feb 11, 2023

williamberman commented Feb 13, 2023

Abhinay1997 commented Feb 13, 2023

williamberman commented Feb 13, 2023

Abhinay1997 commented Feb 13, 2023

		text_model_output.text_embeds = text_embeds.unsqueeze(0).to(device)
		text_model_output.last_hidden_state = last_hidden_state.unsqueeze(0).to(device)

[Community Pipeline] UnCLIP Text Interpolation Pipeline #2257

[Community Pipeline] UnCLIP Text Interpolation Pipeline #2257

Conversation

Abhinay1997 commented Feb 6, 2023

HuggingFaceDocBuilderDev commented Feb 6, 2023 • edited Loading

patrickvonplaten left a comment

Choose a reason for hiding this comment

Abhinay1997 commented Feb 7, 2023

patrickvonplaten commented Feb 7, 2023

Abhinay1997 commented Feb 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

williamberman commented Feb 7, 2023

Abhinay1997 commented Feb 8, 2023 • edited Loading

Choose a reason for hiding this comment

williamberman commented Feb 10, 2023

Choose a reason for hiding this comment

Abhinay1997 commented Feb 11, 2023

williamberman commented Feb 13, 2023

Abhinay1997 commented Feb 13, 2023

williamberman commented Feb 13, 2023

Abhinay1997 commented Feb 13, 2023

HuggingFaceDocBuilderDev commented Feb 6, 2023 •

edited

Loading

Abhinay1997 commented Feb 8, 2023 •

edited

Loading