Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the Llama 3.2 vision models #796

Merged
merged 36 commits into from
Sep 29, 2024
Merged

Implement the Llama 3.2 vision models #796

merged 36 commits into from
Sep 29, 2024

Conversation

EricLBuehler
Copy link
Owner

@EricLBuehler EricLBuehler commented Sep 26, 2024

🚨🚨🚨Model is working and ready for imminent release!🚨🚨🚨

Last few steps:

  • Forward pass runs
  • Correct values confirmed from inputs processor
  • Correct values confirmed from vision model
  • Correct values confirmed from text model

Implementation status:

  • MLlamaModel
    • MLLamaVisionModel
      • MLlamaPrecomputedPositionEmbedding
      • MLlamaPrecomputedAspectRatioEmbedding
      • MLlamaVisionEncoder (tanh gated attn and feedfwd)
        • MLlamaVisionAttention
        • MLlamaMlp
    • MllamaForCausalLM
      • MllamaCrossAttentionDecoderLayer (tanh gated attn and feedfwd)
        • MllamaTextRMSNorm
        • MllamaTextMLP
        • MllamaTextCrossAttention
      • MllamaSelfAttentionDecoderLayer
        • MllamaTextRMSNorm
        • MllamaTextMLP
        • MllamaTextSelfAttention
      • MllamaRotaryEmbedding
    • _prepare_cross_attention_mask for text <> vision
  • ImageProcessor
    • ImagePreProcessor (process images and inputs)
      • resize
      • pad
      • rescale
      • normalize
      • split_to_tiles
      • pack_images
      • convert_aspect_ratios_to_ids
      • build_aspect_ratio_mask
      • InputsProcessor::process_inputs (from seqs)
      • convert_sparse_cross_attention_mask_to_dense
      • get_cross_attention_token_mask
    • Process (apply chat template)
      • process (from messages)

@EricLBuehler EricLBuehler added new feature New feature or request models Additions to model or architectures labels Sep 26, 2024
Copy link

github-actions bot commented Sep 26, 2024

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                   12          105          104            0            1
 Python                 50         2165         1841           64          260
 TOML                   20          621          556            2           63
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          196          169            1           26
 (Total)                            273          201           32           40
-------------------------------------------------------------------------------
 Markdown               34         2425            0         1850          575
 |- BASH                 5          101           98            0            3
 |- JSON                 1           12           12            0            0
 |- Python               5           92           82            0           10
 |- Rust                 8          478          425           22           31
 |- TOML                 2           75           63            0           12
 (Total)                           3183          680         1872          631
-------------------------------------------------------------------------------
 Rust                  253        71654        64733         1353         5568
 |- Markdown           121         1174           25         1081           68
 (Total)                          72828        64758         2434         5636
===============================================================================
 Total                 380        77502        67675         3271         6556
===============================================================================
  

@EricLBuehler
Copy link
Owner Author

EricLBuehler commented Sep 27, 2024

RUST_BACKTRACE=1 cargo run --features cuda -- --port 1234 vision-plain -m meta-llama/Llama-3.2-11B-Vision-Instruct -a mllama

@EricLBuehler
Copy link
Owner Author

Run:

cargo run --features cuda --release -- -i vision-plain -m meta-llama/Llama-3.2-11B-Vision-Instruct -a mllama

And then...

> \image https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg <|image|>Where was this photo most likely taken?
The photo appears to be taken in the White Mountains of New Hampshire, USA. The mountain in the background is likely Mount Washington, which is the highest peak in the Northeastern United States and is known for its iconic summit and challenging weather conditions.
>

@EricLBuehler
Copy link
Owner Author

> Hello!
How can I assist you today?
> What is the date?
The current date is September 29, 2024.
> \image https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg <|image|>Where was this photo most likely taken?
The photo appears to be of a mountainous landscape with a snow-covered peak in the background. Based on the scenery, it's likely that the photo was taken in a region with mountains, possibly in North America or Europe. However, without more information or context, it's difficult to pinpoint the exact location.

That being said, the mountainous landscape and snow-covered peak remind me of the Northeastern United States, particularly the Appalachian Mountains or the White Mountains in New Hampshire. The White Mountains are home to Mount Washington, the highest peak in the Northeast, which is known for its rugged terrain and snowy winters.

If I had to take a guess, I would say that the photo was likely taken in New Hampshire or another mountainous region in the Northeastern United States.
> \image https://upload.wikimedia.org/wikipedia/commons/f/fd/Pink_flower.jpg <|image|>What type of flower is this?
The flower in the photo appears to be a type of wildflower or a daisy-like flower. However, based on the shape and color of the petals, it's difficult to determine the exact type of flower without more information or a closer look.

That being said, the flower's white petals and yellow center remind me of a daisy or a sunflower. However, the petals seem to be slightly more delicate and have a more rounded shape than a typical daisy or sunflower.

If I had to take a guess, I would say that the flower is likely a type of wildflower, such as a buttercup or a dandelion. However, without more information or a closer look, it's difficult to determine the exact type of flower.
> 

@EricLBuehler EricLBuehler merged commit f33ac29 into master Sep 29, 2024
12 checks passed
@EricLBuehler EricLBuehler deleted the mllama branch September 29, 2024 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models Additions to model or architectures new feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant