Add extra_repr to Linear classes for debugging purpose #6954

Xia-Weiwen · 2025-01-16T03:19:46Z

Summary
This PR adds extra_repr method to some Linear classes so that additional info is printed when printing such modules. It is useful for debugging.
Affected modules:

LinearLayer
LinearAllreduce
LmHeadLinearAllreduce

The extra_repr method gives the following info:

in_features
out_features
bias (true or false)
dtype

Example
Print llama-2-7b model on rank 0 after init_inference with world size = 2.
Previously we only got class names of these modules:

InferenceEngine(
  (module): LlamaForCausalLM(
    (model): LlamaModel(
      (embed_tokens): Embedding(32000, 4096)
      (layers): ModuleList(
        (0-31): 32 x LlamaDecoderLayer(
          (self_attn): LlamaSdpaAttention(
            (q_proj): LinearLayer()
            (k_proj): LinearLayer()
            (v_proj): LinearLayer()
            (o_proj): LinearAllreduce()
            (rotary_emb): LlamaRotaryEmbedding()
          )
          (mlp): LlamaMLP(
            (gate_proj): LinearLayer()
            (up_proj): LinearLayer()
            (down_proj): LinearAllreduce()
            (act_fn): SiLU()
          )
          (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
          (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        )
      )
      (norm): LlamaRMSNorm((4096,), eps=1e-05)
      (rotary_emb): LlamaRotaryEmbedding()
    )
    (lm_head): LmHeadLinearAllreduce()
  )
)

Now we get more useful info:

InferenceEngine(
  (module): LlamaForCausalLM(
    (model): LlamaModel(
      (embed_tokens): Embedding(32000, 4096)
      (layers): ModuleList(
        (0-31): 32 x LlamaDecoderLayer(
          (self_attn): LlamaSdpaAttention(
            (q_proj): LinearLayer(in_features=4096, out_features=2048, bias=False, dtype=torch.bfloat16)
            (k_proj): LinearLayer(in_features=4096, out_features=2048, bias=False, dtype=torch.bfloat16)
            (v_proj): LinearLayer(in_features=4096, out_features=2048, bias=False, dtype=torch.bfloat16)
            (o_proj): LinearAllreduce(in_features=2048, out_features=4096, bias=False, dtype=torch.bfloat16)
            (rotary_emb): LlamaRotaryEmbedding()
          )
          (mlp): LlamaMLP(
            (gate_proj): LinearLayer(in_features=4096, out_features=5504, bias=False, dtype=torch.bfloat16)
            (up_proj): LinearLayer(in_features=4096, out_features=5504, bias=False, dtype=torch.bfloat16)
            (down_proj): LinearAllreduce(in_features=5504, out_features=4096, bias=False, dtype=torch.bfloat16)
            (act_fn): SiLU()
          )
          (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
          (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        )
      )
      (norm): LlamaRMSNorm((4096,), eps=1e-05)
      (rotary_emb): LlamaRotaryEmbedding()
    )
    (lm_head): LmHeadLinearAllreduce(in_features=2048, out_features=32000, bias=False, dtype=torch.bfloat16)
  )
)

Xia-Weiwen · 2025-01-16T03:20:47Z

@microsoft-github-policy-service agree company="Intel"

Xia-Weiwen · 2025-01-16T03:21:46Z

@Yejing-Lai Please also help take a look. Thanks.

Xia-Weiwen · 2025-01-16T05:17:44Z

Hi @delock Could you please take a look and approve the CI? Thanks.

Xia-Weiwen · 2025-01-17T01:44:22Z

@loadams Thanks for reviewing and merging it!

) **Summary** This PR adds `extra_repr` method to some Linear classes so that additional info is printed when printing such modules. It is useful for debugging. Affected modules: - LinearLayer - LinearAllreduce - LmHeadLinearAllreduce The `extra_repr` method gives the following info: - in_features - out_features - bias (true or false) - dtype **Example** Print llama-2-7b model on rank 0 after `init_inference` with world size = 2. Previously we only got class names of these modules: ``` InferenceEngine( (module): LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): LinearLayer() (k_proj): LinearLayer() (v_proj): LinearLayer() (o_proj): LinearAllreduce() (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): LinearLayer() (up_proj): LinearLayer() (down_proj): LinearAllreduce() (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05) (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05) ) ) (norm): LlamaRMSNorm((4096,), eps=1e-05) (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): LmHeadLinearAllreduce() ) ) ``` Now we get more useful info: ``` InferenceEngine( (module): LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): LinearLayer(in_features=4096, out_features=2048, bias=False, dtype=torch.bfloat16) (k_proj): LinearLayer(in_features=4096, out_features=2048, bias=False, dtype=torch.bfloat16) (v_proj): LinearLayer(in_features=4096, out_features=2048, bias=False, dtype=torch.bfloat16) (o_proj): LinearAllreduce(in_features=2048, out_features=4096, bias=False, dtype=torch.bfloat16) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): LinearLayer(in_features=4096, out_features=5504, bias=False, dtype=torch.bfloat16) (up_proj): LinearLayer(in_features=4096, out_features=5504, bias=False, dtype=torch.bfloat16) (down_proj): LinearAllreduce(in_features=5504, out_features=4096, bias=False, dtype=torch.bfloat16) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05) (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05) ) ) (norm): LlamaRMSNorm((4096,), eps=1e-05) (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): LmHeadLinearAllreduce(in_features=2048, out_features=32000, bias=False, dtype=torch.bfloat16) ) ) ``` Signed-off-by: siqi <siqi@tecorigin.com>

) **Summary** This PR adds `extra_repr` method to some Linear classes so that additional info is printed when printing such modules. It is useful for debugging. Affected modules: - LinearLayer - LinearAllreduce - LmHeadLinearAllreduce The `extra_repr` method gives the following info: - in_features - out_features - bias (true or false) - dtype **Example** Print llama-2-7b model on rank 0 after `init_inference` with world size = 2. Previously we only got class names of these modules: ``` InferenceEngine( (module): LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): LinearLayer() (k_proj): LinearLayer() (v_proj): LinearLayer() (o_proj): LinearAllreduce() (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): LinearLayer() (up_proj): LinearLayer() (down_proj): LinearAllreduce() (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05) (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05) ) ) (norm): LlamaRMSNorm((4096,), eps=1e-05) (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): LmHeadLinearAllreduce() ) ) ``` Now we get more useful info: ``` InferenceEngine( (module): LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): LinearLayer(in_features=4096, out_features=2048, bias=False, dtype=torch.bfloat16) (k_proj): LinearLayer(in_features=4096, out_features=2048, bias=False, dtype=torch.bfloat16) (v_proj): LinearLayer(in_features=4096, out_features=2048, bias=False, dtype=torch.bfloat16) (o_proj): LinearAllreduce(in_features=2048, out_features=4096, bias=False, dtype=torch.bfloat16) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): LinearLayer(in_features=4096, out_features=5504, bias=False, dtype=torch.bfloat16) (up_proj): LinearLayer(in_features=4096, out_features=5504, bias=False, dtype=torch.bfloat16) (down_proj): LinearAllreduce(in_features=5504, out_features=4096, bias=False, dtype=torch.bfloat16) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05) (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05) ) ) (norm): LlamaRMSNorm((4096,), eps=1e-05) (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): LmHeadLinearAllreduce(in_features=2048, out_features=32000, bias=False, dtype=torch.bfloat16) ) ) ```

) **Summary** This PR adds `extra_repr` method to some Linear classes so that additional info is printed when printing such modules. It is useful for debugging. Affected modules: - LinearLayer - LinearAllreduce - LmHeadLinearAllreduce The `extra_repr` method gives the following info: - in_features - out_features - bias (true or false) - dtype **Example** Print llama-2-7b model on rank 0 after `init_inference` with world size = 2. Previously we only got class names of these modules: ``` InferenceEngine( (module): LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): LinearLayer() (k_proj): LinearLayer() (v_proj): LinearLayer() (o_proj): LinearAllreduce() (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): LinearLayer() (up_proj): LinearLayer() (down_proj): LinearAllreduce() (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05) (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05) ) ) (norm): LlamaRMSNorm((4096,), eps=1e-05) (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): LmHeadLinearAllreduce() ) ) ``` Now we get more useful info: ``` InferenceEngine( (module): LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): LinearLayer(in_features=4096, out_features=2048, bias=False, dtype=torch.bfloat16) (k_proj): LinearLayer(in_features=4096, out_features=2048, bias=False, dtype=torch.bfloat16) (v_proj): LinearLayer(in_features=4096, out_features=2048, bias=False, dtype=torch.bfloat16) (o_proj): LinearAllreduce(in_features=2048, out_features=4096, bias=False, dtype=torch.bfloat16) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): LinearLayer(in_features=4096, out_features=5504, bias=False, dtype=torch.bfloat16) (up_proj): LinearLayer(in_features=4096, out_features=5504, bias=False, dtype=torch.bfloat16) (down_proj): LinearAllreduce(in_features=5504, out_features=4096, bias=False, dtype=torch.bfloat16) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05) (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05) ) ) (norm): LlamaRMSNorm((4096,), eps=1e-05) (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): LmHeadLinearAllreduce(in_features=2048, out_features=32000, bias=False, dtype=torch.bfloat16) ) ) ``` Signed-off-by: gyou2021 <ganmei.you@intel.com>

) **Summary** This PR adds `extra_repr` method to some Linear classes so that additional info is printed when printing such modules. It is useful for debugging. Affected modules: - LinearLayer - LinearAllreduce - LmHeadLinearAllreduce The `extra_repr` method gives the following info: - in_features - out_features - bias (true or false) - dtype **Example** Print llama-2-7b model on rank 0 after `init_inference` with world size = 2. Previously we only got class names of these modules: ``` InferenceEngine( (module): LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): LinearLayer() (k_proj): LinearLayer() (v_proj): LinearLayer() (o_proj): LinearAllreduce() (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): LinearLayer() (up_proj): LinearLayer() (down_proj): LinearAllreduce() (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05) (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05) ) ) (norm): LlamaRMSNorm((4096,), eps=1e-05) (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): LmHeadLinearAllreduce() ) ) ``` Now we get more useful info: ``` InferenceEngine( (module): LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): LinearLayer(in_features=4096, out_features=2048, bias=False, dtype=torch.bfloat16) (k_proj): LinearLayer(in_features=4096, out_features=2048, bias=False, dtype=torch.bfloat16) (v_proj): LinearLayer(in_features=4096, out_features=2048, bias=False, dtype=torch.bfloat16) (o_proj): LinearAllreduce(in_features=2048, out_features=4096, bias=False, dtype=torch.bfloat16) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): LinearLayer(in_features=4096, out_features=5504, bias=False, dtype=torch.bfloat16) (up_proj): LinearLayer(in_features=4096, out_features=5504, bias=False, dtype=torch.bfloat16) (down_proj): LinearAllreduce(in_features=5504, out_features=4096, bias=False, dtype=torch.bfloat16) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05) (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05) ) ) (norm): LlamaRMSNorm((4096,), eps=1e-05) (rotary_emb): LlamaRotaryEmbedding() ) (lm_head): LmHeadLinearAllreduce(in_features=2048, out_features=32000, bias=False, dtype=torch.bfloat16) ) ) ``` Signed-off-by: yisheng <yi.sheng@intel.com>

Add extra_repr to Linear classes for debugging purpose

99ad0d2

Xia-Weiwen requested review from hwchen2017 and loadams as code owners January 16, 2025 03:19

sangshanrupesh mentioned this pull request Jan 16, 2025

**Summary** mozilla-mobile/firefox-ios#24169

Closed

loadams approved these changes Jan 16, 2025

View reviewed changes

loadams enabled auto-merge January 16, 2025 16:40

loadams added this pull request to the merge queue Jan 16, 2025

Merged via the queue into deepspeedai:master with commit 018ece5 Jan 16, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add extra_repr to Linear classes for debugging purpose #6954

Add extra_repr to Linear classes for debugging purpose #6954

Xia-Weiwen commented Jan 16, 2025

Xia-Weiwen commented Jan 16, 2025

Xia-Weiwen commented Jan 16, 2025

Xia-Weiwen commented Jan 16, 2025

Xia-Weiwen commented Jan 17, 2025

Add extra_repr to Linear classes for debugging purpose #6954

Add extra_repr to Linear classes for debugging purpose #6954

Conversation

Xia-Weiwen commented Jan 16, 2025

Xia-Weiwen commented Jan 16, 2025

Xia-Weiwen commented Jan 16, 2025

Xia-Weiwen commented Jan 16, 2025

Xia-Weiwen commented Jan 17, 2025