Replies: 5 comments
-
강의요약 (Transformer Design Variants)
|
Beta Was this translation helpful? Give feedback.
-
새롭게 알게 된 내용
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Grouped-Query Attention (GQA)Grouped-Query Attention은 기본 Multi-Head Attention에서 Query 헤드 수와 Key/Value 헤드 수를 다르게 하여 메모리 사용과 연산 효율을 개선하는 기법입니다. 1. 기본 개념일반적인 Multi-Head Attention에서는, 입력 (x)로부터 다음과 같이 2. 파라미터 재구성 (Parameter Re-mapping)
3. 추론 그래프(연산 순서)의 변화 없음
4. 재학습 없이도 가능한 이유
|
Beta Was this translation helpful? Give feedback.
-
극좌표계: https://ko.wikipedia.org/wiki/%EA%B7%B9%EC%A2%8C%ED%91%9C%EA%B3%84 |
Beta Was this translation helpful? Give feedback.
-
T5, BERT, GPT
KV cache optimization
등등..
Beta Was this translation helpful? Give feedback.
All reactions