Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer的实现是否可以考虑输出attention_weights? #44056

Closed
holyseven opened this issue Jul 4, 2022 · 3 comments
Closed

Transformer的实现是否可以考虑输出attention_weights? #44056

holyseven opened this issue Jul 4, 2022 · 3 comments
Assignees
Labels

Comments

@holyseven
Copy link

需求描述 Feature Description

需求描述

对于高阶开发者,有获取Transformers中间层的attention weights(即shape=[batch_size, num_heads, query_length, key_length])来进行模型设计和分析实验的需求。

具体场景

  • EHealth中SPO任务需要获取Electra中间层结果来计算loss
  • 使用Transformer-based模型时需要中间层attention和结果来进行分析和蒸馏
  • 解释Transformer模型时,attention weights和其对应的梯度,都是非常重要的中间结果

目前Paddle的实现

目前是在MultiHeadAttentionforward()函数中,调用paddle.nn.functional的方法。

https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/nn/layer/transformer.py#L420

因为使用了paddle.nn.functional的方法(不确定是否会减少内存消耗?),用hook的方法也无法得到中间变量的结果。

虽然在MultiHeadAttention中,可以修改need_weights参数,输出weights。但是对于所有已经构建好的模型(尤其是Ernie模型),改变这个参数,需要重新调整模型每一层的输入输出,较为繁琐。

该需求在PaddleNLP的issue里面也有提到过,但感觉这和paddle.nn.layer.transformer更相关。

替代实现 Alternatives

一个简单的方法,就是不使用paddle.nn.functional的方法,而是使用Layer的模块。这样,通过hook的方法,在后期也能很快地得到。而且也完全兼容已有代码和模型。

@paddle-bot-old
Copy link

paddle-bot-old bot commented Jul 4, 2022

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

@guoshengCS
Copy link
Contributor

guoshengCS commented Jul 6, 2022

你好,我们已经开始在 PaddleNLP 的层面做了,这个是我们重点解决的问题 PaddlePaddle/PaddleNLP#2665 。因为里面会引入ModelOutput这样一些特殊的数据结构,和框架中其他layer的规范会不太一样,所以先行在PaddleNLP中解决

@paddle-bot paddle-bot bot closed this as completed Jul 11, 2023
@paddle-bot
Copy link

paddle-bot bot commented Jul 11, 2023

Since you haven't replied for more than a year, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您超过一年未回复,我们将关闭这个issue/pr。
若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants