-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon 6th No.10】Add isposinf / isneginf / isreal / isin API to Paddle #834
Conversation
PyTorch 中的 torch.isposinf API文档 (https://pytorch.org/docs/stable/generated/torch.isposinf.html#torch-isposinf) | ||
PyTorch 中的 torch.isneginf API文档 (https://pytorch.org/docs/stable/generated/torch.isneginf.html#torch-isneginf) | ||
PyTorch 中的 torch.isreal API文档 (https://pytorch.org/docs/stable/generated/torch.isreal.html#torch-isreal) | ||
PyTorch 中的 torch.isin API文档 (https://pytorch.org/docs/stable/generated/torch.isin.html#torch-isin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
|
||
## API实现方案 | ||
1. paddle.isposinf | ||
利用 paddle.isinf 与 paddle.signbit 组合实现 **(目前 paddle.signbit 中调用了 Tensor.numpy() 只能用于动态图,若需 paddle.isposinf 也能用于静态图,需要升级 paddle.signbit)** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要升级 paddle.signbit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
利用Tensor数据类型判断和 paddle.imag 实现 | ||
|
||
4. paddle.isin | ||
参考 pytorch 在 _decompose 中的设计:当test_elements元素个数较少时直接进行暴力搜索,较多时则采取基于排序的算法(利用 flatten,concat,index_put_,searchsorted等API组合实现)。暂时去掉 assume_unique 参数,因为**当前 paddle 的 argsort kernel 使用的是 std::sort 的不稳定排序,与 pytorch 和 numpy 的结果就会存在差异。若后期需要加 assume_unique 参数并用 argsort 实现 isin,则需要先实现 stable 的 argsort。** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
则需要先实现 stable 的 argsort
这一块工作量有多大?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我可以先试试看,我看 argsort 的 kernel 里面是用的 std::sort ,可以给 kernel 加个参数 stable
默认false,然后等于 true 的时候把 std::sort 换成 std::stable_sort 。这样的话就需要把用到 argsort kernel 相关的 python api 都要稍微改一下,包括 paddle.sort 和 paddle.argsort 。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这样的话就需要把用到 argsort kernel 相关的 python api 都要稍微改一下,包括 paddle.sort 和 paddle.argsort 。
好的,等你试完后再评估下,应该不影响原有paddle.sort和paddle.argsort的行为
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NKNaN 在API易用性升级工作中,也有一个sort/argsort支持stable稳定排序的工作,先做这个API升级的工作吧,然后再做is_in API新增,你可以在 先假定已经满足了你计划实现的 sort/argsort支持稳定排序这个前提下,来编写这篇设计文档。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NKNaN 可以先把其他三个API的PR弄好,merge。等【sort/argsort支持stable稳定排序的工作】完成后,再开始isin的工作
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 正确性验证:可以与 NumPy 的结果对齐; | ||
- 不同 shape; | ||
- 前向计算; | ||
- 计算dtype类型:验证 `float64`,`int32`等; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要写明每个API都支持哪些数据类型,paddle.isreal
下同
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
- 正确性验证:可以与 NumPy 的结果对齐; | ||
- 不同 shape; | ||
- 前向计算; | ||
- 计算dtype类型:验证 `float32`,`float64`,`int32`,`int64`(paddle.signbit 在 CPU 上的 kernel 没有注册 `float16`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle.isposinf,paddle.isneginf 这里的数据类型支持的不够全,是因为什么原因呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个需要用 isinf 和 signbit 组合实现。isinf 目前支持的是 float16、float32、float64、int32、int64,然后 signbit 目前调用了 paddle.sign,写测试案例的时候发现 sign kernel 在 CPU 上没有注册 float16 的数据类型。
如果要升级 signbit 的话可能是得再写个 signbit 的 kernel 了,可以绕过 paddle.sign。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
写测试案例的时候发现 sign kernel 在 CPU 上没有注册 float16 的数据类型。
这个没问题,不需要再写个 signbit 的 kernel 了绕过 paddle.sign。sign的数据类型是上次黑客松的时候评估过的。
isinf 目前支持的是 float16、float32、float64、int32、int64
isinf 可以扩展数据类型么。paddle.signbit支持 int8,int16,int32,int64,float16,float32 或 float64。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isinf 可以扩展数据类型么。paddle.signbit支持 int8,int16,int32,int64,float16,float32 或 float64。
可以的,isinf kernel 注册的时候加一下 uint8,int8,int16 就行
这个没问题,不需要再写个 signbit 的 kernel 了绕过 paddle.sign。sign的数据类型是上次黑客松的时候评估过的。
好的,那 signbit 对静态图的兼容就用 paddle.static.nn.py_func 可以吗,因为现在最核心的部分用的是 np.copysign ,要把 Tensor 转换成 numpy 来处理
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以的,可以提个PR上来看下
利用Tensor数据类型判断和 paddle.imag 实现 | ||
|
||
4. paddle.isin | ||
参考 pytorch 在 _decompose 中的设计:当test_elements元素个数较少时直接进行暴力搜索,较多时则采取基于排序的算法(利用 flatten,concat,index_put_,searchsorted等API组合实现)。暂时去掉 assume_unique 参数,因为**当前 paddle 的 argsort kernel 使用的是 std::sort 的不稳定排序,与 pytorch 和 numpy 的结果就会存在差异。若后期需要加 assume_unique 参数并用 argsort 实现 isin,则需要先实现 stable 的 argsort。** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这样的话就需要把用到 argsort kernel 相关的 python api 都要稍微改一下,包括 paddle.sort 和 paddle.argsort 。
好的,等你试完后再评估下,应该不影响原有paddle.sort和paddle.argsort的行为
|
||
## API实现方案 | ||
1. paddle.isposinf | ||
利用 paddle.isinf 与 paddle.signbit 组合实现。**(目前 paddle.signbit 中调用了 Tensor.numpy() 只能用于动态图,需先升级 paddle.signbit 使其也能用于静态图)** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以单独提PR来升级paddle.signbit
使其也能用于静态图
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
argsort的kernel目前应该是cpu下是用的不稳定排序算法,gpu下是用的基数排序,应该是稳定的,这个还需要改吗,如果确定要改的话,修改的地方就是在cpu下加一个稳定排序的算法,然后在gpu下加一个不稳定排序的算法,但是这样改可能会影响现有的paddle.sort和paddle.argsort,因为现有的这个两排序都不区分稳不稳定,在cpu下是不稳定的,在gpu下是稳定的 |
(gpu的kernel排序也不是完全稳定的,根据输入的形状有一种情况会调用thrust::sort_by_key,也是不稳定的,其他情况都是调用的基数排序) |
@NKNaN 要不你把前三个RFC单独提一个PR,先合入?这样review PR的时候,保证已经有RFC在了。 |
好的 |
新增 paddle.isposinf,paddle.isneginf,paddle.isreal,paddle.isin API的设计文档