Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Typing] Paddle 的 CI 中引入 mypy 对于 API 中 docstring 的示例代码的类型检查 #63901

Merged
merged 39 commits into from
May 31, 2024

Conversation

megemini
Copy link
Contributor

@megemini megemini commented Apr 26, 2024

PR Category

User Experience

PR Types

Others

Description

关联 PR : #63597

任务:1-4

在 CI 中引入 mypy 对 docstring 中示例代码的检查。与 xdoctest 的检查项共用一个 CI: PR-CI-Static-Check。如果后续有必要的话可以单独开一个 CI 。

涉及文件:

  • paddle/scripts/paddle_build.sh 添加 mypy 检查入口
  • python/unittest_py/requirements.txt 添加 mypy 的依赖,这里用最新的 1.10 版本
  • tools/mypy.ini 配置文件,后面应该会不断更新
  • tools/type_hints.py 使用 mypy 的 api 进行类型检查
  • tools/test_type_hints.py 针对 type_hinst.py 的单测
  • tools/sampcd_processor_utils.py 中的 get_api_md5 判断 api diff 时,考虑 signature 的变动
  • tools/test_sampcd_processor.py 修改单测,包括 get_api_md5 与 适应 0D 后的 xdoctest 检查

用于 type hint 检查的文件:

  • python/paddle/py.typed
  • python/setup.py.in
  • setup.py

另外,修改一个文件用于验证:

  • python/paddle/tensor/math.py

Copy link

paddle-bot bot commented Apr 26, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@megemini
Copy link
Contributor Author

megemini commented May 5, 2024

Update 20240505

将 mypy 的检查引入 PR-CI-Static-Check,通过修改 api 的 annotation 可以触发 mypy 检查 ~

这里需要

  • python/paddle/py.typed
  • python/setup.py.in
  • setup.py`

配合检查 ~

math.py 中修改两个 api

  • scale,只修改了 return 的 annotation,ci 中检测到 api 变动,示例代码由于不涉及此 api 的输出检查,因此,此 api 的示例代码通过 mypy 检查
  • stanh,为了验证,修改了输入参数,原本是 float ,改为 int ,ci 检查出此 api 变动,并且 mypy 的示例代码检查报错

另外,tools/mypy.ini 还只是最基础的配置,后面应该需要不断更新 ~

@SigureMo 请评审 ~

Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其余暂时没什么大问题,等 CI 跑完我看下效果~

另外有一个小问题,我们现在是复用了示例代码的增量检测逻辑对吧(即 API change)

那么有一个问题就是未来升级 mypy 的话,如果 mypy 出现了一些变动导致现有示例类型检查挂掉的话,是很难全部检查到的,有没有办法一次性全检查到呢?因为我们之前引入过各种各样的 linter,升级时出现的一些新的 error 基本是不可避免的

一些个人想法:

在 CI 标题里加 [typing_checking=all](后缀形式),则检查全部示例文档

这个 PR 我们只讨论方案,具体实现不在本 PR~

另外有一个问题就是,现在这样的话,对于其他不知情的同学来说,在一个示例代码里改了一个 typo CI 是会通过的吗?是否会导致现存的示例代码就通不过呢?

@@ -3525,6 +3525,26 @@ function exec_samplecode_test() {
fi
}

function exec_type_hints() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

唔,这里检查的地方用 type check 吧,比如 exec_type_checking

Comment on lines 339 to 341
patArgSpec = re.compile(
r'^(paddle[^,]+)\s+\(ArgSpec.*document\W*([0-9a-z]{32})'
r'^(paddle[^,]+)\s+\((ArgSpec.*),.*document\W*([0-9a-z]{32})'
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pat_arg_spec

或者使用 PATTERN_ARG_SPEC 放在文件开始作为常量吧

@@ -2046,7 +2050,7 @@ def test_timeout(self):

def test_bad_statements(self):
docstrings_to_test = {
'bad_fluid': """
'good_fluid': """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fluid 检查和单测是不是可以退场了?如果使用了 fluid 单测执行时就会挂掉

可以之后单独 PR 退场一下~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同样,这个文件可以改成 type_checker.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下个 PR 可以考虑将这些(含之前的 sample code)移到 test 目录,建一个 test/tools 子目录,不然这些单测没有 CI 的保护很容易失效的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的!之前就想过这个问题,tools 里面的单测得统一管理!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

添加任务 #63597 任务 4-3

@megemini megemini requested a review from gouzil as a code owner May 10, 2024 05:52
TITLE_CHECK_ALL=`curl -s https://github.com/PaddlePaddle/Paddle/pull/${GIT_PR_ID} | grep "<title>" | grep -i "type checking all" || true`

if [[ ${TITLE_CHECK_ALL} ]]; then
python type_checking.py --full-test; type_checking_error=$?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

关于这块的mypy_cache有个小问题,如果我根据 shell 脚本来跑,这个路径将会是在tools/.mypy_cache, 而不是${PADDLE_ROOT}/.mypy_cache。是故意这么设计的嘛emmm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个有什么影响吗? pyproject.toml 可以配置

[tool.mypy]
python_version = "3.8"
cache_dir = ".mypy_cache"

另外,刚好有个问题请教一下!

我这里 paddle_build.sh 中的 run_type_checking 参考的 check_run_sot_ci

    # use "git commit -m 'message, test=sot'" to force ci to run
    COMMIT_RUN_CI=$(git log -1 --pretty=format:"%s" | grep -w "test=sot" || true)
    # check pr title
    TITLE_RUN_CI=$(curl -s https://github.com/PaddlePaddle/Paddle/pull/${GIT_PR_ID} | grep "<title>" | grep -i "sot" || true)
    if [[ ${COMMIT_RUN_CI} || ${TITLE_RUN_CI} ]]; then
        set -x
        return
    fi

但是,$(git log -1 --pretty=format:"%s" | grep -w "test=type_checking" || true) 没有效果,所以去掉了,只使用了 title 的判断条件 ~

我本地 $(git log -1 --pretty=format:"%s" | grep -w "test=type_checking" || true) 没问题 ~

这是咋回事儿?有啥办法?

Copy link
Member

@gouzil gouzil May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个有什么影响吗? pyproject.toml 可以配置

[tool.mypy]
python_version = "3.8"
cache_dir = ".mypy_cache"

影响就是,这样就会生成和维护两份缓存了

mypy python/paddle/tensor/math.py # cache 位置: ${PADDLE_ROOT}/.mypy_cache

cd tools/
python type_checking.py --full-test # cache 位置: ${PADDLE_ROOT}/tools/.mypy_cache

另外,刚好有个问题请教一下!

可以把git log -1 改成 git log -10或者更大,因为 ci 在运行之前会 git pull upstream develop 所以导致没有命中我们想要的 commit (sot那个我自己来修吧, 感谢大佬发现的问题)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

影响就是,这样就会生成和维护两份缓存了

😅 我草率了 ... ... 当时没想这么多 ... ...

那我在 type_checking.py 里面改为绝对路径吧 ~

感谢!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gouzil 已修改 ~

@megemini
Copy link
Contributor Author

Update 20240510

  • mypy 的配置放到 pyproject.toml

  • mypy 的 cache dir 使用绝对路径

  • 去掉 --debug 参数,减少日志输出(不过,由于 CI 日志的特殊性,仍然会输出两次 error 日志)

  • 根据 PR 的 title 判断是否进行 type checking 或者 type checking all
    参考 @gouzil 的实现方式,但是没判断 git log (有点不稳定),只判断 title ~

    • title 中有 type checking ,则检查 diff 的 api
    • title 中有 type checking all,则检查全部 api
    • title 中没有以上两者,则不检查

    另外,paddle_build.sh 中的 run_type_checking 以及 if [[ ${run_tc} -eq 0 ]]; then 在标注任务结束后就可以去掉了 ~

          run_type_checking
          run_tc=$?
    
          if [[ ${run_tc} -eq 0 ]]; then
              { type_checking_info=$(exec_type_checking 2>&1 1>&3 3>/dev/null); } 3>&1
              type_checking_code=$?
          fi
    
          summary_check_example_code_problems $[${example_code_gpu} + ${example_code}] "${example_info_gpu}\n${example_info}"
    
          if [[ ${run_tc} -eq 0 ]]; then            
              summary_type_checking_problems $type_checking_code "$type_checking_info"
          fi
  • # type: ignore 可以用在 codeblock 最开始,或者中间行(单独一行),以规避 codeblock 整体的检查
    mypy 本身支持 # type: ignore 用在文件开头从而避免文件被检查,但是 doctest.DocTestParser().get_examples 会把单行注释给删除掉,所以我这里单独做了处理,并且支持在 codeblock 中间插入单行 # type: ignore,此行之后的也会规避检查

                      >>> # type: ignore
                      >>> # doctest: -REQUIRES(env:GPU)
                      >>> blabla
                      >>> print(1-1)
                      0
  • 其他修改,如修改文件名等

另外,目前这个 PR-CI-Static-Check 是做的全量检查,日志可以保存一下以供参考 ~

@SigureMo 请评审 ~

Copy link

paddle-ci-bot bot commented May 19, 2024

Sorry to inform you that e7be07d's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Copy link

paddle-ci-bot bot commented May 28, 2024

Sorry to inform you that 618c3b9's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@@ -4536,7 +4634,20 @@ function main() {
api_example)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 api_example 应该是很久以前的 CI 流水线入口,我们现在检查示例代码在 Static-Check 流水线,对应上面 build_and_check_cpubuild_and_check_gpu,这个入口没用了,直接删掉就好


if [[ ${run_tc} -eq 0 ]]; then
summary_type_checking_problems $type_checking_code "$type_checking_info"
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

示例代码和类型检查是否可以分别抽一个函数呢?CI 入口一般只做分发,不适合写太多逻辑~

summary_check_problems $[${example_code_gpu} + ${example_code}] "${example_info_gpu}\n${example_info}"

run_type_checking
run_tc=$?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run_tc=$?
type_checking_status=$?

感觉这样语义会更明确些

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

啊,不对,我才发现是上面的 run_type_checking 的语义不对,上面的 run_type_checking 乍一看是已经开始跑类型检查了,但实际上只是检查了标题,是有点奇怪的,可以改成 check_need_type_checking 其结果为 need_type_checking

config_file=(
args.config_file
if args.config_file
else (base_path + '/../pyproject.toml')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议使用 os.path.join,或者使用 pathlib,避免直接字符串拼接

set +x

# check pr title
TITLE_CHECK=`curl -s https://github.com/PaddlePaddle/Paddle/pull/${GIT_PR_ID} | grep "<title>" | grep -i "type checking" || true`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TITLE_CHECK=`curl -s https://github.com/PaddlePaddle/Paddle/pull/${GIT_PR_ID} | grep "<title>" | grep -i "type checking" || true`
TITLE_CHECK=`curl -s https://github.com/PaddlePaddle/Paddle/pull/${GIT_PR_ID} | grep "<title>" | grep -iE "type checking|typing" || true`

加一个 typing 吧,我比较喜欢用 [Typing] 前缀

@megemini
Copy link
Contributor Author

megemini commented May 29, 2024

Update 20240529

  • paddle_build.sh 中将 exec_samplecode_testexec_type_checking 统一放到新增的 exec_samplecode_checking
  • 检查标题统一改为 typingtyping all
  • 修改 type_checking.py 中的路径表示

另外:

示例代码和类型检查是否可以分别抽一个函数呢?CI 入口一般只做分发,不适合写太多逻辑~
啊,不对,我才发现是上面的 run_type_checking 的语义不对,上面的 run_type_checking 乍一看是已经开始跑类型检查了,但实际上只是检查了标题,是有点奇怪的,可以改成 check_need_type_checking 其结果为 need_type_checking

这两个问题我一起回复一下 :这里是临时的逻辑,后面会去掉 need_type_checking,也就是默认进行检查,所以这里的逻辑复杂了点 ~

@megemini megemini changed the title 【Type Checking All】Paddle 的 CI 中引入 mypy 对于 API 中 docstring 的示例代码的类型检查 【Typing All】Paddle 的 CI 中引入 mypy 对于 API 中 docstring 的示例代码的类型检查 May 29, 2024
@SigureMo
Copy link
Member

image

加速效果这么明显的?2 小时直接减少到 10 分钟跑全量?

@megemini megemini changed the title 【Typing All】Paddle 的 CI 中引入 mypy 对于 API 中 docstring 的示例代码的类型检查 【Typing】Paddle 的 CI 中引入 mypy 对于 API 中 docstring 的示例代码的类型检查 May 31, 2024
@SigureMo SigureMo changed the title 【Typing】Paddle 的 CI 中引入 mypy 对于 API 中 docstring 的示例代码的类型检查 Paddle 的 CI 中引入 mypy 对于 API 中 docstring 的示例代码的类型检查 May 31, 2024
@@ -322,7 +327,7 @@ def scale(x, scale=1.0, bias=0.0, bias_after_scale=True, act=None, name=None):
return helper.append_activation(out)


def stanh(x, scale_a=0.67, scale_b=1.7159, name=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

这里正常报错,没问题

我删掉了标题里的 typing,再触发一次看看是否可以按照预期没有检查

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

类型检查跳过了,符合预期~

这里可以恢复下了~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

话说这里能做到只修改类型提示的情况下,由我来 review 么?不然不利于后续 review 推进

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经确认,可以在下个 PR 这么搞下

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SigureMo SigureMo changed the title Paddle 的 CI 中引入 mypy 对于 API 中 docstring 的示例代码的类型检查 [Typing] Paddle 的 CI 中引入 mypy 对于 API 中 docstring 的示例代码的类型检查 May 31, 2024
Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMeow 🐾

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SigureMo SigureMo merged commit 949d885 into PaddlePaddle:develop May 31, 2024
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants