Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestSecurePairingSession fails on linux_x64_gcc_mbedtls, but only on some machines #3348

Closed
mspang opened this issue Oct 20, 2020 · 5 comments · Fixed by #3357 or #3405
Closed

TestSecurePairingSession fails on linux_x64_gcc_mbedtls, but only on some machines #3348

mspang opened this issue Oct 20, 2020 · 5 comments · Fixed by #3357 or #3405
Assignees
Labels
bug Something isn't working

Comments

@mspang
Copy link
Contributor

mspang commented Oct 20, 2020

Looks like it fails post main in static dtor:

yuif:27585:~/connectedhomeip% gdb out/debug/linux_x64_gcc_mbedtls/tests/TestSecurePairingSession
GNU gdb (Debian 9.2-1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from out/debug/linux_x64_gcc_mbedtls/tests/TestSecurePairingSession...
(gdb) r
Starting program: /home/spang/connectedhomeip/out/debug/linux_x64_gcc_mbedtls/tests/TestSecurePairingSession 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
'#0:','Test-CHIP-SecurePairing'
'#3:','WaitInit ','PASSED'
'#3:','Start    ','PASSED'
'#3:','Handshake','PASSED'
'#3:','Serialize','PASSED'
'#6:','0','4'
'#7:','0','29'

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7bf9537 in __GI_abort () at abort.c:79
#2  0x0000555555597097 in chip::ReferenceCounted<chip::SecurePairingSessionDelegate, chip::DeleteDeletor<chip::SecurePairingSessionDelegate> >::Release (this=0x7fffffffd928) at ../../src/lib/core/ReferenceCounted.h:67
#3  0x0000555555595a15 in chip::SecurePairingSession::~SecurePairingSession (this=0x5555556180a0 <gTestPairingSession1>, __in_chrg=<optimized out>) at ../../src/transport/SecurePairingSession.cpp:54
#4  0x00007ffff7c125a7 in __run_exit_handlers (status=0, listp=0x7ffff7d92718 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#5  0x00007ffff7c1274a in __GI_exit (status=<optimized out>) at exit.c:139
#6  0x00007ffff7bfacd1 in __libc_start_main (main=0x55555555b535 <main()>, argc=1, argv=0x7fffffffdc48, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdc38) at ../csu/libc-start.c:342
#7  0x000055555555b47a in _start ()

@pan-apple

@issue-label-bot issue-label-bot bot added the bug Something isn't working label Oct 20, 2020
@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.96. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@pan-apple
Copy link
Contributor

@mspang does it consistently fail on some machines? Do we have docker container where I can repro and test the fix?

@mspang
Copy link
Contributor Author

mspang commented Oct 20, 2020

It fails 100% of the time on one of my local linux machines, and 0% on another linux machine and my macbook.

Doesn't seem to fail on GitHub actions either.

@pan-apple
Copy link
Contributor

I'll assign it to myself. I'll dig a bit more later today and tomorrow.
@mspang , if I give you a patch, would you be able to try it on your local machine where it consistently fails?

@pan-apple pan-apple self-assigned this Oct 20, 2020
mspang added a commit to mspang/connectedhomeip that referenced this issue Oct 21, 2020
This needs a re-think; there are several problems currently:

- refcounting of objects with static storage duration
- refcounting of objects with automatic storage duration (stack objects)
- refcounting of subobjects
  - including multiple base class subobjects (diamond inheritance)

The majority of current uses are in contexts where lifetime could not be
extended by reference counting and in general, the initial reference is
not dropped. The remaining case looks like a case of unique ownership.

If delegate interfaces really need to be refcounted, we'll need to ban
multiple inheritance of them (without resorting to virtual inheritance).

fixes project-chip#3348
@mspang mspang self-assigned this Oct 21, 2020
mspang added a commit to mspang/connectedhomeip that referenced this issue Oct 21, 2020
TestSecurePairingDelegate must live as long as its SecurePairingSession,
since the latter holds a pointer to the former.

Move SecurePairingSession onto the heap and move
TestSecurePairingDelegate to the main test function so its lifetime can
be ended after deleting SecurePairingSession.

fixes project-chip#3348
@mspang
Copy link
Contributor Author

mspang commented Oct 21, 2020

It's a use-after-free - the statically allocated SecurePairingSession tries to release a reference to a TestSecurePairingDelegate which was already de-allocated (it's on the stack, and we've already returned from main).

Accessing the refcount through the dangling pointer happens to read zero on the system that triggers the abort.

Fixed by #3357 (#3356 also makes the symptom go away, although we still have a dangling ptr - we just don't access it).

mspang added a commit to mspang/connectedhomeip that referenced this issue Oct 21, 2020
TestSecurePairingDelegate must live as long as its SecurePairingSession,
since the latter holds a pointer to the former.

Move SecurePairingSession onto the heap and move
TestSecurePairingDelegate to the main test function so its lifetime can
be ended after deleting SecurePairingSession.

fixes project-chip#3348
mspang added a commit that referenced this issue Oct 22, 2020
TestSecurePairingDelegate must live as long as its SecurePairingSession,
since the latter holds a pointer to the former.

Move SecurePairingSession onto the heap and move
TestSecurePairingDelegate to the main test function so its lifetime can
be ended after deleting SecurePairingSession.

fixes #3348
mspang added a commit that referenced this issue Oct 22, 2020
This needs a re-think; there are several problems currently:

- refcounting of objects with static storage duration
- refcounting of objects with automatic storage duration (stack objects)
- refcounting of subobjects
  - including multiple base class subobjects (diamond inheritance)

The majority of current uses are in contexts where lifetime could not be
extended by reference counting and in general, the initial reference is
not dropped. The remaining case looks like a case of unique ownership.

If delegate interfaces really need to be refcounted, we'll need to ban
multiple inheritance of them (without resorting to virtual inheritance).

fixes #3348
mspang added a commit to mspang/connectedhomeip that referenced this issue Oct 22, 2020
This needs a re-think; there are several problems currently:

- refcounting of objects with static storage duration
- refcounting of objects with automatic storage duration (stack objects)
- refcounting of subobjects
  - including multiple base class subobjects (diamond inheritance)

The majority of current uses are in contexts where lifetime could not be
extended by reference counting and in general, the initial reference is
not dropped. The remaining case looks like a case of unique ownership.

If delegate interfaces really need to be refcounted, we'll need to ban
multiple inheritance of them (without resorting to virtual inheritance).

Reland with some #include needed after project-chip#3397.

fixes project-chip#3348
@mspang mspang reopened this Oct 22, 2020
@mspang mspang closed this as completed Oct 23, 2020
kpschoedel pushed a commit to kpschoedel/connectedhomeip that referenced this issue Oct 23, 2020
TestSecurePairingDelegate must live as long as its SecurePairingSession,
since the latter holds a pointer to the former.

Move SecurePairingSession onto the heap and move
TestSecurePairingDelegate to the main test function so its lifetime can
be ended after deleting SecurePairingSession.

fixes project-chip#3348
BroderickCarlin pushed a commit that referenced this issue Oct 26, 2020
* Remove refcounting (reland of #3356)

This needs a re-think; there are several problems currently:

- refcounting of objects with static storage duration
- refcounting of objects with automatic storage duration (stack objects)
- refcounting of subobjects
  - including multiple base class subobjects (diamond inheritance)

The majority of current uses are in contexts where lifetime could not be
extended by reference counting and in general, the initial reference is
not dropped. The remaining case looks like a case of unique ownership.

If delegate interfaces really need to be refcounted, we'll need to ban
multiple inheritance of them (without resorting to virtual inheritance).

Reland with some #include needed after #3397.

fixes #3348

* Add missing #include <CHIPMem.h>

* Add more missing includes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants