Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

z2m crashing when (re?)joining device #21118

Closed
adynis opened this issue Jan 31, 2024 · 11 comments
Closed

z2m crashing when (re?)joining device #21118

adynis opened this issue Jan 31, 2024 · 11 comments
Labels
ezsp Issues related to deprecated ezsp driver problem Something isn't working

Comments

@adynis
Copy link

adynis commented Jan 31, 2024

What happened?

Recently I started to have lots of issues with my zigbee network (missed commands and big delays which affects also other automations). Trying to "put my finger" on something more precise, I focused on one particular case: After I've seen that my aqara switch (WS-EUK04) was not reacting anymore, I tried to re-join it to zigbee2mqtt, and after many trials and issues I understood that actually this makes my zigbee2mqtt to restart !!
So I tried to replicate it and capture somehow the logs.
Here are the details:

  • I'm using HomeAssistant on a HomeAssistantYellow device, zigbee2mqtt addon 1.35.1-1, enabled "zigbee_herdsman_debug" .
  • I enabled "Permit Join (all)"
  • Then on the Aqara wall-switch I kept it's button pressed to force it to re-join the network
  • As a result I've seen tons of red notification messages in zigbee2mqtt frontend saying that device it's leaving the network
  • Then a notification that device was identified, and trying configuration
  • and then ... some many seconds with nothing (actually here it restarted) ,
  • and then : Failed to configure (Sometimes I can see before also a notification: Zigbee2mqtt has started )
  • Shortly, from the log, I see as last line before stopping: "Error: {"address":28623,"clusterId":32773,"sequence":13} after 10000ms"
  • later edit : 2h later i observe that after all my previous tests with one switch.... other 2 devices (although appearing as connected) they were not communicating anymore... i had to rejoin from multiple trials...

If I try multiple times, finally I can make it to properly configure it ... but ... this is something strange which maybe/hopefully explains also my other issues.

I tried to get the logs:

  • Since it was restarting, I was not sure how to get them from HomeAssistant because I was loosing the old log before restarting
  • I used the command: "docker logs --follow CONTAINER_ID 2>&1 | tee log.txt"
  • I attach here this log, hopefully it helps

Please help since the WAF is starting to decrease 😭

What did you expect to happen?

No response

How to reproduce it (minimal and precise)

No response

Zigbee2MQTT version

1.35.1-1

Adapter firmware version

6.10.3.0 build 297

Adapter

EZSP v8

Setup

HomeAssistant on HomeAssistantYellow

Debug log

log_with_crash.txt

@adynis adynis added the problem Something isn't working label Jan 31, 2024
@Koenkk Koenkk added the ezsp Issues related to deprecated ezsp driver label Jan 31, 2024
@adynis
Copy link
Author

adynis commented Jan 31, 2024

Just to explain myself better: WAF == "Wife Acceptance Factor" 🙃

@Koenkk
Copy link
Owner

Koenkk commented Feb 1, 2024

Check with 1.35.2, it contains some EZSP related fixes

@adynis
Copy link
Author

adynis commented Feb 2, 2024

Thanks @Koenkk , I updated to 1.35.2 and I strongly belive that some improvement exists !!
The good part was that I was able to immediately re-join the aqara (from 1st trial after removing it) which was extremely improbable to happen last days. I need to test more, but ... right now after ~15mins, it looks better, I can't find any abnormal delay in sending actions. (e.g.: press the button and 1 out of 4 lights turns on after some seconds ... ) .

The "improvable" part is that ... I was still able to trigger a restart when I tried to re-join an aqara relay. Fortunatelly I logged this, so ... here is the last part of the log:
log_with_adapter_disconnected.txt

One "teaser" from the log:

2024-02-02T03:39:54.862Z zigbee-herdsman:adapter:ezsp:uart Port closed. Error? null
2024-02-02T03:39:54.863Z zigbee-herdsman:adapter:ezsp:ezsp onSerialClose()
2024-02-02T03:39:54.863Z zigbee-herdsman:adapter:ezsp:driv onEzspClose()
2024-02-02T03:39:54.863Z zigbee-herdsman:adapter:ezsp:debg onDriverClose()
2024-02-02T03:39:54.863Z zigbee-herdsman:controller:log Adapter disconnected'
2024-02-02T03:39:54.864Z zigbee-herdsman:adapter:ezsp:driv Stopping driver
2024-02-02T03:39:54.864Z zigbee-herdsman:adapter:ezsp:ezsp Closing Ezsp
2024-02-02T03:39:54.864Z zigbee-herdsman:adapter:ezsp:uart Closing UART
2024-02-02T03:39:54.864Z zigbee-herdsman:adapter:ezsp:ezsp onSerialClose()
2024-02-02T03:39:54.864Z zigbee-herdsman:adapter:ezsp:driv onEzspClose()
2024-02-02T03:39:54.864Z zigbee-herdsman:adapter:ezsp:debg onDriverClose()
Zigbee2MQTT:error 2024-02-02 05:39:54: Adapter disconnected, stopping
Zigbee2MQTT:debug 2024-02-02 05:39:54: Saving state to file /config/zigbee2mqtt/state.json

What I did:
I enabled "Permit Join" , then I just press ~5 seconds on the aqara relay (SSM-U01) to for it exit the network and re-join, and then ... I've seen on the mobila somehing with red and shorter words ... and then the counting down of "Permit Join" stopped. Then I went to my pc, enabled the loging, and re-did the same, and fortunatelly I catched the log. The 3rd time it worked correctly without errors.

@defibrill
Copy link

I have the similar situation -- when i try to pair new device, it defines as unsupported, then i force delete device and when try to re-pair, Z2M crashes.
It started with the addition of 30 or so devices, but using dances with tambourins I was able to pair 20 more devices ))

BTW, how many devices in your network?

@adynis
Copy link
Author

adynis commented Feb 3, 2024

I have 72 zigbee devices now 😊 And ... from last days/weeks I fully resonate with the "dancing with tambourins" you mentioned :))) I had to do the same ... :-D

Related to this problem: If it helps to do additional tests, let me know !

PS. After 1 day I become quite convinced that 1.35.2 really helped my zigbee network. I had absolutly no strange behaviour anymore while "playing" with the lights in my house. Only joining/re-joining remains "on the table".

@Koenkk
Copy link
Owner

Koenkk commented Feb 3, 2024

Credits go to @Nerivec 😄

@adynis
Copy link
Author

adynis commented Feb 25, 2024

Today I added a new zigbee device. After that I observed accidentally that ... another one (a realay) was not reacting anymore, altough it appears as online, ok, but ... I can't communicate with it (and it's not easy accessible to re-join it). So I had an idea: Let's remove and re-join my wall switch (which is easily accessible) maybe it triggers any spirit somewhere . As a result: 9 logs with z2m crashes ... and the relay still doesn't work ... :(

What I was doing:
Mainly pressing 6 seconds on the wall switch to force it to re-join network ... (+ allow join, and a couple of times I "force removed it" + some other minor details which are not relevant, I suppose).
In all logs the "issue" it's exactly at the end of the file ...

On a quick look the errors are not really identical, I think, ... but here's a "teaser" :

2024-02-25T04:15:42.349Z zigbee-herdsman:adapter:ezsp:uart --> Error: Error: {"sequence":7} after 2000ms
2024-02-25T04:15:42.349Z zigbee-herdsman:adapter:ezsp:uart -!- break waiting (7)
2024-02-25T04:15:42.349Z zigbee-herdsman:adapter:ezsp:uart Can't send DATA frame (6,7,0): 4e00010500
2024-02-25T04:15:42.455Z zigbee-herdsman:adapter:ezsp:ezsp Time to watchdog ... 0
2024-02-25T04:15:42.455Z zigbee-herdsman:adapter:ezsp:ezsp ==> nop: null
2024-02-25T04:15:42.850Z zigbee-herdsman:adapter:ezsp:uart ->> DATA (6,7,1): 4e00010500
2024-02-25T04:15:42.850Z zigbee-herdsman:adapter:ezsp:uart --> [6f0c21a9512ac5717e]
2024-02-25T04:15:42.851Z zigbee-herdsman:adapter:ezsp:uart -?- rewaiting (7)
2024-02-25T04:15:44.424Z zigbee-herdsman:adapter:ezsp:uart <-- [260fb1a97d312a15b658964d24ab5593]
2024-02-25T04:15:44.426Z zigbee-herdsman:adapter:ezsp:uart <-- [499c6aef6566229874f1de2183fc7d5e]
2024-02-25T04:15:44.426Z zigbee-herdsman:adapter:ezsp:uart <-- [7d3a0901cdde6f8ffd193b7e]
2024-02-25T04:15:44.427Z zigbee-herdsman:adapter:ezsp:uart --> parsed 260fb1a9112a15b658964d24ab5593499c6aef6566229874f1de2183fc7e1a0901cdde6f8ffd193b7e
2024-02-25T04:15:44.427Z zigbee-herdsman:adapter:ezsp:uart <-- DATA (2,6,0): 260fb1a9112a15b658964d24ab5593499c6aef6566229874f1de2183fc7e1a0901cdde6f8ffd193b7e
2024-02-25T04:15:44.427Z zigbee-herdsman:adapter:ezsp:uart --> ACK  (3)
2024-02-25T04:15:44.427Z zigbee-herdsman:adapter:ezsp:uart --> [83401b7e]
2024-02-25T04:15:44.428Z zigbee-herdsman:adapter:ezsp:uart <-- ACK (6): 260fb1a9112a15b658964d24ab5593499c6aef6566229874f1de2183fc7e1a0901cdde6f8ffd193b7e
2024-02-25T04:15:44.428Z zigbee-herdsman:adapter:ezsp:uart Unexpected packet sequence 6 | 7
2024-02-25T04:15:44.852Z zigbee-herdsman:adapter:ezsp:uart --> Error: Error: {"sequence":7} after 2000ms
2024-02-25T04:15:44.853Z zigbee-herdsman:adapter:ezsp:uart -!- break rewaiting (7)
2024-02-25T04:15:44.853Z zigbee-herdsman:adapter:ezsp:uart Can't resend DATA frame (6,7,1): 4e00010500
2024-02-25T04:15:44.853Z zigbee-herdsman:adapter:ezsp:ezsp onSerialReset()
2024-02-25T04:15:44.853Z zigbee-herdsman:adapter:ezsp:driv onEzspReset()
2024-02-25T04:15:44.854Z zigbee-herdsman:adapter:ezsp:driv Reset connection.
2024-02-25T04:15:44.854Z zigbee-herdsman:adapter:ezsp:driv Stopping driver
2024-02-25T04:15:44.855Z zigbee-herdsman:adapter:ezsp:ezsp Closing Ezsp
2024-02-25T04:15:44.855Z zigbee-herdsman:adapter:ezsp:uart Closing UART

log12.txt
log13.txt
log14.txt
log15.txt
log16.txt
log17.txt
log18.txt
log19.txt
log20.txt

@maciekdnd
Copy link

maciekdnd commented Mar 2, 2024

Same here. I have Dongle-E 7.3.2, 40+ devices, Aqara H1, Aqara D1, 6 Moes thermostats. From time to time I have to rejoin switch or thermostat that is visible, online but not responding. When I try to reconnect, Z2M crashes. After several trials whole and many network crashes somehow I can repair device and whole network stabilize. Also, when I have permit join enabled, I have to mess with enabling and disabling timed join permit to adapt device again.

@defibrill
Copy link

I tried to rebuild the network on ZHA.
Everything is the same there, the situation is similar. The only thing that helped me to get stability of the network, was to reduce the number of routers to 30.
The remaining 20+ had to be transferred to another coordinator.
This problem only occurs with EFR32MG21, the other CC2652P coordinator is working fine.

@Nerivec
Copy link
Collaborator

Nerivec commented Mar 5, 2024

@defibrill Your issues sound only partially related to this joining problem. My first guess would be, you have one or more routers that don't play well with the EFR coordinator, it takes only one to break an entire part of the network unfortunately... There are people with 100+ devices on EFR without much trouble on either ezsp or ember drivers (highest I know of is 180!). If you are doing some testing and don't mind re-pairing a couple of times (if you don't want to stay on it after that), you can try ember, and check the logs every time you join a new device (start with the ones you know to be older). It will tell you during the interview if the device has an older stack revision; anything pre-21 is definitely expected to cause trouble.
Look for lines that start with [ZDO] Node descriptor for

PS: We'll make this more accessible in the near future.

@adynis
Copy link
Author

adynis commented Mar 7, 2024

I have ~1 week on ember. I think I had no crash anymore (or If there was ... it was ... neglecteble). In last days, because of other issue, I had to play again with join/re-join/etc and it just allowd me to test without issues on Z2M restarting. For testing reasons today, I moved back on ezsp in settings. Then tried to rejoin and ... IMMEDIATELY crashes !? one after the other. this made me convinced that ember is a good solution for resolving crashes (so this ticket could have as a workaround: "move to ember" :-p )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ezsp Issues related to deprecated ezsp driver problem Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants