-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write really does need a timeout #5
Comments
From my experience, the symptoms you identify usually point to a power supply issue. The RPi specifications list 50ma max current draw from the 3.3v pins, so if you are using a module (ie: LNA + PA) that draws more, it is likely you just don't have enough current for the radio module to operate, so it resets. As far as the infinite looping, I look at it this way:
In an extremely mission critical or high availability situation, I can see how having a timeout period would allow error checking and reporting of hardware failures, but again most users should never encounter the issue, and with an Arduino/RPi, I see it potentially causing more harm than good. |
On 2014-05-15 16:34, TMRh20 wrote:
I don't have an external amplifier on this module so I doubt the current
Well, I'm apparently seeing one :)
This is not an accurate statement. As long as the hardware can be made
I just encountered it. Error reporting is important, you can't have a Thanks |
You keep saying you have encountered a hardware failure. I would argue that the cause has not been shown to be anything more than user error in wiring or configuration of hardware, which should not be addressed by software.
I guess you have never seen the error message " "Windows has been shut down to prevent damage to your computer." The point is that it depends on the type of error and potential outcomes. My statement was meant to explain that the current functionality is designed to protect your hardware. With the timeout added, the radio could constantly be reset and keep overloading the circuits for hours on end, potentially causing additional hardware issues or actual failures.
If you disconnect a module, you have to reconnect it and restart the program. That is all, regardless of whether there is a timeout or not, the fix is exactly the same.
Definition of reliability:
Reliability engineering, the ability of a system or component to perform its required functions under stated conditions for a specified period of time. The stated conditions of operation do not include hardware issues. Fix the hardware. |
On 2014-05-15 17:54, TMRh20 wrote:
I encountered a situation where your driver gets into an infinite loop,
I have, and it was my job to investigate those cases and make it so the
I don't buy that justification at all. With a timeout, you can detect
That is not true. With a timeout, you could query the radio after the
I don't question that the hardware needs to be fixed, still, I'd very You're doing the right thing by not clearing the interrupt flags in the |
Your arguments have so many flaws, contradictions and holes in them, this is going nowhere, so I'll keep it simple. Demonstrate how these issues can be detected, resolved, and the user notified without any human intervention, and I will include a timeout. |
I've argued enough about it. I've seen my share of stupid arguments in open source but you're probably arguing one of the least defendable changes I've ever seen as a driver developer. I can't work with you when you're of bad faith. Anyhow, failure detection is trivial:
Resolving the problem in the "reset" case is just about re-running radio.begin(). It will magically recover (until the next issue) - unless the lib prevents it for no good reason. |
Do you really think calling my arguments "stupid" and calling all the work I and fellow contributors have done "pitiful" is the proper way to handle the situation? You could simply wait for others to weigh in. I'm sure that if I am that far off base, many other users will share your opinion. |
Woah... some hard words here! Calm it down. Sven, couldn't you create a Pull Request with your suggestions? |
Woah -- With such a trivial change, I'm sure you could implement it (without issues, bugs, and remaining backwards compatible) sven. |
I hear two philosophies of error handling at odds here: TMRh20: "My statement was meant to explain that the current functionality is designed to protect your hardware. With the timeout added, the radio could constantly be reset and keep overloading the circuits for hours on end, potentially causing additional hardware issues or actual failures." Sven337: "With a timeout, you could query the radio after the timeout occurs and notice that it's not there (reading all 1s) or has resetted, and you can then handle the error situation without human Both views may make sense in some use cases. Can we reach a compromise which allows the user to chose the approach that makes the most sense to their situation? How about an OPTIONAL timeout which attempts to put the radio in a safe state (eg: power down) and sets a disable flag that inhibits restarting until manually cleared. That should protect the hardware from accidental repeated resets, one of TMRh20's concerns. The library caller can however clear the disable flag and restart the radio if that makes sense for their use case, that is they have the option to consciously over-ride the default protective approach TMRh20 advocates. Perhaps even conditionally compile this option. I'd rather that than have the code base forked just to add timeouts for some users. This repository has the makings of one nRF24L01+ library for most purposes. ATMega Arduinos, ATtiny, Due, Teensy and RPi support, fast and flexible - and still improving. Up to you, TMRh20, of course. |
Since we are talking philosophy here, mine is that this is open-source and I am no dictator, I try to be more of quality control and developement. That being said, as always, you do make a pretty convincing argument. |
Per issue #5 by sven337 - Add optional failure handling/timeout - Un-comment #define FAILURE_HANDLING in RF24_config.h to enable - Add radio.errorDetected variable - indicates if an error/timeout was detected example: if(radio.failureDetected){ delay(1000); radio.begin(); radio.failureDetected = 0; radio.openWritingPipe(addresses[1]); radio.openReadingPipe(1,addresses[0]); report_failure(); //blink leds, send a message, etc. } Additional: - removed unused wide_band boolean - remove rx buffer flushes for RPi also
Ok, I've finally made some changes to the code that I hope will suit all users and potential scenarios. I'm still not sure this is the best solution, so feel free to make suggestions. Functionality and reasoning are as follows:
5, Any other previously configured settings should be re-configured Notes: This issue will be left open for a while in case further changes are needed. |
Hello,
a library should not do infinite loops:
//Wait until complete or failed
// If this hangs, it ain't coming back, no sense in timing out
while( ! ( get_status() & ( _BV(TX_DS) | _BV(MAX_RT) ))) { }
Whether the radio is coming back or not is irrelevant - the program should not infinite loop if the radio is broken!
Case in point, my radio has an issue: it will reset (as if it got an electrical reset) at random, for reasons that are unclear to me.
Once it has resetted, the next write call will never succeed, and with your lib write goes in an infinite loop.
The correct thing to do in my opinion would be to add a timeout to write().
More details about my radio issues follow, in case you're interested or have an idea. I have a Arduino sketch that sends a stream of radio packets.
On the receiver side this is the configuration:
================ RASPBERRY PI SPI Configuration ================
CSN Pin = Custom GPIO8
CE Pin = Custom GPIO18
Clock Speed = 8 Mhz
==================== NRF24L01 Configuration ====================
STATUS = 0x46 RX_DR=1 TX_DS=0 MAX_RT=0 RX_P_NO=3 TX_FULL=0
RX_ADDR_P0-1 = 0xe7e7e7e7e7 0xf0f0f0f0f0
RX_ADDR_P2-5 = 0xf1 0xf2 0xc5 0xc6
TX_ADDR = 0xe7e7e7e7e7
RX_PW_P0-6 = 0x00 0x04 0x04 0x04 0x00 0x00
EN_AA = 0x3f
EN_RXADDR = 0x0f
RF_CH = 0x5f
RF_SETUP = 0x27
CONFIG = 0x0f
DYNPD/FEATURE = 0x00 0x00
Retries = 15 * 4000 us
Data Rate = 250KBPS
Model = nRF24L01+
CRC Length = 16 bits
PA Power = PA_MAX
It receives the stream just fine. However, if the stream is interrupted (either because the sender stopped sending, or because I switch the radio to PTX mode), then the radio may reset to its factory defaults. (2Mbit/s, default addresses, default channel, and so on).
After this happens, any write() attempt will fail, and TX_DS and MAX_RT don't even get set (not sure why), so the write() goes into an infinite loop with your lib.
Thanks
The text was updated successfully, but these errors were encountered: