-
-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nucleotide-count: add case-agnostic test #959
nucleotide-count: add case-agnostic test #959
Conversation
de6356a
to
8a9a71c
Compare
@@ -50,6 +50,17 @@ | |||
} | |||
}, | |||
{ | |||
"description": "is case-agnostic against a strand of mixed-case nucleotides", | |||
"property": "nucleotideCounts", | |||
"strand": "aGcTtTtCaTtCtGaCtGcAaCgGgCaAtAtGtCtCtGtGtGgAtTaAaAaAaGaGtGtCtGaTaGcAgC", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would AaCcGgTt
also be valid for this test?
Is there a reason why this needs to be so long and mixed up?
If I'm debugging my program I'd like to be able to manually verify the expected inputs and outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Insti
The input was taken directly from the original case above ("strand with multiple nucleotides").
I wanted to keep it consistent:
-> AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
-> aGcTtTtCaTtCtGaCtGcAaCgGgCaAtAtGtCtCtGtGtGgAtTaAaAaAaGaGtGtCtGaTaGcAgC
Any other suggested input is welcomed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also object to the case above, but wasn't asking you to change that one.
I've never seen those in lowercase when talking about nucleotides, therefore I'm against that change |
I agree with @NobbZ unless a mixed case example can be sited I would expect there would be a screening process ahead of the solution to screen for bad input. |
Well, only halfway… If it is a valid input, we need to test it, if it is invalid input then we need to discuss if it is worth testing as an additional invalid case or not compared to those invalids that currently exist… Since I do not have domain knowledge about biology or biochemistry, I have to assume, that those written representations of DNA that I've seen so far were correct and canonical. Among those I've never seen any lower case letter. Therefore I also assume given inputs with a lowercase character as invalid. If my assumption is right, we need to decide wether we should test invalids or not, but if we do, we should explicitely for lowercase nucleotides because of
|
This is the best example I could quickly find that shows mixed case. |
I found this:
When searching for it, there are multiple sources that suggest it is totally fine to use mixed cases to increase readability of some sort. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description lists the valid DNA symbols as 'ACGT'. Since this is not a case comparison exercise I agree with the others that mixed case input need not be tested here.
@Insti Sorry, am I mistaken? @NobbZ suggested we should check if mix-case is a valid use-case for this domain and act accordingly. And @rpottsoh found a valid case, so it seems he might agree with this assessment. Further research shows that mixed case is indeed a common use-case for DNA-strands. I am not sure what to make out of this now (besides of choosing a better input). |
@Vankog what is this exercise about? |
To me the main aspect is to count the number of occurrences of certain nucleotides. How these nucleotides are represented is part of the domain specific language. With that in mind it turned out the nucleotides in question are usually presented inside sequences composed of A, C, G, T in upper- as well as lower-case and even mixed-case. This, in combination with the robustness principle mentioned in the first post, is why I want to make sure the user's solutions do not fail silently on this (implicit) domain specific requirement nor everyday laziness. |
I agree with this.
And this.
This is where we diverge. The description specifies the characters A, C, G, T. These are only ones we need to use in our test inputs, and testing for anything else is an unnecessary distraction. |
That is why I suggest to adjust the description as well, because it does not match with particular domain specifics. In the same vain I see it suitable to also fix the case-problem in either way. |
In all papers I've read and in text books, uppercase is used, but as it is pointed out above, there is no rule against using lower case. But I honestly do not think it is of great importance whether it's lower or uppercase from a bio scientific point of view. |
@Vankog if you feel that the description could be improved, feel free to create a separate PR that addresses the issue. I do not think there is any problem with the description. It clearly specifies the 4 symbols to be used. |
Im on vacation for the next two weeks. I'll take care of his then. |
I'll close this PR for now, we can re-open it again later if the PR to the description makes it relevant again in the future. Thanks for your contributions ❤️ Have a good holiday. |
Splitting #895 into multiple PRs.
This adds a test case that ensures that lower- and uppercase inputs do not matter and don't interfere with each other. You can use both interchangeably.
Reasoning:
https://en.wikipedia.org/wiki/Robustness_principle