nucleotide-count: add case-agnostic test #959

Vankog · 2017-10-17T12:40:20Z

Splitting #895 into multiple PRs.

This adds a test case that ensures that lower- and uppercase inputs do not matter and don't interfere with each other. You can use both interchangeably.

Reasoning:

"Postel's law" (a.k.a. "robustness principle")
https://en.wikipedia.org/wiki/Robustness_principle
Also, in everyday life, we got used to ignore cases. I see no reasoning why lower-case inputs should be in any way incorrect for this exercise.
The test is located near the end of the spec to cover some generic special cases as part of the finishing touches of the exercise.

exercism#959

Insti · 2017-10-17T14:58:17Z

exercises/nucleotide-count/canonical-data.json

@@ -50,6 +50,17 @@
          }
        },
        {
+          "description": "is case-agnostic against a strand of mixed-case nucleotides",
+          "property": "nucleotideCounts",
+          "strand": "aGcTtTtCaTtCtGaCtGcAaCgGgCaAtAtGtCtCtGtGtGgAtTaAaAaAaGaGtGtCtGaTaGcAgC",


Would AaCcGgTt also be valid for this test?

Is there a reason why this needs to be so long and mixed up?
If I'm debugging my program I'd like to be able to manually verify the expected inputs and outputs.

@Insti
The input was taken directly from the original case above ("strand with multiple nucleotides").
I wanted to keep it consistent:
-> AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
-> aGcTtTtCaTtCtGaCtGcAaCgGgCaAtAtGtCtCtGtGtGgAtTaAaAaAaGaGtGtCtGaTaGcAgC
Any other suggested input is welcomed as well.

I also object to the case above, but wasn't asking you to change that one.

NobbZ · 2017-10-17T15:33:03Z

I've never seen those in lowercase when talking about nucleotides, therefore I'm against that change

rpottsoh · 2017-10-17T16:15:44Z

I agree with @NobbZ unless a mixed case example can be sited I would expect there would be a screening process ahead of the solution to screen for bad input.

Vankog · 2017-10-17T17:18:00Z

@NobbZ @rpottsoh
Well, this brings us back to our discussion here: #902
About where, what and how much we should check for in tests.

NobbZ · 2017-10-17T18:15:29Z

Well, only halfway… If it is a valid input, we need to test it, if it is invalid input then we need to discuss if it is worth testing as an additional invalid case or not compared to those invalids that currently exist…

Since I do not have domain knowledge about biology or biochemistry, I have to assume, that those written representations of DNA that I've seen so far were correct and canonical. Among those I've never seen any lower case letter. Therefore I also assume given inputs with a lowercase character as invalid.

If my assumption is right, we need to decide wether we should test invalids or not, but if we do, we should explicitely for lowercase nucleotides because of

[…] in everyday life, we got used to ignore cases.

rpottsoh · 2017-10-17T18:32:07Z

This is the best example I could quickly find that shows mixed case.

Vankog · 2017-10-17T18:52:18Z

I found this:
http://www.sbcs.qmul.ac.uk/iubmb/misc/naseq.html

As the present recommendations present unique alphabetic symbols for each nucleotide combination, the use of upper- and lower-case letters as equivalent does not lead to confusion. However, such use may cause confusion between r (ribo-) and R (purine), and care must be taken in those rare cases where the various symbols are used in combination. In general, it should be emphasised (i) that upper-case symbols are advocated. and (ii) that the present recommendations are not intended to prejudice any possible future use of contrasting upper- and lower-case letters for specific purposes.

There is, however, a problem of discriminating between the upper-case letters G and C on poorly copied sequences. Nevertheless, the use of alternative symbols for G (such as a barred-G, ) is not recommended. Discrimination between the lower-case letters is much clearer.

When searching for it, there are multiple sources that suggest it is totally fine to use mixed cases to increase readability of some sort.
Quite interesting, I didn't knew this.

Insti

The description lists the valid DNA symbols as 'ACGT'. Since this is not a case comparison exercise I agree with the others that mixed case input need not be tested here.

Vankog · 2017-10-21T13:11:34Z

@Insti Sorry, am I mistaken? @NobbZ suggested we should check if mix-case is a valid use-case for this domain and act accordingly. And @rpottsoh found a valid case, so it seems he might agree with this assessment. Further research shows that mixed case is indeed a common use-case for DNA-strands.

I am not sure what to make out of this now (besides of choosing a better input).
We could also simply extend the description (in either way) to make it clearer.

Insti · 2017-10-21T17:52:30Z

@Vankog what is this exercise about?
What is the main aspect of this exercise that is important?

Vankog · 2017-10-21T19:20:40Z

To me the main aspect is to count the number of occurrences of certain nucleotides.

How these nucleotides are represented is part of the domain specific language. With that in mind it turned out the nucleotides in question are usually presented inside sequences composed of A, C, G, T in upper- as well as lower-case and even mixed-case.

This, in combination with the robustness principle mentioned in the first post, is why I want to make sure the user's solutions do not fail silently on this (implicit) domain specific requirement nor everyday laziness.
Hence the test. Further, I think we should also (or: "at least") update the description to match what we will agree on in the end.

Insti · 2017-10-21T20:01:52Z

the main aspect is to count the number of occurrences of certain nucleotides.

I agree with this.

the nucleotides in question are usually presented inside sequences composed of A, C, G, T

And this.

as well as lower-case and even mixed-case.

This is where we diverge. The description specifies the characters A, C, G, T. These are only ones we need to use in our test inputs, and testing for anything else is an unnecessary distraction.

Vankog · 2017-10-21T21:10:18Z

That is why I suggest to adjust the description as well, because it does not match with particular domain specifics.
I mean, for example, @Dysp also pointed out in #902 (comment), that the description does not really fit the domain specifics and created #913 to improve the description.

In the same vain I see it suitable to also fix the case-problem in either way.

Dysp · 2017-10-21T23:12:45Z

In all papers I've read and in text books, uppercase is used, but as it is pointed out above, there is no rule against using lower case. But I honestly do not think it is of great importance whether it's lower or uppercase from a bio scientific point of view.

Insti · 2017-10-22T08:29:07Z

@Vankog if you feel that the description could be improved, feel free to create a separate PR that addresses the issue.

I do not think there is any problem with the description. It clearly specifies the 4 symbols to be used.

Vankog · 2017-11-02T12:27:06Z

Im on vacation for the next two weeks. I'll take care of his then.

Insti · 2017-11-02T14:57:14Z

I'll close this PR for now, we can re-open it again later if the PR to the description makes it relevant again in the future.

Thanks for your contributions ❤️

Have a good holiday.

Vankog mentioned this pull request Oct 17, 2017

nucleotide-count: refactoring tests (discussion) #895

Closed

nucleotide-count: add case-agnostic test

8a9a71c

exercism#959

Vankog force-pushed the nucleotide-count_add_case_agnostic_test branch from de6356a to 8a9a71c Compare October 17, 2017 12:44

Insti reviewed Oct 17, 2017

View reviewed changes

Insti suggested changes Oct 18, 2017

View reviewed changes

Insti closed this Nov 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nucleotide-count: add case-agnostic test #959

nucleotide-count: add case-agnostic test #959

Vankog commented Oct 17, 2017 •

edited

Loading

Insti Oct 17, 2017

Vankog Oct 17, 2017 •

edited

Loading

Insti Oct 18, 2017

NobbZ commented Oct 17, 2017

rpottsoh commented Oct 17, 2017

Vankog commented Oct 17, 2017 •

edited

Loading

NobbZ commented Oct 17, 2017

rpottsoh commented Oct 17, 2017

Vankog commented Oct 17, 2017 •

edited

Loading

Insti left a comment

Vankog commented Oct 21, 2017 •

edited

Loading

Insti commented Oct 21, 2017

Vankog commented Oct 21, 2017 •

edited

Loading

Insti commented Oct 21, 2017

Vankog commented Oct 21, 2017 •

edited

Loading

Dysp commented Oct 21, 2017

Insti commented Oct 22, 2017

Vankog commented Nov 2, 2017

Insti commented Nov 2, 2017

nucleotide-count: add case-agnostic test #959

nucleotide-count: add case-agnostic test #959

Conversation

Vankog commented Oct 17, 2017 • edited Loading

Insti Oct 17, 2017

Choose a reason for hiding this comment

Vankog Oct 17, 2017 • edited Loading

Choose a reason for hiding this comment

Insti Oct 18, 2017

Choose a reason for hiding this comment

NobbZ commented Oct 17, 2017

rpottsoh commented Oct 17, 2017

Vankog commented Oct 17, 2017 • edited Loading

NobbZ commented Oct 17, 2017

rpottsoh commented Oct 17, 2017

Vankog commented Oct 17, 2017 • edited Loading

Insti left a comment

Choose a reason for hiding this comment

Vankog commented Oct 21, 2017 • edited Loading

Insti commented Oct 21, 2017

Vankog commented Oct 21, 2017 • edited Loading

Insti commented Oct 21, 2017

Vankog commented Oct 21, 2017 • edited Loading

Dysp commented Oct 21, 2017

Insti commented Oct 22, 2017

Vankog commented Nov 2, 2017

Insti commented Nov 2, 2017

Vankog commented Oct 17, 2017 •

edited

Loading

Vankog Oct 17, 2017 •

edited

Loading

Vankog commented Oct 17, 2017 •

edited

Loading

Vankog commented Oct 17, 2017 •

edited

Loading

Vankog commented Oct 21, 2017 •

edited

Loading

Vankog commented Oct 21, 2017 •

edited

Loading

Vankog commented Oct 21, 2017 •

edited

Loading