-
Notifications
You must be signed in to change notification settings - Fork 1.1k
rewriting the Ruleset Style Guide #7707
Comments
@pgerber Thanks very much for stepping forward to do this. Our documentation could really use some improvement. This has turned into an info-dump, I hope it's not too messy. Where to put documentationThe style guide should be merged with Sort orderWhen I review a list of domains, it is almost always involves me manually loading domains in a browser in the order they are listed. That's what the sort order optimizes for. Grouping by top level domain reading right to left groups subdomains together. This is helpful for testing because subdomains are usually similar to one another. Grouping WildcardsWildcards are discouraged but not forbidden. They are handy sometimes, for example:
The problem with wildcards is with inexperienced contributors who use a wildcard to cover only a few simple domains because they don't know better and the documentation is confusing. Non-working hostsIn a perfect world, we would list non-working hosts in tags so they could be visible to automated tools. But in the actual world, yeah we should just list them manually in a comment. PR #6868 has some history on this. My own preference is what you see here: #5390 (comment). Here's an edited short version for easy reference: <!--
Invalid certificate:
8marta.glavbukh.ru
forum2.glavbukh.ru (incomplete certificate chain)
Redirect to HTTP:
8marta2013.glavbukh.ru
den.glavbukh.ru
Refused:
e.glavbukh.ru
www.e.glavbukh.ru
Time out:
psd.glavbukh.ru
str.glavbukh.ru
--> Again, I'm optimizing for me clicking down a list. I want each category grouped together because I might test each group differently. For example, I can test The more testing I do, the more I feel the footnote approach is just wrong. Superscript footnotes are worse on the eyes and, thus, more wrong. The other thing to optimize for is, will this confuse a future reviewer? Will the reviewer be able to tell unambiguously when the problem is fixed? Something like I like to organize the error categories alphabetically -- I've recently, sometimes, started kicking back Pointers between rulesetsThis sort of comment: Ruleset namesFor simple sites, for the This can get tricky for domains in languages I don't know. For example I don't speak Hungarian, and this file If a ruleset covers multiple domains, then the ruleset Really complicated rulesets or groups of rulesets like Filenames should vaguely resemble the TestingThere is a set of tests, which can be run independently and which run as part of Travis, that ensure things basically work. This is documented in There is no prepared tool for testing, for example, mixed content blocking. It should be possible to make such a tool, but no one has done that yet. The latest discussion for creating rulesets is in issue #7691 but there have been other discussions before.
|
Pinging @fuglede @gloomy-ghost @Hainish @J0WI for their input. |
I think #7717 should have been made a part of this discussion, but I guess it's too late. |
@jeremyn Don't forget China is not the only country that practices internet censorship. In my opinion internet censorship is out of scope of HTTPS Everywhere. It's unpredictable, it's difficult to fix issues related to it and very unlikely that censors would change their methods of blocking on HTTPS Everywhere's request. Users in such countries should look for way to bypass internet censorship if sites they visit are broken because of it. |
@jeremyn I live in a country that practices internet censorship and HTTPS Everywhere prevented me from accessing multiple websites, for example Archive.org. I guess that's the price for security. I'm not going to turn HTTPS Everywhere off because of that. |
I strongly agree that we should not modify our policy for inclusion based on whether a country censors a site, even if it is a country with a large set of potential or current users. This may hurt the adoption of the tool from within censorship regimes, but modifying our rules based on the vicissitudes censors contributes to a technical ecosystem which legitimates and normalizes censorship. The tension between accessibility for users inside censorship regimes and the desire to make the internet more secure is something that many sites have had to deal with recently. The English-language Wikipedia has had to deal with this tension very directly. It is my opinion that they made the right choice of forcing encryption for all users, even if this increases the chances that they will be censored, as they have in the past. It puts the onus on the censorship regime to be enacting the censorious behavior, rather than the sites exhibiting self-censorship. They chose not to contribute to that technical infrastructure of censorship, and I think they were right in doing so, whether or not it hurts their adoption from within these regimes. Also, as @smw-koops points out, censorship can be spurious and hard to model for, and incorporating this into an ever-growing list of concerns for ongoing and new rulesets adds additional process for maintainers. |
@smw-koops I'm fine with mentioning downgrading in the documentation based on whatever the consensus from #7717 is. Regarding censorship, I don't mean that HTTPS Everywhere should accommodate censors. What I mean is that there are pull requests that say that certain sites are only available in the GFW, or that they behave differently if they're inside the GFW. For example a site may use a CDN with different HTTPS behavior if the request comes from inside the GFW than if it comes from outside. Pull request #7586 is a messy example. I don't want to sign off on a pull request that I can't fully test. Right now I think we only have one reviewer who can reliable test these inside-the-GFW pull requests, @gloomy-ghost, so if @gloomy-ghost is the one who submits the pull request, they're stuck because no one else can review it. So, the documentation should mention this problem and give advice to help people submitting these pull requests. For example, in #7586 I asked @gloomy-ghost to remove as much of the inside-the-GFW stuff as possible, which let me test enough of it that I felt comfortable merging it. (There was one domain in a comment that I couldn't check.) |
Minor point: I like <target host="example.com" />
<test url="http://example.com/myurl/" />
<target host="www.example.com" /> If everyone agrees, we should mention this in the documentation either explicitly or by example. |
I'm fine with this. This irks my typical indentation-following-xml-scope convention, but that's not in the xml standard and I understand the utility of this indentation formatting. |
Another thing that we might want to document is the discussion in #7243 (comment), answering the question of why we will or won't disable rulesets that have problems until those problems are fixed. |
My note in #5025 (comment) about when to add comments for nonworking domains might be interesting for the documentation. |
This shoudl be resolved with #8193 - we should break out any separate concerns into separate issues. |
I've become to realize that the Ruleset Style Guide is out-of-date and lacking some crucial information.
I believe it would be best to completely rewrite it. I don't mind writing it myself. However, before actually doing so, there are a few questions that I'd like to have answered. I also added a preliminary outline for the new guide. Feedback is welcome!
Open Questions
Sort order
I've been made aware of a de facto standard for sorting host in rulesets. I'm not sure we want the ordering this way but we should formally agree on a standard. Also, the style guide should mention and use that order.
Apparently, the preferred ordering (#7643):
I'd prefer not treating
www
as a special case. It still confuses me and is harder to sort when generating a ruleset. Is there a reason for thewww
being out-of-order?Remove wildcards from the examples
I believe they have been deprecated, we should remove them from the main examples and just add a separate example clearly indicating that wildcards should be avoided. What are valid use cases for wildcards? I there a good example?
Listing non-working hosts in comments
There appear to be different ways of adding hosts that are not covered by HTTPSEverywhere, as comments (e.g. hosts that don't have a valid certificate). Later, we might want to add the hosts to the XML to allow processing the information more easily. For now, it's probably best to agree on one style and add examples to the guide.
Is there a list of the categories that are used for non-functional hosts?
These categories appear to be used commonly:
I also use unexpected http status code, #7706.
Did I miss anything? I personally prefer my style over the two others listed below. Using ¹ or letter (m,r) only works if there is short list of hosts. In long list, I always need to scroll up and down to read the key. Is there a style that's superior or do we just pick one?
I use this style:
others use:
or:
… and many more variations
Naming of Rulesets
There is a name used as file name and one within the file itself. Easiest would probably be to use Example.com.xml as file name and Example.com ruleset name. How do we deal with companies that have multiple domains? Do we put all domains in one file or do we use one file per domain? The former is probably be better if you want to make edits manually. The latter if you want to create and update rules automatically.
Tools
People appear to use a whole bunch of scripts and other aids. I for instance wrote my own script for generating rulesets, #7706. Is there a list of tools and methods how to create, update and verify rulesets? What is a good way to verify HTTPSEverywhere doesn't break a site?
Structure
The current structure, if you even want to call it that, is a bit of a mess. I'd like to see it a bit more structured.
I was thinking about something like:
Let me know what I missed or what you'd like to have changed.
The text was updated successfully, but these errors were encountered: