Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow options to be enclosed in '' "" or || #551

Closed
wants to merge 3 commits into from

Conversation

uli42
Copy link
Member

@uli42 uli42 commented Nov 16, 2017

Note: Nesting and escaping is not supported currently!

This is not perfect, e.g. the following string will be processed
wrong:

key=<v,a,l>,u,e1>,key2=value2

However, it will help if some value needs to contain a comma (which will be required by one of my keyboard patches to pass full RMLVO keyboard definition via the keyboard option.

@uli42 uli42 requested a review from Ionic November 16, 2017 22:32
Copy link
Member

@sunweaver sunweaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me sometime to emulate the string parsing in my mind... Looks good to me.

@Ionic: do you want to take a second look?

@Ionic
Copy link
Member

Ionic commented Nov 17, 2017

Yes, I also think we should remove any strings that are nestable as parseable delimiters to avoid ambiguity. I'll look into this later.

Copy link
Member

@Ionic Ionic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thought even more about this, I still maintain that we shouldn't allow delimiters for which the start and end characters are different (like {}, [] and <>), but actually decide on one fixed delimiter.

I don't like | because it's non-standard and weird in this context.

From the remaining characters ' and ", " is probably most natural.

I'd go with supporting " only.

char *realvalue = NULL;
char *next = fullstring + strlen(string);

if ((delimiter = index(string, '=')))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strchr instead of index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, can do that. The original used rindex (which is wrong IMHO). I can see posix has deprecated both of them.

{
*delimiter = 0;
value = delimiter + 1;
for (int i = 0; i < strlen(startchars); i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use sizeof(startchars) - 1 here to avoid the overhead of strlen(), but not really important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

else
{
/* option without value */
value = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless, value will already be NULL here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will fix that.

char *valstart = fullstring + (value - string) + 1;
if ((next = strchr(valstart, endchars[i])))
{
realvalue = strndup(valstart, next - valstart);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hope that strndup(s, 0) is legit... on the platforms I tested, it does what a human would expect, but no idea if that's universally true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If s is longer than n, only n bytes are copied, and a terminating null byte ('\0') is added."

It's not clearly stated what happens for n=0 but it is also not stated that 0 is illegal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was my point, too.

delimiter = rindex(string, ':');

if (delimiter)
if ((delimiter = rindex(string, ':')))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to strrchr() if touching this anyway.

Copy link
Member Author

@uli42 uli42 Nov 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some more uses of (r)index() in Binder.c, should also be fixed, but separately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are a lot of other cases that use deprecated functionality, but we can update what we need to touch anyway "for free".

current = NULL;
}
}
if (current == NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, current can also become NULL inside the if clause. Using an else would skip the check in that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, right.

@@ -1545,45 +1545,138 @@ static void nxagentParseOptions(char *name, char *value)
free(argv[0]);
}

/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/*
 * Please use
 * the standard
 * multi-line
 * comment format
 */

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

char *valstart = fullstring + (value - string) + 1;
if ((next = strchr(valstart, endchars[i])))
{
realvalue = strndup(valstart, next - valstart);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An option string such as foo="b,ar"baz will be parsed wrongly, with only b,ar passed as the value. It's especially problematic that the remainder baz will be handled as a key in the next iteration.

I guess that more work is necessary here.

And it can get even more complex, think of a string like foo="b,ar"b"az" as well. We actually need to scan for "quotes" multiple times until we either hit another comma or a NULL character.

Copy link
Member Author

@uli42 uli42 Nov 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Quoting characters are only allowed at the beginning and at the end of the string. Everything else is considered unsupported and will/can produce unexpected results.

However, what about extending that a bit by only excepting the end character it it is directly followed by ',' or '\0'?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That at least prohibits unexpected/wrong further parsing and looks correct. I guess that's good enough under the constraint you mentioned.

@uli42
Copy link
Member Author

uli42 commented Nov 18, 2017 via email

@Ionic
Copy link
Member

Ionic commented Nov 19, 2017

I'd exclude delimiters for which start and end characters do not match ({}, (), [] and <>), because they are inherently nestable. Explaining to users that this is not supported would be possible, but not all users check the full documentation and it's counter-intuitive. At least in most languages I know such enclosing schemes are used if stuff needs to be nestable (think programming languages, JSON, regular expressions etc.) - which doesn't work for delimiters that have the same start and end characters.

This leaves '', "" and ||.

My point against || is that it's non-standard. Sometimes horizontal bars are used to delimit things, but that's mostly for separating columns in data, not for grouping parts.

Leaves '' and "".

Of course we could allow both of them, but these are also slightly ambiguous. In most cases where quoting is allowed via quote characters, double and single quotes behave differently. In such cases, having two options available makes sense, since, for instance, one option (mostly single quotes) treats the quotes text literally, while the other option (mostly double quotes) allows substitutions within the literal text if specific markers are used.

In our case, we don't really have such a connotation in mind - or a need for it, or functionality available.

Using "" as the only quoting character combination is common, makes our and user's lives easier and is absolutely sufficient.

@uli42
Copy link
Member Author

uli42 commented Nov 19, 2017

The whole purpose of all this is allowing value containing commas. Depending on the character set in the value you must choose a quoting character that is not contained. So having more to choose makes it more probable that the user (t.i. the programmer of the client application that sets up the option file) finds a suitable one.

Sticking to " will make it impossible to quote anything that contains " and , in the value.

Further, I don't see anyone using nested quotes, simply because there's no use case IMHO. Can you come up with a parameter example that would require nested quoting? As long as we do not offer escaping here I'd say there is none.

The character set I chose for quoting includes the ones I consider as being the best regarding readability. So I included the | character as you can clearly see where it begins and where it ends.

I came across all this when implementing rmlvo keyboard definition for the options string because in that case commas can occur. So there really is a need for that. Instead of enclosing the whole string in quotes or brackets I also thought about using an escaping character instead which would really only be required for escaping commas. However, I consider this being less readable and more complicated to handle from the user's standpoint (I have an nx client script written in bash; it is easier/clearer to add quoting than to escape stuff in bash strings).

One clean approach would be a new format of the options file: instead of having one long line with comma separated values the option file should consist of several lines, one for each option. Quoting/escaping would not be required as long as we restrict values to not contain any line break characters. But: I don't have an idea how to handle values with commas in the DISPLAY or in the case where you pass a string instead of a filename via -options. Maybe we should make a big step here and restrict those cases to the most common/required ones, the rest has to be passed via options file.

Update: improved explanation

@uli42
Copy link
Member Author

uli42 commented Nov 19, 2017

pushed updated version: replaced index() calls, use sizeof(), abort if quoting characters appear in the middle of the value

@Ionic
Copy link
Member

Ionic commented Nov 21, 2017

Further, I don't see anyone using nested quotes, simply because there's no use case IMHO. Can you come up with a parameter example that would require nested quoting? As long as we do not offer escaping here I'd say there is none.

Right, and that's why I recommend not supporting {}, (), [] or <>. They look nestable, but we don't want to support it and we don't have any use case either.

Sticking to " will make it impossible to quote anything that contains " and , in the value.

Well... then let's keep '', "" and ||. Especially the last one should be uncommon enough to not accidentally be part of a quoted string.

@sunweaver
Copy link
Member

sunweaver commented Nov 21, 2017 via email

@uli42 uli42 force-pushed the pr/quote_options branch 3 times, most recently from f842054 to a5ebee4 Compare November 21, 2017 23:02
@uli42
Copy link
Member Author

uli42 commented Nov 21, 2017

new version with only ", ' and | and manpage.

@Ionic Ionic changed the title Allow options to be enclosed in '' "" || {} () [] or <> Allow options to be enclosed in '' "" or || Nov 22, 2017
@Ionic
Copy link
Member

Ionic commented Nov 22, 2017

Reminds me a bit of sed... Possibly, more chars make sense: @, /, whatnot-supported-by-sed...

Yes, sed is very similar in this respect. It also expects one specific character as the delimiter, but is agnostic as to what character is used (i.e., any character will work).

We cannot directly mimic sed here, since

  • we don't allow or support escapes in the options string
  • it's possible that the value starts and ends with the same character incidentally OR is followed by an option that ends in that same character, which opens a world of pain

Say, the string foo=fo,of, would be parsed as the kv pair {foo,{o,o}}. Not exactly what the user intended...

Copy link
Member

@Ionic Ionic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few doc fixups.

session startup and at runtime (i.e. when resuming a suspended session)
by so-called nx/nx options.
Other than the command line options, \fBnxagent\fR can be configured
at session startup and at runtime (i.e. when resuming a suspended
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e.,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather leave it that way. We are not uniform in this respect, there are occurances of both forms in the manpage and in the sourcecode.

https://jakubmarian.com/comma-after-i-e-and-e-g/ (I prefer British here)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, your choice (and likely makes sense since the statement is enclosed in parentheses.)

Other than the command line options, \fBnxagent\fR can be configured
at session startup and at runtime (i.e. when resuming a suspended
session) by so-called nx/nx options. These options are always passed
as a (sometime looong) options string consisting of option=value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

as a (sometime looong) options string consisting of option=value
pairs, separated by commas. \fBnxagent\fR will extract that
options string from the DISPLAY variable (see below) or an options
file (provided via the \-\fIoptions\fR command line option). To
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or an options string passed on the command line?

Copy link
Member Author

@uli42 uli42 Nov 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left that out on purpose:

  1. the text is already confusing, I did not want to add more confusion.
  2. the manpage does not include any hint that this is possible at all (which should probably be changed; I have opened an issue for that: nxagent manpage does not mention options string for -options #565)
  3. an options string passed via cmdline is mainly usable for non-persistent sessions and testing. So it is nothing for the majority of users

Further, I'am wondering if we should drop even "options via DISPLAY" out of the equation, see #564.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no... the rules for this stuff are quite complicated and currently not documented - mainly the overrides.

I guess we can leave that out of this PR and only mention the most commonly used scheme, yes.

options string from the DISPLAY variable (see below) or an options
file (provided via the \-\fIoptions\fR command line option). To
overcome problems in case a value contains the comma character a
simple quoting is supported. The quoting character must immediately
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simple quoting scheme?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that!

file (provided via the \-\fIoptions\fR command line option). To
overcome problems in case a value contains the comma character a
simple quoting is supported. The quoting character must immediately
follow the = character and again directly precede the next
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave out again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. I have just pushed a new version.

@uli42 uli42 force-pushed the pr/quote_options branch 2 times, most recently from 2b7f5b6 to 8a7942d Compare November 23, 2017 14:23
@Ionic
Copy link
Member

Ionic commented Nov 26, 2017

Do you want to rebase the second commit into the first one prior to merging?

@uli42
Copy link
Member Author

uli42 commented Dec 5, 2017

Please do not merge now - need to adapt nxcomp's option handling in the same way first!

@sunweaver
Copy link
Member

sunweaver commented Dec 5, 2017 via email

Note: Nesting and escaping is not supported currently! Quoting is is
only allowed around the complete value part of an option. Quoting must
start directly after the '=' and end directly before ',' or EOS.

This is not perfect, e.g. the following string will be processed erroneously:

key="v,a,l",u,e1",key2=value2

However, it will help if some value needs to contain a comma.
@uli42
Copy link
Member Author

uli42 commented Aug 22, 2018

check: are the changes to nxcomp included already?

@sunweaver
Copy link
Member

sunweaver commented Aug 22, 2018 via email

@uli42
Copy link
Member Author

uli42 commented Aug 22, 2018

Looking at the code in nxcomp/src/Loop.cpp I found that nxcomp supports typical URL encoding techniques in the option values (see function URLDecodeInPlace) . So instead of quoting we could replace ambiguous chars by their hex representation e.g. %20 for space. That functionality seems to be missing in nxagent code, though...

@uli42 uli42 mentioned this pull request Aug 23, 2018
@uli42
Copy link
Member Author

uli42 commented Aug 23, 2018

I have implemented URL decoding in #725. That solves the underlying problem in a very elegant way.

@sunweaver
Copy link
Member

Superceded by #725, thus closing...

@sunweaver sunweaver closed this Aug 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants