Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case expressions instead of if-case statement #2181

Closed
munificent opened this issue Mar 30, 2022 · 5 comments
Closed

Case expressions instead of if-case statement #2181

munificent opened this issue Mar 30, 2022 · 5 comments
Labels
patterns Issues related to pattern matching.

Comments

@munificent
Copy link
Member

munificent commented Mar 30, 2022

This is a strawman for a more general case expression form to replace the narrowly targeted if-case statement in the patterns proposal.

Background

The main use for patterns in control flow is switch statements and expressions. However, those can be fairly verbose. Following Swift, the proposal also defines an if-case statement form:

if (case [int x, int y] = json) {
  print('Was coordinate array $x,$y');
} else {
  throw FormatException('Invalid JSON.');
}

In an issue, @lrhn suggested that instead of if-case statements, we follow C# and Java and allow a pattern after is (or instanceof in the case of Java):

if (this.field is int value) {   // for "field promotion"
  // use value
}

if (something is [Object error, StackTrace stack]) {
  Error.throwWithStackTrace(error, stack);
}

Patterns would be allowed in any is expression, not just as a direct expression in an if condition, allowing useful chaining like:

if (something is int a && other is int b) return a + b;

Unfortunately, allowing any matcher pattern after is would lead to some problems. Lasse and I spent some time discussing it and we came up with another idea, described here. If we decide we like this better, I'll roll it into the main patterns proposal and remove if-case.

Proposal

We define a new infix case operator. The left-hand side is an expression, and the right-hand side is a matcher pattern. The above examples look like:

if (json case [int x, int y]) {
  print('Was coordinate array $x,$y');
} else {
  throw FormatException('Invalid JSON.');
}

if (this.field case int value) {   // for "field promotion"
  // use value
}

if (something case [Object error, StackTrace stack]) {
  Error.throwWithStackTrace(error, stack);
}

if (something case int a && other case int b) return a + b;

This is not restricted to use in if conditions. It's a general-purpose expression that can appear anywhere expressions are allowed:

var isEmpty = rectangle case (width: 0, height: 0);

assert(json case {'id': int _}, 'Should have numeric "id" field.');

It has the same precedence as is and is!:

relationalExpression ::= bitwiseOrExpression
  (typeTest | typeCast | caseTest | relationalOperator bitwiseOrExpression)?
  | 'super' relationalOperator bitwiseOrExpression

caseTest ::= equalityExpression ('case' matcher)?

(You can think of the existing is and is! expressions as syntactic sugar for a subset of what case expressions can match.)

TODO: Should we allow guard clauses?

Control flow and scoping

The two key challenges with allowing a refutable pattern to appear in any expression context are:

  1. What happens when the pattern is refuted and fails to match? Is there control flow? If so, where to?
  2. What is the scope of variables bound by the pattern?

These two questions are intertwined: if the pattern fails to match, we need to ensure that no code where the variables it binds are in scope can be executed.

Having a restricted if-case statement form instead of an expression form answers both of those. Since the pattern can only appear directly inside an if condition, the control flow behavior and scoping extent are fairly obvious. It's less obvious how an expression should behave.

The insight is that there are some places in the grammar where a Boolean expression is expected in order to perform control flow. We call these refutable positions. The behavior of a case expression can vary depending on whether it appears in a refutable position or not.

Refutable position case expressions

When a case expression is in a refutable position, then match failure causes it to jump over some specified code. Any variables the case expression binds are only in scope in that region.

An expression is in a refutable position if it is:

  • The condition expression of an if statement or element.
  • The condition expression of a while statement.
  • The condition expression of a conditional (?:) expression.
  • The left operand of an && expression.
  • The right operand of an && expression E and E is in a refutable position.
  • The operand of a grouping (...) expression E and E is in a refutable position.

Put together, these rules cover a series of && appearing directly inside a condition expression, ignoring parentheses which have no effect. As in:

if (a && (b && c) && d) ...
(a && b) && c ? ... : ...

When a case expression appears in a refutable position, variables bound by its pattern are in scope in any subsequent && operands as well as the region of code executed when the surrounding condition is true. For if statements, that's the then statement. For if elements, the then element, etc.

If the pattern matches, then the case expression evaluates to true and execution proceeds. Otherwise, it evaluates to false, any remaining operands in the && chain short-circuit, and the condition is `false.

Non-refutable position case expressions

When a case expression is not in a refutable position, it is a compile-time error if the matcher binds any variables. This sidesteps any questions around scope. The result of the case expression is true if the pattern matches and false otherwise.

This lets users use case expressions in any place where it's useful to be able to ask questions about the structure of some object, while avoiding binding variables in arbitrary expressions and leading to confusing scope.

Opinion

When I first started writing this up, I was hoping we could piggyback on the type promotion rules and basically say you can have variables in a case expression in all of the places where type promotion says a variable can show some type. Then the scope of those variables is the scope where the promoted variable has its promoted type.

We'd get some conceptual unification and hopefully it would be easier for users to understand the scope since it follows rules they are already somewhat familiar with.

On reading the current flow analysis spec, I came to the conclusion that the flow analysis rules are much too subtle to hang variable scoping off of. Mirroring those would imply allowing code like:

test(Object o) {
  Object o = 'str';
  print(o case String s || (throw '!'));
  print(s.length); // <-- "s" in scope here.
}

test(Object o) {
  if (o case String s) {
    print('Got a string.');
  } else {
    return;
  }
  print(s.length); // <-- "s" in scope here.
}

Those look horrifically wrong to me even if they are technically sound.

Instead, I proposed the much simpler "refutable position" above which I think covers the cases we care about and has reasonable scoping rules. The result basically takes the current proposed if-case statement and:

  1. Tweaks the syntax to be "expression case pattern" instead of "case pattern = expression". I like this and would suggest doing that even if we keep if-case as a dedicated statement.

  2. Extends it to be allowed in conditional expressions and while conditions. If-case elements are already planned, and this seems like a reasonable extension.

  3. Supports chains of &&. This is reasonable, but it does look kind of strange especially if we allow variable bindings in preceding operands. It becomes a very odd special case rule where a chain of && directly inside a condition expression has some special powers.

    On further thought, I also don't find it particularly strongly-motivated either. Instead of:

    if (something case int a && other case int b) return a + b;

    If the right operand doesn't depend on the left then you can always write:

    if ((something, other) case (int a, int b)) return a + b;

    I think that's likely more idiomatic.

  4. Adds a more or less unrelated infix case expression that can be used anywhere but can't bind variables. Kind of neat but not super valuable. Most examples I came up with felt kind of contrived and not much better than the expression you would write today instead.

Overall, this didn't come together as well as I was hoping, but it has some promise, or at least pieces of it do.

@munificent munificent added the patterns Issues related to pattern matching. label Mar 30, 2022
@Wdestroier
Copy link

if (data case String message) { ... }

This syntax looks a little awkward. A suggestion of mine is to introduce the be keyword, which is a little shorter than case and more similar to is.

if (data be String message) { ... }

@munificent
Copy link
Member Author

It has a certain Shakespearean charm:

if (music be TheFoodOfLove) playOn();

But I think overall it would look even stranger to most users. Also, pragmatically, case is already a reserved word which makes it much easier to use in new syntax without risking ambiguity or breaking existing code. And it sends a clearer signal that the thing after it is a pattern, since that's what users already know from seeing case inside a switch statement.

munificent added a commit that referenced this issue Apr 5, 2022
Instead of:

  if (case [int x, int y] = json) return Point(x, y);

The proposed syntax is now:

  if (json case [int x, int y]) return Point(x, y);

This does *not* define an infix case *expression*. But it is more
forward-compatible with that syntax if we decide to do that later.

See #2181.
@munificent
Copy link
Member Author

I have a PR out that changes the syntax of if-case statements to use the "infix-like" syntax described here. So instead of:

if (case [int x, int y] = json) return Point(x, y);

The proposed syntax is now:

if (json case [int x, int y]) return Point(x, y);

This change does not define an infix case expression like this issue talks about. But it is more forward-compatible with that syntax if we decide to do that later.

I made this change for three main reasons in increasing order:

  1. I subjectively think it looks nicer. It puts the value before the pattern, which is consistent with the order that it appears in a switch statement.

  2. It makes it clearer that the pattern is a matcher pattern and not a binder pattern. With a pattern variable declaration, you have a binder pattern before a =. In a switch, you have a pattern after case. The previous if-case syntax case pattern = value looks like both of those so it's not clear which kind of pattern you're looking at. The infix case syntax makes that clearer. The pattern after case is always a matcher and there's no = to confuse.

  3. It's important to emphasize this distinction because matcher and binder patterns behave differently with regards to type inference.

    var <double>[x] = [1];
    print('x = $x');
    
    switch ([1]) {
      case <double>[var y]: print('y = $y');
    }

    This should print:

    x = 1.0
    y = 1
    

    In a pattern variable declaration, we use downwards inference from the pattern's type to infer the list type on the initializer which in turn infers double for the element and does an int-to-double conversion on the 1 literal.

    In a switch case, the pattern has no effect on inference in the value expression. The point of a matching pattern in a switch case is to ask a question about the value, so it would be weird if the case influenced the value that it was querying. Also, in the presense of multiple cases, it's not clear how they would all interact to influence the value expression's inferred type.

    So then how does this behave:

    if (case <double>[x] = [1]) print('x = $x');

    The answer is that it prints x = 1, which is what we want. But it's not clear from the syntax. It looks a lot like a pattern variable declaration so a user might reasonably expect downwards inference from the pattern to the value on the right.

    Using the infix syntax, I believe is more likely to lead them to correctly expect no inference:

    if ([1] case <double>[x]) print('x = $x');

munificent added a commit that referenced this issue Apr 6, 2022
…#2191)

Instead of:

  if (case [int x, int y] = json) return Point(x, y);

The proposed syntax is now:

  if (json case [int x, int y]) return Point(x, y);

This does *not* define an infix case *expression*. But it is more
forward-compatible with that syntax if we decide to do that later.

See #2181.
@munificent
Copy link
Member Author

The current proposed syntax unifies binders and matches into a single grammar. It also uses the same type inference process for pattern variable declarations and patters in if. That means there's less need to have a case-like syntax for pattern-if statements. Instead, it uses if (var <pattern> = <expr>).

There's no longer something that looks like an infix case expression and no current plans to add that, so I'm going to close this issue. If we revisit this syntax, of course, we can re-open it.

@munificent munificent reopened this Sep 9, 2022
@munificent
Copy link
Member Author

After lots of agonizing and back and forth, we've gone back to a syntax where variable patterns in cases are slightly different from those in pattern variable declaration statements. Because of that, the previous if (var = <pattern>) syntax doesn't make sense, so we've gone back to the if (<expr> case <pattern>) syntax proposed in #2181 (comment).

It does not define case as a general infix operator. It can only appear in if conditions. I think that's the right answer, at least for now, because it avoids having an undelimited pattern appearing anywhere an expression can appear. I strongly suspect the latter would lead to ambiguity somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patterns Issues related to pattern matching.
Projects
None yet
Development

No branches or pull requests

2 participants