Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow patterns after is. #2137

Closed
lrhn opened this issue Mar 2, 2022 · 19 comments
Closed

Allow patterns after is. #2137

lrhn opened this issue Mar 2, 2022 · 19 comments
Labels
patterns Issues related to pattern matching.

Comments

@lrhn
Copy link
Member

lrhn commented Mar 2, 2022

The current pattern matching proposal special cases pattern matching in if statements.
It uses a syntax which looks like assignment.

Consider instead allowing a pattern after any is check.

if (this.field is int value) {   // for "field promotion"
  // use value
}

or

if (something is [Object error, StackTrace stack]) {  // Real code I'd write today if I could.
  Error.throwWithStackTrace(error, stack);
}

Further, it allows chaining pattern matches as:

if (something is int a && other is int b) return a + b;

As precedence, C# already does this.

By allowing any matcher pattern after is, we also allow patterns to be used in conditional expression (?/:) and an if-element (in a collection literal), not just in an if statement. That does mean that we get local variables introduced inside those constructs, probably limited to the rest of that expression.
It makes patterns a more "first class" language feature by allowing it orthogonally anywhere a test is allowed, not just in specific statement contexts.

@lrhn lrhn added the patterns Issues related to pattern matching. label Mar 2, 2022
@lrhn
Copy link
Member Author

lrhn commented Mar 2, 2022

(Allowing bindings in any expression does mean that we have effectively introduced a let functionality into the language using an irrefutable pattern, (expression is var x) ? body : throw "ureachable", so we might as well introduce let x = expression in body as syntax directly.)

@mraleph
Copy link
Member

mraleph commented Mar 2, 2022

Java allows this as well: https://openjdk.java.net/jeps/394

Though I must admit I find this syntax a bit hard to read at times. I would be much more in favour of something that goes right to left because it reads as "pattern match to value", but I can't really come up with any reasonable syntax which does not look, ahm, strange.

if (int value = this.field) {

} 

Though semantics becomes a bit confusing here, because it does not act like a normal assignment (which would throw if types don't match or refuse to compile).

One thing we could consider here is that promotion to int? to int should preferably not require repeating the type, e.g. maybe

// given this.field is int?
if (this.field is var value) {   
  //
}

though it looks rather cryptic...

@leafpetersen
Copy link
Member

I kind of like this syntax when limited specifically to condition guards. I continue to be strongly opposed to having non syntactically delimited variable binding scopes for the reasons that I partially sketched out here . So allowing this in arbitrary is expressions is something I am still strongly opposed to.

@munificent
Copy link
Member

munificent commented Mar 4, 2022

From first blush, I like the way this looks. Familiarity with C# and Java is a plus.

There are a couple of ambiguity problems, though. :(

Identifiers as types or constants

if (value is int) { ... }

Today, this is a type check. If we treat the RHS of is as a pattern, then int becomes a constant pattern and the condition checks that the value is equal to the value of the type literal int. Probably not what you want.

For backwards compatibility, we could say that if the RHS is a (possibly qualified) identifier, then it continues to behave like a type check. But that means that the RHS isn't compositionally consistent:

if (1 is int) print('ok');
if ((1, 2) is (int, int)) print('oops');

Here, the first is matches but the second doesn't because, again, the bare identifiers inside the tuple pattern are treated as constant patterns.

We could say that an identifier in a matcher pattern is treated as a type check pattern (i.e. int behaves like int _) if the identifier resolves to a type. Otherwise, it's a constant pattern. But I'm really hesistant to require identifier resolution to know what a pattern means. It works out OK in practice since type names are usually capitalized and thus obvious, but I don't like the semantics being so contextual.

Or we could say that bare identifiers aren't treated as constants in matcher patterns and require some other syntax. But that breaks compatibility with existing switch cases.

Null-check patterns versus nullable types

Another ambiguity is:

if (value is int?) { ... }

Today, this is a type check against a nullable type. If we interpret int? as a matcher pattern, it means:

  1. Is value null? If so, match fail.
  2. Else, match if value is equal to the constant (i.e. type literal) int.

(Granted, this pattern is already confusing. We might want to add a notion of precedence to the pattern grammar so that the operand to a null-check pattern can't be a constant or literal pattern since that's unlikely to be useful.)

Where it can be used and variable scope

I agree with Leaf that I don't want to allow is to bind variables when used in arbitrary expression positions. It's just too confusing:

function(value is int x, x);

receiver.method(value is int x).chained(x);

Are these allowed? What fraction of Dart users would ever be able to intuit whatever answer we choose? I think the the safest/sanest approach is to not allow code like this. It just gets too weird.

That raises the questions of what the actual boundaries are restrictions are. I could see some combination of:

  • Only allow patterns after is when it appears directly inside an if condition.
  • Likewise but for conditional expressions.
  • Allow patterns after any is expression, but only patterns that bind variables in if conditions and/or conditional expression conditions.
  • Similar to above but extend it to allow is inside logical (&& and ||) operators in conditions.

is versus other syntax

Given the above ambiguity problems, I'm not feeling it's really worth trying to make this happen.

I think it's probably easiest to just keep is the way it is. It's the expression you use when you want to simply check whether a value has some static type. It gives you optimally terse syntax for that since you can't use constants in there and thus identifiers can be presumed to be types.

If you want to ask more structural questions about a value, then use an alternative that explicitly takes a pattern on the right. That syntax is a little more verbose if you're just checking a type, but in return lets you destructure and ask more interesting questions.

Right now, the syntax in the proposal for the latter is:

if (case pattern = value) ...

I'm not super attached to it. Another alternative that's only two characters longer than is is:

if (value case pattern) ...

That maintains a consistent property that matcher patterns always appear following case.

&& chaining in if-case

We could easily extend the if-case statement to allow chains of &&:

if (value1 case pattern1 && value2 case pattern2 && value3 case pattern3) ...

I suspect it's not worth the extra complexity when you can usually just do:

if ((value1, value2, value3) case (pattern1, pattern2, pattern3)) ...

This doesn't always work, since the later values in the tuple can't access variables declared by the current pattern. I'm not sure if that comes up enough to be worth allowing &&. I'm generally already worried about the large syntax surface area of the proposal, so I'd be strongly inclined to leave this out unless we know we want it.

case expressions

There may be times where you want to do some interesting destructuring on an object but only as a predicate and where you don't want to do any control flow, as in a shorter way to write:

var json = ...
var isUserUpdate = json is Map<String, Object?> && json['model'] == 'user' && json['verb'] == 'update';

For this, we could conceivably support case expressions:

var json = ...
var isUserUpdate = json case <String, Object?>{'model': 'user', 'verb': 'update'};

This avoids the ambiguity problems because now you're using case so the RHS is definitely known to be a pattern. I'm not sure if this is useful enough to carry its weight. And, in particular, it means that a pattern can appear in an expression without an explicit closing delimiter. I'm not sure if that will cause ambiguity problems, like:

var wat = {value case (a, b) ? 'yes' : 'no'};

Here, is the ? a null-check pattern or a conditional expression? Does this produce a set or map?

Overall, I think it's safest to try to ensure patterns only appear in clearly delimited syntactic positions, which the current proposal does.

Using patterns after is is a really good suggestion. I'm not sure it works out, but it's definitely good to explore.

@lrhn
Copy link
Member Author

lrhn commented Mar 6, 2022

I agree with Leaf that I don't want to allow is to bind variables when used in arbitrary expression positions. It's just too confusing

If both Java and C# are doing it, have they solved the confusion problem? Or is it not really a problem the way people actually use the feature? (I'm sure you can do something obscure, but if you won't get it through code-review, it's not a real problem. Working well in idiomatic code, and being confusing if misused, that describes most powerful language features.)

It seems that C# introduces the variable at the test (it's not a variable syntactically before) and it then treats it as a normal block-scoped variable which is uninitialized if the test fails.

		Object o = "A";
		if (o is String a && o is String b) {
			Console.WriteLine(a.Length + b.Length);
		}
		a = "b"; // Allowed
		Console.WriteLine(a);

		Boolean z = o is String y && y.Length == 0;
		//Console.WriteLine(y); // unassigned variable y

		String w = o is String v ? v : "not string";
   	        Console.WriteLine(o is String u);

Java seems to only have the variable directly downstream of the test, and special case boolean-based control flow to propagate the variable.

        Object o = "A";
        if (o instanceof String a && o instanceof String b) {
           System.out.println(a + b); // Works
        }
        Boolean x = (o instanceof String b) && b.length() == 0; // Works
        // b not in scope here at all.
        String v = o instanceof String z ? z : "not string"; // Works.
        System.out.println(o instanceof String q); // Works

In both cases, the variable introduced by the test is alive in the code dominated by a successful test, but no further than the current block.
In Java, the variable is inaccessible outside of the dominated code.
In C#, the variable is accessible, but possibly uninitialized, in the rest of the block.
It's the same definite-assignment analysis in both cases, the consequences of being unassigned are just different, and I'm willing to bet that most C# authors won't even know that the variable exists outside of the test.

Have we ever heard of anyone finding either Java or C# confusing at this point? Or are users just automatically using the test in the most obvious way, which works pretty well, and not pushing its edge cases?

About chaining tests, consider a situation like;

if (e1 case String b && b.length > 0 && e2[b.length] case otherPattern) ...

I expect such situations to happen, where you want to use a pattern match as part of a longer boolean expression, interleaved with other tests.
I also would like it to work consistently with the conditional expression, ?/: and element if.

Heck, I want o is Future && (throw ArgumentError("No future!")) to work, even though I don't want anyone to ever write it, because it would be consistent. Instead of making pattern match a special thing you can only do in special places, I want it as a general boolean expression with side effects, just like a promoting is test is now.
We can define pattern mathcing to work in every situation - C# and Java both did.
Orthogonality is great, it reduces the special cases!

@munificent
Copy link
Member

Quick note: I found an excellent discussion from the C# folks about the approach they took for scoping variables declared in is patterns: dotnet/roslyn#12939

@leafpetersen
Copy link
Member

Quick note: I found an excellent discussion from the C# folks about the approach they took for scoping variables declared in is patterns: dotnet/roslyn#12939

That's a good discussion of the scoping choices, but doesn't cover the discussion on making is patterns bind variables unfortunately.

@leafpetersen
Copy link
Member

Are there any real use cases for this feature outside of conditionals? What if instead of making this an expression, we just defined an extended boolean expression grammar that can only be used conditionals (or possibly only in if case` conditionals) and includes these expressions.

That is, suppose we define conditional patterns to be something like : <boolean expression> | <expression> is <pattern> | <conditional pattern> && <conditional pattern> | <conditional pattern> || <conditional pattern> | !<conditional pattern>, and then say that the only place that conditional pattern appears in the grammar is in the condition of an if statement or element?

What are the convincing use cases for including this into the expression grammar? And if there are use cases, is it enough to then provide an inclusion into expression grammar? For example, add case(<conditional pattern>) as an expression of type boolean?

@lrhn
Copy link
Member Author

lrhn commented Mar 18, 2022

If there are no use-cases outside of conditionals, then it might still be easier (implementation-wise) to allow the feature everywhere, than it is to introduce new syntactic categories that can only be used in a few cases. And that may or may not be complete enough for what users want to do.
For example, we might want to include these as conditional patterns too:

  • <conditional pattern> ? <conditional pattern> : <conditional pattern> (Probably, if we include && and ||)
  • '(' <conditional pattern> ')' (parentheses matter! Syntactically at least.)
  • throw <expression> (or is any expression of type Never accepted as a "boolean expression"? Any expression which has a type which is a subtype of bool?)
  • <boolean? expression> ?? <conditional pattern>

If there are no other use-cases than conditionals, then people just won't use it in other ways, but if we make a special grammar for conditions, people will still need to learn that.

I do want to support the conditional expression, for things like

  var result = field is Foo f ? f.getResult() : nonFooResult();

Conditions are not just if statements, it's ?/:, if-elements, and (do-)while conditions.
I see no good reason to allow

if (field is Foo f && f.isOK) {
  ...
}

but not

while (field is Foo f && f.isOK) {
  ...
}

(The if and while conditions are very similar, and I'd worry if it was no longer possible to change if to while or vice-versa.)

Now, for something probably a bridge too far: @stereotype441, you were working on linking information to boolean variables.

void main() {
  Object o = [];
  var b = o is List;
  if (b) {
    print(o.length); // This works!
  }
}

I'm not sure it's a good idea to allow:

  var b = o is List l;
  // l not in scope
  if (b) { 
     // l in scope as List here
  }
  // l not in scope

On the other hand, it would be perfectly in-line with treating the scope of the variable with the scope where the same is test would have promoted a variable - it's just "promoting" from undefined to defined.

@leafpetersen
Copy link
Member

If there are no use-cases outside of conditionals, then it might still be easier (implementation-wise) to allow the feature everywhere, than it is to introduce new syntactic categories that can only be used in a few cases. And that may or may not be complete enough for what users want to do.

I'm not worried about the implementation cost, I'm worried about the user experience. I continue to strongly feel that introducing scopes in the middle of arbitrary expressions where the nesting depth is unrelated to the scope is a very poor idea, whether C# decided to do it or not. I'm perfectly comfortable with introducing scopes in the headers of if, while, for, or what have you - the scope is tied to a clear syntactic hierarchy (though I do feel a little hinky about putting the variables in scope in the continuation, what C# in that discussion calls the "blocker" pattern). And I'm fine with introducing variables within an expression with clear delimited scope (after all, this just what function literals do). So case(<conditional pattern>) where variables introduced in conditional pattern do not escape the scope of the case expression are fine by me. But <conditional pattern> where variables bleed out from arbitrarily deep syntactic nesting inside of expressions out to the enclosing block is something that I continue to be extremely skeptical about.

@cedvdb
Copy link

cedvdb commented Mar 19, 2022

I find this syntax really nice. I assume it cannot be made final though ? Reassignability could be confusing here:

void fn(dynamic a) {
   if (a is int a) {
     a++;
   }
   // no bleeding scope
   print(a);
 }
 
 fn(3); // prints 3 

On the other hand having it final would not be consistent with the rest.

Have we ever heard of anyone finding either Java or C# confusing at this point?

What is the use case / appeal for a "bleeding scope" ? This is unintuitive, inconsistent with the rest of the language and unconventional with how scopes usually work in other languages beside the c# example and maybe js var hoisting. If it bled I'd expect other parenthesis to bleed as well, which would be really unconventional. It also prevents the compiler from catching errors

  for (int i = 0; i < 5; i++) {  }
  print(i);  // no  this is a mistake caught by the compiler

Maybe I misunderstood the bleeding part though as it seems out of place

@munificent
Copy link
Member

munificent commented Mar 23, 2022

What if instead of making this an expression, we just defined an extended boolean expression grammar that can only be used conditionals (or possibly only in if case` conditionals) and includes these expressions.

That is, suppose we define conditional patterns to be something like : <boolean expression> | <expression> is <pattern> | <conditional pattern> && <conditional pattern> | <conditional pattern> || <conditional pattern> | !<conditional pattern>, and then say that the only place that conditional pattern appears in the grammar is in the condition of an if statement or element?

I think I'm with Lasse that it's probably more trouble than it's worth to create conditions as a separate syntactic category. I could see us maybe wanting to do that it we wanted to use the same syntax to mean different things in each category. (Trivia: I believe BCPL uses = for assignment in statement position but equality in an expression.) But I don't think we want to overload syntax like this.

I haven't had a chance to write this up yet, but when Lasse and I were last talking about this, one idea was to add a <expr> case <pattern> expression form that can be used anywhere an expression can appear. But the pattern is only allowed to bind variables if it appears inside an if condition, conditional expression condition, etc. When used elsewhere, it's simply an expression that evaluates to a Boolean. That would avoid all of the weird scoping issues while still providing a general-purpose expression that can be more widely used. For example, it would let you do things like:

// show a validation error if the JSON isn't the right format:
return TextFormValidator(isValid: json case ['user', String _]);

@munificent
Copy link
Member

I wrote up a strawman for the idea in my last comment here: #2181

@munificent
Copy link
Member

munificent commented Aug 25, 2022

I don't think I ever responded to this important question:

If both Java and C# are doing it, have they solved the confusion problem?

Java and C# don't have type literals and Dart does. That is, I think, the key reason why extending is to allow arbitrary patterns is much harder in Dart. This is valid Dart code today:

var x = 1;

if (x is int) print('type test');
//       ^^^

switch (x) {
  case int: print('equivalent to type literal');
//     ^^^
    break;
}

So we already have two places in the language that want to generalize to allow patterns, but where existing valid syntax means something completely different. Java and C# avoid that because there are no type literals.

I don't see how we can generalize both is expressions and switch cases to allow arbitrary patterns while having a single coherent pattern syntax that covers both. I've tried a bunch of different ideas, but every way I've tried to disambiguate and not break existing is expressions looks strictly worse to me than simply not allowing patterns after is.

The only remaining idea I can think of is to eliminate type literals from the language (#2393). If we manage to do that (unlikely), then maybe we could revisit this.

@lrhn
Copy link
Member Author

lrhn commented Aug 26, 2022

There are places where a type is just expected. If you write a single identifier, it's a type. If you write more, the rest may be a variable name.

That's currently just new-style function types: void Function(int) vs void Function(int x).

We have a general problem with deciding what a plain identifier means in different pattern contexts, because we currently do different things in different places that should all be accepting patterns.

  • Function parameter: (x) => x, an identifier introduces a new variable.
  • Assignment x = 2, an identifier refers to an existing assignable variable or setter.
  • Case case c:, an identifier refers to an existing constant variable.
  • Type check is T, an identifier refers to a variable.

I suggested elsewhere that we could let patterns work differently in different contexts (with the option of forcing a context with a prefix), so:

  • A function parameter is in a declaration context, where a raw identifier introduces a new variable. Prefix is var or final, so var x = e; and var (x, y) = pair; and (x) => x introduces new variables.
  • An assignment like x = 2 or (x, y) = (y, x) tries to assign to existing variables/setters. There is no prefix, only available in expressions with no other pattern context, so basically just in possibly-parallel assignments.
  • A case is in a constant context, (possibly qualified) variables must refer to constant variables, prefix is const. So case a: is the same as case const a:, and const e can be used anywhere else to force a constant context in the traditional sense as well. The const pattern context doesn't require a constant, you can do case (var x, c):, it just defaults ambiguous identifiers to constants. (Likely doens't default T(x:1) to an object creation instead of an extractor pattern, we'll need to add the const explicitly for that).
  • A check like is T is in a type context where single stand-alone identifier (or qualified identifier) refers to a type. No prefix, but a suffix of T _ or T() will force the treatment of T as a type.

With this, if (x is int) will be equivalent to if (x is int _), which checks for being an int and does not bind the promoted value to anything. Or you can do if (x is int i) { ... print(i.toRadixString(16)); ...}, where i is bound in the code directly guarded by the check when the check occurs in a branch-test.

This can, possibly, allow us to generalize all these positions to accept patterns without changing the behavior of currently used syntax.
Or maybe it's too complicated.

munificent added a commit to dart-lang/dart_style that referenced this issue Aug 26, 2022
dart-lang/language#2137 (comment)

In particular:

- Instead of if-var statements, use `is <pattern>` as the condition.
- Require "var" before variables in cases.
@munificent
Copy link
Member

I went ahead and tried to apply the contextual syntax you describe here to my test refactoring of dart_style to use patterns. It looks like this. My impression:

  • For the most part, code just gets longer compared to the previous version which treats identifiers as binders in cases. That is quite common and now all of them need var. I sort of like that it's a little more explicit, but the verbosity is a bummer. I don't like that it makes patterns in cases look/behave differently from parameter lists and patterns in variable declarations. Even during this very small refactoring, I sometimes forgot what context I was in and added a var when it wasn't needed or forgot to add one where it was.

  • Doing if (<expr> is <pattern>) instead of if (var <pattern> = <value>) looks basically fine to me, but I could go either way.

  • I turned this:

    if (output == Output.show || output == Output.json) ...

    into:

    if (output is Output.show | Output.json) ...

    This is kind of neat, and could be handy if the expression being checked isn't a local variable and you don't want to evaluate it multiple times. An even stranger one was turning:

    if (char != 0x20 && char != 0x09 && char != 0x0a && char != 0x0d)

    into:

    if (char is! 0x20 | 0x09 | 0x0a | 0x0d)

    The code golfer in me finds this very cute and clever. But, honestly, it really feels like we'd be turning Dart into Perl where there's multiple ways to write any given expression. I really don't want to go down that road.

    The current proposal also allows something similar but you have to torture the syntax a bit to get to it:

    if (var (!= 0x20 & != 0x09 & != 0x0a & != 0x0d) = char)

    Heaven help anyone who tries to write that.

@lrhn
Copy link
Member Author

lrhn commented Aug 27, 2022

if (char is! 0x20 | 0x09 | 0x0a | 0x0d)

I like code golf as much as the next geek, but this is a severe parsing ambiguity since | is an infix operator. It's even defined on bool (just not to accept an int). The pattern may need to be delimited a little.

Putting the pattern first does achieve this by delimiting it by ( .... =, but only at the cost of not allowing && and || combinations of pattern checks, which is one of the things this issue tries to fix.

Having more than one way to do things is what patterns are for. You can write if (list.length == 2) { var v1 = list[0]; if (v1 is int) { var v2 = list[1]; if (v2 is int) { SOMETHING }}}, but if ([int v1, int v2] = list) { SOMETHING } is shorter.
That's OK, it's shorter and more readable. If a pattern check is shorter, but not more readable, you should not use the pattern.

For example, if (var == x = y) isn't very readable. :)

@munificent
Copy link
Member

For example, if (var == x = y) isn't very readable. :)

For what it's worth, the patterns proposal explicitly disallows that (since it's weird and unreadable) by only allowing a subset of patterns to appear as the outermost pattern in a pattern declaration context.

@munificent
Copy link
Member

I'm going to go ahead and close this because, which I wish we could make it work, I think type literals and some other syntactic hard/soft constraints make it too much of an uphill battle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patterns Issues related to pattern matching.
Projects
None yet
Development

No branches or pull requests

5 participants