-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntax for lists and arrays #2124
Comments
How are you thinking about using Arrays? Are they just the underlying datatype? Or they would have some function in the PRQL language itself? Are you thinking a map / dict is also a tuple? i.e. "(JSON-like)"? For me the central tradeoff is:
|
Primarily, we need them to be the building block of what a "relation" is. In the future, we will probably want to support actual arrays as a column data type.
No, a tuple has known number of entries at the compile time. Map / dict would be another primitive data structure, which would need its own syntax. I don't see a use-case for maps now, but I do think that we should support them eventually. Which means it would be wise to think about their syntax now.
Exactly. I'm leaning toward violating the "Almost every language uses parentheses", but I recognize that it's far from ideal. |
As of #2585, we have:
The main issue I see here is that these are very different from almost every other language, for both tuples & arrays! And there's an open question of what we can do when we want to represent a struct. One hard-backward-incompatible change would be to do:
I'm worried that it's very backward incompatible. (Though maybe this is the one upside of not having many users yet...). But apart from that, it has a few advantages:
It would also serve our current ambiguity around whether lists contain assignments: group {title, country} (
aggregate {
average gross_salary,
sum_gross_cost = sum gross_cost, # this was in a list but contains an assignment?
}
) The commented item above is instead an item in a tuple, which can contain assignments. What do others think? One alternative is to retain backward compatibility, but use |
An important point that you make is that tuples are semantically equivalent to records and structs. I'm +2 on that. I'd treat maps / dicts distinct from tuples, as their names are not known at compile time. With that out of the way, the syntax. I've been advocating for the I'm not that concerned with backward compatibility that much - the long-term benefits outweigh the transition cost now. I also believe that @snth once noted that he would also prefer |
{[a = 5, b = false], [a = 6, b = true]}
filter b == true to [{a = 5, b = false}, {a = 6, b = true}]
filter b == true looks good! The change to a syntax closer to the commonly used json would be worthwhile, I think. |
I discussed on the dev call with @snth — he's keen too... |
Would this be a straight swap? Is there anything that would remain (I think anything but a straight swap would be more difficult, though it's still helpful to think through examples and confirm the theory corresponds to the vision) |
It should be a straight swap. For arrays of join conditions too. (this is close to another thing I'd like to change: join should not take an array of conditions if we changed filter not to do so. I'd be clearer what's happening if we had |
Do we also rename "list" to "tuple"? I mean in documentation and code. Having |
In #2612 @max-sixty mentioned that it would be possible to change
See a few comments there about pros and cons. I'll post a few more examples here:
|
Just today, I'm hitting into this error frequently: .query::<String, _>(
"SELECT ...",
&(state.user_id),
);
... and it's not obvious that I'm missing a tuple so this becomes a tuple: .query::<String, _>(
"SELECT ...",
- &(state.user_id),
+ &(state.user_id,),
); Thus I'm -1 for To lessen the naming problem, we could call it |
As I commented on #2612 , I'm also -1 on |
I agree with this (see also #2612 (comment)). What about having a keyword to delineate the optional join predicate instead of the parentheses? One could go with on as in SQL, e.g. Alternatively use I'm not particularly attached to this proposal but thought I would just throw it out there and type up some examples to see what it would look like and to see what others think. |
While we're making sweeping changes, what about using Another line of argument is that I've sometimes been a bit confused for a few seconds as to whether Also, the |
Let's not start making sweeping changes only because we have just done some :D We have not reached 1.0 yet, so we can make them anytime - if they are justified. I've opened two issues about your last two comments. Let's keep this issue about syntax of tuples and arrays. |
@max-sixty @eitsupi what are your opinions on:
|
If no one else had commented, I'd be +0.1. Others make good points, and I think it's dangerous, so I'm -0.5. We can also add it later, but would be very expensive to remove, given we'd be removing only half of the usages.
Nothing is perfect:
WDYT? |
I dislike name |
I really like the As to whether it should be |
For me one sign of good notation is when it brings out relationships or patterns that might not have been apparent before. I had that experience during the last dev call when @max-sixty was taking me through the Given the use of Now what about the following query? from invoices
group {customer_id} (
aggregate {
my_total = sum total
}
) Seeing The query above translates to the following SQL SELECT
customer_id,
SUM(total) AS my_total
FROM
invoices
GROUP BY
customer_id which does allow defining an alias for I think in PRQL you currently would need to have another This is something I had never noticed before until I saw it written with the |
Actually, this query:
... produces: SELECT
customer_id AS c_id,
SUM(total) AS my_total
FROM
invoices
GROUP BY
customer_id ... on the current version of the compiler already! This really is validation of good language design. Ok, I think we can say that these proposals:
... are rejected. |
Very impressive! I didn't even know it could do that... |
When you make language constructs independent, the orthogonality kicks in :D |
Yes, that's amazing! I thought I tried it in the playground last night and it didn't work but I just tried again and it also works with the old syntax in 0.8.1. Just shows that because of the list association with |
This is continuation of discussion in #1964 (comment)
As of the latest update to RFC about type system (see linked PR), PRQL will have two container types:
tuples, which have a known number of entries that can have different types,
arrays, which have an unknown number of entries all of the same type.
Currently PRQL has "lists" with same semantics as the ones described for tuples. To limit confusion in this thread, let's use terms "tuple", "array" and "list" as I described it here and be clear when proposing changes to the names.
With terminology out of the way, let's talk about the problem: syntax.
As I see it, there are two options:
{}
for arrays). Try to preserve current syntax. Use[]
for tuples,{}
for arrays,{}
for tuples and[]
for arrays.The text was updated successfully, but these errors were encountered: