docs: RFC for type system #1964

aljazerzen · 2023-02-26T13:44:47Z

No description provided.

for more information, see https://pre-commit.ci

max-sixty

Overall great! And thanks for writing it up!

Very much agree with the approach of starting simple — e.g. just int rather than int64 — and seeing where we get to. At least in SQL, that's going to give us almost all the benefits, like an error on trying to add an int & a string.

One thing we could consider — though maybe you've considered it and rejected it, is to model the type system on something like DuckDB. This is what @ghalimi suggested. To the extent that the industry coalesces on that standard, it'd be good. To the extent it's just another standard, it's less good, but could still be a helpful template.

That said, I think the approaches aren't that different — both would start with some basic types and attempt to map in other type systems. And there isn't an easy way of using DuckDB's types in code (it's not in rust), so we're just using the spec.

My guess is that a core challenge here will be adding in types to the library gradually — e.g. can we get an error on 5 + "foo" with reasonable work? There's so much we could do here — and likely we'll get lots of utility from the first 20% of work.

book/src/internals/type-system.md

ghalimi · 2023-02-26T22:59:15Z

Overall great! And thanks for writing it up!

Very much agree with the approach of starting simple — e.g. just int rather than int64 — and seeing where we get to. At least in SQL, that's going to give us almost all the benefits, like an error on trying to add an int & a string.

One thing we could consider — though maybe you've considered it and rejected it, is to model the type system on something like DuckDB. This is what @ghalimi suggested. To the extent that the industry coalesces on that standard, it'd be good. To the extent it's just another standard, it's less good, but could still be a helpful template.

That said, I think the approaches aren't that different — both would start with some basic types and attempt to map in other type systems. And there isn't an easy way of using DuckDB's types in code (it's not in rust), so we're just using the spec.

My guess is that a core challenge here will be adding in types to the library gradually — e.g. can we get an error on 5 + "foo" with reasonable work? There's so much we could do here — and likely we'll get lots of utility from the first 20% of work.

DuckDB or Iceberg. They're highly compatible with each other, but we like DuckDB better because it offers more control regarding encoding sizes. Here is a mapping table between the two: https://github.com/sutoiku/puffin/blob/main/TYPES.md

adriangb · 2023-02-26T23:31:30Z

I like the comparisons with Postgres, SQLite and DuckDB. I think it would also be interesting to compare to Apache Arrow

ghalimi · 2023-02-26T23:54:00Z

I like the comparisons with Postgres, SQLite and DuckDB. I think it would also be interesting to compare to Apache Arrow

Apache Arrow is a great candidate for that as well, and is really close to DuckDB. You can't go wrong with Arrow.

ghalimi · 2023-02-27T00:41:10Z

I like the comparisons with Postgres, SQLite and DuckDB. I think it would also be interesting to compare to Apache Arrow

Here is a mapping table between DuckDB, Arrow, and Iceberg types:

https://github.com/sutoiku/puffin/blob/main/TYPES.md

Apache Arrow is probably more "universal", but DuckDB's types look a bit cleaner to me. It's purely cosmetic though.

max-sixty · 2023-02-27T08:16:15Z

To what extent should we have a single type system for:

"PRQL types", such as relation / column / function, and
"Data types", such as int / string?

My impulse is that they should be common, at least at the "front-end". If so, possibly we should add those to the doc. (Am happy to do it if you'd like me to)

aljazerzen · 2023-02-27T11:58:00Z

@max-sixty

To what extent should we have a single type system for:

I've added a section on physical layout that talks about this (if I understood correctly what you meant).

In summary, we don't have to define "data types" right now.

aljazerzen · 2023-02-27T12:01:29Z

Regarding DuckDB and Apache Iceberg: we will surely use them as a base for defining primitives, but also steer away from their handling of null and a few other quirks.

ghalimi · 2023-02-27T12:28:22Z

Regarding DuckDB and Apache Iceberg: we will surely use them as a base for defining primitives, but also steer away from their handling of null and a few other quirks.

Nulls are painful...

RandomFractals · 2023-02-27T14:17:08Z

Why not just use standard Arrow data types?

I believe those are more common and with more databases and systems like Data Fusion and Polars providing native arrow and parquet data support, plus Arrow Flight and ADBC, PRQL target system submodules could just convert arrow data types to the target system defined/supported data types.

See this api for example or arrow data types defined for other languages on their site:

https://arrow.apache.org/docs/python/api/datatypes.html

aljazerzen · 2023-02-27T16:51:56Z

@RandomFractals as I've written, this is not talking about physical layout of types, but about logical type system in PRQL. For physical layout we will use a standard, probably Apache Arrow, but this is not needed right now.

I am strongly against just adopting type system that is vaguely defined for Apache Arrow Python client or the type system of DuckDB, because it does not cohere principles described in the proposal. Again, for example, handling of null values.

Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>

vanillajonathan · 2023-02-27T23:30:31Z

I really like the data types int32 and int64. I've been thinking about database type systems and those two data types are exactly what I had in mind.

But what about int8 and int16?
We would need something to map to tinyint and smallint.

Note: Interestingly enough Rust have a i128 and a u128 data type.

`datetime` instead of `timestamp`?

As for timestamp maybe this should be named datetime?
It is datetime in Microsoft SQL Server (documentation) which also offers the date and timedata types, so if we were to offer a date and time data type (which I think we should) then datetime would be much more suitable, coherent and predictable than timestamp.

A boolean data type?

Maybe there could be a boolean data type?
But keep in mind that MySQL have a built-in boolean data type, while Microsoft SQL Server does not and instead uses the bit data type.

A string data type?

Most programming languages have a string datatype so that is what developers, especially front-end developers and NoSQL developers are accustomed to, but SQL uses varchar (and char). Oh, and Microsoft SQL Server also have nvarchar and nchar for Unicode.

Vendor-specific data types?

Some databases have non-standard vendor-specific data types, such as jsonbin PostgreSQL. One thing that bothers me about this is that just by looking at a database schema it is difficult to discern whether a data type is a standard data type or a non-standard data type, and to which DBMS it belongs to.

CSS deals with this with vendor prefixes, example Chrome might have their vendor-specific CSS property called -webkit-user-drag and Firefox might have a vendor-specific CSS property named -moz-radial-gradient, then these are denoted as such by the -webkit- and -moz- prefix respectively.

Media types addresses this with "vendor trees", example application/vnd.microsoft.portable-executable.

aljazerzen · 2023-02-28T09:28:17Z

book/src/internals/type-system.md

+type int
+type float
+type bool
+type text


@vanillajonathan
A string data type?

Most programming languages have a string datatype so that is what developers, especially front-end developers and NoSQL developers are accustomed to, but SQL uses varchar (and char). Oh, and Microsoft SQL Server also have nvarchar and nchar for Unicode.

I've used "text" here, because we found term "string" too programmer-like, which may be confusing to non-tech-savy people (see #286 (comment)). Apart from being consistent in the language, I don't have a strong preference here.

Whatever we choose, this type does not have a specific physical layout, by which I mean that I could be implemented by VARCHAR in one DBMS and by NVARCHAR in another.

FYI this was me that suggested from_text rather than from_string.

I'm actually +1 on using string here, since this part is much more engineer-y, and the "Excel poweruser archetype user" won't be interacting with these.

aljazerzen · 2023-02-28T09:29:53Z

book/src/internals/type-system.md

+# built-in types
+type int
+type float
+type bool


@vanillajonathan
A boolean data type?

Maybe there could be a boolean data type?
But keep in mind that MySQL have a built-in boolean data type, while Microsoft SQL Server does not and instead uses the bit data type.

There is one proposed. And we really need it, because it's the type of all comparison expressions.

(minor point — I was the person who thought from_string was too programmer-y. I'm less against it here, since types will predominantly be handled by the programmer-y folks rather than those more used to Excel)

max-sixty · 2023-02-28T09:33:12Z

I've added a section on physical layout that talks about this (if I understood correctly what you meant).

Yes I think so. My main point was that we already have a type system for "PRQL types" such as functions and relations — attempting to compile select (f a) will fail if f isn't a function`.

I'm guessing that will be part of this type system, even though our function type will never map to a DB type, because it's evaluated before it gets to the DB...

aljazerzen · 2023-02-28T09:35:02Z

book/src/internals/type-system.md

+A similar example is a "string array type" in PTS that could be represented by
+an `text[]` (if DBMS supports arrays) or `json` or it's variant `jsonb` in
+Postgres. Again, the representation would affect operators: in Postgres arrays
+can be access with `my_array[1]` and json uses `my_json_array -> 1`. This
+example may not be applicable, if we decide that we want a separate JSON type in
+PST.


@vanillajonathan
Vendor-specific data types?

Some databases have non-standard vendor-specific data types, such as jsonbin PostgreSQL. One thing that bothers me about this is that just by looking at a database schema it is difficult to discern whether a data type is a standard data type or a non-standard data type, and to which DBMS it belongs to.

CSS deals with this with vendor prefixes, example Chrome might have their vendor-specific CSS property called -webkit-user-drag and Firefox might have a vendor-specific CSS property named -moz-radial-gradient, then these are denoted as such by the -webkit- and -moz- prefix respectively.

Media types addresses this with "vendor trees", example application/vnd.microsoft.portable-executable.

What I meant here, is that we may be able to support vendor-specific data types jsonb in Postgres by defining them as normal structs and lists in PRQL and mark them as having a "Postgres JSON" representation.

This would tell the compiler to use postgres's operators (i.e. ->) for implementing standard PRQL operations (in this case field indirection - accessing a field of a struct).

aljazerzen · 2023-02-28T09:43:54Z

book/src/internals/type-system.md

+
+type invoices = {[
+    invoice_id = int64,
+    issued_at = timestamp,


@vanillajonathan
datetime instead of timestamp?

As for timestamp maybe this should be named datetime?
It is datetime in Microsoft SQL Server (documentation) which also offers the date and timedata types, so if we were to offer a date and time data type (which I think we should) then datetime would be much more suitable, coherent and predictable than timestamp.

I'm torn on this question.

datetime is more user-friendly, but also a bit misleading in that it refers to a date and a time including a timezone, when in fact it is refering to a moment, which is timezone independent.

My feeling here is that we should find a good (modern) API that deals with time and build on top of their findings.

A solid API that hasn't disappointed me yet is chrono crate

Also Java 8 java.time package

Fully agreed on using a good modern API for datetime logic. A major source of logic bugs in data transformations comes from datetime handling.

Chrono is definitely a great reference and personally I think there should definitely be at least two native datetime types for sql usage: datetime (with TZ) and naive_datetime (without TZ).

Filed here: #2125

book/src/internals/type-system.md

aljazerzen · 2023-02-28T09:56:58Z

I'm guessing that will be part of this type system, even though our function type will never map to a DB type, because it's evaluated before it gets to the DB...

Yes, this is meant to replace or extend the type system we have now. It was built without much design work, so I lacks the big picture view.

And yes, I completely forgot about the types of functions :D

book/src/internals/type-system.md

eitsupi · 2023-03-01T09:40:21Z

book/src/internals/type-system.md

+type my_int = int
+
+# sum type (union or enum)
+type number = int | float


It's a naive question, but I think it's hard to understand if pipe and OR use the same symbol |.
Couldn't we use some other symbol here?
(I don't know of any other symbol that would be better suited for OR except ||, so it might be better to leave it as is.)

Yes, I agree - using same syntax as pipe it not ideal.

I think having || is better.

Wait, I just realized that we already use or for logical disjunction. Can we use it here?

Let's say we have two sets:

let a = {1, 2, 3} let b = bool # does this make sense? let union_of_a_and_b = a or b # or is the better? let union_of_a_and_b = a + b

P.S. Python uses | (binary or) for unioning sets.

I agree that it is better to unify either "&& and ||" or "and and or".
I prefer the former, but given that it is easier to distinguish from |, maybe the latter is better.

vanillajonathan · 2023-03-01T14:06:38Z

Very much agree with the approach of starting simple — e.g. just int rather than int64.

I disagree.

What exactly really is int?

It is ambiguous.

Is it an alias to int32?
Is it an alias to int64?
Is it an union type that means either int32 or int64?

I believe introducing int would be a mistake. Ambiguity is bad.

aljazerzen · 2023-03-01T14:59:39Z

What exactly really is int?

It is an integer number. Without specified size, but with many arithmetic operations defined. That is enough for now, as the size does not affect any of our translations. If needed, we can define it as a union of all intXX types in the future.

I wouldn't say this is ambiguous - type definition and all of its operations are well defined. But it may be too "dynamic"...

eitsupi · 2023-03-04T14:18:38Z

Perhaps a large number of types like Apache Arrow data types will be needed when implementing something like cast function?

snth

Added suggestions

book/src/internals/type-system.md

snth · 2023-03-05T13:55:16Z

As hinted at before, I have some suggestions for the syntax of the type system.

The principles guiding my suggestions are:

Try to reuse whatever syntax we have already.
Where we need to introduce new syntax, try to keep this consistent with other popular conventions out there.

We currently use angle brackets <> for type annotations. I have mostly seen this used in discussions in Github issues and in the PRQL std lib. To me it seems like <> acts as a "type constructor" operator (I'm not sure if that's completely correct in the most technical sense of the word) in the sense that it takes an expression and turns it into a type. I think we should try to keep that and ensure that whatever syntax we decide on stays consistent with that. What I mean by that is the following:

While for function defintions we write

func interp lower:0 higher x -> (x - lower) / (higher - lower)

this is equivalent to

let interp = lower:0 higher x -> (x - lower) / (higher - lower)

but the first form is preferred for readability.

Similarly for types we could(/should) have that
```
type number = int | float
```
is equivalent to
```
let number = <int | float>
```
where again the first is preferred for readability but they are equivalent.

Other syntax that we have already firmly established is (square) brackets [] for lists. I think we should carry this over into the type system rather than introducing {} for this. Moreover, in terms of point 2. above, {} is quite commonly used for struct types in other languages such as Javascript, JSON, Python, ... .

Moreover, we have already established : as syntax for named parameters, as for example in the interp function definition above. So then if we view {} as a "struct constructor", then it would make sense that it takes named parameters with : and purely positional ones (for tuples) simply separated by ,. This has the nice consequence that the resulting syntax is very similar, if not identical to, JSON.

Therefore I propose rewriting the Syntax section as follows:

Syntax

# built-in types
type int
type float
type bool
type text
type char
type null

# users-defined types
type my_int = int

# sum type (union or enum)
type number = int || float
type scalar = number || bool || str || char

# by default types are not nullable
type my_nullable_int = int || null

# user-defined enum
type open
type pending
type closed
type status = open || pending || closed

# product type
type my_struct = {id: int_nullable, name: str}
type my_timestamp_with_timezone = {int, text}

# list type
type list = [T]

# relations are list of structs
type my_relation = [{
	id: int,
	title: text,
	age: int
}]

type invoices = [{
    invoice_id: int64,
    issued_at: timestamp,
    labels: [text]

    #[repr(json)]
    items = [{
        article_id: int64,
        count: int16 where x -> x >= 1,
    }],
    paid_by_user_id: int64 || null,
    status: status,
}]

WDYT?

EDIT: I didn't remember about the #[repr(json)] bit and will have to reread that and comment on that another time.

aljazerzen · 2023-03-05T19:24:19Z

book/src/internals/type-system.md

+type status = open | pending | closed
+
+# product type
+type my_struct = [id = int_nullable, name = str]


@snth
Other syntax that we have already firmly established is (square) brackets [] for lists. I think we should carry this over into the type system rather than introducing {} for this. Moreover, in terms of point 2. above, {} is quite commonly used for struct types in other languages such as Javascript, JSON, Python, ... .

Moreover, we have already established : as syntax for named parameters, as for example in the interp function definition above. So then if we view {} as a "struct constructor", then it would make sense that it takes named parameters with : and purely positional ones (for tuples) simply separated by ,. This has the nice consequence that the resulting syntax is very similar, if not identical to, JSON.

There is a clash between what we now call "lists" and what list is in this PR.

Our select and derive take something that has a fixed number elements, they can be of different types and they can be named. This definition collides with what this PR proposes under the name "struct". So effectively, this PR renames "lists" to "structs".

E.g.:

select [1.4, false, "foo"] # this is now named "struct"

It also proposes a new construct named "list" which acts as "a repeated type" - number of repetitions is unknown at compile time, all elements are of the same type and they cannot be named. Think of a column in a relation.

Now, there are two options for the syntax:

(what this PR currently proposes) try to preserve current syntax: use [] for structs and {} for lists,

(what you described) mimic JSON and basically all other languages: use {} for structs and [] for lists.

This definition collides with what this PR proposes under the name "struct". So effectively, this PR renames "lists" to "structs".

E.g.:

select [1.4, false, "foo"] # this is now named "struct"

I hadn't thought about it like that, but I think that's quite accurate.

IIUC most DBs name their List type as Array, so we could have

I'm keen to keep the PRQL construct of select [a, b, c] be called something simple, like List (and not Struct), since it's a very simple type that everyone would encounter very early.

Whereas the "repeated type" is more advanced and OK to call something technical.

We'd then have lists and arrays. And lists would represent relation rows.

What about tuples and arrays? Is the name "tuple" approachable for non-programmer people? Would it be confusing that [] is used for tuples?

I think that tuple is very close to relational algebra and would be a good fit here.

I think that tuple is very close to relational algebra and would be a good fit here.

It's a bit foreign but I agree it's very close to relational algebra. Definitely fine in the typing docs; I'm less keen on having the opening example explaining PRQL be "Then we pass a tuple of [1.4, false, "foo"] to select"...

It's also unfortunate that our [] syntax is not standard tuple syntax. Though this could give us an opportunity to make our parentheses syntax even more complicated! (no)

And lists would represent relation rows.

I didn't have this in my mind — if anything the List would be closer to columns, or values. (Or maybe that's what you mean? Values in one row?)

This naming gets confusing fast :D Let me clarify:

lists (currently expressed with []) or tuples (proposed) correspond to rows.

arrays (proposed) correspond to columns.

(edit: possibly we should push the syntax changes off to a different thread, unless it's required for typing decisions, sorry for sidetracking us)

from my_table select {num = 1.4, truthy = false, bar = "foo", double_col = col * 2} group {col_2} (derive a = sum double_col) sort {num, truthy}

I agree with the internal elegance of this. And we can say "select {a} is equivalent to select {a=a}.

But sort {num, truthy} is still more alien than sort [num, truthy], so I think we should pause before making this change. For most people who aren't programmers, the squiggly brackets are barely used!

Would it be possible to further clarify what we mean by "correspond to {rows,columns}"?

Right now, this is the way I'd model our relations:

relation = array of tuples tuple = a list of named types

From this I hope it's obvious what I mean when I say that a relation row is expressed as a tuple.

For most people who aren't programmers, the squiggly brackets are barely used!

I'd say the same thing for square brackets.

I agree with @max-sixty that we should pause for a while before a change like this (if we went ahead) because it's quite fundamental. I'm actually pro it and would also favour bringing things around to be more consistent with JSON type syntax but IMHO we should take care to try and think through as many of the repercussions as we can because we don't want to make changes like this to often.

So, in terms of current release numbers I would propose something like the following (numbers are just indicative and could be incremented appropriately depending on when the work gets done):

0.6.0: release imminent, pretty much in "feature freeze" mode

0.7.0: more QOL improvements, (settle syntax deliberations)

0.8.0: last release with old syntax, advertise upcoming syntax change

0.9.0 or 0.10.0: release new syntax

My main point is that given that it might be a fair amount of work to rewrite existing queries, people should be given adequate warning and they should still be given the opportunity to benefit from the next round or two of QOL improvements before having to rewrite their queries for the new syntax.

Maybe I'm too conservative but that's how I would go about it.

I agree that we should probably move this to the Github Discussions section since we seem to have stumbled onto a bigger topic than we anticipated.

tuple = a list of named types

OK, makes sense.

(Let's move the syntax conversation elsewhere, I'm muddying the discussion...)

Let's continue the discussion here: #2124

aljazerzen · 2023-03-05T19:42:35Z

book/src/internals/type-system.md

+type null
+
+# users-defined types
+type my_int = int


@snth
We currently use angle brackets <> for type annotations. I have mostly seen this used in discussions in Github issues and in the PRQL std lib. To me it seems like <> acts as a "type constructor" operator (I'm not sure if that's completely correct in the most technical sense of the word) in the sense that it takes an expression and turns it into a type. I think we should try to keep that and ensure that whatever syntax we decide on stays consistent with that. What I mean by that is the following:

While for function defintions we write
func interp lower:0 higher x -> (x - lower) / (higher - lower)
this is equivalent to
let interp = lower:0 higher x -> (x - lower) / (higher - lower)
but the first form is preferred for readability.

Similarly for types we could(/should) have that
type number = int | float
is equivalent to
let number = <int | float>
where again the first is preferred for readability but they are equivalent.

This is a great point: I haven't really defined how types interact with values and what < > means.

I think we should define a type of a variable to be "a set of all possible values of that variable". And if types are just sets, any type expression are just set expressions (e.g. int || null is a union of two sets).

param <X> would then mean "param can accept any value from set X".

I'd say that the syntax for your last example should then be:

type number = int || float let number = int || float

I think we should define a type of a variable to be "a set of all possible values of that variable".

Nice, I like this!

And if types are just sets, any type expression are just set expressions (e.g. int || null is a union of two sets).

Minor nit: do we distinguish between singleton sets and the value they contain or are values automatically upgraded to singleton sets? For example technically it should probably be

nullable_int = int || {null}

but I think it might make sense to treat values the same as singleton sets containing them as long as that doesn't cause any problems elsewhere.

do we distinguish between singleton sets and the value they contain

Technically we should distinguish, but "automatic coercion to singletons" you propose would work.

The thing is that apart from this, we don't need syntax for sets so we don't need to use {} and it remains available for other applications.

vanillajonathan · 2023-03-05T20:34:09Z

@snth For types declared over multiple lines, I think it would be nice have to use squrae brackets, so that curly braces would be enough, just like in the majority of other languages.

vanillajonathan · 2023-03-07T12:54:44Z

I think there should be a uuid type. Example 123e4567-e89b-12d3-a456-426614174000.

RandomFractals · 2023-03-07T17:49:09Z

Frankly, if we have this many comments in PR, maybe we should start/use design/feature ticket or discussions next time to get consensus and iron out the details. PRs this long should never happen mainly b/c not many github users actually read PR comments, but they do read issue comments and discussions. And if you want to involve your dev/users community in these discussions, maybe move them out to discussions next time. :)

max-sixty · 2023-03-07T21:36:38Z

if we have this many comments in PR, maybe we should start/use design/feature ticket or discussions next time to get consensus and iron out the details.

At the risk of getting meta / having another thread of discussion here — I do think anchoring in a specific proposal is very helpful; rather than a discussion on "what does anyone want to see in our type system".

I find this a very difficult problem. There are tradeoffs between gathering more ideas vs. having a focused discussion; between collecting a wider range of viewpoints vs. having a more informed discussion.

A couple of things I've found helpful:

Anchor in an actual proposal
Move other threads elsewhere, to keep the discussion focused
Produce a summary, so folks don't need to read the whole thing to catch up on the conclusions. Can do midway through to outline the current state of the the discussion and then restart.

We get (1) & (3) by working on this doc and then merging it, and then having another discussion if we need one. I'm fine to have that discussion in GH Discussion if folks prefer...

Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> Co-authored-by: Tobias Brandt <Tobias.Brandt@gmail.com>

book/src/internals/type-system.md

eitsupi

Perhaps missed replacements?

book/src/internals/type-system.md

Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com>

aljazerzen · 2023-03-12T11:47:44Z

I've updated the RFC with discussed changes. Mainly these are under sections Theory, Type annotations, Type definitions and Container types. I still have to expand definitions to include functions.

max-sixty · 2023-03-14T01:41:29Z

book/src/internals/type-system.md

+> For any undefined terms used in this section, refer to set theory and
+> mathematical definitions in general.
+
+A "type of a variable" is a "set of all possible values of that variable". This


No strong opinion, but isn't the word type more familiar than the word set? I get the theory here. For the sake of our docs, I'd favor going back to type when these become docs....

going back to type

Changing to "or of two types represents a union of those two types"?

My focus here iz the discussion below where we start to refer to types exclusively as sets. I support making the link between the concepts, I just think when we start documenting this, we should use the term "type" rather than "set".

I see. I'm using term "set", because it's more suitable here in theory section and because it will be implemented this way - type declarations will be just variable declarations of sets.

Let's hold on with this until we convert this to documentation.

max-sixty · 2023-03-14T01:42:47Z

book/src/internals/type-system.md

+But similar to how both `func` and `let` can be used to define functions (when
+we introduce lambda function syntax), let's also define syntactic sugar for type
+definitions:
+
+```
+# these two are equivalent
+let my_type <set> = set_expr
+type my_type = set_expr
+```


FWIW I'm increasingly in favor of this approach — i.e. functions can be let add = a b -> a + b

OTOH having two ways of defining these is not ideal. So we could only do the first, and lint towards including the type annotation for set/types & functions...

max-sixty · 2023-03-14T01:43:29Z

book/src/internals/type-system.md

+**Tuple** is the only product type in PTS. It contains n ordered fields, where n
+is known at compile-time. Each field has a type itself and an optional name.
+Fields are not necessarily of the same type.
+
+In other languages, similar constructs are named struct, tuple, named tuple or
+(data)class.
+
+**Array** is a container type that contains n ordered fields, where n is not
+known at compile-time. All fields are of the same type and cannot be named.
+
+**Relation** is an array of tuples.


Very clear, thanks!

max-sixty · 2023-03-14T01:51:49Z

book/src/internals/type-system.md

+- Binary operation `or` of two sets represents a union of those two sets:
+
+  ```
+  let number = int or float


Separate issue filed at #2170

for more information, see https://pre-commit.ci

aljazerzen · 2023-03-14T21:04:05Z

I'm merging this now, as it is generally finished. That does not mean that all details described here are set in stone, but rather that the outline of the type system is ready for further discussion in separate threads.

Thank you all for comments!

aljazerzen mentioned this pull request Feb 26, 2023

Type system #1965

Open

docs: RFC for type system

a5b360b

aljazerzen force-pushed the rfc-types branch from 1ee2190 to a5b360b Compare February 26, 2023 13:50

[pre-commit.ci] auto fixes from pre-commit.com hooks

c99b306

for more information, see https://pre-commit.ci

max-sixty approved these changes Feb 26, 2023

View reviewed changes

aljazerzen added 2 commits February 27, 2023 12:47

physical layout

2fddd11

an example

0a37ed2

Update book/src/internals/type-system.md

37324dc

Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>

aljazerzen commented Feb 28, 2023

View reviewed changes

book/src/internals/type-system.md Outdated Show resolved Hide resolved

eitsupi reviewed Feb 28, 2023

View reviewed changes

book/src/internals/type-system.md Outdated Show resolved Hide resolved

eitsupi reviewed Mar 1, 2023

View reviewed changes

snth reviewed Mar 5, 2023

View reviewed changes

aljazerzen commented Mar 5, 2023

View reviewed changes

aljazerzen and others added 3 commits March 11, 2023 11:48

Typos and minor fixes

31b3bbe

Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> Co-authored-by: Tobias Brandt <Tobias.Brandt@gmail.com>

Theory section

394336a

cleanup, rewording

e14d6f2

aljazerzen force-pushed the rfc-types branch from a414c15 to e14d6f2 Compare March 12, 2023 11:20

eitsupi reviewed Mar 12, 2023

View reviewed changes

book/src/internals/type-system.md Outdated Show resolved Hide resolved

eitsupi reviewed Mar 12, 2023

View reviewed changes

book/src/internals/type-system.md Outdated Show resolved Hide resolved

book/src/internals/type-system.md Outdated Show resolved Hide resolved

aljazerzen mentioned this pull request Mar 12, 2023

Syntax for lists and arrays #2124

Closed

Apply suggestions from code review

0ea6cc4

Co-authored-by: eitsupi <50911393+eitsupi@users.noreply.github.com>

aljazerzen mentioned this pull request Mar 12, 2023

Temporal types #2125

Open

max-sixty reviewed Mar 14, 2023

View reviewed changes

max-sixty approved these changes Mar 14, 2023

View reviewed changes

eitsupi mentioned this pull request Mar 14, 2023

Infix operators — and & or vs && & || #2170

Closed

aljazerzen and others added 3 commits March 14, 2023 21:36

Merge remote-tracking branch 'origin/main' into rfc-types

5eadc77

Functions

358befd

[pre-commit.ci] auto fixes from pre-commit.com hooks

ac318b9

for more information, see https://pre-commit.ci

aljazerzen merged commit cb13428 into main Mar 14, 2023

aljazerzen deleted the rfc-types branch March 14, 2023 21:04

Porkepix mentioned this pull request Apr 2, 2023

prql-compiler 0.7.0 Homebrew/homebrew-core#127326

Merged

snth mentioned this pull request May 14, 2024

Arrow type system as base for the type system #4472

Open

docs: RFC for type system #1964

docs: RFC for type system #1964

Conversation

aljazerzen commented Feb 26, 2023

max-sixty left a comment

Choose a reason for hiding this comment

ghalimi commented Feb 26, 2023

adriangb commented Feb 26, 2023

ghalimi commented Feb 26, 2023

ghalimi commented Feb 27, 2023

max-sixty commented Feb 27, 2023

aljazerzen commented Feb 27, 2023

aljazerzen commented Feb 27, 2023

ghalimi commented Feb 27, 2023

RandomFractals commented Feb 27, 2023 • edited Loading

aljazerzen commented Feb 27, 2023

vanillajonathan commented Feb 27, 2023

datetime instead of timestamp?

A boolean data type?

A string data type?

Vendor-specific data types?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

max-sixty commented Feb 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aljazerzen commented Feb 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vanillajonathan commented Mar 1, 2023

aljazerzen commented Mar 1, 2023

eitsupi commented Mar 4, 2023

snth left a comment

Choose a reason for hiding this comment

snth commented Mar 5, 2023 • edited Loading

Syntax

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aljazerzen Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

max-sixty Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vanillajonathan commented Mar 5, 2023

vanillajonathan commented Mar 7, 2023

RandomFractals commented Mar 7, 2023 • edited Loading

max-sixty commented Mar 7, 2023 • edited Loading

eitsupi left a comment

Choose a reason for hiding this comment

aljazerzen commented Mar 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

max-sixty Mar 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aljazerzen commented Mar 14, 2023

RandomFractals commented Feb 27, 2023 •

edited

Loading

`datetime` instead of `timestamp`?

snth commented Mar 5, 2023 •

edited

Loading

aljazerzen Mar 7, 2023 •

edited

Loading

max-sixty Mar 7, 2023 •

edited

Loading

RandomFractals commented Mar 7, 2023 •

edited

Loading

max-sixty commented Mar 7, 2023 •

edited

Loading

max-sixty Mar 14, 2023 •

edited

Loading