Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PageRequest without Sort is allowed #1010

Closed
mipo256 opened this issue Mar 11, 2025 · 24 comments
Closed

PageRequest without Sort is allowed #1010

mipo256 opened this issue Mar 11, 2025 · 24 comments

Comments

@mipo256
Copy link

mipo256 commented Mar 11, 2025

Currently, the spec defines PageRequest as an object, that encapsulates the pagination info. And we have the separate Sort object that presents the sorting capabilities.

The spec allows to have queries like this:

@Find
List<Book> booksByPublicationYear(
    @By("publicationYear") Year year,
    PageRequest pageable
);

When using Hibernate as an implementation, for instance, this would translate into this SQL statement:

select
    b1_0.id,
    b1_0.created_at,
    b1_0.name,
    b1_0.publication_year 
from
    Book b1_0 
where
    b1_0.publication_year=? 
offset
    ? rows 
fetch
    first ? rows only

The problem is that the pagination itself often does not make much sense without sorting. The query above would yield unpredictable results, according to ANSI SQL. Even assuming other implementation of Jakarta Data that do not work with RDBMS, they work over storages that often do not have strong guarantees of order of values/documents/records in the result stream if the client did not ask it explicitely.

That is why in Spring Framework, for instance, the PageRequest would always have Sorting component in it.

My problem is that currently the spec allows for methods with PageReqeust, but without Sort parameter. That invites problems into peoples

@gavinking
Copy link
Contributor

Sure, you're supposed to either write something like:

@Find 
@OrderBy(_Book.PUBLICATION_YEAR)
List<Book> booksByPublicationYear(
        @By(_Book.PUBLICATION_YEAR) Year year,
        PageRequest pageable);

or, as you say, include a parameter of type Sort or Order.

A @Find method returning List and accepting a PageRequest and no kind of sorting criteria is indeed typically a bad idea, but I'm not completely convinced the spec should necessarily ban it.

Note that the sec does already require that a repository method returning Page or CursoredPage declares sorting/ordering. It's only the case of List which is debatable.

@njr-11
Copy link
Contributor

njr-11 commented Mar 11, 2025

I was surprised to see we allow List, Stream, and array as valid return types for repository methods that accept PageRequest, but it's there,

* <p>A query method of a repository may have a parameter of type
* {@code PageRequest} if its return type indicates that it may return
* multiple entities, that is, if its return type is an array type,
* {@link List}, {@code Stream}, {@link Page}, or {@link CursoredPage}.

I suppose it makes sense for someone who doesn't want to deal with the Page API and would rather interact with it as List.

In any case, I see the following requirement on the user to ensure an order:

* <p>When using pagination, always ensure that the ordering is consistent
* across invocations. One way to achieve this is to include the unique
* identifier in the sort criteria.</p>

The intent of this was likely to state that it's okay to order on the Id, but alternatively it's also okay not to if you have some other way, like maybe for bank accounts a combination of routingNumber, accountNumber that together define an order. I suppose this could also allow for a default ordering that is provided by a database or even a Jakarta Data provider chosen default.

If others feel it is important, I'd also be open to allowing no order and standardizing a default to use when no order is specified, such as by Id ascending, which I suppose would benefit users who don't care about an order so they wouldn't need to bother supplying any.

I should also point out, I don't think some of the arguments made in this issue are correct. It says,

That is why in Spring Framework, for instance, the PageRequest would always have Sorting component in it.

That's actually a proctected method. If I scroll down to the public API method right after, you can see Spring providing an easy way to obtain an unsorted PageRequest with only the page number and size,

https://github.com/spring-projects/spring-data-commons/blob/cda948ed6107856fa05c9709d0a05c706d836181/src/main/java/org/springframework/data/domain/PageRequest.java#L53-L62

@mipo256
Copy link
Author

mipo256 commented Mar 11, 2025

Thank you for your response, @gavinking.

I'm on your team, and I'm not exactly sure that the spec should forbid it entirely.

However, the core of the problem, I think, is that the PageRequest,' as it's name states, is supposed to provide for a page. The Page can be, of course, representation as List, or as Page`, where later provides more information.

The problem is that the so-called PageRequest does not provide the consistent page by itself - it needs the Ssort info.

What I think is the problem is that PageRequest is not sufficient by itself for providing the page, the name is misleading

I hope my concern is clear for you, Gavin. Let me know what you think about it

@gavinking
Copy link
Contributor

gavinking commented Mar 11, 2025

So if the question is: should a PageRequest itself carry sorting criteria, well, that's an interesting one. Earlier designs of Jakarta Data did indeed allow that, but in the end we decided it didn't make sense; it just made things more complicated. So we moved away from that design.

The problem is that the so-called PageRequest does not provide the consistent page by itself - it needs the Sort info.

Well, no, that's not necessarily true. If you have a repository method with a JDQL order by clause in a @Query annotation, or if you have @Find @OrderBy, as in the example above, then you don't need to provide any Sort objects. It's only in the case of dynamic sorting criteria that they're necessary.

But on the other hand almost no PageRequest is ever completely self-contained in the sense you're trying to describe, not even in the previous design, nor in Spring Data, or whatever. Because a repository query method also:

  • almost always has parameters, e.g. @By(_Book.PUBLICATION_YEAR) Year year in the example above, and
  • might even have Restrictions.

And so to do pagination you also need carry those things around, not just the dynamic ordering criteria (if any).

I don't see how it would be consistent at all to say that Sort objects belong to the PageRequest, but that year doesn't, and Restrictions don't. The PageRequest says which page of query results you're looking for. It doesn't contain all parameters of the repository method. And I think this is good.

@mipo256
Copy link
Author

mipo256 commented Mar 11, 2025

I disagree @gavinking.
The publicationYear is a filtering codition. PageRequest, as far as I see it, contains only the information required to fetch the page. The way you fitler out the data is completely different question. It does not have anything to do with PageRequest and that is why it is not included.

The consistency tha I'm talkina about is that the spec currently provide a way to fetch data by using solely PageRequest. The @OrderBy or JPQL that you've mentioned of course solve the problem, but the user is not required to do that. We provide a gun to a user to shot himself in a foot. That is my concern.

@njr-11
Copy link
Contributor

njr-11 commented Mar 11, 2025

The publicationYear is a filtering codition.

That's true, but the filtering is every bit as important as the ordering, and even more the entity type that you want to retrieve pages of (this is provided by the repository method return type, not the PageRequest).

We provide a gun to a user to shot himself in a foot. That is my concern.

That's a very strong statement and I'm trying to figure out what it really means. I think you are saying that a user can forget to include ordering with PageRequest and unknowingly end up relying on the database, provider, or have no ordering at all. But switching to PageRequest with optional Sort like Spring has doesn't solve that problem, and I would say actually makes it worse. You can forget to invoke .withSort on a Spring PageRequest and get the same problem. The reason I say it's worse is that the repository method signature does nothing at all to prevent it. At least in Jakarta Data, the problem is isolated to the repository method definition, which will either include a Sort or Order method parameter that forces the user of the method to realize it needs ordering, or the method has the @OrderBy annotation or the OrderBy keyword hard coded onto it in which case the repository method user doesn't need to do anything at all for ordering. For the problem to exist in Jakarta Data, the repository method writer would need to forget to allow for ordering. Some Jakarta Data providers might reject this, and others might default the ordering, and others might defer to the database for ordering. The Data spec doesn't define a behavior. I'm willing to have it define something though, and then I don't think the shoot in the foot argument would apply.

@mipo256
Copy link
Author

mipo256 commented Mar 12, 2025

Highly appretiate your response, @njr-11

That's true, but the filtering is every bit as important as the ordering

Of course it is. I'm not saying it is not important. I just do not understand why the question was including this in the PageRequest in the first place.

That's a very strong statement and I'm trying to figure out what it really means

Well, yeah, I agree with you here.

At least in Jakarta Data, the problem is isolated to the repository method definition, which will either include a Sort or Order method parameter that forces the user of the method to realize it needs ordering, or the method has the @orderby annotation or the OrderBy keyword hard coded onto it in which case the repository method user doesn't need to do anything at all for ordering.

That is not necesserely true. What you're saying is true only in case of prefefined methods in repositories, or when we return the Page. I just provided an example in the issue. I can easily a query, and Jakarta Data (Hibernate as implementation in this case) and provide no @OrderBy, no ORDER BY in JDQL, and no Sort.

I agree with you, that signature that accepts PageRequest and Sort for instance, forces user to think about sorting. The problem is when the user is defining signature himself. Nobody prevents user from doing query in the issue.

But switching to PageRequest with optional Sort like Spring has doesn't solve that problem

While I agree with you, I'm not saying that Spring Data solves this problem in ideal way. What I said is that in Spring Data the Sort is always included, directly or indirectly. If the user did not specify the Sort component of Spring Data's PageRequest - the sorting would be Sort.unsorted().

I just want to clarify, I do not think it is a great solution from Spring Data. I'm just saying that Sorting is in the object.

@otaviojava
Copy link
Contributor

The consistency tha I'm talkina about is that the spec currently provide a way to fetch data by using solely PageRequest. The @OrderBy or JPQL that you've mentioned of course solve the problem, but the user is not required to do that. We provide a gun to a user to shot himself in a foot. That is my concern.

That is true; it does not make the API safe enough for Java developers. Having a default way to solve it won't be bad, such as Spring does with Sort.unsorted() or allows the providers to throw an exception either build or on runtime.

@gavinking
Copy link
Contributor

I can easily a query, and Jakarta Data (Hibernate as implementation in this case) and provide no @OrderBy, no ORDER BY in JDQL, and no Sort.

But Nathan's point is that you can do that in Spring Data too.

Just because a PageRequest can carry a Sort, that doesn't mean that the client is required (via the type system) to provide a Sort.

And the issue is that we simply can't prevent the creation of a PageRequest with no Sort, because for at least some repository methods, a PageRequest with no Sort is perfectly 100% legit!

@mipo256
Copy link
Author

mipo256 commented Mar 12, 2025

I perfectly got you @gavinking, I'm not saying Spring Data's solution is perfect, I stated it explicitly.

What I think we can and should do - at least supply a warning, that the described setup with PageRequest solely (at least in Hibernate) is almost definitely not what the user wants.

@gavinking
Copy link
Contributor

gavinking commented Mar 12, 2025

So I think the semantically strong thing here that @mipo256 is reaching for, and which I think that the Spring Data people were grasping for but missed, is the idea of some sort of true multi-page result set or cursor-type object.

This is something I privately toyed with when we were working on Data 1.0, but ultimately there were questions I just couldn't answer about it. The programming model would be approximately this:

Pages<Book> pages = library.booksByPublicationYear(Year.now(), _Book.title.asc(), PageRequest.ofSize(50));
List<Book> firstPage = pages.page(0);
...

List<Book> secondPage = pages.page(1);
...

Where a Page object memoizes all the arguments passes to the repository method, including restrictions, order, and all parameters. And so you don't have to go back to the repository for each subsequent page.

My issue with this construct, and the reason I never talked about it out loud until now is: how is the application program supposed to store an instance of Page between requests?

I suppose that @mipo256 has some way to store a Spring PageRequest (with attached sorting criteria) between requests. I'm curious to know how he does that.

@mipo256
Copy link
Author

mipo256 commented Mar 13, 2025

@gavinking I'll respond shortly, a bit busy.

@mipo256
Copy link
Author

mipo256 commented Mar 13, 2025

So I think the semantically strong thing here that @mipo256 is reaching for, and which I think that the Spring Data people were grasping for but missed, is the idea of some sort of true multi-page result set or cursor-type object.

Well, maybe you can put it this way. In general, the question of who would remember the position of the cursor between requests (which is what you're asking, if I understood you correctly) is quite broad, and it highly depends on the context, see examples below.

So, for instance, if we have the following method

@Find
List<Book> booksByPublicationYear(
    @By("publicationYear") Year year,
    PageRequest pageable
);

Assume we have a front-end client. It can provide the page number as the users scrolls down, and page size in order for us to create the PageRequest. The client (the front-end in the browser in our case) will remember the position and will provide subsequent requests for other pages.

It is just a single example. Another example is to use a batch processing within an app. For instance, I need to process all the books via some criteria and do something with them. The dataset is quite large, I do not want to load everything onto memory at once, so I load it with chunks - via PageRequest. I store the page number and size in the java heap somewhere in the form of some object. It does not matte row exactly I store it.

@gavinking Maybe I did not understand you correctly, I'm sorry.

P.S: Yes, I know the Page can be used, but I'll briefly repeat my point - the spec provides a way of using List here, and people would use it there. The query above works. And this may lead into problems.

@gavinking
Copy link
Contributor

gavinking commented Mar 13, 2025

So let's quickly dispatch with the batch processing scenario, because it's easy: Pages actually works beautifully in this case, and it's a much better way to do pagination. Perhaps we should consider adding Pages just to address this use case.

Unfortunately, Pages doesn't appear to work well in other contexts. Let's consider the case of a web endpoint.

As far as I understand it, a typical web endpoint is always going to receive something like:

/books?year=2000&sort=id&page=1&results=50

and then the Java code which handles this endpoint is going to have to unpack the HTTP parameters into Java objects and forward them to a repository, using something like:

var books = 
        library.booksByPublicationYear(Year.of(year), 
                                       Sort.asc(sort), 
                                       PageRequest.ofPage(page).size(results));

where the repository method is declared as:

@Find
List<Book> booksByPublicationYear(Year publicationYear, 
                                  Sort<Book> sorting,
                                  PageRequest page);

Now, considering this scenario, did it help when we had the Sort object hanging off the PageRequest? Well, no, I don't think it did. In fact, it made things worse because:

  1. we had to turn PageRequest into a parameterized type, which was just a pain, and
  2. even so we lost typesafety because now the client could forget to add the Sort to the PageRequest, whereas when they're independent parameters, a Sort is required by the signature of the repository method.

Similarly, would my Pages object help at all? No, and for very similar reasons. Indeed, it would be worse, since now you would be required to unpack other parameters like the year, and repack them into a Pages object, all in a completely untypesafe way.

So the reason for keeping all these parameters separate is that you actually gain type safety.

@mipo256
Copy link
Author

mipo256 commented Mar 16, 2025

Thank you for your response, Gavnin @gavinking

even so we lost typesafety because now the client could forget to add the Sort to the PageRequest, whereas when they're independent parameters, a Sort is required by the signature of the repository method.

Yes, but that is only because the suer can forget about Sort when creating the PageRequest. This happens because of this instantiation process:

PageRequest.ofPage(page).size(results) // that is a '"builder"

The API allows using PageRequest without Sort.

If we're using this API, then of course forgetting the Sort and having PageRequest to be parametrized Java type is quite a hit on type safety, there is no doubt about that. But IMHO the spec should not allow users to instantiate the PageRequest like this - spec needs to force users to provide all the values that are required for pagination, like this:

// in PageRequest class
public static PageRequest of(int pageNumber, int pageSize, Sort sorting);

// on the callee side
PageRequest.of(0, 10, Sort.by(StaticMetaModel_.FIELD).asc());

Another thing - I do not necessarily think that Sort by itself should make PageRequest parametrized, to be honest. But that is an open question.

@gavinking
Copy link
Contributor

gavinking commented Mar 16, 2025

But IMHO the spec should not allow users to instantiate the PageRequest like this - spec needs to force users to provide all the values that are required for pagination

I've already explained why this is simply a non-starter. The common case here is that the repository method itself specifies a well-defined ordering. Dynamic ordering is the less common case.

The most basic usage of the API is of this form:

@Query("where lower(title) like :title order by title")
List<Book> booksTitled(String title, PageRequest page);
var books = library.booksTitled("%hibernate%", PageRequest.ofSize(25));

There is nothing wrong with the code above. We absolutely do not want to force the caller to have to re-specify the ordering in this case!

So what you're suggesting simply doesn't work. You need to let go of it.

I do not necessarily think that Sort by itself should make PageRequest parametrized, to be honest.

But then we would arrive at a worse loss of type safety than the one you're worried about!

Because then you could pass _Author.name.asc() to a repository method which returns Books. And that's much worse than a query with missing sorting criteria.

  • The worst that can happen when you have a PageRequest with no well-defined order is that you get rows returned in some database-defined order. The databases I know of don't usually return rows in a random order, it's usually the order of the index used, or something else that's approximately stable across requests, though of course that's not guaranteed. So is this is a problem we really need to detect at and prevent at compile time? I dunno, honestly I'm not sure it is.
  • On the other hand, passing a Sort<Author> to a repository method which returns Books is definitely always wrong and an error and absolutely falls into the class of bugs we would like to detect at compile time.

So, if we really did decide that you're right, and that pagination with no order is something catastrophic that we need to prevent at compile time (as indicated, I'm really not convinced of this) then the way we would have to do it would be by saying that every repository method with a parameter of type PageRequest must either:

  1. specify an order by or @OrderBy, or
  2. have a parameter of type Sort or Order.

We could in principle do that, but I'm just not sure we need to.

I mean, in our implementation of Jakarta Data we can easily output a compiler warning in this case, and that doesn't require any change to the spec.

@mipo256
Copy link
Author

mipo256 commented Mar 23, 2025

Hey @gavinking! Sorry for delay.

But then we would arrive at a worse loss of type safety than the one you're worried about!

I agree, I see what you're talking about, I overlooked it.

I've already explained why this is simply a non-starter. The common case here is that the repository method itself specifies a well-defined ordering. Dynamic ordering is the less common case.

I think we're talking about a bit differnt things here. I'm not talkinag about the dynamic or static sorting configuration.

In the meantime, we have discussed this issue internally with @aleksey-stukalov (I think Gavin is familiar with this person), and I encourage him to participate in this conversation on GitHub.

Let me explain what we narrowed:

The current approach like that:

@Find
List<Book> booksByPublicationYear(
    @By("publicationYear") Year year,
    PageRequest pageable
);

may be fine, depending on the Jakarta Data vendor. Also, it does not seem to be clear that the Jakarta Data specifiction itself needs to do something about it, since it is reasonable that someone does not need sorting in request at all and just what a subset of data.

What we agreed is the case, is that the various implementations should decide, wether they should, for instance, emit a warning, or not. Becuase in Hibernate as Jakarta Data implementation, when it works with PostgreSQL for instance, the query like that:

select
    b1_0.id,
    b1_0.created_at,
    b1_0.name,
    b1_0.publication_year 
from
    Book b1_0 
where
    b1_0.publication_year=? 
offset
    ? rows 
fetch
    first ? rows only

would yield unpredictable results, that's just what happens. But the fact is that this is not necessary true for other implementations of Jakarta Data or other RDBMs vendos.

So, finally, it seems appropriate that:

  1. The spec should NOT be changed
  2. And the particular implementations like Hibernate with PostgreSQL maybe should emit a warning in case the usage looks like the one above.

What are your thoughts on this, @gavinking ?

@gavinking
Copy link
Contributor

And the particular implementations like Hibernate with PostgreSQL maybe should emit a warning in case the usage looks like the one above.

Well I actually already did that here:

hibernate/hibernate-orm@00de402

But it doesn't really work as well as I would have liked because IntelliJ kinda hides compiler warnings from the user. So those warnings are only visible if you go hunting for them.

@gavinking
Copy link
Contributor

IntelliJ kinda hides compiler warnings from the user

Note however, that this is something that could be implemented directly in IntelliJ's Jakarta Data support.

@aleksey-stukalov
Copy link

IntelliJ kinda hides compiler warnings from the user

Note however, that this is something that could be implemented directly in IntelliJ's Jakarta Data support.

@gavinking, could you please elaborate a bit more on that? It definitely would be great to show such warnings to the users in the form of inspection and/or the warning itself in the build output. I don't quite understand how we can hide it out in the current implementation. I'll ask the team to investigate that, but it would be great if you could add a bit more details regarding the problem

@aleksey-stukalov
Copy link

As for the initial question, it is my perception, not more, but I would rather say that the spec meets my personal understanding. I see pagination as a "form" of output, which can be implemented in multiple ways, while ordering is still a part of the data request part. In other words, "order by" is a part of the query declaration, and pagination is about presentation logic that is not necessarily a part of the underlying query.

@gavinking
Copy link
Contributor

gavinking commented Mar 23, 2025

could you please elaborate a bit more on that?

Well the thing is that in the case of a compiler error, IntelliJ automatically opens up the Build view, and shows me the error. But for a warning, if the Build view isn't already open and focussed, I won't see the warning. This is usually a reasonable behavior, but it means there's no good way for an annotation processor to provide warnings to the developer.

@aleksey-stukalov
Copy link

could you please elaborate a bit more on that?

Well the thing is that in the case of a compiler error, IntelliJ automatically opens up the Build view, and shows me the error. But for a warning, if the Build view isn't already open and focussed, I won't see the warning. This is usually a reasonable behavior, but it means there's no good way for an annotation processor to provide warnings to the developer.

Ah, yes, I see. Let's discuss this with the UX team anyway. Will see what they offer.

@gavinking
Copy link
Contributor

The spec should NOT be changed

Seems this is the consensus of everyone here, so I'm going to close the issue.

Going forward, this is a problem that can be dealt with by implementations and by tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants