[Discussion] Object Store Composition #7171

tustvold · 2025-02-21T11:57:45Z

Problem

Initially the ObjectStore API was relatively simple, consisting of a few methods to interact with object stores. As such many systems took this abstraction and used it as a generic IO abstraction, this is good and what the crate was designed for.

As people wanted additional functionality, such as instrumentation, caching or concurrency limiting, this was implemented by creating ObjectStore implementations that wrap the existing ones. Again this worked well.

However, over time the ObjectStore API has grown, and now has 8 required methods and a further 10 methods with default implementations. This creates a number of challenges for this wrapper based approach for composition.

API Surface

As a wrapper must avoid "despecializing" methods, it must implement all 18 methods. Not only is this burdensome, but creates upgrade hazards as new methods are added, potentially in non-breaking versions.

Additional Context

As the logic within these wrappers has grown more complex, there comes the need to pass additional information through to this logic. This motivates requests like #7155

Interface Creep

In many places the ObjectStore interface gets used as the abstraction for components that don't actually require the full breadth of ObjectStore functionality. There is no need, for example, for a parquet reader to depend on more than the ability to fetch ranges of bytes.

This leads to perverse "ObjectStore" implementations, that actually only implement say get functionality. Similarly in contexts like apache/datafusion#14286 it creates complexities around how to shim the full ObjectStore interface, despite the actual operators in question only using a very small subset of this functionality.

Request Correlation

As the ObjectStore logic has gotten more sophisticated, incorporating automatic retries, request batching, etc... the relationship between an ObjectStore method call and requests has gotten rather fuzzy. This makes implementing instrumentation, concurrency limiting, tokio task dispatch, etc... at this API boundary increasingly inaccurate/problematic.

Thoughts

I personally think we should encourage a move away from this wrapper based form of composition and instead do the following:

Encourage use of specialized traits like parquet's AsyncFileReader that reflect what a given component actually needs, and can evolve independently of ObjectStore
Add additional functionality for injecting logic into the HTTP request path (Decouple ObjectStore from Reqwest / Generic HTTP Client Support #6056) allowing
- More accurate instrumentation
- More accurate concurrency limiting
- Potential sophistication w.r.t tokio runtime dispatch

I can't help feeling right now ObjectStore is stuck between trying to expose the functionality of ObjectStore's in a portable and ergonomic fashion, whilst also trying to provide some sort of generic all-purpose IO subsystem abstraction, which I'm not sure aren't incompatible goals....

Tagging @alamb @crepererum @Xuanwo @waynr @kylebarron

waynr · 2025-02-21T20:20:38Z

Just some questions since I'm still fairly green when it comes to really understanding the intricacies of Rust's type system...

As a wrapper must avoid "despecializing" methods, it must implement all 18 methods

I'm not sure what you mean when you say that wrappers must avoid de-specializing. Does that have something to do with compiler optimizations?

Encourage use of specialized traits like parquet's AsyncFileReader that reflect what a given component actually needs, and can evolve independently of ObjectStore

So this would also involve using something like the ParquetFileReaderFactory, right? And that's the level at which, in the case of a caching implementation that I described in #7135 and #7155, I would need to have session state/config information available to pass to a custom implementation to get properly-parented spans? It looks like this and related interfaces don't currently support accepting opaque contextual data but maybe you're suggesting they are more open to that kind of change than ObjectStore?

tustvold · 2025-02-21T20:50:21Z

de-specializing

They might provide their own implementations of things like delete_stream, the wrappers must therefore call through to this instead of relying on the default implementation

It looks like this and related interfaces don't currently support accepting opaque contextual data but maybe you're suggesting they are more open to that kind of change than ObjectStore?

Perhaps, but given things like AsyncFileReader are per-file, it may be that they can just be constructed with the relevant context

alamb · 2025-02-23T12:26:07Z

However, over time the ObjectStore API has grown, and now has 8 required methods and a further 10 methods with default implementations. This creates a number of challenges for this wrapper based approach for composition.

I basically agree with this statement of challenge, though I am not sure how hard it actually is in practice (having done it myself and seen various different versions of it)

Interface Creep

In many places the ObjectStore interface gets used as the abstraction for components that don't actually require the full breadth of ObjectStore functionality. There is no need, for example, for a parquet reader to depend on more than the ability to fetch ranges of bytes.

I think the parquet reader also needs to be able to get the total file sizes as well (at least unless the negative ranges to fetch end bytes is supported). Adding support to

This leads to perverse "ObjectStore" implementations, that actually only implement say get functionality. Similarly in contexts like apache/datafusion#14286 it creates complexities around how to shim the full ObjectStore interface, despite the actual operators in question only using a very small subset of this functionality.

I personally think we should encourage a move away from this wrapper based form of composition and instead do the following:

Encourage use of specialized traits like parquet's AsyncFileReader that reflect what a given component actually needs, and can evolve independently of ObjectStore

I would say "perverse" is somewhat subjective. It is certainly complex but that also needs to be measured against

The complexity of other alternatives
The complexity of the problem being solved
The complexity of using the API vs implementing the API. For example, the AsyncFileReader in parquet specifically adds non trivial complexity to using the reader

Add additional functionality for injecting logic into the HTTP request path (Decouple ObjectStore from Reqwest / Generic HTTP Client Support #6056) allowing

More accurate instrumentation

More accurate concurrency limiting

Potential sophistication w.r.t tokio runtime dispatch

This feels like a good idea to me

I can't help feeling right now ObjectStore is stuck between trying to expose the functionality of ObjectStore's in a portable and ergonomic fashion, whilst also trying to provide some sort of generic all-purpose IO subsystem abstraction, which I'm not sure aren't incompatible goals....

I think OpenDAL https://github.com/apache/opendal is trying to provide generic all-purpose IO subsystem abstraction

So my personal recommendation is

Keep the object store API the same / don't expand it
Support Decouple ObjectStore from Reqwest / Generic HTTP Client Support #6056
Add additional documentation / examples for more advanced functionality
Point people at OpenDAL if they need more advanced features

alamb · 2025-02-23T12:26:56Z

It occurs to me this might be a great time to provide easier integration for using parquet-rs reader with OpenDAL directly (as in have a open-dal feature for parquet-rs. I think @Xuanwo has proposed this in the past.

tustvold · 2025-02-23T12:29:06Z

I think OpenDAL https://github.com/apache/opendal is trying to provide generic all-purpose IO subsystem abstraction

I will let @Xuanwo weigh in here, but I think OpenDAL is in a very similar place to ObjectStore w.r.t this, and has very similar issues. Both are abstractions for data "access", not a general IO subsystem abstraction.

Edit: Ultimately something has to glue together downstream abstractions, e.g. in the context of #7135 providing a way to connect DF's SessionContext through to some IO subsystem. Either DF needs to overload some existing interface e.g. ObjectStore/OpenDAL inevitably leading to challenges like #7155 or it needs to define its own mechanism. In the case of parquet and AysncFileReaderFactory, this interface already exists we just need to point people at it.

I think the parquet reader also needs to be able to get the total file sizes as well (at least unless the negative ranges to fetch end bytes is supported). Adding support to

This isn't a requirement - see here

provide easier integration for using parquet-rs reader with OpenDAL directly

I remain pretty lukewarm on this, given parquet-opendal already does this.

tustvold · 2025-02-23T12:56:08Z

Support #6056

Actually having started playing around with this, we end up with the same problem here that we end up needing to pass context through to these layers. So perhaps this is really just moving the problem, and we're going to end up with something like #7155 regardless 🤔

alamb · 2025-02-23T13:38:44Z

Support #6056

Actually having started playing around with this, we end up with the same problem here that we end up needing to pass context through to these layers. So perhaps this is really just moving the problem, and we're going to end up with something like #7155 regardless 🤔

I agree we are going to need some sort of API to pass context through the various API layers 👍

Xuanwo · 2025-02-23T18:14:45Z

Edit: Ultimately something has to glue together downstream abstractions, e.g. in the context of #7135 providing a way to connect DF's SessionContext through to some IO subsystem. Either DF needs to overload some existing interface e.g. ObjectStore/OpenDAL inevitably leading to challenges like #7155 or it needs to define its own mechanism. In the case of parquet and AysncFileReaderFactory, this interface already exists we just need to point people at it.

Thank you @tustvold for inviting me to join this discussion.

I believe we should build datafusion-storage primarily focused on DataFusion's own needs while maintaining datafusion-storage-object-store and datafusion-storage-opendal separately. The benefit is that users can implement innovative features like datafusion-storage-cudf or datafusion-storage-io_uring without being constrained by the current I/O abstraction of object-store or OpenDAL.

If this becomes a reality, DataFusion can design the abstraction based on its own requirements without having to push everything upstream to object_store. This would allow them to maintain useful features such as context management and add additional requirements to the trait while letting datafusion-storage-object-store and datafusion-storage-opendal handle the extra work.

We can start by aliasing the ObjectStore trait inside datafusion-storage first. I'm happy to initiate a proposal if that sounds like a good idea to you.

crepererum · 2025-02-24T10:26:24Z

Thoughts

Status Quo at Influx

As people wanted additional functionality, such as instrumentation, caching or concurrency limiting, this was implemented by creating ObjectStore implementations that wrap the existing ones. Again this worked well.

(from #7171 (comment) )

I've just looked at our setup and this is how the hierarchy currently looks like (from outer to inner):

metrics
in-memory cache
metrics
disk cache
metrics
chunking of GET requests (that's also where Introduce Extensions concept to object_store::GetOptions and object_store::PutOptions #7155 comes in handy)
metrics
racing of multiple requests
metrics
actual object_store AWS S3 implementation

Not saying that this is what people usually do, but I thought that insight might be helpful.

Trait methods

However, over time the ObjectStore API has grown, and now has 8 required methods and a further 10 methods with default implementations.

(from #7171 (comment) )

I think that by itself was a design mistake (in hindsight). There should be ONE single GET method (probably get_opts) and if that method is too complicated for API users, we should provide them with an extension trait (NOT part of the core trait) that maps simple API methods to the more complete ones.

As a wrapper must avoid "despecializing" methods, it must implement all 18 methods. Not only is this burdensome, but creates upgrade hazards as new methods are added, potentially in non-breaking versions.

(from #7171 (comment) )

At Influx we use #[deny(clippy::missing_trait_methods)] to catch that, but you're right that adding new methods is therefore a perceived breaking change.

Sans-IO / IO Abstraction

Add additional functionality for injecting logic into the HTTP request path (#6056) allowing
More accurate instrumentation

(from #7171 (comment) )

Depends. I would say "more low-level", but now you're in a different pickle because you now longer see the high-level methods or at least you partially have to recover them (e.g. is it an upload/put, a copy, a move). So I don't think the instrumentation on that level is universally better.

I do however agree that splitting the IO layer is the right move.

tustvold · 2025-02-24T10:32:56Z

There should be ONE single GET method (probably get_opts) and if that method is too complicated for API users

FWIW this is an alternative proposal I had written up as part of the issue (before I accidentally closed the tab and lost it). The challenge with this approach is there are methods that have different signatures, e.g. whilst delete_stream has a default implementation in terms of delete, the whole purpose is to allow specialization. The same can be said for get_ranges or list_with_delimiter. Ultimately when I tried this, the extension trait would allow us to remove the basic non-opts variants, but many of the methods are not just simple proxies to a _opts call and need to be specializable.

Depends. I would say "more low-level", but now you're in a different pickle because you now longer see the high-level methods or at least you partially have to recover them (e.g. is it an upload/put, a copy, a move). So I don't think the instrumentation on that level is universally better.

My vague thought here is we might be able to do something with http::Extensions to propagate this context lower.

crepererum · 2025-02-24T10:51:44Z

Depends. I would say "more low-level", but now you're in a different pickle because you now longer see the high-level methods or at least you partially have to recover them (e.g. is it an upload/put, a copy, a move). So I don't think the instrumentation on that level is universally better.

My vague thought here is we might be able to do something with http::Extensions to propagate this context lower.

That sounds like a compromise I could live with.

That said, I don't think any of the proposals really gonna help eliminating wrappers. Looking at the hierarchy posted here, sure we could move the metrics into the IO layer. Chunking is already harder because now you again need to "reverse" the high-level intention from the low-level HTTP call. Racing might work. Caching not really. And putting everything into the DataFusion-specific wrappers (which might not be your only high-level consumer) just moves the problem or makes it even worse: now the entire code base uses the data fusion type instead of the object_store type, or you re-create the wrappers for every single high-level interface you have.

tustvold · 2025-02-24T10:59:21Z

Could you expand on what you mean by chunking?

crepererum · 2025-02-24T11:36:18Z

Could you expand on what you mean by chunking?

If you have a large object (like 100MB), instead of using a single GET request for the whole range, using multiple requests in -- let's say -- 16MB chunks. Our internal testing showed that while the throughput and hence latency of a single S3 GET request is somewhat limited, you can reduce the overall latency and increase the throughput using that method.

tustvold · 2025-02-24T12:32:45Z

I wonder if that makes more sense being done explicitly on top of the ObjectStore API, instead of as a layer within a wrapper?

There have been some discussions in the past about providing something similar to AWS TransferManager to handle this sort of use-case. (e.g. #6837)

crepererum · 2025-02-24T12:57:13Z

I wonder if that makes more sense being done explicitly on top of the ObjectStore API, instead of as a layer within a wrapper?

Well, the chunking sits between the IO and the caching layer. So sure you can put a lot of things on top of ObjectStore that are NOT using the ObjectStore API, but as I've said: integrating that into a larger code base that isn't all just DataFusion is going to be pain.

alamb · 2025-02-24T13:54:01Z

I believe we should build datafusion-storage primarily focused on DataFusion's own needs while maintaining datafusion-storage-object-store and datafusion-storage-opendal separately. The benefit is that users can implement innovative features like datafusion-storage-cudf or datafusion-storage-io_uring without being constrained by the current I/O abstraction of object-store or OpenDAL.

I think this is an excellent idea -- I suggest the next step would be to open a ticket in DataFusion to discuss creating such an API

@Xuanwo is filing a ticket something you are able to do? Otherwise I will try and find time to do so

Xuanwo · 2025-02-24T14:07:00Z

I think this is an excellent idea -- I suggest the next step would be to open a ticket in DataFusion to discuss creating such an API

@Xuanwo is filing a ticket something you are able to do? Otherwise I will try and find time to do so

I'm willing to fill an issue at DF.

tustvold · 2025-02-26T11:36:12Z

With #7183 we will have a mechanism to introduce request-oriented middleware, along with more sophisticated logic for doing things like:

Request-oriented metrics
Concurrency limiting
Spawning IO to different thread pools

This will reduce the need for ObjectStore wrappers, as well as address a number of their more glaring deficiencies.

However, it will not address the need to get context (#7155) through the ObjectStore trait either for use within an ObjectStore wrapper, or within the HTTPClient. As such I think we should proceed with #7170, as I don't really see a viable alternative to doing this.

I also am excited by the possibilities of combining #7183 and #7170 to allow doing things like carrying information into S3 access logs as described here.

tustvold added object-store Object Store Interface question Further information is requested labels Feb 21, 2025

tustvold mentioned this issue Feb 21, 2025

feat: add Extensions to object store GetOptions #7170

Merged

tustvold mentioned this issue Feb 23, 2025

Decouple ObjectStore from Reqwest #7183

Merged

Xuanwo mentioned this issue Feb 24, 2025

Release object_store 0.12.0 (API breaking) Around Feb 30 2025 #6903

Open

7 tasks

Xuanwo mentioned this issue Feb 24, 2025

discuss: Introduce datafusion-storage as datafusion's own storage interface apache/datafusion#14854

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Object Store Composition #7171

[Discussion] Object Store Composition #7171

tustvold commented Feb 21, 2025

waynr commented Feb 21, 2025

tustvold commented Feb 21, 2025

alamb commented Feb 23, 2025

alamb commented Feb 23, 2025

tustvold commented Feb 23, 2025 •

edited

Loading

tustvold commented Feb 23, 2025

alamb commented Feb 23, 2025

Xuanwo commented Feb 23, 2025

crepererum commented Feb 24, 2025

tustvold commented Feb 24, 2025 •

edited

Loading

crepererum commented Feb 24, 2025

tustvold commented Feb 24, 2025

crepererum commented Feb 24, 2025

tustvold commented Feb 24, 2025 •

edited

Loading

crepererum commented Feb 24, 2025

alamb commented Feb 24, 2025

Xuanwo commented Feb 24, 2025

tustvold commented Feb 26, 2025

[Discussion] Object Store Composition #7171

[Discussion] Object Store Composition #7171

Comments

tustvold commented Feb 21, 2025

waynr commented Feb 21, 2025

tustvold commented Feb 21, 2025

alamb commented Feb 23, 2025

alamb commented Feb 23, 2025

tustvold commented Feb 23, 2025 • edited Loading

tustvold commented Feb 23, 2025

alamb commented Feb 23, 2025

Xuanwo commented Feb 23, 2025

crepererum commented Feb 24, 2025

Thoughts

Status Quo at Influx

Trait methods

Sans-IO / IO Abstraction

tustvold commented Feb 24, 2025 • edited Loading

crepererum commented Feb 24, 2025

tustvold commented Feb 24, 2025

crepererum commented Feb 24, 2025

tustvold commented Feb 24, 2025 • edited Loading

crepererum commented Feb 24, 2025

alamb commented Feb 24, 2025

Xuanwo commented Feb 24, 2025

tustvold commented Feb 26, 2025

tustvold commented Feb 23, 2025 •

edited

Loading

tustvold commented Feb 24, 2025 •

edited

Loading

tustvold commented Feb 24, 2025 •

edited

Loading