You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The "float" suggests fractional resource requests are supported equally for CPU and GPU. I believe this is a rather fundamental feature of the ray runner!
Describe alternatives you've considered
Joblib parallelization within each udf - this makes the user responsible for tuning resource allocation between the ray scheduler and each task's thread pool.
Additional Context
No response
Would you like to implement a fix?
No
The text was updated successfully, but these errors were encountered:
In Daft, we enforce a minimum of 1 CPU for UDFs even though Ray technically allows for fractional CPU allocation (< 1). There are two main reasons for this:
Function Fusion and Resource Requirements
When a UDF is part of a pipeline where it's fused with other operations, we take the maximum resource requirements across all steps. Since all other operations in Daft require at least 1 CPU, setting a UDF to use < 1 CPU would effectively be ignored as it would be raised to 1 CPU anyway due to the max resource requirement calculation.
Resource Thrashing Prevention
When Ray workers are allocated fractional CPUs, they can experience "thrashing" - where multiple workers compete for the same CPU resources, leading to frequent context switches and degraded performance. By enforcing a minimum of 1 CPU, we ensure more stable and predictable performance for UDF execution.
So while it might seem limiting to not allow < 1 CPU allocation, this decision helps ensure consistent performance and avoids potential issues with resource allocation and scheduling in the Ray backend.
However, you raise an interesting question - what's the difference between this approach and setting Ray's CPU resources to < 1.0?
The key difference is control and locality. When you manage threading within a UDF:
The threads share the same process space and memory
You have explicit control over the threading behavior
Context switching happens within your process, managed by the OS thread scheduler
In contrast, when Ray allocates fractional CPUs:
Multiple separate Ray worker processes compete for CPU resources
The scheduling is handled by Ray's distributed scheduler
Context switching happens at the process level, which is more expensive
Regarding your use case - could you share more details about what you're trying to achieve? Understanding your specific requirements (e.g., throughput goals, memory constraints, data processing patterns) would help us suggest the most appropriate approach. There might be other optimization strategies we could explore. For example, if it’s a common pattern, potentially implementing your logic as a native Daft operation would provide better performance than a UDF.
Is your feature request related to a problem?
I have been attempting to use a rather useful feature of ray distributed - fractional CPU resource requests.
When this didn't work, I dug through the code base to find this:
Daft/daft/runners/ray_runner.py
Line 446 in f9a4b70
Is there anything daft can do to overcome this? Could I get some more information on why this is a limitation?
To work around this, I have to set up a thread pool within each UDF and manually tune the partitioning logic to achieve the functionality.
Describe the solution you'd like
True support for fractional CPU utility as indicated by the type hints here: https://www.getdaft.io/projects/docs/en/stable/api_docs/udf.html
The "float" suggests fractional resource requests are supported equally for CPU and GPU. I believe this is a rather fundamental feature of the ray runner!
Describe alternatives you've considered
Joblib parallelization within each
udf
- this makes the user responsible for tuning resource allocation between theray
scheduler and eachtask
's thread pool.Additional Context
No response
Would you like to implement a fix?
No
The text was updated successfully, but these errors were encountered: