Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting fractional cpu count on the ray runner #3808

Open
NellyWhads opened this issue Feb 14, 2025 · 2 comments
Open

Supporting fractional cpu count on the ray runner #3808

NellyWhads opened this issue Feb 14, 2025 · 2 comments
Assignees
Labels
enhancement New feature or request p1 Important to tackle soon, but preemptable by p0

Comments

@NellyWhads
Copy link

Is your feature request related to a problem?

I have been attempting to use a rather useful feature of ray distributed - fractional CPU resource requests.

When this didn't work, I dug through the code base to find this:

def _get_ray_task_options(resource_request: ResourceRequest) -> dict[str, Any]:

Is there anything daft can do to overcome this? Could I get some more information on why this is a limitation?

To work around this, I have to set up a thread pool within each UDF and manually tune the partitioning logic to achieve the functionality.

Describe the solution you'd like

True support for fractional CPU utility as indicated by the type hints here: https://www.getdaft.io/projects/docs/en/stable/api_docs/udf.html

The "float" suggests fractional resource requests are supported equally for CPU and GPU. I believe this is a rather fundamental feature of the ray runner!

Describe alternatives you've considered

Joblib parallelization within each udf - this makes the user responsible for tuning resource allocation between the ray scheduler and each task's thread pool.

Additional Context

No response

Would you like to implement a fix?

No

@NellyWhads NellyWhads added enhancement New feature or request needs triage labels Feb 14, 2025
@jessie-young jessie-young added p1 Important to tackle soon, but preemptable by p0 and removed needs triage labels Feb 14, 2025
@jessie-young
Copy link
Contributor

Thanks for raising this - we're going to explore a few options and will get back to you!

@jessie-young jessie-young self-assigned this Feb 14, 2025
@jessie-young
Copy link
Contributor

In Daft, we enforce a minimum of 1 CPU for UDFs even though Ray technically allows for fractional CPU allocation (< 1). There are two main reasons for this:

Function Fusion and Resource Requirements

When a UDF is part of a pipeline where it's fused with other operations, we take the maximum resource requirements across all steps. Since all other operations in Daft require at least 1 CPU, setting a UDF to use < 1 CPU would effectively be ignored as it would be raised to 1 CPU anyway due to the max resource requirement calculation.

Resource Thrashing Prevention

When Ray workers are allocated fractional CPUs, they can experience "thrashing" - where multiple workers compete for the same CPU resources, leading to frequent context switches and degraded performance. By enforcing a minimum of 1 CPU, we ensure more stable and predictable performance for UDF execution.

So while it might seem limiting to not allow < 1 CPU allocation, this decision helps ensure consistent performance and avoids potential issues with resource allocation and scheduling in the Ray backend.

However, you raise an interesting question - what's the difference between this approach and setting Ray's CPU resources to < 1.0?

The key difference is control and locality. When you manage threading within a UDF:

  • The threads share the same process space and memory
  • You have explicit control over the threading behavior
  • Context switching happens within your process, managed by the OS thread scheduler

In contrast, when Ray allocates fractional CPUs:

  • Multiple separate Ray worker processes compete for CPU resources
  • The scheduling is handled by Ray's distributed scheduler
  • Context switching happens at the process level, which is more expensive

Regarding your use case - could you share more details about what you're trying to achieve? Understanding your specific requirements (e.g., throughput goals, memory constraints, data processing patterns) would help us suggest the most appropriate approach. There might be other optimization strategies we could explore. For example, if it’s a common pattern, potentially implementing your logic as a native Daft operation would provide better performance than a UDF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request p1 Important to tackle soon, but preemptable by p0
Projects
None yet
Development

No branches or pull requests

2 participants