Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump ubuntu 22.04 as default SKU for Azure Batch #5804

Merged
merged 7 commits into from
Feb 21, 2025

Conversation

pditommaso
Copy link
Member

This PR bumps Ubuntu 22.04 as default SKU for Azure Batch node pools. This is required because Ubuntu 20.04 is going to be retired by April 2025 (See below):

Support for Ubuntu 20.04 LTS for Azure Batch pools will be retired on 23 April 2025
You're receiving this notice because you're currently using either an Ubuntu 20.04 LTS Marketplace or derived image with Azure Batch pools subject to support end of life.

Azure Batch typically follows standard end of life timelines set by publishers for supported images from the Azure Marketplace. Ubuntu 20.04 LTS is reaching the end of standard support life. Batch pools with Ubuntu 20.04 LTS VM images and the Batch node agent SKU batch.node.ubuntu 20.04 will no longer be supported in Batch after 23 April 2025.

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Copy link

netlify bot commented Feb 20, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit fd26a07
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/67b86636eba93400088368d3

@bentsherman
Copy link
Member

Docs need to be updated here

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@pditommaso pditommaso requested a review from a team as a code owner February 20, 2025 13:27
@pditommaso
Copy link
Member Author

pditommaso commented Feb 20, 2025 via email

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@pditommaso
Copy link
Member Author

@adamrtalbot happy with this?

@adamrtalbot
Copy link
Collaborator

Just needed to correct the docs: ad7634e

@adamrtalbot
Copy link
Collaborator

Wait hold on, I get an error:

> nextflow-dev run hello
N E X T F L O W  ~  version 25.01.0-edge
Launching `https://github.com/nextflow-io/hello` [admiring_nobel] DSL2 - revision: afff16a9b4 [master]
ERROR ~ Error executing process > 'sayHello (3)'

Caused by:
  Cannot find a matching VM image with publisher=microsoft-azure-batch; offer=ubuntu-server-container; OS type=linux; verification type=verified

@pditommaso
Copy link
Member Author

umm, how tests are passing?!

@adamrtalbot
Copy link
Collaborator

adamrtalbot commented Feb 20, 2025

There is no Azure Batch ubuntu-2204 image 🤦 I think we need to use the DSVM replacement.

publisher = "microsoft-dsvm"
offer = "ubuntu-hpc"
sku = "22.04"
agent = "batch.node.ubuntu 22.04"

Microsoft just do a "latest supported" tag :shakefist:

@adamrtalbot
Copy link
Collaborator

adamrtalbot commented Feb 20, 2025

A more recent alternative would be this:

publisher = "canonical"
offer = "ubuntu-24_04-lts
sku = "server"
agent = "batch.node.ubuntu 24.04"

Will give it a try.

@adamrtalbot
Copy link
Collaborator

A more recent alternative would be this:

publisher = "canonical"
offer = "ubuntu-24_04-lts
sku = "server"
agent = "batch.node.ubuntu 24.04"

Will give it a try.

Did not work - DSVM it is (which comes with GPU drivers and stuff so might be better anyway).

@pditommaso
Copy link
Member Author

pditommaso commented Feb 20, 2025

Deprecation notes mention

Required action

Stop using Ubuntu 20.04 LTS based VM images for Batch pools before 23 April 2025. Please migrate your Batch pools to a VM image based on Ubuntu 22.04 or later, or the microsoft-dsvm ubuntu-hpc 2204 (or later) image for container and/or workloads requiring Infiniband/GPU support. You can also migrate to any other Batch-supported VM image, if amenable for your workload. Existing Batch pools cannot be updated with a new VM image reference and creating a new Batch pool is required for migration.

@adamrtalbot
Copy link
Collaborator

feat(Azure-Batch): Switch to using Microsoft DSVM machine 22.04 inste…

Ok so 2e69a40 is correct

@pditommaso
Copy link
Member Author

Can you revert and revert and test it?

@adamrtalbot
Copy link
Collaborator

You mean rollback to 192699b? There isn't a 22.04 image so this doesn't work.

Or do you want me to test the CI?

@pditommaso
Copy link
Member Author

I've tried:

  • sku: batch.node.ubuntu 22.04
  • offer: ubuntu-hpc
  • publisher: microsoft-dsvm

It tries to create the pool, but then it fails with:

{"Code":"BadRequest","Message":"The selected VM size 'Standard_D4_v3' cannot boot Hypervisor Generation '2'. If this was a Create operation please check that the Hypervisor Generation of the Image matches the Hypervisor Generation of the selected VM Size. If this was an Update operation please select a Hypervisor Generation '2' VM Size. For more information, see https://aka.ms/azuregen2vm"} (Code: Provider Error Json)

Not sure what to do

@adamrtalbot
Copy link
Collaborator

adamrtalbot commented Feb 21, 2025

That's a very old VM size which isn't a generation 2 VM, so it doesn't support the virtual machine image. It's been deprecated now so you should switch to something like a Dsv6 machine.

Note: you'll need to make sure you have quota in the CI batch account.

@pditommaso
Copy link
Member Author

Can you suggest one in concrete?

@adamrtalbot
Copy link
Collaborator

Let's go with a Standard_D4_v5, cheap and should be widely available.

@pditommaso
Copy link
Member Author

Hanging with this error message when creating the pool The specified account has reached VM series core quota for standardDv5Family (Code: AccountVMSeriesCoreQuotaReached)

I see two problems, 1) it should not hang on pool creation errors, 2) still unable to find a valid default instance type for ubuntu.

If you could give a try to address at least the latter it would be appreciated.

@adamrtalbot
Copy link
Collaborator

Hanging with this error message when creating the pool The specified account has reached VM series core quota for standardDv5Family (Code: AccountVMSeriesCoreQuotaReached)

We need to make sure there is sufficient quota in the Batch account. Which Batch account is it running in?

We can either increase the quota or use a machine with quota available.

Note spot machines have their own generic quota for all machine types, so this might be an option.

p.s. there is one alternative which is much, much more complicated but consistent with AWS/Google.

@adamrtalbot
Copy link
Collaborator

Regarding this:

  1. it should not hang on pool creation errors

Yeah it sucks, Azure node pools are detached from running so I don't know whether we can catch it in Nextflow. Seqera Platform should be able to check the pool is in active state though.

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@pditommaso pditommaso force-pushed the azure-batch-default-sku branch from 2e69a40 to 8cd7f3f Compare February 21, 2025 11:06
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
pditommaso and others added 2 commits February 21, 2025 12:35
Co-authored-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@pditommaso pditommaso merged commit e0ba536 into master Feb 21, 2025
21 checks passed
@pditommaso pditommaso deleted the azure-batch-default-sku branch February 21, 2025 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants