-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resource/aws_autoscaling_group: add instance_refresh block #13791
resource/aws_autoscaling_group: add instance_refresh block #13791
Conversation
I'm in favor of having the wait enabled by default. However, I don't think we need the |
@Vivalldi Consistency with the conventions of the other timeout field settles it. I'm currently looking at the following remaining issues:
With the above plus acceptance tests, it will be a few more days before this is ready. Edit: waiting and timeouts have been dropped |
8e84b3a
to
12119c2
Compare
PR looks good to me. Appreciate all the work you've put in. Out of sheer curiosy, what's the reasoning behind dropping waiting and timeouts? |
@Vivalldi I couldn't find a way of managing failures and timeouts such that the outcome is obvious to the user. Not unsolvable, but it would have been a distraction in this PR. The link between an ASG's state and a particular Instance Refresh is weak, so realistically the provider would need a new resource in order to be able to declaratively start and track the outcome of a single refresh. An update to the ASG would cause ForceNew by way of a |
On waiting for completion: I think that's an extremely valuable feature, given that (at least as far as I can tell) one of the main use-cases for instance_refresh would be managing deployments. Essentially, what we are saying with instance_refresh is "ensure my autoscaling group and all the instances attached to it are configured as specified in the code". This operation is only complete once the instance refresh is done. I don't have a strong opinion on adding this in a separate PR though (might even help creating one) as long as this PR doesn't make it hard to do so. Would be curious @roberth-k what troubles you ran into, to me a behaviour like @Vivalldi mentioned, using |
Two main complications:
Hence if the first (Note: CloudFormation doesn't have this issue, as it would roll back the entire ASG on failure) This can happen more often than it seems, as there's no one size fits all default to the timeout. A large ASG with a long warmup time could take ages to refresh. Read at diff time might notice that there is an ongoing (or failed) refresh, but wouldn't be able to tell whether it was the previous A separate |
12119c2
to
908c553
Compare
Superseded by #14442. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks! |
Hi @roberth-k. Thanks for both of your PRs for Instance Refresh. We've decided to base the Instance Refresh feature implementation on this PR, not #14442. The Instance Refresh operation is triggered as part of the changes made to the resource, so it fits better as a configuration of the resource itself. A Refresh can also happen concurrently with a scaling operation, so we don't want to wait for scaling to complete before triggering the Refresh. |
Community Note
Closes #13785
Release note for CHANGELOG:
Output from acceptance testing:
The following block will be added to
aws_autoscaling_group
:If the block is present, start an ASG Instance Refresh with the given parameters when the ASG is updated. The refresh is started only when a property is changed that would need a rollout to take effect (e.g.
launch_template
). If there is already an active refresh during an update, it is cancelled.The instance refresh is "fire and forget": the resource does not wait for the refresh to complete. This comes partly down to the difficulty of choosing a sensible timeout (highly dependent on ASG size, warmup settings, healthy percentages), and partly down to difficulties related to the representation of a rollout in Terraform state. Since the ASG update and Instance Refresh are separate operations, the resource might initially fail the apply, but then report a misleading success on the second run. Workarounds would require an opinionated solution, such as using a separate resource so that the identity of an Instance Refresh could be tracked individually.