Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating very large database causes Error: error modifying DB Instance (orders-replica-7-p): InvalidParameterCombination: You must specify both the storage size and iops when modifying the storage size or iops on a DB instance that has iops. #12493

Closed
muffy opened this issue Mar 21, 2020 · 11 comments
Labels
service/rds Issues and PRs that pertain to the rds service.

Comments

@muffy
Copy link
Contributor

muffy commented Mar 21, 2020

We use very large RDS PostgreSQL instances that typically take up to two hours to create in AWS. After the apply/DB create runs for a couple of hours we get the error:

Error: error modifying DB Instance (my-giant-replica): InvalidParameterCombination: You must specify both the storage size and iops when modifying the storage size or iops on a DB instance that has iops.

Looking in CloudTrail we see that there is indeed a modify DB instance request that has allocated storage and not IOPS:

    "requestParameters": {
        "dBInstanceIdentifier": "my-giant-replica",
        "allocatedStorage": 6000,
        "vpcSecurityGroupIds": [
            "sg-1234567890123456"
        ],
        "applyImmediately": true,
        "dBParameterGroupName": "my-custom-pg",
        "preferredBackupWindow": "06:30-07:30",
        "preferredMaintenanceWindow": "fri:09:00-fri:09:30",
        "allowMajorVersionUpgrade": false
    }

Since the DB is being created, it should not require modification. The DB moves from creating to storage-optimization, which will also take several hours, so even if an update were necessary, it would not be possible to update the allocated storage at this time. Also, the AWS docs (https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.DBInstance.Modifying.html) say: "You can't modify allocated storage if the DB instance status is storage-optimization or if the allocated storage for the DB instance has been modified in the last six hours." so the provider should not try to make a call to update allocated storage at all. If it does, though, it should always include IOPS even if they have not changed.

The plan for the db is as follows:

+ resource "aws_db_instance" "rds-instance" {
      + address                               = (known after apply)
      + allocated_storage                     = 16000
      + allow_major_version_upgrade           = false
      + apply_immediately                     = false
      + arn                                   = (known after apply)
      + auto_minor_version_upgrade            = false
      + availability_zone                     = (known after apply)
      + backup_retention_period               = 0
      + backup_window                         = "04:11-04:41"
      + ca_cert_identifier                    = (known after apply)
      + character_set_name                    = (known after apply)
      + copy_tags_to_snapshot                 = true
      + db_subnet_group_name                  = (known after apply)
      + delete_automated_backups              = true
      + deletion_protection                   = false
      + enabled_cloudwatch_logs_exports       = [
          + "postgresql",
          + "upgrade",
        ]
      + endpoint                              = (known after apply)
      + engine                                = "postgres"
      + engine_version                        = "10.9"
      + final_snapshot_identifier             = "my-giant-replica-final-snapshot"
      + hosted_zone_id                        = (known after apply)
      + iam_database_authentication_enabled   = false
      + id                                    = (known after apply)
      + identifier                            = "my-giant-replica"
      + identifier_prefix                     = (known after apply)
      + instance_class                        = "db.r5.24xlarge"
      + iops                                  = 40000
      + kms_key_id                            = "arn:aws:kms:us-east-1:NNNNNNNNNN:key/AAAAAAAAAAAAAAAA"
      + license_model                         = (known after apply)
      + maintenance_window                    = "fri:10:15-fri:11:15"
      + monitoring_interval                   = 1
      + monitoring_role_arn                   = "arn:aws:iam::NNNNNNNNN:role/rds-monitoring-role"
      + multi_az                              = false
      + name                                  = "giant_db"
      + option_group_name                     = (known after apply)
      + parameter_group_name                  = "my-custom-pg"
      + performance_insights_enabled          = true
      + performance_insights_kms_key_id       = (known after apply)
      + performance_insights_retention_period = 7
      + port                                  = 5432
      + publicly_accessible                   = false
      + replicas                              = (known after apply)
      + replicate_source_db                   = "my-giant-db"
      + resource_id                           = (known after apply)
      + skip_final_snapshot                   = false
      + status                                = (known after apply)
      + storage_encrypted                     = true
      + storage_type                          = "io1"
      + tags                                  = {
          + "Name"           = "my-giant-replica"
          + "terraform"      = "true"
        }
      + timezone                              = (known after apply)
      + username                              = "username"
      + vpc_security_group_ids                = (known after apply)

      + timeouts {
          + create = "2h"
        }
    }
``

<!---
Please note the following potential times when an issue might be in Terraform core:

* [Configuration Language](https://www.terraform.io/docs/configuration/index.html) or resource ordering issues
* [State](https://www.terraform.io/docs/state/index.html) and [State Backend](https://www.terraform.io/docs/backends/index.html) issues
* [Provisioner](https://www.terraform.io/docs/provisioners/index.html) issues
* [Registry](https://registry.terraform.io/) issues
* Spans resources across multiple providers

If you are running into one of these scenarios, we recommend opening an issue in the [Terraform core repository](https://github.com/hashicorp/terraform/) instead.
--->

<!--- Please keep this note for the community --->

### Community Note

* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request
* Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
* If you are interested in working on this issue or have submitted a pull request, please leave a comment

<!--- Thank you for keeping this note for the community --->

### Terraform Version

<!--- Please run `terraform -v` to show the Terraform core version and provider version(s). If you are not running the latest version of Terraform or the provider, please upgrade because your issue may have already been fixed. [Terraform documentation on provider versioning](https://www.terraform.io/docs/configuration/providers.html#provider-versions). --->

Terraform v0.12.23
provider.aws v2.54.0

### Affected Resource(s)

<!--- Please list the affected resources and data sources. --->

* aws_db_instance

### Terraform Configuration Files

<!--- Information about code formatting: https://help.github.com/articles/basic-writing-and-formatting-syntax/#quoting-code --->

```hcl
# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key: https://keybase.io/hashicorp

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

  1. terraform apply

Important Factoids

References

  • #0000
@ghost ghost added the service/rds Issues and PRs that pertain to the rds service. label Mar 21, 2020
@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Mar 21, 2020
@corentone
Copy link
Contributor

corentone commented Mar 25, 2020

I was able to reproduce the bug with an acceptance test:
corentone@3837b93

❯ make testacc TEST=./aws TESTARGS='-run=TestAccAWSDBInstance_ReplicateSourceDb_AllocatedStorageAndIops'
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./aws -v -count 1 -parallel 20 -run=TestAccAWSDBInstance_ReplicateSourceDb_AllocatedStorageAndIops -timeout 120m
=== RUN   TestAccAWSDBInstance_ReplicateSourceDb_AllocatedStorageAndIops
=== PAUSE TestAccAWSDBInstance_ReplicateSourceDb_AllocatedStorageAndIops
=== CONT  TestAccAWSDBInstance_ReplicateSourceDb_AllocatedStorageAndIops
--- FAIL: TestAccAWSDBInstance_ReplicateSourceDb_AllocatedStorageAndIops (1716.28s)
    testing.go:654: Step 0 error: errors during apply:

        Error: error modifying DB Instance (tf-acc-test-5222287488441308567): InvalidParameterCombination: You must specify both the storage size and iops when modifying the storage size or iops on a DB instance that has iops.
        	status code: 400, request id: 9bf84a7b-0bff-4b19-a33c-d244388a706a

          on /var/folders/4z/zp0cp9hn241dyp6v1jfm_tsm0000gn/T/tf-test090395968/main.tf line 15:
          (source code not available)


FAIL
FAIL	github.com/terraform-providers/terraform-provider-aws/aws	1717.527s
FAIL
make: *** [testacc] Error 1

I will propose a fix in the next couple days. I'm thinking of making IOPS a variable that goes in both creation and modify steps.

The issue is as the error describes, the modify call MUST have the Iops passed for a DB that has IOPS set. DB size doesn't affect the bug.
I'd love for someone more experienced with RDS to critique it (Ill likely tag recent reviewers of the rds code when I get the fix :) )

@corentone
Copy link
Contributor

Small update. My first fix was to pass the iops to the modify step as well during the creation, which solved that error.

I've been adding a few more cases, one that tests a different allocated_storage in the replica compared to the source.

I get the following error:

        Error: error modifying DB Instance (tf-acc-test-5531041571957603201): InvalidParameterCombination: You can't currently modify the storage of this DB instance because the previous storage change is being optimized.

The explanation for this is what @muffy was already mentioning about the storage-optimization state.
I see two options moving forward to resolve this issue:
1/ do not allow creation of a replica with an allocated storage thats different from the primary. While that may work, this would be slightly odd if you do want a replica with different storage as you'd have to go in 2 steps. Also, Im not sure if it'd be possible to know the allocated storage of the primary in that part of the code (or it may require quite a bit of effort)
2/ if the allocated storage is passed, update the waiting condition to wait for the storage-optimation to be done and therefore only wait for available stage.

I'm leaning towards 2/ as it will be the most consistent in the code and that way there would be no change in behavior for the module. The wait for storage-optimization to be done could be limited to the case if allocated_storage is passed (which would make cases not using it slightly faster); I'm currently thinking it's best to have it for all for consistency.

@muffy
Copy link
Contributor Author

muffy commented Mar 26, 2020

I agree, it's most consistent to wait for storage-optimization to complete, as long as you also check to see if the storage is in fact being updated to a new value, since waiting until the end of storage optimization can add hours to the completion of the apply.

@corentone
Copy link
Contributor

corentone commented Mar 26, 2020

One more observation.
I tried waiting for storage-optimization and allocated_storage has to be the same as the primary, otherwise you get the error:

        Error: error modifying DB Instance (tf-acc-test-748905720185396347): InvalidParameterCombination: You can't currently modify the storage of this DB instance. Try again after approximately 6 hours.

So 1 is back! I'm RDS' detective pikachu at this point :)

@gkop
Copy link

gkop commented Oct 28, 2020

Just wanted to share what I observed: this bug manifested when passing explicit allocated_storage to the replica, even when it was the same value as for the primary. Passing explicit IOPS that are same as primary did not manifest any issue. Since there was no functional need for us to pass allocated_storage (we like it to be the same as primary, which is what you get when you don't pass explicitly), removing that got me past this snag.

@ahodges22
Copy link

We're still running into this on the v3.29.1 provider.

@bminahan-kc
Copy link

bminahan-kc commented May 10, 2021

This problem is most annoying when trying to increase the RDS storage size over time. If you have a primary and a replica, and you up the storage size on the primary and do a terraform apply:

  • if you DO NOT specify the storage size explicitly on the replica, the replica will NOT have its storage size updated to reflect the primary implicitly.
  • if you DO specify the storage size on the replica, you will run into this error.

so the only workaround I've been able to come up with besides completely replacing the replica is to NOT specify the allocated storage for the replica and use the AWS CLI to update the storage size of the replica once you do it once the terraform state did not seem to complain.

EDIT: this seems like the same issue but more with primary vs. replica relationship. I can open a separate ticket if needed to track this.

YakDriver pushed a commit that referenced this issue Jul 9, 2021
Ignore allocated storage while creating a Read Replica

* Fixes Issue #12493

Tests:

* The acceptance test (allocatedStorageAndIops that would be previously failing), shows the bug by
setting up a Replica DB with both IOPS and Allocated storage.
* Also Added an acceptance test for Iops modification to confirm code is ok because of the returned error
message in the issue. Didn't show failures previously.

Fix:
* Ignore allocated storage when creating a read replica as this value cannot be different from the primary.
* Update doc to reflect param handling difference.
@corentone
Copy link
Contributor

@bminahan-kc it sounds like the same issue.
I think your workaround is good (our workaround was to fork the provider-aws to apply the fix that was in the PR).

The Fix is merged and you should be able to pick it up on the next release, if that didn't fix it, feel free to open a new issue.

@justinretzolk
Copy link
Member

Hey all 👋 It looks like the fix for this was merged in with #12548 and was released with the v3.50.0 release of the provider. Given that's the case, we're going to close this issue for now. If anyone runs into it again with a version greater than v3.50.0, please open a new issue so that we can track the regression.

@github-actions github-actions bot removed the needs-triage Waiting for first response or review from a maintainer. label Oct 12, 2021
@speller
Copy link
Contributor

speller commented Oct 13, 2021

I have this issue under version 3.61.0.

Initializing provider plugins...
- Reusing previous version of hashicorp/aws from the dependency lock file
- Using hashicorp/aws v3.61.0 from the shared cache directory

...

  # module.backend.module.magi-db.aws_db_instance.app-db will be created
  + resource "aws_db_instance" "app-db" {
      + address                               = (known after apply)
      + allocated_storage                     = 10
      + allow_major_version_upgrade           = true
      + apply_immediately                     = true
      + arn                                   = (known after apply)
      + auto_minor_version_upgrade            = true
      + availability_zone                     = (known after apply)
      + backup_retention_period               = (known after apply)
      + backup_window                         = (known after apply)
      + ca_cert_identifier                    = (known after apply)
      + character_set_name                    = (known after apply)
      + copy_tags_to_snapshot                 = false
      + db_subnet_group_name                  = "rev-m2-4720-magi-aws-magi-mysql"
      + delete_automated_backups              = true
      + endpoint                              = (known after apply)
      + engine                                = "mysql"
      + engine_version                        = "5.7"
      + engine_version_actual                 = (known after apply)
      + hosted_zone_id                        = (known after apply)
      + id                                    = (known after apply)
      + identifier                            = (known after apply)
      + identifier_prefix                     = "rev-m2-4720-magi-aws-magi"
      + instance_class                        = "db.t3.micro"
      + kms_key=(sensitive)
      + latest_restorable_time                = (known after apply)
      + license_model                         = (known after apply)
      + maintenance_window                    = (known after apply)
      + monitoring_interval                   = 0
      + monitoring_role_arn                   = (known after apply)
      + multi_az                              = (known after apply)
      + name                                  = (known after apply)
      + nchar_character_set_name              = (known after apply)
      + option_group_name                     = (known after apply)
      + parameter_group_name                  = (known after apply)
      + password=(sensitive)
      + performance_insights_enabled          = false
      + performance_insights_kms_key=(sensitive)
      + performance_insights_retention_period = (known after apply)
      + port                                  = (known after apply)
      + publicly_accessible                   = false
      + replicas                              = (known after apply)
      + resource_id                           = (known after apply)
      + skip_final_snapshot                   = true
      + snapshot_identifier                   = "magi-test-db"
      + status                                = (known after apply)
      + storage_type                          = (known after apply)
      + timezone                              = (known after apply)
      + username                              = "root"
      + vpc_security_group_ids                = [
          + "sg-0498a96750e578d7f",
        ]
    }

...

Error: error modifying DB Instance (rev-m2-4720-magi-aws-magi20211013085558127600000003): InvalidParameterCombination: You must specify both the storage size and iops when modifying the storage size or iops on a DB instance that has iops.
	status code: 400, request id: 1a26673a-4955-40cb-affb-19e5bc5e40fc
  with module.backend.module.magi-db.aws_db_instance.app-db,
  on ../../db/main.tf line 82, in resource "aws_db_instance" "app-db":
  82: resource "aws_db_instance" "app-db" {

@github-actions
Copy link

github-actions bot commented Jun 3, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
service/rds Issues and PRs that pertain to the rds service.
Projects
None yet
Development

No branches or pull requests

7 participants