Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downgrade PostgreSQL to v13.x [MARXAN-1664] #1148

Merged
merged 4 commits into from
Jul 7, 2022

Conversation

hotzevzl
Copy link
Member

@hotzevzl hotzevzl commented Jun 23, 2022

DO NOT MERGE: even if/when we are happy with these changes after having magnificently verified everything for functional and performance regressions, we should keep these changes in a parallel universe of a PR stack that includes changes needed for infra/tf, so that we can keep fast-tracking urgent changes into develop in the meanwhile for any fixes that may be needed for instances where demos/training sessions are running.

https://vizzuality.atlassian.net/browse/MARXAN-1664

Part of spike to check if all works as expected (bar any tolerable performance regressions) if we run both apidb and geodb on PostgreSQL v13, in preparation to moving the database instances in k8s setups to the Azure database for PostgreSQL Flexible service, which currently supports PostgreSQL up to version 13.

We have been using PostGIS v3.1 so far, and this PR also bumps up PostGIS to v3.2 (as this is the version currently supported in the PostgreSQL v13 Azure managed service).

In theory, functional regressions are not expected with this downgrade of PostgreSQL, as the key functionality needed for vector tiles is provided by PostGIS v3.x and we're keeping (and actually bumping up) this extension's version.

Some performance regressions are expected, instead, with the PostgreSQL downgrade at least in terms of some lesser parallelization of queries in some cases: the assumption here is however that we may be trading some tolerable performance regressions in some cases with the ability to run the PostgreSQL databases on the Azure managed service, which provide both functional advantages (backups, restores, other managed affordances) as well as the ability to potentially use read replicas in
the future to better balance read loads.

@vercel
Copy link

vercel bot commented Jun 23, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
marxan ✅ Ready (Inspect) Visit Preview Jul 5, 2022 at 10:49AM (UTC)
marxan-storybook ✅ Ready (Inspect) Visit Preview Jul 5, 2022 at 10:49AM (UTC)

@tiagojsag
Copy link
Contributor

LGTM I was waiting on a reply from their side to move forward with the infra work on our end.

@tiagojsag
Copy link
Contributor

btw we should validate that Azure Postgres thingy has the other plugins we need.

@hotzevzl
Copy link
Member Author

hotzevzl commented Jun 27, 2022

btw we should validate that Azure Postgres thingy has the other plugins we need.

marxan-geo-api=# select extname,extversion from pg_extension;
        extname         | extversion
------------------------+------------
 plpgsql                | 1.0
 postgis                | 3.2.1
 postgis_topology       | 3.2.1
 fuzzystrmatch          | 1.1
 postgis_tiger_geocoder | 3.2.1
 pgcrypto               | 1.3
 tablefunc              | 1.0
 postgis_raster         | 3.2.1
(8 rows)
marxan-api=# select extname, extversion from pg_extension;
 extname  | extversion
----------+------------
 plpgsql  | 1.0
 pgcrypto | 1.3
(2 rows)

In Azure Flexible Server for PostgreSQL v13 (https://docs.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-extensions#postgres-13-extensions):

pgpgsql: 1.0
postgis: 3.2.0
postgis_topology: 3.2.0
fuzzystrmatch: 1.1
postgis_tiger_geocoder: 3.2.0
pgcrypto: 1.3
tablefunc: not available
postgis_raster: 3.2.0

As far as I can see, tablefunc is not actually used (none of its functions are mentioned anywhere in the codebase: https://www.postgresql.org/docs/current/tablefunc.html) - so I think I can remove this extension as part of this PR.

@aagm do you see any issues with this?

@hotzevzl
Copy link
Member Author

hotzevzl commented Jul 1, 2022

so @aagm @tiagojsag I don't understand this:

what seems to fail in several tests is an attempt to insert into a calculated column, for which the error in PG13 is (when doing something similar via psql):

ERROR:  cannot insert into column "hash"
DETAIL:  Column "hash" is a generated column.

however, the same operation does likewise (and thankfully) lead to an error in PG14:

ERROR:  cannot insert a non-DEFAULT value into column "hash"
DETAIL:  Column "hash" is a generated column.

And in both cases, given the hint provided by PG14, inserting into a generated column with a DEFAULT value works, which is where I am getting lost, because these tests fail with the only difference being the PostgreSQL server version and nothing else in the code (unless of course TypeORM or the pg driver do things differently between pg13 and pg14).

So, this is more of an update on this task: in case there's something obvious that jumps to your eyes, please let me know, otherwise I'll get into overdrive mode trying to understand what's breaking so miserably.

Lastly, this is not "just" a failure artifact related to tests only: given that all the failing tests are related to project cloning/export/import, I could confirm that these operations succeed (in general 🙄) in instances running on PG14 but not on instances (aka my local env, for now) running on PG13.

@hotzevzl
Copy link
Member Author

hotzevzl commented Jul 1, 2022

nvm, I've caught the hicckup in docker logs and I know how to avoid this - we need to tell TypeOrm to never even bother trying to include the generated column in insert operations. Why using DEFAULT for the column nevertheless works both in pg13 and pg14 via psql but not via typeorm/node-pg is a mystery, bit not a mystery i'm willing to follow through source code at this stage.

@hotzevzl
Copy link
Member Author

hotzevzl commented Jul 1, 2022

I know how to avoid this

so, all the hiccups related to that generated column are fixed now, and we're left with a single regression:

FAIL test/tiles/protected-area.e2e-spec.ts
  ● Getting tile for project
    expect(received).toEqual(expected) // deep equality
    Expected: 17
    Received: 18
      74 | 
      75 |       expect(customFeature.length).toEqual(1);
    > 76 |       expect(features.length).toEqual(17);
         |                               ^
      77 | 
      78 |       /**
      79 |        * TODO maybe:
      at Object.ThenItContainsCustomProtectedArea (test/tiles/protected-area.fixtures.ts:76:31)
      at Object.<anonymous> (test/tiles/protected-area.e2e-spec.ts:17:18)
Test Suites: 1 failed, 1 skipped, 48 passed, 49 of 50 total
Tests:       1 failed, 4 skipped, 2 todo, 166 passed, 173 total
Snapshots:   0 total
Time:        350.678 s
Ran all test suites.
error Command failed with exit code 1.

which seems weird, because this should be essentially ST_AsMVT() producing tiles with a different number of features (18 vs 17) in pg13 vs pg14 - my guess would be some different treatment of something towards the edge of a tile, that is caught in when using postgis 3.1 but not 3.2

I'd be tempted to surrender to this one and update the expectation in the failing test, especially as visual inspection of protected area tiles in an instance with pg13 doesn't show anything that looks off, but I'll leave this for Monday - @aagm please feel free to suggest if we should investigate this regression further.

@hotzevzl
Copy link
Member Author

hotzevzl commented Jul 5, 2022

@aagm @tiagojsag pragmatically trying to close these PG14->PG13 downgrade checks, I am personally happy-ish to go ahead with the down-up-grade by updating the only failing expectation: 7751391

It would be worth understanding why PostGIS 3.2 calculates things differently than PostGIS 3.1 (tests pass, untouched, when using PostgreSQL13+PostGIS3.1 instead of PostgreSQL13+PostGIS3.2 as base image: https://github.com/Vizzuality/marxan-cloud/actions/runs/2611259505) - but 1. I don't think it's a battle worth fighting now, 2. I kinda trust that PostGIS does the right thing in any case, and 3. my focus in this spike was to make sure that a downgrade to PostgreSQL 13 would not cause operational issues or require us to rewrite un-manageable portions of the application's code, so I think we could live with discrepancies in how some things are calculated.

If needed (from a science point of view) we can try to dig further into calculation discrepancies later.

I'd merge this PR at the end of the day today unless either of you thinks we should take the originally failing test as a red flag.

hotzevzl added 4 commits July 5, 2022 12:43
Part of spike to check if all works as expected (bar any tolerable
performance regressions) if we run both apidb and geodb on PostgreSQL
v13, in preparation to moving the database instances in k8s setups to
the Azure database for PostgreSQL Flexible service, which currently
supports PostgreSQL up to version 13.

We have been using PostGIS v3.1 so far, and this PR also bumps up
PostGIS to v3.2 (as this is the version currently supported in the
PostgreSQL v13 Azure managed service).

In theory, functional regressions are not expected with this downgrade
of PostgreSQL, as the key functionality needed for vector tiles is
provided by PostGIS v3.x and we're keeping (and actually bumping up)
this extension's version.

Some performance regressions are expected, instead, with the PostgreSQL
downgrade at least in terms of some lesser parallelization of queries in
some cases: the assumption here is however that we may be trading some
tolerable performance regressions in some cases with the ability to run
the PostgreSQL databases on the Azure managed service, which provide
both functional advantages (backups, restores, other managed
affordances) as well as the ability to potentially use read replicas in
the future to better balance read loads.
`tablefunc` functions are not currently used, and this extension is not
available in the Azure Database for PostgreSQL - Flexible Server
(PostgreSQL v13), so we should be able to safely avoid the use of this
extension altogether.
The expectation was toEqual(17) originally; when switching from
PostgreSQL 14+PostGIS 3.1 to PostgreSQL 13+PostGIS 3.2 (to match the
versions available in Azure Database for PostgreSQL - Flexible Server),
processing of the features leads to a count of 18 features (this is due,
specifically, to the PostGIS 3.1->3.2 part of the down-up-grade).
@hotzevzl hotzevzl force-pushed the spike/api/MARXAN-1664_downgrade-to-postgresql-13 branch from 7751391 to 2bd14f4 Compare July 5, 2022 10:44
@hotzevzl hotzevzl merged commit 3ffa90d into develop Jul 7, 2022
@hotzevzl hotzevzl deleted the spike/api/MARXAN-1664_downgrade-to-postgresql-13 branch July 7, 2022 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants