Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterate on the new verification_jobs table #1826

Closed
Tracked by #1367
marcocastignoli opened this issue Jan 6, 2025 · 17 comments
Closed
Tracked by #1367

Iterate on the new verification_jobs table #1826

marcocastignoli opened this issue Jan 6, 2025 · 17 comments
Assignees

Comments

@marcocastignoli
Copy link
Member

marcocastignoli commented Jan 6, 2025

Iterate on the new table definition, making sure that it contains all the need information. In particular focus on the foreign key verification_id in the case of a successful verification and understand what information to include in the case of a failed verification:

  • jobId
  • endpoint: /v2/verify/etherscan/
  • chainId
  • address
  • status
  • server hardware
  • compilation time
  • errors (compiler failures)
  • createdAt
  • finishedAt (success or fail)
  • when successful: verification-id
  • Do we want to store information when the compilation succeeds but match fails?
    • onchain bytecodes
    • recompiled bytecodes
@marcocastignoli marcocastignoli moved this from Triage to Sprint - Candidates in Sourcify Public Jan 6, 2025
@marcocastignoli marcocastignoli moved this from Sprint - Candidates to Sprint - Up Next in Sourcify Public Jan 6, 2025
@manuelwedler manuelwedler moved this from Sprint - Up Next to Sprint - In Progress in Sourcify Public Jan 28, 2025
@manuelwedler
Copy link
Contributor

manuelwedler commented Jan 30, 2025

I mostly agree with above fields. Errors need multiple fields due to the API design in Stoplight. I propose to use exactly these names for the columns:

Table: verification_jobs

Needed for the API response:

  • id: bigserial, primary key, incremental
  • status: varchar, not null
  • started_at: timestamptz, not null
  • completed_at: timestamptz
  • chain_id: bigint, not null
  • contract_address: bytea, not null
  • verified_contract_id: bigserial, foreign key to verified_contracts.id
  • error_code: varchar
  • error_message: varchar
  • error_id: uuid

These are only needed for stats:

  • verification_endpoint: varchar, not null
  • server_hardware: jsonb, not null
  • compilation_time: interval

Regarding server_hardware, I am not sure if we can get helpful information for it, since we run it in Cloud Run and CPU and memory allocations are just limits. We can try to use the Node.js functions for those (os.cpus(), os.totalmem(), ...). Let's see what they return under this environment. I think it makes sense to store this information for each verification job in the following json format:

{
  "cpu": string,
  "memory": string
}

Anything else that should be added to the server_hardware?

Additional information?

Do we want to store information when the compilation succeeds but match fails?

  • onchain bytecodes
  • recompiled bytecodes

In my opinion, we should not store more information than necessary, because I don't want to blow the database. The error_code with the error_message should be enough to tell the user what went wrong. If we want to debug something, we should be able to find the corresponding logs by using the error_id.

@marcocastignoli marcocastignoli moved this from Sprint - In Progress to Sprint - Needs Review in Sourcify Public Jan 30, 2025
@kuzdogan
Copy link
Member

Small comment on chain_id, I think it should be bigInt instead of numeric, see verifier-alliance/database-specs#23

@kuzdogan
Copy link
Member

kuzdogan commented Jan 31, 2025

Additionally this was what's recommended by Claude. I don't know to what extent this is useful but I see we can go beyond the minimal cpu+memory info:

const os = require('os');
const v8 = require('v8');

const getHardwareInfo = () => {
  const memoryUsage = process.memoryUsage();
  const heapStats = v8.getHeapStatistics();
  
  return {
    cpu: {
      model: os.cpus()[0].model,
      cores: os.cpus().length,
      speed: os.cpus()[0].speed,
      allocated_cores: process.env.CPU_LIMIT || os.cpus().length,
      cpu_quota: process.resourceUsage().maxCPU || null
    },
    memory: {
      total_allocated: process.env.MEMORY_LIMIT || os.totalmem(),
      available: os.freemem(),
      process_usage: {
        heapTotal: memoryUsage.heapTotal,
        heapUsed: memoryUsage.heapUsed,
        external: memoryUsage.external,
        rss: memoryUsage.rss
      }
    },
    environment: {
      type: process.env.K_SERVICE ? 'cloud-run' : 'local',
      node_version: process.version,
      container_id: process.env.HOSTNAME || null
    }
  };
};

Edit: When I asked for only the essential info Claude gave me these:

const os = require('os');

const getEssentialHardwareInfo = () => {
  return {
    cpu: {
      allocated_cores: process.env.CPU_LIMIT || os.cpus().length,
      model: os.cpus()[0].model
    },
    memory: {
      allocated_mb: Math.floor((process.env.MEMORY_LIMIT || os.totalmem()) / (1024 * 1024))
    },
    environment: {
      type: process.env.K_SERVICE ? 'cloud-run' : 'local'
    }
  };
};

@kuzdogan
Copy link
Member

Regarding storing the bytecodes, I'd really like to show a nice UI for the users in the future how these bytecodes diff for failed jobs. Like this:

Image

We find ourselves debugging into people's failed verifications a lot of times and it would be useful to have this information without doing everything locally, even better if we can convey this information to the user and they don't have to contact us at all. IMO this is an essential part of building an overall better contract verification experience.

However this is not a requisite from my side and this can be added later by extending the table. We can even add another table like failed_verification_bytecodes that only retains the failed bytecodes of the last e.g. 2 weeks. So that the database won't blow. The bytecodes are only useful immediately to see what went wrong but we can keep the verification_jobs table minimal and store it longer term.

@manuelwedler
Copy link
Contributor

Small comment on chain_id, I think it should be bigInt instead of numeric, see verifier-alliance/database-specs#23

Updated my proposal regarding this.

Additionally this was what's recommended by Claude. I don't know to what extent this is useful but I see we can go beyond the minimal cpu+memory info:

I would do a mixture of the extended and minimal information. Not everything of the extended seems useful to me. How about this?

const getHardwareInfo = () => {
  return {
    cpu: {
      allocated_cores: process.env.CPU_LIMIT || os.cpus().length,
      model: os.cpus()[0].model,
      speed: os.cpus()[0].speed
    },
    memory: {
      total_allocated: process.env.MEMORY_LIMIT || os.totalmem()
    },
    environment: {
      type: process.env.K_SERVICE ? 'cloud-run' : 'local',
      node_version: process.version,
      cloud_run_revision: process.env.K_REVISION
    }
  };
};

Where do these environment variables actually come from (CPU_LIMIT and MEMORY_LIMIT)? Not sure if they really exist.

The other question here is, do we really want to store this information for every job? If you exclude the current resource usage, this will be a lot of redundant information. We could store this once per revision in a separate table and just reference it by some identifier.

Regarding storing the bytecodes, I'd really like to show a nice UI for the users in the future how these bytecodes diff for failed jobs. Like this:

Alright, makes sense to me. From my point of view, we can add it now. But it should reference entries in the code table. How about adding these to the verification_jobs table?

  • onchain_creation_code_hash: bytea, foreign key to code
  • onchain_runtime_code_hash: bytea, foreign key to code
  • recompiled_creation_code_hash: bytea, foreign key to code
  • recompiled_runtime_code_hash: bytea, foreign key to code

Now, I also wonder if we need to store the error_message in the database. The server could determine the message on-the-fly based on the error_code and the stored bytecodes.

@kuzdogan
Copy link
Member

kuzdogan commented Feb 3, 2025

Yes we can remove the error_message

I also agree with the redundancy of the server_hardware. Also in the server_hardware I see the hardware info but no information regarding how much resources are used, right? Memory and cpu would be the same across all verification jobs?

And regarding storing the bytecodes in code table, I'm not sure about that. If we do that would it be difficult when we are deleting the verification_job entry? When we are deleting the verification_jobs that are e.g. 30days old, we need to delete the code entry too and we need to make sure we are not deleting a code that is being referenced by a verified contract. Also catagorically the bytecodes in code are different than those coming from failed jobs. code are verified contract bytecodes so they are "useful", the bytecodes from failed jobs will be ephemeral and "not useful" so maybe it makes sense to separate them in different tables. Also keep in mind we'll only save bytecodes for failed jobs so most of the verification_jobs will not have bytecodes.

@manuelwedler
Copy link
Contributor

Right, they should only be stored for failed jobs.

I expected the verification_jobs table to never get deleted. It's hard to determine what would be a good time frame for deleting. Also consider this scenario: Someone verifies a contract through the Remix plugin and closes the browser immediately after. He opens Remix only after 2 months again. The verification would show as failed in the IDE, because we deleted the information, but actually the verification went successfully.

I also agree with the redundancy of the server_hardware. Also in the server_hardware I see the hardware info but no information regarding how much resources are used, right? Memory and cpu would be the same across all verification jobs?

If we don't change the Cloud Run service, I would expect this, yes. In my proposal, I excluded the resource usage for this reason.

@manuelwedler
Copy link
Contributor

So maybe our idea for table was a bit different here. I tried to keep it minimal, because I would like to keep the data forever.

What if we separate the table into two: One for the minimal information of the job that should be kept forever, and one with additional information that is used for debugging and statistics (maybe bytecodes and server_hardware) and can be deleted from time to time.

@kuzdogan
Copy link
Member

kuzdogan commented Feb 3, 2025

The verification would show as failed in the IDE, because we deleted the information, but actually the verification went successfully.

This is a rather edge case and I don't think we need to cover them. It's reasonable users can't find out why their job failed after 30-60 days. But if their verification went through, they can always check with chainId+address.

What if we separate the table into two: One for the minimal information of the job that should be kept forever, and one with additional information that is used for debugging and statistics (maybe bytecodes and server_hardware) and can be deleted from time to time.

Bytecodes yes but server hardware was also meant for longer term. The goal of that is to provide context for the compilation and hardware+compilation_time gives a relative resource use information that can be useful when e.g. someone's looking for resource intensive contracts.

We can dedupe the server hardware easily with a separate table, so it can stay longer term. For the bytecodes, we can look in the future how much space they take and only then start deleting. We can also explore some compression options as I don't expect the older bytecodes to be accessed frequently.

@marcocastignoli
Copy link
Member Author

Regarding hardware I think it's best to just use an identifier that we provide in the .env, we just need to store the identifier in the verification_jobs table. E.g. current identifier can be: GCP_CLOUD_RUN_0. Then we know that we refer with GCP_CLOUD_RUN_0.

Regarding deleting the rows, the remix case @manuelwedler can be fixed on the client side, just by calling v2/contract/chain/address instead of v2/verificationJob. That said, I like the idea of having two tables. One for successful contracts (minimal, potentially with permanent data) and one for failed contracts (extended, we remove data every N days).

Regarding verification_job's bytecode, I would not store them in codes. Let's keep everything in the verification_jobs table. I think it's much safer for when we'll have to delete and also it's a potential attack strategy for someone to fill up our database with bad codes.

@manuelwedler
Copy link
Contributor

I agree mostly with what Marco said. I think we can indeed just store an identifier for the server hardware. The identifier for the hardware could be the process.env.K_REVISION which gives the Cloud Run revision. That would reduce the data a lot compared to a json object. I would also name the column cloud_run_revision in order to be specific.

Here is my new proposal composing the above ideas:

Table verification_jobs

  • id: bigserial, primary key, incremental
  • status: varchar, not null
  • started_at: timestamptz, not null
  • completed_at: timestamptz
  • chain_id: bigint, not null
  • contract_address: bytea, not null
  • verified_contract_id: bigserial, foreign key to verified_contracts.id
  • error_code: varchar
  • error_id: uuid
  • verification_endpoint: varchar, not null
  • cloud_run_revision: varchar (use process.env.K_REVISION)
  • compilation_time: interval

Table verification_jobs_ephemeral

  • id: bigserial, primary key, foreign key to verification_jobs.id
  • onchain_creation_code: bytea
  • onchain_runtime_code: bytea
  • recompiled_creation_code: bytea
  • recompiled_runtime_code: bytea

What do you think about this?

@manuelwedler
Copy link
Contributor

manuelwedler commented Feb 4, 2025

Regarding deleting the rows, the remix case @manuelwedler can be fixed on the client side, just by calling v2/contract/chain/address instead of v2/verificationJob.

Regarding that, I'm not sure if that would be the right thing to do. It could be that verification failed because a contract is already verified. In that case, it would be weird to show a successful receipt in Remix.

I would say we should just move forward with not deleting the verification_jobs table and see if it really grows too big over time.

@marcocastignoli
Copy link
Member Author

Nice!

  1. Let's just keep a generic column name for hardware: instead of cloud_run_revision -> hw. Then we store something with an identifier + details "cloud_run:${process.env.K_REVISION}". So we can reuse it if we change environment.
  2. "In that case, it would be weird to show a successful receipt in Remix" Right.

@kuzdogan
Copy link
Member

kuzdogan commented Feb 4, 2025

  • cloud_run_revision: varchar (use process.env.K_REVISION)

This wouldn't give an outsider eye what the server hardware was like CPU and memory, which was the whole point. Should we keep that info somewhere, like in another table? We can still keep this for debugging purposes though.

@manuelwedler
Copy link
Contributor

This wouldn't give an outsider eye what the server hardware was like CPU and memory, which was the whole point. Should we keep that info somewhere, like in another table? We can still keep this for debugging purposes though.

Also agree with this point. You would need access to GCP then to gather this information. The other question is, who will actually have access to our database directly? This table should just be for internal use and not included in the parquet export.

If we decide to store the hardware info, I think it should be in a separate table. The json properties from above could be its columns.

@marcocastignoli
Copy link
Member Author

marcocastignoli commented Feb 4, 2025

like in another table?

I think we can just list all the details of the server somewhere else. I would not create a postgresql table for this. E.g. in our docs

@manuelwedler
Copy link
Contributor

Implemented the last proposal in #1902

@github-project-automation github-project-automation bot moved this from Sprint - Needs Review to Sprint - Done in Sourcify Public Feb 5, 2025
@marcocastignoli marcocastignoli moved this from Sprint - Done to COMPLETED in Sourcify Public Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants