-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Temporary chrome deployment #4440
Open
vitorguidi
wants to merge
73
commits into
master
Choose a base branch
from
chrome
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We are currently logging progression when unpacking, but we're missing the last log. This can be useful to monitor what's going on in CF.
### Motivation We currently lack metrics for build retrieval and unpacking times. This PR adds that, with granularity by fuzz target and job type. There are two different implementations for build downloading/unpacking: - In the Build class, from which RegularBuild, SplitTargetBuild, FuchsiaBuild and SymbolizedBuild inherit the downloading/unpacking behavior - In the CustomBuild class, which implements its own logic There are two possible cases for downloading/unpacking: clusterfuzz either downloads the whole build and unpacks it locally, or unpacks it remotely. This is the case for all build types except CustomBuild, which does not perform remote unpacking. For build retrieval over http, we do not track download time. For all the other cases, it suffices to keep track of start/finish time for download and unpacking. Finally, a _build_type is added to the constructor of the Build class, from which all other inherit. This is used to track the build type (debug or release), and is only mutated by SymbolizedBuild when attempting to fetch a debug build. Part of #4271
For non engine fuzzers, we do not need to have the full list of fuzz targets, since those builds are only containing the APP_NAME. For that reason, and when the build is fully unpacked, lazily fetch the fuzzing targets only when requested by the user of the class. This change will tremendously speed up the unpacking step for some of our fuzzers, see https://docs.google.com/document/d/1OepfVcuG2XNXLxgZIXVwgE-uOhX9RO5RfH0TSg_wmPU/. Co-authored-by: jonathanmetzman <31354670+jonathanmetzman@users.noreply.github.com>
### Motivation We currently lack awareness on how old builds are during fuzz task. This PR implements that, by making the assumption that the Last Update Time metadata field in GCS is a good proxy for build age. [Documentation reference](https://cloud.google.com/storage/docs/json_api/v1/objects#resource) ### Approach Symbolized and custom builds do not matter, thus all builds of interest will be fetched from ```build_manager.setup_regular_build```. Logic for collecting all bucket paths and the latest revision was refactored, so that ```setup_regular_build``` can also figure out the latest revision for a given build and conditionally emit the proposed metric. ### Testing strategy !Todo: test this for fuzz, analyze, progression Locally ran tasks, with instructions from #4343 and #4345 , and verified the _emmit_build_age_metric function gets invoked and produces sane output. Commands used: ``` fuzz libFuzzer libfuzzer_asan_log4j2 ```  ``` progression 4992158360403968 libfuzzer_asan_qt ```  ``` analyze 4992158360403968 libfuzzer_asan_qt (disclaimer: build revision was overriden mid flight to force a trunk build, since this testcase was already tied to a crash revision) ```  Part of #4271
### Motivation Small changes for the build retrieval metric, as requested per Chrome folks after initially using the feature: * Adding the time for listing fuzz targets as a step * Added the total retrieval duration * Tracking metric as minutes, for readability Part of #4271
### Motivation The Chrome team has no easy visibility into how many manually uploaded test cases flake or successfully reproduce. This PR implements a counter metric to track that. There are three possible outcomes, each represented by a string label: 'reproduces', 'one_timer' and 'does_not_reproduce' Part of #4271
…4381) ### Motivation Once a testcase is generated (or manually uploaded), followup tasks (analyze/progression) are started. This happens by publishing to a pubsub queue, both for the manually uploaded case, and for the fuzzer generated case. If for any reason the messages are not processed, the testcase gets stuck. To get better visibility into these stuck testcases, the UNTRIAGED_TESTCASE_AGE metric is introduced, to pinpoint how old these testcases that have not yet been triaged are(more precisely, gone through analyze/regression/impact/progression tasks). ### Attention points Testcase.timestamp mutates in analyze task: https://github.com/google/clusterfuzz/blob/6ed80851ad0f6f624c5b232b0460c405f0a018b5/src/clusterfuzz/_internal/bot/tasks/utasks/analyze_task.py#L589 This makes it unreliable as a source of truth for testcase creation time. To circumvent that, a new ```created``` field is added to the Testcase entity, from which we can derive the correct creation time. Since this new field will only apply for testcases created after this PR is merged, Testcase.timestamp will be used instead to calculate the testcase age when the new field is missing. ### Testing strategy Ran the triage cron locally, and verified the codepath for the metric is hit and produces sane output (reference testcase: 4505741036158976).  Part of #4271
Chrome security shepherds manually upload testcases through appengine, triggering analyze task and, in case of a legitimate crash, the followup progression tasks: * Minimize * Analyze * Impact * Regression * Cleanup cronjob, when updating a bug to inform the user that all above stages were finished This PR adds instrumentation to track the time elapsed between the user upload, and the completion of the above events. * TestcaseUploadMetadata.timestamp was being mutated on the preprocess stage for analyze task. This mutation was removed, so that this entity can be the source of truth for when a testcase was in fact uploaded by the user. * The job name could be retrieved from the JOB_NAME env var within the uworker, however this does not work for the cleanup use case. For this reason, the job name is fetched from datastore instead. * The ```query_testcase_upload_metadata``` method was moved from analyze_task.py to a helpers file, so it could be reused across tasks and on the cleanup cronjob Every task mentioned was executed locally, with a valid uploaded testcase. The codepath for the metric emission was hit and produced the desired output, both for the tasks and the cronjob. Part of #4271
### Motivation Some cumulative distribution metrics (build age, retrieval, testcase age, testcase triage duration) are misbehaving and capping at 1. This PR intends to aid in debugging that.
This reverts commit 5404212.
### Motivation Cumulative distribution metrics from the monitoring initiative were incorrectly set to use the fixed width bucketer, and/or width=0.05 and max_buckets=20. This caused percentile metrics to cap at 1, which was wrong behavior. This PR attempts to fix that by moving them all to Geometric Bucketer, without the aforementioned limits. It also reverts #4429 , since it apparently broke triage.py in chrome. Part of #4271
https://crbug.com/380707237 is caused by this discrepency between master and chrome branches.
…ng everything (#4486) Now that we are lazily checking for fuzzing targets, it makes sense to allow remote unpacking even when unpacking the full archive. Furthermore, it seems that remote unpacking performances are much higher than local unpacking on CF bots, so this might improve overall performances of the build_manager.
### Motivation This merges #4489, #4458 and #4483 to the chrome temporary deployment branch The purpose is to have task error rate metrics, and log what old testcases are polluting the testcase upload metrics, so we can figure out if a purge is necessary --------- Co-authored-by: jonathanmetzman <31354670+jonathanmetzman@users.noreply.github.com>
This merges #4494 into the temporary chrome deployment
#4503) Fixes the following error: ``` AttributeError 'ProtoType' object has no attribute 'DESCRIPTOR' Failed to flush metrics: 'ProtoType' object has no attribute 'DESCRIPTOR' Traceback (most recent call last): File "/mnt/scratch0/clusterfuzz/src/clusterfuzz/_internal/metrics/monitor.py", line 118, in _flush_metrics metric.monitoring_v3_time_series(series, labels, start_time, end_time, File "/mnt/scratch0/clusterfuzz/src/clusterfuzz/_internal/metrics/monitor.py", line 350, in monitoring_v3_time_series self._set_value(point.value, value) File "/mnt/scratch0/clusterfuzz/src/clusterfuzz/_internal/metrics/monitor.py", line 404, in _set_value point.int64_value = value ^^^^^^^^^^^^^^^^^ File "/mnt/scratch0/clusterfuzz/src/third_party/proto/message.py", line 935, in __setattr__ pb_value = marshal.to_proto(pb_type, value) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/scratch0/clusterfuzz/src/third_party/proto/marshal/marshal.py", line 229, in to_proto proto_type.DESCRIPTOR.has_options ^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'ProtoType' object has no attribute 'DESCRIPTOR' ```
cherry-pick #4411
This adds 9 archives from Fuzzilli to the general test-input archives used by fuzzers on Clusterfuzz. The Fuzzilli-side archives are refreshed every few days. We'll add a freshness metric in a follow up. This was tested locally with butler. See bug: https://crbug.com/362963886
Cherry-pick of #4526
Introduces regex-based filtering directly in the API to improve performance and reduce number of calls to ab api server. Cherry pick: #4297 Co-authored-by: aditya-wazir <108256495+aditya-wazir@users.noreply.github.com>
The device check is updated to use find instead of a literal match so that sanitized version of the devices (e.g: cheetah_hwasan) can also be used --------- Cherry pick: #4256 Co-authored-by: svasudevprasad <151788366+svasudevprasad@users.noreply.github.com>
Merging #4530 into the Chrome branch
This adds all PRs related to the terraform dashboard.
Co-authored-by: Peter Boström <git@pbos.me>
This fixes the following error: ``` File "/usr/local/google/home/metzman/clusterfuzz-311/butler.py", line 421, in <module> sys.exit(main()) ^^^^^^ File "/usr/local/google/home/metzman/clusterfuzz-311/butler.py", line 407, in main return command.execute(args) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/google/home/metzman/clusterfuzz-311/src/local/butler/deploy.py", line 576, in execute package.package( File "/usr/local/google/home/metzman/clusterfuzz-311/src/local/butler/package.py", line 87, in package py_unittest.execute(args={}) File "/usr/local/google/home/metzman/clusterfuzz-311/src/local/butler/py_unittest.py", line 299, in execute target=args.target, ^^^^^^^^^^^ AttributeError: 'dict' object has no attribute 'target' ```
Previous logic would overwrite `minimized_keys` if say round 5 of libfuzzer minimization failed, even if round 4 succeeded. This resulted in tasks completing minimization successfully, but not storing `minimized_keys` to reflect that in the database. Instead, we should take the result of the last successful minimization round. This both aligns with the logic around crash results and testcase file names, and prevents another side effect of this bug: we were previously deleting blobs on GCS that had been uploaded during successful minimization rounds :/ Note: it seems a similar bug affects the cleanse step, but I did not change logic there as it is OSS-Fuzz specific and I was not sure whether or not it could be intentional. See [`minimize_task.py` line 1686](https://github.com/google/clusterfuzz/pull/4626/files#diff-e8255271eeeadc2bda46b215fbc7b0bb160ef1116f6e27ff895990d88caafa5fR1686). There are exceedingly few tests for minimize task, so I did not write a test for this either - the effort required seemed too high. Bug: https://crbug.com/389589679
This was requested by chrome. I can't really think of a strong justification for this paternalistic behavior, and I can think of some against it, mainly that it causes CF to behave unlike the user expects. Cherry-picked from 01239a0 in master. Fixes: b/382207330 Co-authored-by: jonathanmetzman <31354670+jonathanmetzman@users.noreply.github.com>
Merging #4562 to branch 'chrome' Co-authored-by: Vitor Guidi <vitorguidi@gmail.com>
Chrome check failure stack traces have changed, update the ignore regexes to match them. This generalizes previous regexes to cover the new variants: ``` logging::CheckLogMessage::~CheckLogMessage logging::DCheckLogMessage::~DCheckLogMessage logging::CheckNoreturnError::~CheckNoreturnError logging::NotReachedLogMessage::~NotReachedLogMessage logging::NotReachedNoreturnError::~NotReachedNoreturnError ``` cc @pbos @tsepez for future reference Bug: https://crbug.com/389589679 Co-authored-by: Titouan Rigoudy <titouan@chromium.org>
#4649) …read runs Merge to chrome branch.
Cherry-pick of #4656 into the `chrome` branch.
Cherry-pick of #4661 into the `chrome` branch.
merging PR #4638 chrome
Cherry-pick of #4669 into the `chrome` branch.
centipede: create workdir in prepare instead of the constructor It turns out that the same class can be used for different fuzzing rounds, and the parent directory of the temp dir is being cleared in between each round. For that reason, we need to re-create a workdir in prepare.
Cherry-pick from of #4647 into `chrome` branch.
We are getting this error during butler deploy: ``` | ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. | twine 5.1.1 requires urllib3>=1.26.0, but you have urllib3 1.24.3 which is incompatible. ``` There might have been a regression in the underlying python dependencies. We ran: ``` pipenv lock ``` And this will change the following deps: [prod] aiohappyballs: 2.4.2 > 2.4.6 aiosignal: 1.3.1 > 1.3.2 attrs: 24.2 > 25.1 cachetools: 5.5.0 > 5.5.1 deprecated: 1.2.14 > 1.2.18 frozenlist: 1.4.1 > 1.5.0 google-cloud-appengine-logging: 1.4.5 > 1.5.0 google-apis-commons-protos: 1.65 > 1.66 grpc-google-iam-v1: 0.13.1 > 0.14.0 pbr: 6.1.0 > 6.1.1 propcache: 0.2.1 (NEW) proto-plus: 1.24.0 > 1.26.0 pyjwt: 2.9.0 > 2.10.1 setuptools: 75.1.0 > 75.8.0 six: 1.16.0 > 1.17.0 yarl: 1.13.1 > 1.18.3 [dev] cachecontrol: 0.14.0 > 0.14.1 cachetools: 5.5.0 > 5.5.1 click: 8.1.7 > 8.1.8 google-cloud-firestore: 2.19.0 > 2.20.0 google-apis-common-protos: 1.65.0 > 1.66.0 markupsafe: 2.1.5 > 3.0.2 proto-plus: 1.24.0 > 1.26.0 pyjwt: 2.9.0 > 2.10.1 ### Testing We successfully deployed from butler with --release=chrome-tests-syncer and --target=zips, this unblocks deployments
Fix a bug inserted in #4675, which was deleting the manifest file before uploading it to the GCP during deployment.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.