Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for configured pipelines like Line's format #79

Merged
merged 11 commits into from
Oct 9, 2024

Conversation

intarga
Copy link
Member

@intarga intarga commented Oct 2, 2024

  • deserialize pipeline config files
  • replace DAG with pipeline map in scheduler
  • fix breakage
  • derive num_leading and num_trailing from the pipeline
  • fix references to DAG in documentation

@intarga intarga self-assigned this Oct 2, 2024
@intarga intarga added the enhancement New feature or request label Oct 2, 2024
@intarga intarga linked an issue Oct 2, 2024 that may be closed by this pull request
@intarga intarga marked this pull request as ready for review October 7, 2024 14:20
@intarga intarga requested a review from Lun4m October 7, 2024 14:21
@Lun4m
Copy link
Collaborator

Lun4m commented Oct 8, 2024

Compared to the old yaml version there is a lot of repetition here (and I'm also not a big fan of toml's array-of-table syntax 😅). What do you think about instead using:

-  [[steps]]
-  name = "special_value_check"
-  [steps.check.special_value_check]
+  [steps.special_value_check]
   special_values = [-999999, -6999, -99.9, -99.8, 999, 6999, 9999]

And then deserializing with the serde_with crate (not 100% sure I understand how it works)

+   #[serde_with::serde_as]
    #[derive(Debug, Deserialize, PartialEq, Clone)]
    pub struct Pipeline {
        /// Sequence of steps in the pipeline
-       pub steps: Vec<PipelineStep>,
+       #[serde_as(as = "serde_with::EnumMap")]
+       pub steps: Vec<CheckConf>,
        /// Minimum number of leading points required by all the tests in this pipeline
        #[serde(skip)]
        pub num_leading_required: u8,
        /// Minimum number of trailing points required by all the tests in this pipeline
        #[serde(skip)]
        pub num_trailing_required: u8,
    }

But we would also need to convert the CheckConf to string to extract the default check name.
And it might be useful to keep an optional name (or description) field if we want to run the same check with different parameters. I guess this would end up inside each conf struct, but maybe there's a way to have it outside (if it makes any difference)?

@intarga
Copy link
Member Author

intarga commented Oct 8, 2024

The name and the enum variant are actually doing different things here. the enum variant decides what check will be run, but the name will label the flags that get returned.

These are not always the same, for example WP5 have talked about having mutliple range checks (one for instrument limits, one to reject based on heuristics, one that will not flag for end users, but will alert HQC to look at suspicious data). These will all use the same enum variant, but will have different names.

@Lun4m
Copy link
Collaborator

Lun4m commented Oct 8, 2024

Yes, that's why I said we should have an optional name field if we have the same check with different parameters. However what I proposed won't work anyway, because EnumMap will panic if you have duplicate tables 😞 (lol I forgot how toml works)

@intarga
Copy link
Member Author

intarga commented Oct 8, 2024

I like the idea of being able to use a default name, though. And that would get rid of unnecessary repetition. I worry it might make the format more confusing, but as long as we document it clearly it should be ok.

I also don't like the array of tables syntax, but I don't think we have another option if we want to use toml 🙁

@intarga
Copy link
Member Author

intarga commented Oct 8, 2024

Yes, that's why I said we should have an optional name field if we have the same check with different parameters. However what I proposed won't work anyway, because EnumMap will panic if you have duplicate fields 😞 (lol I forgot how toml works)

Oh that's a shame. I think we could still make it work with a custom deserializer though. Only issue is whether it's worth taking on the maintenance burden associated with that. I'll let you make a decide on that, and I'll implement it if you say so.

@Lun4m
Copy link
Collaborator

Lun4m commented Oct 8, 2024

That's not fair 🤣 But duplicate fields are not valid in TOML, so I don't think it's worth it.

Copy link
Collaborator

@Lun4m Lun4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job, it's a good foundation we can iterate on with the others. I only have a couple of comments.

@intarga intarga requested a review from Lun4m October 8, 2024 14:04
intarga added a commit that referenced this pull request Oct 8, 2024
@Lun4m
Copy link
Collaborator

Lun4m commented Oct 9, 2024

Apparently singular naming style for array-of-tables is allowed, so we can use [[step]] and add #[serde(rename = "step")] above the steps field in Pipeline

@intarga
Copy link
Member Author

intarga commented Oct 9, 2024

@Lun4m I think we're good to go on this!

@intarga intarga merged commit ca0831e into trunk Oct 9, 2024
1 check passed
intarga added a commit that referenced this pull request Oct 9, 2024
@intarga intarga deleted the yaml-pipelines branch October 9, 2024 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parse test pipelines from a config file
2 participants