-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
My fixes #815
My fixes #815
Conversation
@@ -76,7 +76,7 @@ def __init__(self, | |||
# if True (default), use the already initialized values for the first iteration. | |||
self._start_with_defaults: bool = bool( | |||
strtobool(str(self._config.pop('start_with_defaults', True)))) | |||
self._max_iter = int(self._config.pop('max_suggestions', 100)) | |||
self._max_iter = int(self._config.pop('max_suggestions', 5000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we should drop this. It can be changed in configs.
_LOG.info( | ||
"Update the optimizer with: %d configs, %d scores, %d status values", | ||
len(configs or []), len(scores or []), len(status or []) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none at all :) just format really
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yshady the general rule for submitting PRs (open-source or not) is to separate the functionality changes from cosmetic fixes and make sure each PR is as small as possible (but not smaller)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry this is kinda a first for me :) like a kid learning to walk for first time haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me try to clean this up tomorrow after my demo and explain some of the stuff. I also still need to figure out how to use GitHub correctly
if len(configs or []) == 1: | ||
_LOG.info("Only one configuration provided, using defaults.") | ||
self._start_with_defaults = True | ||
elif has_data and self._start_with_defaults: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the meat of the change. Can you please update the PR description to explain this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there was no config it would freak out and say hey you have no trials in the db I cannot suggest a config based on nothing
status: Optional[Sequence[Status]] = None) -> bool: | ||
configs: Sequence[dict], | ||
scores: Sequence[Optional[Dict[str, TunableValue]]], | ||
status: Optional[Sequence[Status]] = None) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please undo whitespace only changes. They make it harder to understand what's relevant. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry I only use GitHub for my personal code, never work in teams using it so first time experience really. My school has decided that math is more important than GitHub experience
@@ -47,7 +47,7 @@ class DbSchema: | |||
# pylint: disable=too-many-instance-attributes | |||
|
|||
# Common string column sizes. | |||
_ID_LEN = 512 | |||
_ID_LEN = 256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this change about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some sql stuff me and Eu Jing both have this, I believe kelly too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need that. I believe this is from the times @eujing was trying to work around my bug in MLOS when I was saving callables in the DB instead of actual values :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @DelphianCalamity, @yshady and I have been needing this change whenever we use a fresh database and sqlalchemy is trying to create tables for the first time. @motus I think you did partially fix the original issue we had but, this column size is still too large for MySQL-backends.
Summarizing a thread we have in teams with Kelly, the current offending table is trial_param
.
The DDL is roughly:
CREATE TABLE trial_param (
exp_id VARCHAR(512) NOT NULL,
trial_id INTEGER NOT NULL,
param_id VARCHAR(512) NOT NULL,
param_value VARCHAR(1024),
PRIMARY KEY (exp_id, trial_id, param_id),
FOREIGN KEY (exp_id, trial_id) REFERENCES trial (exp_id, trial_id)
)
The primary key becomes an issue when using standard encodings like utfmb3 (1-3 bytes per char) or utfmb4 (more bytes per char).
The upper bound for (varchar 512 + integer + varchar 512) in these encodings exceeds 3072 bytes (the limit for MySQL keys)
This results in the MySQL error Specified key was too long; max key length is 3072 bytes
when executing DDL on a new database.
But I agree, these changes should be split up into individual PRs. I think Yaseen just wanted to store all his changes in a branch in-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining @eujing
@@ -35,7 +35,7 @@ def __init__(self, *, # pylint: disable=too-many-locals,too-many-arguments | |||
seed: Optional[int] = 0, | |||
run_name: Optional[str] = None, | |||
output_directory: Optional[str] = None, | |||
max_trials: int = 100, | |||
max_trials: int = 5000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert
@@ -7,6 +7,6 @@ | |||
"experiment_id": "MyExperimentName", | |||
"config_id": 1, | |||
"trial_id": 1, | |||
"max_trials": 100 | |||
"max_trials": 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert
@@ -104,6 +104,9 @@ def bulk_register(self, | |||
df_scores = self._adjust_signs_df( | |||
pd.DataFrame([{} if score is None else score for score in scores])) | |||
|
|||
# Convert all score columns to numeric, coercing errors to NaN | |||
df_scores = df_scores.apply(pd.to_numeric, errors='coerce') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this fixed already in #789 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be but I haven't tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bpkroth yes, this is more heavy-handed version of what we already have in master
if target in scores: | ||
if pd.isna(scores[target]): | ||
_LOG.warning(f"'{target}' is NaN in the best observation. Setting it to 0.") | ||
scores[target] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be optional behavior. It might be better to mark the Trial as FAILED instead and ask the user to check their scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also #523
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree setting nan to 0 is not ideal but I desperately wanted long running experiments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is totally fine to report NaNs here - say, for scores that were not the primary optimization target or at all absent in a particular trial (e.g., because of the experiments' merge). Not to mention that 0 may not be the right value to impute, depending on your optimization direction. I am afraid that @yshady is bending MLOS to make the UI work smoothly and I think it should be the other way around :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@motus no the UI is separate from MLOS experiments. All it does is run the commands. Yes these fixes were to just make MLOS run for long periods of time and they are shortcuts and not ideal. @eujing can confirm as well, I am bending MLOS to just work :) not just for the UI but for the sake of collecting data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@motus also optimizer does not like nan values from my experience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to impute the correct value tomorrow
Left some comments. Can you please split this out and give it a more descriptive name/description? |
@@ -104,6 +104,9 @@ def bulk_register(self, | |||
df_scores = self._adjust_signs_df( | |||
pd.DataFrame([{} if score is None else score for score in scores])) | |||
|
|||
# Convert all score columns to numeric, coercing errors to NaN | |||
df_scores = df_scores.apply(pd.to_numeric, errors='coerce') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bpkroth yes, this is more heavy-handed version of what we already have in master
if target in scores: | ||
if pd.isna(scores[target]): | ||
_LOG.warning(f"'{target}' is NaN in the best observation. Setting it to 0.") | ||
scores[target] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is totally fine to report NaNs here - say, for scores that were not the primary optimization target or at all absent in a particular trial (e.g., because of the experiments' merge). Not to mention that 0 may not be the right value to impute, depending on your optimization direction. I am afraid that @yshady is bending MLOS to make the UI work smoothly and I think it should be the other way around :)
@@ -47,7 +47,7 @@ class DbSchema: | |||
# pylint: disable=too-many-instance-attributes | |||
|
|||
# Common string column sizes. | |||
_ID_LEN = 512 | |||
_ID_LEN = 256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need that. I believe this is from the times @eujing was trying to work around my bug in MLOS when I was saving callables in the DB instead of actual values :)
how i ran experiments