-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trb analysis #78
Closed
Closed
Trb analysis #78
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- split out uuids into all, stage and non stage - fixed the uuid check to split the confirmed trips also into stage and non stage - found missing IDs and confirmed that they had no data - created confirmed and expanded confirmed trips dataframes separately for stage and non stage as well
Don't think we can use inferred labels given that they are also the output of an algorithm with its own error. + also ignore "Prefer Not To Say". ``` data = data[~data['available_modes'].isin(['None', 'Prefer not to say'])] ``` Without the change, I get the error: ``` --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-28-ba060be6a019> in <module> 129 130 # Add availability variables to data --> 131 data = add_mode_availability(data, availability_codes, 'available_modes', 'Mode_confirm', 'Replaced_mode') <ipython-input-28-ba060be6a019> in add_mode_availability(data, availability_codes, availability_col, choice_col, replaced_col) 115 i+=1 116 continue --> 117 options = [availability_codes[x] for x in available.split(';')] 118 # Chosen mode must be in the available modes list, if mode was chosen it is assumed available 119 # SWAP THIS LINE TO INCLUDE REPLACED MODE IN THE CHOICE SET (FOR VISUALS AT END) <ipython-input-28-ba060be6a019> in <listcomp>(.0) 115 i+=1 116 continue --> 117 options = [availability_codes[x] for x in available.split(';')] 118 # Chosen mode must be in the available modes list, if mode was chosen it is assumed available 119 # SWAP THIS LINE TO INCLUDE REPLACED MODE IN THE CHOICE SET (FOR VISUALS AT END) KeyError: 'Prefer not to say' ```
Fix data read by splitting into all, stage and non-stage
Flip no_inferred_label and inferred_label + start investigating infer…
+ put the switch to flip at the top and flip based on it in the model creation + create a function to pull out the sensed primary mode, but don't use it yet
…input The three options are: - ONLY_LABELED: for only the labeled subset. for the sensitivity analysis, this would be a user who labeled all their trips - ONLY_SENSED: using only the sensed modes (walk,bike,car,bus,train). for the sensitivity analysis, this would to simulate a user with no labels - BEST_AVAILABLE: user label, falling back to best label assist, falling back to best sensed label.for the sensitivity analysis, this would to simulate a user with partial labels + change the mapping of the primary sensed mode to correspond with label assist so that the downstream code (e.g. mode_map from mode_confirm to Mode_confirm) works properly Note that if many parts of the code will NOT work for sensed labels since some kinds of labels are missing (e.g. ridehail) + minor code fixes around refactoring the display of numbers + @zackAemmer, you might want to add more of these as you work through the analyses as yet another sanity check Testing done: - Ran with all three options - `ONLY_LABELED` and `BEST_AVAILABLE` run through - `ONLY_SENSED` takes a long time, so I ran it on the first 100 trips ``` elif input_dataset == "ONLY_SENSED": expanded_ct = expanded_ct.head(100) expanded_ct.mode_confirm = expanded_ct.apply(lambda row: get_primary_sensed_mode(row), axis=1) ``` and it fails with ``` KeyError: "['tt_ridehail', 'tt_transit', 'tt_walk', 'tt_s_micro'] not in index" ``` while training the random forest model ``` X = df_train[feature_list].values ``` Note also that I had a preliminary fix for this; sharing it here in case it helps ``` if input_dataset == "ONLY_SENSED": # Remove features that don't exist remove_list = [] for fn in feature_list: if fn not in df_train.columns: print("NO DATA FOR FEATURE %s" % fn) remove_list.append(fn) for rn in remove_list: feature_list.remove(rn) ```
Make the runs more configurable
… into trb-analysis
…in notebooks with cell results
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Includes the code used to test and generate plots of replacement mode model accuracy and F1 for choosing replacement mode features:
e-mission/e-mission-server#890