Trb analysis #78

zackAemmer · 2022-12-23T01:35:31Z

Includes the code used to test and generate plots of replacement mode model accuracy and F1 for choosing replacement mode features:
e-mission/e-mission-server#890

- split out uuids into all, stage and non stage - fixed the uuid check to split the confirmed trips also into stage and non stage - found missing IDs and confirmed that they had no data - created confirmed and expanded confirmed trips dataframes separately for stage and non stage as well

Don't think we can use inferred labels given that they are also the output of an algorithm with its own error. + also ignore "Prefer Not To Say". ``` data = data[~data['available_modes'].isin(['None', 'Prefer not to say'])] ``` Without the change, I get the error: ``` --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-28-ba060be6a019> in <module> 129 130 # Add availability variables to data --> 131 data = add_mode_availability(data, availability_codes, 'available_modes', 'Mode_confirm', 'Replaced_mode') <ipython-input-28-ba060be6a019> in add_mode_availability(data, availability_codes, availability_col, choice_col, replaced_col) 115 i+=1 116 continue --> 117 options = [availability_codes[x] for x in available.split(';')] 118 # Chosen mode must be in the available modes list, if mode was chosen it is assumed available 119 # SWAP THIS LINE TO INCLUDE REPLACED MODE IN THE CHOICE SET (FOR VISUALS AT END) <ipython-input-28-ba060be6a019> in <listcomp>(.0) 115 i+=1 116 continue --> 117 options = [availability_codes[x] for x in available.split(';')] 118 # Chosen mode must be in the available modes list, if mode was chosen it is assumed available 119 # SWAP THIS LINE TO INCLUDE REPLACED MODE IN THE CHOICE SET (FOR VISUALS AT END) KeyError: 'Prefer not to say' ```

Fix data read by splitting into all, stage and non-stage

…red_label fill

Flip no_inferred_label and inferred_label + start investigating infer…

+ put the switch to flip at the top and flip based on it in the model creation + create a function to pull out the sensed primary mode, but don't use it yet

@zackAemmer

…input The three options are: - ONLY_LABELED: for only the labeled subset. for the sensitivity analysis, this would be a user who labeled all their trips - ONLY_SENSED: using only the sensed modes (walk,bike,car,bus,train). for the sensitivity analysis, this would to simulate a user with no labels - BEST_AVAILABLE: user label, falling back to best label assist, falling back to best sensed label.for the sensitivity analysis, this would to simulate a user with partial labels + change the mapping of the primary sensed mode to correspond with label assist so that the downstream code (e.g. mode_map from mode_confirm to Mode_confirm) works properly Note that if many parts of the code will NOT work for sensed labels since some kinds of labels are missing (e.g. ridehail) + minor code fixes around refactoring the display of numbers + @zackAemmer, you might want to add more of these as you work through the analyses as yet another sanity check Testing done: - Ran with all three options - `ONLY_LABELED` and `BEST_AVAILABLE` run through - `ONLY_SENSED` takes a long time, so I ran it on the first 100 trips ``` elif input_dataset == "ONLY_SENSED": expanded_ct = expanded_ct.head(100) expanded_ct.mode_confirm = expanded_ct.apply(lambda row: get_primary_sensed_mode(row), axis=1) ``` and it fails with ``` KeyError: "['tt_ridehail', 'tt_transit', 'tt_walk', 'tt_s_micro'] not in index" ``` while training the random forest model ``` X = df_train[feature_list].values ``` Note also that I had a preliminary fix for this; sharing it here in case it helps ``` if input_dataset == "ONLY_SENSED": # Remove features that don't exist remove_list = [] for fn in feature_list: if fn not in df_train.columns: print("NO DATA FOR FEATURE %s" % fn) remove_list.append(fn) for rn in remove_list: feature_list.remove(rn) ```

Make the runs more configurable

…leanup

… into trb-analysis

…in notebooks with cell results

zackAemmer and others added 30 commits July 26, 2022 07:37

Add analysis notebook for review

7da6ca8

Lots of changes accumulated

816732d

First draft

b9f7266

Remove outputs to make code diffs more visible and easier to review

405123b

Merge pull request #1 from shankari/zack_trb_analysis

79d03b2

Fix data read by splitting into all, stage and non-stage

Flip no_inferred_label and inferred_label + start investigating infer…

3b09c0d

…red_label fill

Merge pull request #2 from shankari/zack_trb_analysis

8b9ef1b

Flip no_inferred_label and inferred_label + start investigating infer…

Put the parameters for the run upfront

a052ed8

+ put the switch to flip at the top and flip based on it in the model creation + create a function to pull out the sensed primary mode, but don't use it yet

Merge pull request #3 from shankari/zack_trb_analysis

1ff35ca

Make the runs more configurable

Add f1 score, split car modes, k-fold validation, test on replaced, c…

d4e3e02

…leanup

Merge notebooks

19a5a96

Clear cells

378db23

More updates

24a0fbe

clear cells

2136d8b

Update analysis

acef135

Clear cells

136b4db

New emissions calculations

10f64cc

More viz updates

4b57503

Accumulated changes and MXL model

3ba7c93

Cleanup model results, mxl replaced, ...

8c4ff7b

Clear and save

7ff8fb9

Simplify modeling, pct samples, other changes

1eca8dd

Move models to own file, clean up everything

cdbfb10

Place all models in separate file

f5a5bb5

Restructure everything to use SP/RP data and availability indicators

a978fcc

Fixing av, getting MXL to converge

5c07656

RF model accuracy improvements, GBDT, diff features

e90ac70

zackAemmer added 5 commits September 9, 2022 10:14

Merge branch 'main' of https://github.com/e-mission/em-public-dashboard…

6794b3f

… into trb-analysis

Backing up changes

72f9c28

Small updates for paper revisions

f6cd071

Merge upstream updates

8d5e94c

Remove labels from training features, add sensed mode confirm, check …

a2f7be4

…in notebooks with cell results

shankari mentioned this pull request Apr 20, 2023

Converting pie charts into stacked bar charts #83

Closed

shankari mentioned this pull request Jul 11, 2023

Redesign the public dashboard to be more compact and to support error bars #86

Closed

zackAemmer closed this by deleting the head repository Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trb analysis #78

Trb analysis #78

zackAemmer commented Dec 23, 2022

Trb analysis #78

Trb analysis #78

Conversation

zackAemmer commented Dec 23, 2022