You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: content/model-improvement/assessment-model-improvement.qmd
+3-1
Original file line number
Diff line number
Diff line change
@@ -104,7 +104,9 @@ Follow the [Evaluate your model with resampling section](https://www.tidymodels.
104
104
105
105
Compute the RMSE for both models again. Of course nothing changes for the null model. Compare the new RMSE estimates obtained through CV with those obtained earlier. What did and didn't change?
106
106
107
-
Run the code again that creates the CV folds and does the fitting. This time, choose a different value for the random seed. The RMSE values for the CV fits will change. That's just due to the randomness in the data splitting. If we had more data, we would expect to get less variability. The overall pattern between changes in the RMSE values for the fits to the training data without CV, and what we see with CV, should still be the same.
107
+
Also look at the standard error for the RMSE. Since you are now sampling, you not only get a single estimate for RMSE, but one for each sample, so you can look at the variation in RMSE. This gives you a good indication of how robust your model performance is.
108
+
109
+
Finally, run the code again that creates the CV folds and does the fitting. This time, choose a different value for the random seed. The RMSE values for the CV fits will change. That's just due to the randomness in the data splitting. If we had more data, we would expect to get less variability. The overall pattern between changes in the RMSE values for the fits to the training data without CV, and what we see with CV, should still be the same.
108
110
109
111
::: note
110
112
If you want to get more robust RMSE estimates with CV, you can try to set `repeats` to some value. That creates more samples by repeating the whole CV procedure several times. In theory this might give more robust results. You might encounter some warning messages. This is likely related that occasionally, by chance, data is split in a way that some information (e.g., a certain value for `SEX` in our data) is missing from one one of the folds. That can cause issues.
0 commit comments