forked from VowpalWabbit/vowpal_wabbit
-
Notifications
You must be signed in to change notification settings - Fork 0
power_t instability
Ariel Faigon edited this page Apr 26, 2014
·
2 revisions
I'm seeing a case where with a non-default and somewhat high --power_t
vw starts to learn "in reverse".
It first makes some classification mistake, but has a very high confidence that it is right (50 with logistic loss function) and once this is hit, the loss jumps so far that it causes vw to keep diverging (progressive loss grows instead of shrinks)
$ vw -k -c -b 20 --power_t 0.872996603174264 --ngram 2 --loss_function logistic --holdout_off --passes 2 spam-n-ham.vw-train -P 1.1
Generating 2-grams for all namespaces.
Num weight bits = 20
learning rate = 0.5
initial_t = 0
power_t = 0.872997
decay_learning_rate = 1
creating cache_file = spam-n-ham.vw-train.cache
Reading datafile = spam-n-ham.vw-train
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.693147 0.693147 1 1.0 1.0000 0.0000 4029
1.478370 2.263593 2 2.0 -1.0000 2.1538 13563
1.157826 0.516737 3 3.0 1.0000 0.3908 3485
1.083802 0.861730 4 4.0 -1.0000 0.3128 903
0.991067 0.620129 5 5.0 1.0000 0.1518 937
0.959003 0.798682 6 6.0 -1.0000 0.2010 923
0.895257 0.512780 7 7.0 1.0000 0.4006 3841
0.907203 0.990828 8 8.0 -1.0000 0.5268 5349
0.856887 0.454358 9 9.0 1.0000 0.5531 1515
0.835689 0.644904 10 10.0 -1.0000 -0.0989 6679
0.798063 0.421809 11 11.0 1.0000 0.6449 2241
0.752182 0.499831 13 13.0 1.0000 1.7736 3543
0.711641 0.448130 15 15.0 1.0000 0.4189 2077
0.690732 0.533913 17 17.0 1.0000 2.7187 4031
0.658030 0.380066 19 19.0 1.0000 0.3127 2717
0.641397 0.483382 21 21.0 1.0000 0.6680 2195
0.629552 0.546635 24 24.0 -1.0000 0.0480 6565
0.594956 0.318187 27 27.0 1.0000 2.4530 2317
0.580820 0.453600 30 30.0 -1.0000 -1.0427 4593
0.549089 0.231775 33 33.0 1.0000 1.0560 1353
0.511646 0.202739 37 37.0 1.0000 2.4959 3687
0.484056 0.228854 41 41.0 1.0000 1.7326 1267
0.460778 0.269898 46 46.0 -1.0000 -5.1305 7603
0.436878 0.216994 51 51.0 1.0000 1.1120 1485
0.410844 0.189556 57 57.0 1.0000 1.3529 2205
0.390448 0.196692 63 63.0 1.0000 3.9829 2495
0.367381 0.159770 70 70.0 -1.0000 -2.2816 6433
0.356811 0.251116 77 77.0 1.0000 1.4673 5197
2.676170 25.000000 85 85.0 1.0000 -50.0000 3751 <<<--- instability starts
4.547601 22.222222 94 94.0 -1.0000 -50.0000 6543
6.033408 20.000000 104 104.0 -1.0000 -50.0000 21639
8.064995 27.272727 115 115.0 1.0000 -50.0000 3551
9.665153 25.000000 127 127.0 1.0000 -50.0000 2345
10.910532 23.076923 140 140.0 -1.0000 -50.0000 5973
12.191393 25.000000 154 154.0 -1.0000 -50.0000 5763
13.280581 23.764015 170 170.0 -1.0000 14.5336 923
16.480336 48.477887 187 187.0 1.0000 -50.0000 7115
19.054213 44.386589 206 206.0 -1.0000 23.9025 1307
21.114752 41.327650 227 227.0 1.0000 -50.0000 3059
22.372604 34.787055 250 250.0 -1.0000 -50.0000 4653
22.702367 26.000000 275 275.0 1.0000 -50.0000 2419
22.914689 25.000000 303 303.0 1.0000 -50.0000 3911
23.116340 25.087308 334 334.0 -1.0000 12.4452 923
24.974465 43.227818 368 368.0 -1.0000 39.9895 4587
26.660979 43.434958 405 405.0 1.0000 -50.0000 1903
26.340127 23.170732 446 446.0 -1.0000 -50.0000 8045
26.064555 23.333333 491 491.0 1.0000 -50.0000 2313
26.937839 35.513480 541 541.0 1.0000 -50.0000 2071
27.260462 30.433904 596 596.0 -1.0000 -50.0000 21639
26.901274 23.333333 656 656.0 -1.0000 -50.0000 8259
27.532508 33.806591 722 722.0 -1.0000 26.8339 3633
27.460316 26.746313 795 795.0 1.0000 -50.0000 5791
27.594075 28.923305 875 875.0 1.0000 -50.0000 2155
27.589488 27.543881 963 963.0 1.0000 -50.0000 3747
27.813375 30.036083 1060 1060.0 -1.0000 18.0254 2593
27.514201 24.522464 1166 1166.0 -1.0000 17.7912 8401
27.532259 27.712218 1283 1283.0 1.0000 -50.0000 3809
27.578906 28.042852 1412 1412.0 -1.0000 -50.0000 2907
27.555299 27.320559 1554 1554.0 -1.0000 -50.0000 28841
27.445783 26.354827 1710 1710.0 -1.0000 11.9809 6433
27.409691 27.048780 1881 1881.0 1.0000 -50.0000 9907
27.282661 26.018403 2070 2070.0 -1.0000 -50.0000 7203
26.456300 18.192692 2277 2277.0 1.0000 50.0000 2403
26.197215 23.609770 2505 2505.0 1.0000 -50.0000 2813
26.111145 25.252166 2756 2756.0 -1.0000 12.5928 6805
26.001592 24.907648 3032 3032.0 -1.0000 4.2863 903
26.051664 26.551067 3336 3336.0 -1.0000 -50.0000 3883
26.028613 25.798377 3670 3670.0 -1.0000 -50.0000 9359
26.067086 26.451813 4037 4037.0 1.0000 -50.0000 7519
finished run
number of examples per pass = 2208
passes used = 2
weighted example sum = 4416
weighted label sum = 0
average loss = 26.1141
best constant = 0
total feature number = 23641592
Another notable fact is that if I change the --power_t
value very slightly, the instability point is never hit and I get good convergence:
$ vw -k -c -b 20 --power_t 0.8715 --ngram 2 --loss_function logistic --holdout_off --passes 2 spam-n-ham.vw-train -P 1.1
Generating 2-grams for all namespaces.
Num weight bits = 20
learning rate = 0.5
initial_t = 0
power_t = 0.8715
decay_learning_rate = 1
creating cache_file = spam-n-ham.vw-train.cache
Reading datafile = spam-n-ham.vw-train
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.693147 0.693147 1 1.0 1.0000 0.0000 4029
1.465068 2.236988 2 2.0 -1.0000 2.1241 13563
1.149436 0.518174 3 3.0 1.0000 0.3872 3485
1.076945 0.859473 4 4.0 -1.0000 0.3089 903
0.985696 0.620698 5 5.0 1.0000 0.1506 937
0.954293 0.797281 6 6.0 -1.0000 0.1984 923
0.891358 0.513746 7 7.0 1.0000 0.3982 3841
0.903368 0.987437 8 8.0 -1.0000 0.5214 5349
0.853655 0.455956 9 9.0 1.0000 0.5487 1515
0.833098 0.648081 10 10.0 -1.0000 -0.0923 6679
0.795891 0.423818 11 11.0 1.0000 0.6391 2241
0.750066 0.498029 13 13.0 1.0000 1.7524 3543
0.710186 0.450968 15 15.0 1.0000 0.4185 2077
0.689487 0.534242 17 17.0 1.0000 2.6862 4031
0.657152 0.382305 19 19.0 1.0000 0.3113 2717
0.640812 0.485584 21 21.0 1.0000 0.6618 2195
0.629353 0.549137 24 24.0 -1.0000 0.0550 6565
0.595146 0.321493 27 27.0 1.0000 2.4214 2317
0.581268 0.456363 30 30.0 -1.0000 -1.0259 4593
0.549650 0.233468 33 33.0 1.0000 1.0434 1353
0.512534 0.206326 37 37.0 1.0000 2.4673 3687
0.485204 0.232410 41 41.0 1.0000 1.7151 1267
0.461981 0.271553 46 46.0 -1.0000 -5.0603 7603
0.438200 0.219412 51 51.0 1.0000 1.1034 1485
0.412284 0.191995 57 57.0 1.0000 1.3411 2205
0.392078 0.200127 63 63.0 1.0000 3.9390 2495
0.369108 0.162374 70 70.0 -1.0000 -2.2563 6433
0.358593 0.253443 77 77.0 1.0000 1.4557 5197
0.344772 0.211750 85 85.0 1.0000 3.2863 3751
0.335832 0.251396 94 94.0 -1.0000 -2.1917 6543
0.317840 0.148709 104 104.0 -1.0000 -14.6892 21639
0.339893 0.548397 115 115.0 1.0000 3.7903 3551
0.318588 0.114416 127 127.0 1.0000 4.9485 2345
0.362086 0.787023 140 140.0 -1.0000 0.5251 5973
0.350545 0.235145 154 154.0 -1.0000 -6.3234 5763
0.353494 0.381877 170 170.0 -1.0000 -2.0318 923
0.369466 0.529179 187 187.0 1.0000 -2.6782 7115
0.454023 1.286240 206 206.0 -1.0000 -1.6473 1307
0.435457 0.253341 227 227.0 1.0000 4.9855 3059
0.406619 0.122000 250 250.0 -1.0000 -4.3437 4653
0.374862 0.057286 275 275.0 1.0000 2.0914 2419
0.358930 0.202454 303 303.0 1.0000 1.1091 3911
0.368161 0.458396 334 334.0 -1.0000 -3.3644 923
0.351187 0.184435 368 368.0 -1.0000 -4.2832 4587
0.362503 0.475058 405 405.0 1.0000 -5.9344 1903
0.348703 0.212385 446 446.0 -1.0000 -7.3214 8045
0.390176 0.801221 491 491.0 1.0000 2.5436 2313
0.360049 0.064201 541 541.0 1.0000 3.4172 2071
0.332399 0.060420 596 596.0 -1.0000 -26.2559 21639
0.306510 0.049343 656 656.0 -1.0000 -10.7042 8259
0.296942 0.201843 722 722.0 -1.0000 -4.1233 3633
0.277599 0.086287 795 795.0 1.0000 10.7313 5791
0.288209 0.393656 875 875.0 1.0000 6.4724 2155
0.263451 0.017268 963 963.0 1.0000 7.5301 3747
0.248831 0.103694 1060 1060.0 -1.0000 -6.6147 2593
0.240073 0.152493 1166 1166.0 -1.0000 -6.3333 8401
0.231670 0.147924 1283 1283.0 1.0000 11.1468 3809
0.213634 0.034255 1412 1412.0 -1.0000 -9.0230 2907
0.195168 0.011546 1554 1554.0 -1.0000 -45.4656 28841
0.180541 0.034836 1710 1710.0 -1.0000 -10.3924 6433
0.170098 0.065668 1881 1881.0 1.0000 14.5629 9907
0.155685 0.012245 2070 2070.0 -1.0000 -15.8180 7203
0.153130 0.127575 2277 2277.0 1.0000 11.8049 2403
0.159413 0.222162 2505 2505.0 1.0000 9.1660 2813
0.203143 0.639568 2756 2756.0 -1.0000 -10.2517 6805
0.224682 0.439761 3032 3032.0 -1.0000 -6.3854 903
0.239497 0.387253 3336 3336.0 -1.0000 -14.2190 3883
0.265876 0.529349 3670 3670.0 -1.0000 -20.4291 9359
0.308112 0.730472 4037 4037.0 1.0000 10.8850 7519
finished run
number of examples per pass = 2208
passes used = 2
weighted example sum = 4416
weighted label sum = 0
average loss = 0.348728
best constant = 0
total feature number = 23641588
Is this a bug that can be fixed or an inevitable numeric instability case?