power_t instability

I'm seeing a case where with a non-default and somewhat high --power_t vw starts to learn "in reverse".

It first makes some classification mistake, but has a very high confidence that it is right (50 with logistic loss function) and once this is hit, the loss jumps so far that it causes vw to keep diverging (progressive loss grows instead of shrinks)

$ vw -k -c -b 20 --power_t 0.872996603174264 --ngram 2 --loss_function logistic --holdout_off --passes 2 spam-n-ham.vw-train -P 1.1
Generating 2-grams for all namespaces.
Num weight bits = 20
learning rate = 0.5
initial_t = 0
power_t = 0.872997
decay_learning_rate = 1
creating cache_file = spam-n-ham.vw-train.cache
Reading datafile = spam-n-ham.vw-train
num sources = 1
average    since         example     example  current  current  current
loss       last          counter      weight    label  predict features
0.693147   0.693147            1         1.0   1.0000   0.0000     4029
1.478370   2.263593            2         2.0  -1.0000   2.1538    13563
1.157826   0.516737            3         3.0   1.0000   0.3908     3485
1.083802   0.861730            4         4.0  -1.0000   0.3128      903
0.991067   0.620129            5         5.0   1.0000   0.1518      937
0.959003   0.798682            6         6.0  -1.0000   0.2010      923
0.895257   0.512780            7         7.0   1.0000   0.4006     3841
0.907203   0.990828            8         8.0  -1.0000   0.5268     5349
0.856887   0.454358            9         9.0   1.0000   0.5531     1515
0.835689   0.644904           10        10.0  -1.0000  -0.0989     6679
0.798063   0.421809           11        11.0   1.0000   0.6449     2241
0.752182   0.499831           13        13.0   1.0000   1.7736     3543
0.711641   0.448130           15        15.0   1.0000   0.4189     2077
0.690732   0.533913           17        17.0   1.0000   2.7187     4031
0.658030   0.380066           19        19.0   1.0000   0.3127     2717
0.641397   0.483382           21        21.0   1.0000   0.6680     2195
0.629552   0.546635           24        24.0  -1.0000   0.0480     6565
0.594956   0.318187           27        27.0   1.0000   2.4530     2317
0.580820   0.453600           30        30.0  -1.0000  -1.0427     4593
0.549089   0.231775           33        33.0   1.0000   1.0560     1353
0.511646   0.202739           37        37.0   1.0000   2.4959     3687
0.484056   0.228854           41        41.0   1.0000   1.7326     1267
0.460778   0.269898           46        46.0  -1.0000  -5.1305     7603
0.436878   0.216994           51        51.0   1.0000   1.1120     1485
0.410844   0.189556           57        57.0   1.0000   1.3529     2205
0.390448   0.196692           63        63.0   1.0000   3.9829     2495
0.367381   0.159770           70        70.0  -1.0000  -2.2816     6433
0.356811   0.251116           77        77.0   1.0000   1.4673     5197
2.676170   25.000000          85        85.0   1.0000 -50.0000     3751   <<<--- instability starts
4.547601   22.222222          94        94.0  -1.0000 -50.0000     6543
6.033408   20.000000         104       104.0  -1.0000 -50.0000    21639
8.064995   27.272727         115       115.0   1.0000 -50.0000     3551
9.665153   25.000000         127       127.0   1.0000 -50.0000     2345
10.910532  23.076923         140       140.0  -1.0000 -50.0000     5973
12.191393  25.000000         154       154.0  -1.0000 -50.0000     5763
13.280581  23.764015         170       170.0  -1.0000  14.5336      923
16.480336  48.477887         187       187.0   1.0000 -50.0000     7115
19.054213  44.386589         206       206.0  -1.0000  23.9025     1307
21.114752  41.327650         227       227.0   1.0000 -50.0000     3059
22.372604  34.787055         250       250.0  -1.0000 -50.0000     4653
22.702367  26.000000         275       275.0   1.0000 -50.0000     2419
22.914689  25.000000         303       303.0   1.0000 -50.0000     3911
23.116340  25.087308         334       334.0  -1.0000  12.4452      923
24.974465  43.227818         368       368.0  -1.0000  39.9895     4587
26.660979  43.434958         405       405.0   1.0000 -50.0000     1903
26.340127  23.170732         446       446.0  -1.0000 -50.0000     8045
26.064555  23.333333         491       491.0   1.0000 -50.0000     2313
26.937839  35.513480         541       541.0   1.0000 -50.0000     2071
27.260462  30.433904         596       596.0  -1.0000 -50.0000    21639
26.901274  23.333333         656       656.0  -1.0000 -50.0000     8259
27.532508  33.806591         722       722.0  -1.0000  26.8339     3633
27.460316  26.746313         795       795.0   1.0000 -50.0000     5791
27.594075  28.923305         875       875.0   1.0000 -50.0000     2155
27.589488  27.543881         963       963.0   1.0000 -50.0000     3747
27.813375  30.036083        1060      1060.0  -1.0000  18.0254     2593
27.514201  24.522464        1166      1166.0  -1.0000  17.7912     8401
27.532259  27.712218        1283      1283.0   1.0000 -50.0000     3809
27.578906  28.042852        1412      1412.0  -1.0000 -50.0000     2907
27.555299  27.320559        1554      1554.0  -1.0000 -50.0000    28841
27.445783  26.354827        1710      1710.0  -1.0000  11.9809     6433
27.409691  27.048780        1881      1881.0   1.0000 -50.0000     9907
27.282661  26.018403        2070      2070.0  -1.0000 -50.0000     7203
26.456300  18.192692        2277      2277.0   1.0000  50.0000     2403
26.197215  23.609770        2505      2505.0   1.0000 -50.0000     2813
26.111145  25.252166        2756      2756.0  -1.0000  12.5928     6805
26.001592  24.907648        3032      3032.0  -1.0000   4.2863      903
26.051664  26.551067        3336      3336.0  -1.0000 -50.0000     3883
26.028613  25.798377        3670      3670.0  -1.0000 -50.0000     9359
26.067086  26.451813        4037      4037.0   1.0000 -50.0000     7519

finished run
number of examples per pass = 2208
passes used = 2
weighted example sum = 4416
weighted label sum = 0
average loss = 26.1141
best constant = 0
total feature number = 23641592

Another notable fact is that if I change the --power_t value very slightly, the instability point is never hit and I get good convergence:

$ vw -k -c -b 20 --power_t 0.8715 --ngram 2 --loss_function logistic --holdout_off --passes 2 spam-n-ham.vw-train -P 1.1
Generating 2-grams for all namespaces.
Num weight bits = 20
learning rate = 0.5
initial_t = 0
power_t = 0.8715
decay_learning_rate = 1
creating cache_file = spam-n-ham.vw-train.cache
Reading datafile = spam-n-ham.vw-train
num sources = 1
average    since         example     example  current  current  current
loss       last          counter      weight    label  predict features
0.693147   0.693147            1         1.0   1.0000   0.0000     4029
1.465068   2.236988            2         2.0  -1.0000   2.1241    13563
1.149436   0.518174            3         3.0   1.0000   0.3872     3485
1.076945   0.859473            4         4.0  -1.0000   0.3089      903
0.985696   0.620698            5         5.0   1.0000   0.1506      937
0.954293   0.797281            6         6.0  -1.0000   0.1984      923
0.891358   0.513746            7         7.0   1.0000   0.3982     3841
0.903368   0.987437            8         8.0  -1.0000   0.5214     5349
0.853655   0.455956            9         9.0   1.0000   0.5487     1515
0.833098   0.648081           10        10.0  -1.0000  -0.0923     6679
0.795891   0.423818           11        11.0   1.0000   0.6391     2241
0.750066   0.498029           13        13.0   1.0000   1.7524     3543
0.710186   0.450968           15        15.0   1.0000   0.4185     2077
0.689487   0.534242           17        17.0   1.0000   2.6862     4031
0.657152   0.382305           19        19.0   1.0000   0.3113     2717
0.640812   0.485584           21        21.0   1.0000   0.6618     2195
0.629353   0.549137           24        24.0  -1.0000   0.0550     6565
0.595146   0.321493           27        27.0   1.0000   2.4214     2317
0.581268   0.456363           30        30.0  -1.0000  -1.0259     4593
0.549650   0.233468           33        33.0   1.0000   1.0434     1353
0.512534   0.206326           37        37.0   1.0000   2.4673     3687
0.485204   0.232410           41        41.0   1.0000   1.7151     1267
0.461981   0.271553           46        46.0  -1.0000  -5.0603     7603
0.438200   0.219412           51        51.0   1.0000   1.1034     1485
0.412284   0.191995           57        57.0   1.0000   1.3411     2205
0.392078   0.200127           63        63.0   1.0000   3.9390     2495
0.369108   0.162374           70        70.0  -1.0000  -2.2563     6433
0.358593   0.253443           77        77.0   1.0000   1.4557     5197
0.344772   0.211750           85        85.0   1.0000   3.2863     3751
0.335832   0.251396           94        94.0  -1.0000  -2.1917     6543
0.317840   0.148709          104       104.0  -1.0000 -14.6892    21639
0.339893   0.548397          115       115.0   1.0000   3.7903     3551
0.318588   0.114416          127       127.0   1.0000   4.9485     2345
0.362086   0.787023          140       140.0  -1.0000   0.5251     5973
0.350545   0.235145          154       154.0  -1.0000  -6.3234     5763
0.353494   0.381877          170       170.0  -1.0000  -2.0318      923
0.369466   0.529179          187       187.0   1.0000  -2.6782     7115
0.454023   1.286240          206       206.0  -1.0000  -1.6473     1307
0.435457   0.253341          227       227.0   1.0000   4.9855     3059
0.406619   0.122000          250       250.0  -1.0000  -4.3437     4653
0.374862   0.057286          275       275.0   1.0000   2.0914     2419
0.358930   0.202454          303       303.0   1.0000   1.1091     3911
0.368161   0.458396          334       334.0  -1.0000  -3.3644      923
0.351187   0.184435          368       368.0  -1.0000  -4.2832     4587
0.362503   0.475058          405       405.0   1.0000  -5.9344     1903
0.348703   0.212385          446       446.0  -1.0000  -7.3214     8045
0.390176   0.801221          491       491.0   1.0000   2.5436     2313
0.360049   0.064201          541       541.0   1.0000   3.4172     2071
0.332399   0.060420          596       596.0  -1.0000 -26.2559    21639
0.306510   0.049343          656       656.0  -1.0000 -10.7042     8259
0.296942   0.201843          722       722.0  -1.0000  -4.1233     3633
0.277599   0.086287          795       795.0   1.0000  10.7313     5791
0.288209   0.393656          875       875.0   1.0000   6.4724     2155
0.263451   0.017268          963       963.0   1.0000   7.5301     3747
0.248831   0.103694         1060      1060.0  -1.0000  -6.6147     2593
0.240073   0.152493         1166      1166.0  -1.0000  -6.3333     8401
0.231670   0.147924         1283      1283.0   1.0000  11.1468     3809
0.213634   0.034255         1412      1412.0  -1.0000  -9.0230     2907
0.195168   0.011546         1554      1554.0  -1.0000 -45.4656    28841
0.180541   0.034836         1710      1710.0  -1.0000 -10.3924     6433
0.170098   0.065668         1881      1881.0   1.0000  14.5629     9907
0.155685   0.012245         2070      2070.0  -1.0000 -15.8180     7203
0.153130   0.127575         2277      2277.0   1.0000  11.8049     2403
0.159413   0.222162         2505      2505.0   1.0000   9.1660     2813
0.203143   0.639568         2756      2756.0  -1.0000 -10.2517     6805
0.224682   0.439761         3032      3032.0  -1.0000  -6.3854      903
0.239497   0.387253         3336      3336.0  -1.0000 -14.2190     3883
0.265876   0.529349         3670      3670.0  -1.0000 -20.4291     9359
0.308112   0.730472         4037      4037.0   1.0000  10.8850     7519

finished run
number of examples per pass = 2208
passes used = 2
weighted example sum = 4416
weighted label sum = 0
average loss = 0.348728
best constant = 0
total feature number = 23641588

Is this a bug that can be fixed or an inevitable numeric instability case?

convergence vs divergence with slightly different --power_t

arielf's github wiki

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

power_t instability

Clone this wiki locally