quantize to F32/F16/Q8_0 can result in a Q6_K output tensor #5818

cebtenzzre · 2024-03-01T16:08:25Z

Running quantize with a target dtype of F32, F16, or Q8_0 can result in a Q6_K output tensor without --pure (ref #5631 (comment)). This is surprising, as I would expect converting to F32 and then quantizing to F16 to produce similar results to converting directly to F16.

I suggest that the k-quant mixture logic should never attempt to decrease the quality of the output tensor, only increase it.

cebtenzzre · 2024-03-11T17:48:42Z

Fixed by ee35600

cebtenzzre added the bug Something isn't working label Mar 1, 2024

ggerganov added the good first issue Good for newcomers label Mar 2, 2024

cebtenzzre closed this as completed Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantize to F32/F16/Q8_0 can result in a Q6_K output tensor #5818

quantize to F32/F16/Q8_0 can result in a Q6_K output tensor #5818

cebtenzzre commented Mar 1, 2024

cebtenzzre commented Mar 11, 2024

quantize to F32/F16/Q8_0 can result in a Q6_K output tensor #5818

quantize to F32/F16/Q8_0 can result in a Q6_K output tensor #5818

Comments

cebtenzzre commented Mar 1, 2024

cebtenzzre commented Mar 11, 2024