CUDA: Generate error message for unsupported quantizations like iq4_nl #1473

themanyone · 2024-05-20T21:19:09Z

The Problem

llama.cpp crashes instead of reporting that it does not support iq4_nl quantization.

./llama-cli -ngl 1 -m ~/.local/share/models/Phi-3-mini-4k-instruct-IQ4_NL.gguf
available from https://huggingface.co/lmstudio-community/Phi-3-mini-4k-instruct-GGUF/tree/main

Expected results

"Example: IQ4_NL does not support -ngl. Please run without -ngl flag"

Current Behavior

...
Aborted (core dumped)

The problem

Toward the end of ggml-cuda/dmmv.cu:665, it aborts on this assertion.

        case GGML_TYPE_Q6_K:
            dequantize_mul_mat_vec_q6_K_cuda(src0_dd_i, src1_ddf_i, dst_dd_i, ne00,
row_diff, stream);
            break;
        case GGML_TYPE_F16:
            convert_mul_mat_vec_f16_cuda(src0_dd_i, src1_dfloat, dst_dd_i, ne00,
row_diff, stream);
            break;
        default:
------->       GGML_ASSERT(false);
            break;
    }

$ lscpu
cpu.txt

lspci
...

01:00.0 VGA compatible controller: NVIDIA Corporation GM204GLM [Quadro M3000M] (rev a1) (prog-if 00 [VGA controller])
        DeviceName: 0
        Subsystem: Hewlett-Packard Company Device 1630
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 44
        Region 0: Memory at d8000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at d0000000 (64-bit, prefetchable) [size=32M]
        Region 5: I/O ports at 4000 [size=128]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia

$ uname -a
Linux fedora 6.8.9-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Thu May 2 18:44:19 UTC 2024 x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

themanyone closed this as completed Jun 16, 2024

themanyone reopened this Jun 16, 2024

themanyone changed the title ~~CUDA crash in llama_decode_internal, when using -ngl with Phi-3~~ CUDA: Generate error message for unsupported quantizations like iq4_nl Jun 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: Generate error message for unsupported quantizations like iq4_nl #1473

CUDA: Generate error message for unsupported quantizations like iq4_nl #1473

themanyone commented May 20, 2024 •

edited

Loading

CUDA: Generate error message for unsupported quantizations like iq4_nl #1473

CUDA: Generate error message for unsupported quantizations like iq4_nl #1473

Comments

themanyone commented May 20, 2024 • edited Loading

The Problem

Expected results

Current Behavior

The problem

themanyone commented May 20, 2024 •

edited

Loading