Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA: Generate error message for unsupported quantizations like iq4_nl #1473

Open
themanyone opened this issue May 20, 2024 · 0 comments
Open

Comments

@themanyone
Copy link

themanyone commented May 20, 2024

The Problem

llama.cpp crashes instead of reporting that it does not support iq4_nl quantization.

./llama-cli -ngl 1 -m ~/.local/share/models/Phi-3-mini-4k-instruct-IQ4_NL.gguf
available from https://huggingface.co/lmstudio-community/Phi-3-mini-4k-instruct-GGUF/tree/main

Expected results

"Example: IQ4_NL does not support -ngl. Please run without -ngl flag"

Current Behavior

...
Aborted (core dumped)

The problem

Toward the end of ggml-cuda/dmmv.cu:665, it aborts on this assertion.

        case GGML_TYPE_Q6_K:
            dequantize_mul_mat_vec_q6_K_cuda(src0_dd_i, src1_ddf_i, dst_dd_i, ne00,
row_diff, stream);
            break;
        case GGML_TYPE_F16:
            convert_mul_mat_vec_f16_cuda(src0_dd_i, src1_dfloat, dst_dd_i, ne00,
row_diff, stream);
            break;
        default:
------->       GGML_ASSERT(false);
            break;
    }

$ lscpu
cpu.txt

lspci
...

01:00.0 VGA compatible controller: NVIDIA Corporation GM204GLM [Quadro M3000M] (rev a1) (prog-if 00 [VGA controller])
        DeviceName: 0
        Subsystem: Hewlett-Packard Company Device 1630
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 44
        Region 0: Memory at d8000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at d0000000 (64-bit, prefetchable) [size=32M]
        Region 5: I/O ports at 4000 [size=128]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia

$ uname -a
Linux fedora 6.8.9-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Thu May 2 18:44:19 UTC 2024 x86_64 GNU/Linux

@themanyone themanyone reopened this Jun 16, 2024
@themanyone themanyone changed the title CUDA crash in llama_decode_internal, when using -ngl with Phi-3 CUDA: Generate error message for unsupported quantizations like iq4_nl Jun 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant