LlamaCppGenerator randomness not working as expected #1269

erlebach · 2024-12-27T17:44:44Z

Consider the code below, which run a Llama-3.1 model with non-zero temperature. When I execute the code below multiple times, I always get the same resposne, even though Llama.cpp uses a non-deterministic seed by default. Is this expected behavior? What approach should I use to get a different result on every run? Setting seed=-1 in the generation_kwargs dictionary solves the problem. It is not clear why this is necessary though because seed=-1 by default in Llama.cpp (:#define LLAMA_DEFAULT_SEED 0xFFFFFFFF in spm-headers/llama.h in the https://github.com/ggerganov/llama.cpp.git repository. This suggest that there is an error somewhere.

      generation_kwargs={
          "max_tokens": 128,
          "temperature": 0.7,
          "top_k": 40,
          "top_p": 0.9,
          "seed": -1,
      },

from haystack_integrations.components.generators.llama_cpp import LlamaCppGenerator

# Set the seed to a random value based on the current time
random.seed(int(time.time()))

generator = LlamaCppGenerator(
    model="data/llm_models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf",
    n_ctx=512,
    n_batch=128,
    model_kwargs={
        "n_gpu_layers": -1,
        "verbose": False,
        "n_gpu_layers": -1,
    },
    generation_kwargs={
        "max_tokens": 128,
        "temperature": 1.7,
        "top_k": 40,
        "top_p": 0.9,
    },
)
generator.warm_up()

simplified_schema = '{"content": "Your single sentence answer here"}'
system =  "You are a helpful assistant. Respond to questions with a single sentence " \
          f"using clean JSON only following the JSON schema{simplified_schema}. " \
          " Never use markdown formatting or code block indicators."
user_query = "What is artificial intelligence?"

prompt = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>" \
         f"{system}<|eot_id|>" \
         f"<|start_header_id|>user<|end_header_id|> {user_query}" \
         f"<|start_header_id|>assistant<|end_header_id|>"
print(f"{prompt=}")

result = generator.run(prompt)
print("result= ", result["replies"][0])

The text was updated successfully, but these errors were encountered:

julian-risch · 2025-01-02T08:59:19Z

@erlebach Thanks for reaching out about this issue. Have you seen this related issue? abetlen/llama-cpp-python#1809

d-kleine · 2025-01-21T10:09:26Z

I am just another user, but also the params you have set might "compete" with each other when executed simultaneously (there is no execution order in the generation settings). You might alter either temperature or top_p, but not both, see here. I would also suggest to go only with top_p over top_k, but not using both simultaneously, see here why.

julian-risch transferred this issue from deepset-ai/haystack Jan 2, 2025

julian-risch added integration:llama_cpp P3 labels Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LlamaCppGenerator randomness not working as expected #1269

LlamaCppGenerator randomness not working as expected #1269

erlebach commented Dec 27, 2024 •

edited

Loading

julian-risch commented Jan 2, 2025

d-kleine commented Jan 21, 2025 •

edited

Loading

LlamaCppGenerator randomness not working as expected #1269

LlamaCppGenerator randomness not working as expected #1269

Comments

erlebach commented Dec 27, 2024 • edited Loading

julian-risch commented Jan 2, 2025

d-kleine commented Jan 21, 2025 • edited Loading

erlebach commented Dec 27, 2024 •

edited

Loading

d-kleine commented Jan 21, 2025 •

edited

Loading