Skip to content

Commit 0f25369

Browse files
committed
chore(docs): auto generated reports
1 parent 126a4a0 commit 0f25369

File tree

5 files changed

+52
-1
lines changed

5 files changed

+52
-1
lines changed
+3
Loading
Loading
Loading

docs/index.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ What will technology look like in 2050?
2020

2121
| Id | Language Model | Embedding Model | # samples |
2222
|:---:|:---|:---|---:|
23+
| [20250305T170658](reports/20250305T170658.md) | o3-mini | jinaai/jina-embeddings-v2-base-en | 32 |
2324
| [20250304T080143](reports/20250304T080143.md) | unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit | jinaai/jina-embeddings-v2-base-en | 1408 |
2425
| [20250303T144808](reports/20250303T144808.md) | unsloth/Mistral-Small-24B-Instruct-2501-bnb-4bit | jina-embeddings-v3 | 1408 |
2526
| [20250303T193518](reports/20250303T193518.md) | unsloth/Mistral-Small-24B-Instruct-2501-bnb-4bit | jinaai/jina-embeddings-v2-base-en | 352 |
@@ -61,5 +62,5 @@ LLM Thermometer estimates temperature values of Large Language Models through se
6162
---
6263

6364
<div align="center">
64-
<sub>Generated by <a href="https://github.com/S1M0N38/llm-thermometer">LLM Thermometer</a> v0.5.2</sub>
65+
<sub>Generated by <a href="https://github.com/S1M0N38/llm-thermometer">LLM Thermometer</a> v0.6.0</sub>
6566
</div>

docs/reports/20250305T170658.md

+41
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
## Report
2+
3+
These are only 16 samples for temperatures 0 and 1 (32 total samples) generated via the GitHub Copilot endpoint used in VS Code Copilot Chat extension. I suspect that the temperature value provided in the request body is being ignored during actual generation.
4+
5+
Despite the limited sample size and consideration of only two temperature values, the following plots present anomalies when compared to plots from other experiments (where temperature parameters meaningfully influence token sampling):
6+
7+
1. There is nearly no difference in the *Cumulative Probability* plot curves. Contrary to expectations, the blue curve (temperature = 0) rises faster than the red curve (temperature = 1). This behavior is the exact opposite of plots from other experiments.
8+
9+
2. The *Similarity (mean)* plot shows almost no variation. When the temperature parameter genuinely influences generation, we would expect the red dot (temperature = 1) to be positioned further left (indicating lower mean similarity) and the blue dot (temperature = 0) positioned further right (indicating higher mean similarity).
10+
11+
**Conclusion**: The temperature parameter appears to be ignored.
12+
13+
---
14+
15+
```
16+
What will technology look like in 2050?
17+
```
18+
- **Id:** `20250305T170658`
19+
- **Language Model:** `o3-mini`
20+
- **Embedding Model:** `jinaai/jina-embeddings-v2-base-en`
21+
22+
---
23+
24+
![violinplot](../assets/20250305T170658/violinplot.png)
25+
![ecdfplot](../assets/20250305T170658/ecdfplot.png)
26+
![scatterplot](../assets/20250305T170658/scatterplot.png)
27+
28+
---
29+
30+
**Statistical Summary**
31+
32+
| temperature | Mean | Median | Std Dev | Min | 25% | 75% | Max | Count |
33+
|--------------:|-------:|---------:|----------:|-------:|-------:|-------:|-------:|--------:|
34+
| 1e-05 | 0.9789 | 0.9791 | 0.004 | 0.9662 | 0.9761 | 0.9819 | 0.9872 | 120 |
35+
| 1 | 0.9797 | 0.9791 | 0.0047 | 0.9695 | 0.9766 | 0.9836 | 0.9904 | 120 |
36+
37+
---
38+
39+
<div align="center">
40+
<sub>Generated by <a href="https://github.com/S1M0N38/llm-thermometer">LLM Thermometer</a> v0.6.0</sub>
41+
</div>

0 commit comments

Comments
 (0)