-
chore: making test less flaky (
3effa18
) -
chore: fix updated torch types (
4c46da6
) -
chore: fixing linting errors and adding precommit hook (
85f6241
)
- feat: allow setting the artifacts path (
2a4b4dc
)
-
fix: gracefully handle slashes in model filename for autointerp (
5d6464a
) -
fix: fix typing and updating mdl for saelens >=5.4.0 (
802d1c3
) -
fix: load probe class with weights_only = False (
f05bf40
) -
fix: Update README to include eval output schema update instructions (
f0adee2
) -
fix: Update json schema jsons (
2b2a6d3
)
- Merge pull request #60 from chanind/deflaking-test
chore: making test less flaky (963f2e8
)
-
Remove threshold from state dict if we aren't using it (
d91a218
) -
Merge pull request #59 from chanind/artifacts-path-option
feat: allow setting the artifacts path (53901a2
)
- Merge pull request #58 from chanind/fixing-types
chore: fix updated torch types (849018f
)
- Merge pull request #57 from chanind/fix-slash-in-model-name-autointerp
fix: gracefully handle slashes in model filename for autointerp (11b2e38
)
-
adding artifacts_path to unlearning eval (
ce1de32
) -
By default we don't use a threshold for custom topk SAEs (
60579ed
) -
Merge pull request #56 from chanind/type-fixes
fix: fix typing and updating mdl for saelens >=5.4.0 (0888d07
)
- Merge pull request #55 from chanind/precommit-check
chore: fixing linting errors and adding precommit hook (7ac7ced
)
-
Fix SAE Bench SAEs repo names (
18dc457
) -
Prevent potential division by zero (
92315dd
) -
Add optional pinned dependencies (
e74f0cf
) -
Calculate featurewise statistics in demo (
5204b48
) -
Improve documentation on custom SAE usage (
f15fe53
) -
Merge pull request #53 from adamkarvonen/hide_absorption_stddev
hide stddev from default display for absorption (155afbc
)
-
hide stddev from default display for absorption (
d970f05
) -
Merge pull request #52 from adamkarvonen/update_scr_tpp
update scr_tpp_schema to show top 20 by default (f551e7b
)
-
update scr_tpp_schema to show top 20 by default (
59320e2
) -
Merge pull request #51 from adamkarvonen/update_schema_jsons
fix: Update eval output schema jsons (7b2021c
)
-
Add computational requirements (
9b621a9
) -
Improve graphing notebook, include matryoshka results in graphs (
f2d1d98
) -
Merge pull request #50 from chanind/lint-and-type-check
chore: Adding formatting, linting and type checking (a0fb5e9
)
-
adding README and Makefile with helpers (
7452eca
) -
fixing linting and type-checking issues (
e663e3a
) -
formatting with ruff (
14dad45
) -
Check that unlearning data exists before running unlearning eval (
294b25c
) -
Improve export notebook (
e2b0b3c
) -
Improve graphing utils (
661920d
) -
Fix spelling (
8c0df93
) -
Add standard deviation for absorption / autointerp, store results per class for sparse probing / tpp for potential error bars (
141aff7
) -
Use GPU probing in correct location (
ec5efa8
)
- fix: use GPU for llm probing (
ba0956e
)
- fix: pass device into core evals (
e6651ea
)
-
fold W_dec norm when loading SAE Lens SAEs (
511d51a
) -
Change default sparse probing k values (
271a9d4
)
- feat: Add a frac alive calculation to core (
0399550
)
- added absorption fraction metric (#48)
feat: added absorption fraction metric
-
Small fixes
-
remove unused FeatureAbsorptionCalculator._filter_prompts function
Co-authored-by: Demian Till <demian.till@cambridgeconsultants.com> (7545ee3
)
-
Add a script for organizing and uploading results (
4689129
) -
Calculate featurewise statistics by default (
bca84ca
)
- feat: add misc core metrics (
2c731f6
)
- Make sure grad is enabled for absorption tests (
bd25ca0
)
- feat: EvalOutput and EvalConfig base classes to allow easy JSON schema export (
537219a
)
-
fix: eval_result_unstructured should be optional (
38e81b0
) -
fix: dump to json file correctly (
5f1cf15
)
-
git commit -m "fix: add missing init.py" (
20b20f2
) -
Merge pull request #47 from chanind/packaging
feat: Setting up Python packaging and autodeploy with Semantic Release (e52a418
)
-
Merge branch 'main' into packaging (
9bc22a4
) -
Merge branch 'main' into packaging (
bb10234
) -
Update SAE Bench demo to use new graphing functions (
9bbfdc5
) -
switching to poetry and setting up CI (
a9af271
) -
Add option to pass in arbitrary sae_class (
e450661
) -
Mention dictionary_learning (
c140e71
) -
Update graphing notebook to work with filenames (
dc6f951
) -
deprecate graphing notebook (
67118ee
) -
migrating to sae_bench base dir (
bb8e145
) -
Use a smaller batch size for unlearning (
3a099d2
) -
Reduce memory usage by only caching required activations (
f026998
) -
Remove debugging check (
8ea7162
) -
Add sanity checks before major run (
0908b18
) -
Improve normalization check (
16a3c0e
) -
Add normalization for batchtopk SAEs (
6a031bd
) -
Add matroyshka loader (
1078899
) -
Add pythia 160m (
b219497
) -
simplify process of evaluating dictionary learning SAEs (
c2dca52
) -
Add a script to run evals on dictionary learning SAEs (
3f4139b
) -
Make the layer argument optional (
e53675d
) -
Add batch_top_k, top_k, gated, and jump_relu implementations (
9a7fce8
) -
Add a function to test the saes (
864b4b3
) -
Update demo for new relu sae setup (
5d04ce5
) -
Ensure loaded SAEs are on correct dtype and device (
a5d6d62
) -
Create a base SAE class (
8fcc9fe
) -
Add blog post link (
2d47229
) -
cleanup README (
0e724df
) -
Clean up graphing notebook (
c08f3f5
) -
Graph results for all evals in demo notebook (
29ac97b
) -
Clean up for release (
1c9822c
) -
Include baseline pca in every graph. (
a45afd2
) -
Clean up plot legends, support graphing subplots (
7ade8b0
) -
Merge pull request #45 from adamkarvonen/update_jsonschemas
update jsonschemas (879c7ca
)
-
update jsonschemas (
a14d465
) -
Use notebook as default demo, mention in README (
298796b
) -
Minor fixes to demo (
05808c7
) -
Add missing batch size argument (
877f2e7
) -
Fixes for changes to eval config formats (
e0cb629
) -
Add an optional best of k graphing cell (
081b59c
) -
Ignore any folder containing "eval_results" (
12f8d66
) -
Add cell to add training tokens to config dictionaries (
38173c9
) -
Also plot all sae bench checkpoints (
93563e0
) -
Add eval links (
2216f99
) -
rename core results to match convention (
51e47fd
) -
Ignore autointerp with generations when downloading (
aa20644
) -
Use != instead of > for L0 measurement (
83504b7
) -
Add utility cell for removing llm generations (
67c9b03
) -
Add utility cell for splitting up files by release name (
3cc51ea
) -
Add force rerun option to core, match sae loading to other evals (
8676d5d
) -
Improve plotting of results (
89e5567
) -
Consolidate SAE loading and output locations (
293b385
) -
Plot generator for SAE Bench (
c2cb78e
) -
Add utility notebook for adding sae configs (
8508a01
) -
Improve custom SAE usage (
e959f65
) -
Improve graphing (
490cd2a
) -
Fix failing tests (
ed88f65
) -
match core output filename with others (
8ca0787
) -
Remove del sae flag (
feaf1f8
) -
Add current status to repo (
9c95af7
) -
Add sae config to output file (
b2fbd6d
) -
Add a flag for k sparse probing batch size (
6f2e38f
) -
Merge pull request #44 from adamkarvonen/absorption-tweaks-2
improving memory usage of k-sparse probing (6ae8235
)
- Merge pull request #43 from adamkarvonen/fake_branch
single line update (7984d50
)
-
single line update (
d9637e1
) -
improving memory usage of k-sparse probing (
841842a
) -
Add documentation to demo notebook (
2e170e1
) -
adapted graphing to np result filestructure (
3629b90
) -
Improve reduced memory script (
ecb9f46
) -
Script for evaluating 1M width SAEs (
63a6783
) -
Use expandable segments to reduce memory usage (
4f3967d
) -
Delete SAE at the correct location in the for loop (
ff0beda
) -
Shell script for running 65k width SAEs on 24 GB GPUs (
9b0bd9d
) -
Delete sae at end of loop to lower memory usage. Primarily required for 1M width SAEs (
08f9755
) -
Add absorption (
b2e89c9
) -
Add note on usage (
07cbf3c
) -
Add shell scripts for running all evals (
a832e09
) -
add 9b-it unlearning precomputed artifacts (
93502c0
) -
Add example of running all evals to notebook (
473081d
) -
Clean up filename (
a067c5c
) -
Create a demo of using custom SAEs on SAE bench (
49d5ecd
) -
Move warnings to main function, raise error if not instruct tuned (
e798adf
) -
perform calculations with a running sum to avoid underflow (
d842a1f
) -
Do probe attribution calculation in original dtype for memory savings (
366dc4c
) -
Use api key file instead of command line argument (
bb48a6c
) -
Add flags to reduce VRAM usage (
322334a
) -
fix unlearning test (
5039e5e
) -
add optional flag to reduce peak memory usage (
735f988
) -
Ignore core model name flag for now (
43ef711
) -
Don't try set random seed if it's none (
d1d6f72
) -
Make eval configs consistent, require model names in all eval arguments. (
d37e77c
) -
Add ability to pass in random seed and sae / llm batch size (
d8f026b
) -
Describe how values are set within eval configs (
365fb40
) -
Always ignore the bio forget corpus (
3e6d36f
) -
Use util function to convert str to dtype (
7281627
) -
update graphing scripts (
ff38240
) -
Merge pull request #39 from adamkarvonen/add_9b
add gemma-2-9b default DTYPE and BATCH_SIZE (164b6f5
)
-
also add for 9b-it (
b93f3c9
) -
add gemma-2-9b (
8030c03
) -
Update regexes and test data to match new SAE Bench SAEs (
6da4692
) -
Update outdated reference, don't get api_key if not required (
da9a2dc
) -
Add ability to pass in flag for computing featurewise statistics, default it to false (
f6430af
) -
Move str_to_dtype() to general utils (
8ab32f9
) -
Pass in a string dtype (
f49d41c
) -
Merge pull request #35 from adamkarvonen/add_pca
Add pca (f4fbd0c
)
-
Delete old sae bench data (
55f9b6f
) -
Mention disk space, fix repo name (
04a2b01
) -
mention WMDP access (
26816e5
) -
Be consistent when setting default dtype (
3ed82b3
) -
Rename baselines to custom_saes (
067bb79
) -
Rename shift to SCR (
bbbdfdc
) -
correctly save and load state dict (
35d64c8
) -
Just use the global PCA mean (
2317de9
) -
Increase test tolerance, remove cli validation as other evals aren't using it (
395095b
) -
Match core eval config to others with dtype usage (
9197e3f
) -
Also check for b_mag so we don't ignore gated SAEs bias (
edd0de2
) -
consolidate save locations of artifacts and eval results (
c9a18b0
) -
revert eval_id change for now (
2cfbcc0
) -
Save fake encoder bias as a tensor of zeros (
4ccbec6
) -
Ensure that sae_name is unique (
cce38b6
) -
Change default results location (
6987235
) -
Also compare residual stream as a baseline (
d528f3b
) -
Don't require sae to have a b_enc (
c979427
) -
Include model name in tokens filename (
a255df0
) -
Check if file exists (
f8c9ab2
) -
Fix regex usage demo (
1c4117d
) -
remove outdated import (
9dddb3e
) -
Simplify custom SAE usage demonstration (
542c659
) -
Benchmark autointerp scores on mlp neurons and saes (
4ba6cba
) -
Simplify code by storing dtype as a string (
963c9c2
) -
Add option to set dtype for core/main.py (
d649826
) -
Pass in the optional flag to save activations (
d0b8091
) -
Don't check for is SAE to enable use with custom SAEs (
36cfba8
) -
mention new run all script (
916df28
) -
Script for running evals on all custom SAEs at once (
b91c210
) -
Rename formatting utils to general utils (
dedec93
) -
Clean up duplicated functions (
28d2f2f
) -
Clean up old graphing code (
d3e8e87
) -
Fix memory leak (
65fa76a
) -
Make test file names consistent (
70e2eaa
) -
Remove unused flag (
4ed9602
) -
Improve GPU PCA training (
01e6306
) -
Fix namespace error and expected result format error (
776e5f4
) -
Enable usage of core with custom SAEs (
e359d1a
) -
Add a function to fit the PCA using GPU and CUML (
2632849
) -
Switch from nested dict to list of tuples for selected_saes (
dbdfe19
) -
Make it easier to train pca saes (
eb41438
) -
Format with ruff (
e62a436
) -
Test identity SAE implementation (
e95e055
) -
Add a PCA baseline (
645a040
) -
Move unlearning tokenization function to general utils file, consolidate tokenization functions (
98c4b5c
) -
Merge pull request #34 from adamkarvonen/fix-core-eval-precision
fixing excessively low precision (e0ddf06
)
-
fixing excessively low precision (
d1eea66
) -
Merge pull request #33 from adamkarvonen/add_baselines
Add baselines (20c2a40
)
-
Update README (
e5b2ba4
) -
Fix regexes (
478b41c
) -
Rename selection notebook (
1b24a46
) -
Remove usage of SAE patterns list (
80ed74d
) -
Make sure batch is on the correct dtype (
f8a1158
) -
Adapt auto interp to enable use with custom saes (
7ea0e59
) -
Adapt absorption to match existing format (
7e2ac58
) -
Enable easy usage of evals with custom SAEs (
64d2e23
) -
Use sae.encode() for compatibility instead of sae.run_with_cache() (
42cc9ce
) -
fix device errors (
1725899
) -
format with ruff (
737b788
) -
Set autointerp context size in eval config (
c4bfa82
) -
Add autointerp progress bars (
bcc14a9
) -
Use baseline SAEs on the sparse probing eval (
d3e5e07
) -
Merge pull request #32 from adamkarvonen/core_evals_ignore_special
Added option to exclude special tokens from SAE reconstruction (186bdb4
)
-
Added option to exclude special tokens from SAE reconstruction (
54a55f7
) -
Example jumprelu implementation (
0ca103e
) -
identity sae baseline (
5f65ace
) -
Merge pull request #31 from adamkarvonen/activation_consolidation
Activation consolidation (3ddcceb
)
-
Add graphing for pythia and autointerp (
1b4cf2a
) -
Correctly index into sae_acts (
8c938ad
) -
Adapt format to Neuronpedia requirements (
9baec7c
) -
Update README.md (
de3ce5c
) -
Rename for consistency (
632f54d
) -
Add end to end autointerp test (
360dfe0
) -
Remove college biology from datasets as too close to wmdp_bio (
dcdbbc5
) -
Print a warning if there aren't enough alive latents (
23fc5a5
) -
Include dataset info in filename (
d37fc04
) -
Add functions to encode precomputed activations (
bd742ee
) -
Eliminate usage of activation store (
fa58764
) -
Adapt autointerp to new format (
1de25f7
) -
prepend bos token (
46e00a4
) -
Mask off BOS, EOS, and pad tokens (
4e2b0d6
) -
Collect the sparsity tensor for SAE autointerp (
0dd3a91
) -
Format with ruff (
2afc772
) -
Updated question ids running with one BOS token (
df3b9d4
) -
Zero out SAE activations on BOS token (
0d30360
) -
Only use one BOS token at beginning (
ad97556
) -
Remove redundant with no_grad() (
528959f
) -
Merge remote-tracking branch 'origin/main' into activation_consolidation (
777e9d4
) -
Move the get_sparsity function to general utils folder, mask bos, pad, and eos tokens for unlearning (
f679f0f
) -
Make it easier to use get_llm_activations() with other evals (
1ed9a29
) -
Merge pull request #8 from callummcdougall/callum/autointerp
Autointerp eval (6f81495
)
-
Merge branch 'main' into callum/autointerp (
466a37d
) -
Improve graphing notebook for current output format (
36fb3ba
) -
Apply nbstripout (
ca27e41
) -
Notebook specifically for graphing and analyzing mdl results (
114cefb
) -
Merge pull request #30 from adamkarvonen/mdl_fixes
Mdl fixes (65c3c98
)
-
Add example data and add details to README. (
21f3c83
) -
Use torch instead of t for consistency (
903324f
) -
Move calculations to float32 to avoid dtype errors (
099a94f
) -
Add descriptions to unlearning hyperparameters and descriptions of shift, tpp, and sparse probing evals. (
6c141c1
) -
Merge pull request #28 from adamkarvonen/update_unlearning
Update unlearning output format (2bfd70b
)
-
descriptions (
c1f79b3
) -
Update (
60247d0
) -
update description (
03a2402
) -
fix unlearning test (
7c50173
) -
remove artifact (
41c7750
) -
output format (
1201a7c
) -
update unlearning test (
1eae76b
) -
unlearning start (
a7be6df
) -
Merge pull request #27 from adamkarvonen/core_tests
Update JSON schema filenames (b6ed053
)
-
remove unused (
810afe8
) -
updated schema file names (
e4df309
) -
update name of output schema file (
af58f0f
) -
Merge pull request #26 from adamkarvonen/core_tests
Update ui_default_display, titles for display (371b80d
)
-
Update titles (
4885187
) -
default display (
8f811b8
) -
Merge pull request #25 from adamkarvonen/core_tests
added tests for core eval output (92bc76a
)
-
added tests for core eval output (
591ce3f
) -
Add end to end unlearning test (
79435ab
) -
clean up activations always defaults to false (
5d545d1
) -
Further general cleanup of mdl eval (
51e9b60
) -
Merge pull request #24 from adamkarvonen/core_update
New Core output format, plus converter (b01be8a
)
-
New Core output format, plus converter (
192e92b
) -
Save sae results per sae (
7045ad2
) -
Fix variable name bug (
d5ec2d2
) -
MDL is running (
c486454
) -
Format with ruff (
04b14b7
) -
Merge pull request #6 from koayon/mdl-eval
Implement MDL eval (829de0c
)
-
Merge branch 'main' into mdl-eval (
ad18568
) -
Generate bfloat16 question_ids and commit them to the proper location, remove old ones (
d48c68b
) -
Add example unlearning output (
9866453
) -
Merge pull request #23 from adamkarvonen/unlearning_adapt
Unlearning adapt (3c48cdb
)
-
Allow plotting of gemma SAEs (
c813b39
) -
Adapt unlearning eval to new format (
26d1675
) -
pass artifact folder in to unlearning functions (
87508a6
) -
Add a sparsity penalty when training the SHIFT / TPP linear probes (
cc73c6f
) -
Merge pull request #21 from adamkarvonen/shift_sparse_probing_descriptions
Shift sparse probing descriptions (3e9555a
)
-
Remove unnecessary test keys, add note to README (
12f324d
) -
Merge pull request #22 from adamkarvonen/fix/handle_gated_in_core
handle case where gated SAEs don't have b_enc (586597b
)
-
Finish rename of the spurious_corr variable (
b28aab6
) -
handle case where gated SAEs don't have b_enc (
355aaf4
) -
update doc about how to update json schemas files. add json schema files. (
95fda67
) -
Update from uncategorized to shift_metrics and tpp_metrics (
43bf1f4
) -
Improve titles and descriptions for sparse probing (
177be38
) -
Improve descriptions, titles, and variable names in SHIFT and TPP (
29e1ecc
) -
Merge pull request #20 from adamkarvonen/make_unstructured_optional
fix: eval_result_unstructured should be optional (76d72a6
)
- Merge pull request #19 from adamkarvonen/core_eval_incremental_saving
Core eval incremental saving (7ddd55a
)
-
added error handling and exponential backoff (
92abbbe
) -
added code to produce intermediate json output between SAEs (
b15a2e2
) -
fix device bug, resolve test utils conflicts (
9b2e909
) -
Merge pull request #18 from adamkarvonen/set_sparse_probing_default_display
set k = 1, 2, 5 default display = true for sparse probing (db93af6
)
-
set k = 1, 2, 5 default display = true for sparse probing (
bf9b5ac
) -
Merge pull request #17 from adamkarvonen/add_unstructured_eval_output
Feature: Support unstructured eval output (3b17927
)
- Merge pull request #16 from adamkarvonen/basic-evals
Added core evals to repo (c55f48f
)
-
Support unstructured eval output (
adf028a
) -
Added core evals to repo (
9b1dd45
) -
Merge pull request #15 from adamkarvonen/json_schema_absorption
Use Pydantic for eval configs and outputs for annotations and portability (e75c8b5
)
-
update shift/tpp and sparse probing to evaloutput format (
2dbb6f8
) -
Merge remote-tracking branch 'origin/main' into json_schema_absorption (
eb8c660
) -
Add pytorch cuda flag due to OOM error message (
c8e74f4
) -
confirm shift_and_tpp to new output format (
153c713
) -
Merge remote-tracking branch 'origin/main' into json_schema_absorption (
b337d5f
) -
test pre-commit hook (
648046f
) -
produce the JSON schema files and add as a pre-commit hook (
6683e17
) -
Add example regexes for gemma 2 2b (
aea66aa
) -
Merge pull request #14 from adamkarvonen/shift_sparse_probing_updates
Shift sparse probing updates (14b5025
)
-
Add example usage of gemma-scope and gemma SAEs (
f2dcacf
) -
Improve arg parsing and probe file name (
ec8cd87
) -
Mention other use for GPU probe training (
929cdc0
) -
Add early stopping patience to reduce variance (
5285563
) -
Add note on random seed being overwritten by argparse (
d5a215a
) -
Separate save areas for tpp and shift (
c860deb
) -
Also ignore artifacts and test results (
1a81702
) -
Add shift and tpp to new format (
d7e4b8b
) -
Improve assert error message if keys don't match (
c514f49
) -
force_rerun now reruns even if a results file exists for a given sae (
1ff0b16
) -
Make shift and tpp tests compatible with old results (
ff2f46d
) -
Make sparse probing test backwards compatible with old results (
782a080
) -
fix absorption test (
d62b752
) -
Create a new graphing notebook for regex based selection (
65ef605
) -
Improve artifacts and results storage locations, add a utility to select saes using multiple regex patterns (
1cdfdb7
) -
No longer aggregate over saes in a dict (
0e5bccb
) -
Rename old graphing file (
df72d30
) -
fix ctx len bug, handle dead features better (
63c2561
) -
don't commit artifact file (
2c5691a
) -
Add openai and tabulate to requirements.txt (
f626447
) -
Begin shift / tpp adaptation (
ab1f062
) -
No longer average over multiple saes (
aaf06eb
) -
Add an optional list of regexes (
41b86a4
) -
By default remove the bos token (
4877424
) -
Match new sae bench format (
5d484e3
) -
Add note on output formats (
56c637f
) -
Add notes on custom sae usage (
30d4f16
) -
Add a utility function to plot multiple results at once (
a89c86e
) -
Ignore images and results folders (
e07e65f
) -
Merge branch 'main' into mdl-eval (
922fb14
) -
Update mdl_eval (
bdefc02
) -
Merge pull request #12 from jbloomAus/demo-format-and-command-changes-absorption
Demo of Changes to enable easy running of evals at scale (using absorption) (a9603b8
)
-
Merge branch 'main' into demo-format-and-command-changes-absorption (
9a9d4b1
) -
delete old template (
0092a1f
) -
add re-usable testing utils for the config, cli and output format. (
03aee86
) -
delete old template (
8d66a49
) -
Merge pull request #13 from adamkarvonen/minor_shift_improvements
Minor shift improvements (1b318d5
)
-
Notebook used to test different datasets (
d4d4fb5
) -
update stategy for running absorption via CLI (
13f90d0
) -
Comment out outdated tests (
f1e2f9e
) -
Add runtime estimates to READMEs (
32da7aa
) -
Rename to match other readmes (
8dc952f
) -
Reduce the default amount of n values for faster runtime (
74fd9a1
) -
Lower peak memory usage to fit on a 3090 (
1fecf15
) -
Skip first 150 chars per Neurons in a Haystack (
46d9510
) -
Merge pull request #11 from adamkarvonen/add_datasets
Add additional sparse probing datasets (0512456
)
-
Share dataset creation code between tpp and sparse probing evals (
30f60b6
) -
Update scr and tpp tests (
8561e91
) -
Add an optional keys to compare list, to only compare those values (
d60388b
) -
Add ag_news and europarl datasets (
82de70f
) -
Add a single shift scr metric key (
b65f969
) -
Use new sparse probing dataset names (
7b32c83
) -
Use more probe epochs, update to use new dataset names (
b49db06
) -
Add several new sparse probing datasets (
b4f5400
) -
Add dataset functions for amazon sentiment and github code (
aa1a478
) -
Use full huggingface dataset names (
d2d4001
) -
Merge pull request #10 from curt-tigges/main
Initial RAVEL code (fc6a59b
)
-
Merge branch 'main' into main (
d132ec3
) -
Change default unlearning hyperparameters (
d4c1949
) -
Do further analysis of unlearning hyperparameters (
8f6262c
) -
Add multiple subsets of existing datasets (
ae36e81
) -
Retry loading dataset due to intermittent errors (
dc38fd0
) -
Use stop at layer for faster inference (
e752a9c
) -
Merge pull request #9 from adamkarvonen/unlearning_cleanup
Unlearning cleanup (209e526
)
-
fix topk error (
e39a9ab
) -
add sae encode function (
b247f91
) -
Get known question ids if they don't exist (
f3516f4
) -
Remove unused functions (
ac3da4d
) -
discard unused variable (
aea5531
) -
Get results for all retain thresholds (
08eec18
) -
add regex based sae selection strategy (
57e9be0
) -
Updated notebook (
ae40301
) -
Save unlearning score in final output (
f90b114
) -
Add file to get correct answers for a model (
0361c07
) -
Fix missing filenames (
6aad3ca
) -
Move hyperparameters to eval config (
5d2b9d0
) -
restructure results json, store probe results (
9ec59a8
) -
Move llm and sae to llm_dtype (
56e8e43
) -
Fix utils import (
79321f8
) -
Apply ruff formatter (
1423c33
) -
Make sure we don't commit the forget dataset (
63bc153
) -
Apply nbstripout (
2ff3e72
) -
Merge pull request #7 from yeutong/unlearning
implement unlearning eval (42de6df
)
-
Merge branch 'main' into unlearning (
b2f6d68
) -
add version control utils (
b516958
) -
first commit (
4b23575
) -
Add pytorch flag due to CUDA OOM message (
c57eef7
) -
Move sae to llm dtype (
c510d95
) -
Add a README and test for absorption (
a8f4190
) -
Add example main function to absorption eval (
2f1c551
) -
Move sae to llm dtype (
26ed7a0
) -
Merge pull request #3 from chanind/absorption
Feature Absorption Eval (8d80be6
)
-
Added initial demo notebook (
86dfd95
) -
Added initial RAVEL files for dataset generation (
96e963f
) -
renaming dict keys (
ff81c53
) -
Merge remote-tracking branch 'upstream/main' (
daab3e2
) -
add analysis (
5eb7dfa
) -
success (
b67c97b
) -
add gemma-2-2b-it (
5f502d7
) -
revert changes to template.ipynb (
aac04e0
) -
fix detail (
e6e0985
) -
fixing batching error in absorption calculator (
8a425cd
) -
Merge branch 'main' into absorption (
822ad11
) -
Merge pull request #5 from koayon/rename-utils
Rename utils to avoid name conflict (eb6cc7b
)
-
update notebook imports (
6b4ca4a
) -
notebook reversion (
1391c79
) -
indentation (
111a68b
) -
Utils renaming (
cbd6b99
) -
rename utils to avoid name conflict (
e2a380a
) -
Scaffold mdl eval (orange) (
a6b3406
) -
arrange structure (
6392064
) -
replace model and sae loading (
91ce2fd
) -
moved all code (
7a714b4
) -
Merge pull request #4 from adamkarvonen/shift_eval
Shift eval (22a2a72
)
-
Update README with determinism (
aed96e8
) -
Fixed shift and tpp end to end tests (
2c2947a
) -
Merge branch 'main' into absorption (
5db6ecb
) -
reverting to original sparse probing main.py (
8d6779f
) -
fixing dtypes (
280f51a
) -
Add README for shift and tpp (
bcf3934
) -
Add end to end test for shift and tpp (
4ac13ef
) -
Move SAE to model dtype, add option to set column1_vals_list (
6ae4065
) -
adding absorption calculation code (
704eb00
) -
Initial working SHIFT / TPP evals (
fab86d4
) -
Add SHIFT paired classes (
900a04b
) -
Modify probe training for usage with SHIFT / TPP (
039fd29
) -
Pin dataset name for sparse probing test (
26d85ed
) -
Correct shape annotation (
dc1f3d7
) -
adding in k-sparse probing experiment code (
0ab6d2c
) -
Merge pull request #2 from adamkarvonen/sparse_probing_add_datasets
Sparse probing add datasets (19b4c4a
)
-
Check for columns with missing second group (
1895318
) -
Run sparse probing eval on multiple datasets and average results (
6e158b3
) -
Add function to average results from multiple runs (
47af366
) -
Remove html files (
9c42b2f
) -
WIP: absorption (
6348fb2
) -
Update READMEs (
e6f1e3b
) -
Create end to end test for sparse probing repo (
1eb63e9
) -
Rename file so it isn't ran as a test (
127924a
) -
Fix main.py imports, set seeds inside function (
9ec51f2
) -
Deprecate temporary fix for new sae_lens version (
13e5da0
) -
Don't pin to specific versions (
146c1fc
) -
Remove old requirements.txt (
2ea8bac
) -
Merge pull request #1 from adamkarvonen/restructure
Restructure (38e2721
)
-
added restructure (
c3952bd
) -
restructure (
f328f7f
) -
created branch (
374561a
) -
Update to use new eval results folder (
69657d8
) -
Make selected SAEs input explicit (
fdc0afd
) -
mention SAE Lens tutorial (
0121f32
) -
Add pythia example results (
e0a14be
) -
Fix titles (
b7203b0
) -
Demonstrate loading of Pythia and Gemma SAEs (
d158304
) -
Add new examples Gemma results (
cc1a27c
) -
Include checkpoints and not checkpoints if include_checkpoints is True (
f5b1a71
) -
Temp fix for SAE Bench TopK SAEs (
8b7a6ec
) -
Use sklearn by default, except for training on all SAE latents (
2b1e2b6
) -
Test file for experimenting with probe training (
4cd9cff
) -
Add gemma-scope eval_results data (
5f2bf51
) -
Use a dict of sae_release: list[sae_names] for the sparse probing eval (
70100d8
) -
Define llm dtype in activation_collection.py (
0f29194
) -
training plot scaled by steps (
9b49ec4
) -
bugfix sae_bench prefix (
da9f95f
) -
Merge branch 'main' of https://github.com/adamkarvonen/SAE_Bench_Template into main (
54a6156
) -
add plot over training steps (
065825e
) -
Also calculate k-sparse probing for LLM activations (
5e816b0
) -
Move default results into sparse_probing/ from sparse_probing/src/ (
2283cc5
) -
By default, perform k-sparse probing using sklearn instead of hand-rolled implementation (
57c4216
) -
Optional L1 penalty (
078fb32
) -
Separate logic for selecting topk features using mean diff (
8700aa7
) -
Set dtype based on model (
baa004e
) -
Add type annotations so config parameters are saved (
337c21d
) -
add virtualenv instructions (
68edafe
) -
added interactive 3var plot and renamed files for clarity (
5afafb7
) -
adapted requirements (
aeadbb4
) -
Merge branch 'main' of https://github.com/adamkarvonen/SAE_Bench_Template into main (
5ee11cf
) -
debugging correlation plots (
cee970f
) -
moved formatting utils to external file (
8fcc5da
) -
Update README.md (
98d5f57
) -
Update README.md (
2e2e9f8
) -
clarified README.md (
205b2fb
) -
clarify README.md (
540123b
) -
added explanation to template (
9df0fcb
) -
Improve README (
68927c9
) -
Updated pythia and gemma results (
57ecbea
) -
Improve graphing notebook (
66d0b66
) -
Apply nbstripout (
d955146
) -
Walkthrough notebook of dictionary format (
5246390
) -
Add to .gitignore (
c0b1b84
) -
Utility notebook to compare multiple run results (
27352d3
) -
Improve SAE naming, use Gemma by default (
c03e540
) -
Add READMEs (
b1ea1b3
) -
Make determinstic, improve sae key naming (
6a86bdb
) -
Make sure to type check shapes (
e67a8bc
) -
Archive development notebooks (
7ecd360
) -
Fix the recording name of saes (
08ceced
) -
Add missing batch indexing (
aeda80e
) -
Refactor batch processing to handle all sae_batch_size scenarios efficiently (
c15d61d
) -
Fix reduced precision warning (
36c9a8b
) -
correctly save results (
f5dfabb
) -
example results (
8b1ec0c
) -
Data on existing SAEs (
7761e2b
) -
Cleanup (
0f631d7
) -
Dev notebook (
1412347
) -
sparse probing eval (
fc789e5
) -
Create bias in bios dataset (
0ff4cf1
) -
Apply nbstripout (
b993ecc
) -
Beginning of plotting notebook (
141fbd2
) -
initial commit (
319300f
)