CHANGELOG

v0.4.0 (2025-02-22)

Chore

chore: making test less flaky (3effa18)
chore: fix updated torch types (4c46da6)
chore: fixing linting errors and adding precommit hook (85f6241)

Feature

feat: allow setting the artifacts path (2a4b4dc)

Fix

fix: gracefully handle slashes in model filename for autointerp (5d6464a)
fix: fix typing and updating mdl for saelens >=5.4.0 (802d1c3)
fix: load probe class with weights_only = False (f05bf40)
fix: Update README to include eval output schema update instructions (f0adee2)
fix: Update json schema jsons (2b2a6d3)

Unknown

Merge pull request #60 from chanind/deflaking-test

chore: making test less flaky (963f2e8)

Remove threshold from state dict if we aren't using it (d91a218)
Merge pull request #59 from chanind/artifacts-path-option

feat: allow setting the artifacts path (53901a2)

Merge pull request #58 from chanind/fixing-types

chore: fix updated torch types (849018f)

Merge pull request #57 from chanind/fix-slash-in-model-name-autointerp

fix: gracefully handle slashes in model filename for autointerp (11b2e38)

adding artifacts_path to unlearning eval (ce1de32)
By default we don't use a threshold for custom topk SAEs (60579ed)
Merge pull request #56 from chanind/type-fixes

fix: fix typing and updating mdl for saelens >=5.4.0 (0888d07)

Merge pull request #55 from chanind/precommit-check

chore: fixing linting errors and adding precommit hook (7ac7ced)

Fix SAE Bench SAEs repo names (18dc457)
Prevent potential division by zero (92315dd)
Add optional pinned dependencies (e74f0cf)
Calculate featurewise statistics in demo (5204b48)
Improve documentation on custom SAE usage (f15fe53)
Merge pull request #53 from adamkarvonen/hide_absorption_stddev

hide stddev from default display for absorption (155afbc)

hide stddev from default display for absorption (d970f05)
Merge pull request #52 from adamkarvonen/update_scr_tpp

update scr_tpp_schema to show top 20 by default (f551e7b)

update scr_tpp_schema to show top 20 by default (59320e2)
Merge pull request #51 from adamkarvonen/update_schema_jsons

fix: Update eval output schema jsons (7b2021c)

Add computational requirements (9b621a9)
Improve graphing notebook, include matryoshka results in graphs (f2d1d98)
Merge pull request #50 from chanind/lint-and-type-check

chore: Adding formatting, linting and type checking (a0fb5e9)

adding README and Makefile with helpers (7452eca)
fixing linting and type-checking issues (e663e3a)
formatting with ruff (14dad45)
Check that unlearning data exists before running unlearning eval (294b25c)
Improve export notebook (e2b0b3c)
Improve graphing utils (661920d)
Fix spelling (8c0df93)
Add standard deviation for absorption / autointerp, store results per class for sparse probing / tpp for potential error bars (141aff7)
Use GPU probing in correct location (ec5efa8)

v0.3.2 (2025-01-14)

Fix

fix: use GPU for llm probing (ba0956e)

Unknown

Don't hardcode the device for unlearning (a594ee6)
Update unlearning data path (443761d)

v0.3.1 (2025-01-14)

Fix

fix: pass device into core evals (e6651ea)

Unknown

fold W_dec norm when loading SAE Lens SAEs (511d51a)
Change default sparse probing k values (271a9d4)

v0.3.0 (2025-01-13)

Feature

feat: Add a frac alive calculation to core (0399550)

Unknown

added absorption fraction metric (#48)

feat: added absorption fraction metric

Small fixes
remove unused FeatureAbsorptionCalculator._filter_prompts function

Co-authored-by: Demian Till <demian.till@cambridgeconsultants.com> (7545ee3)

Add a script for organizing and uploading results (4689129)
Calculate featurewise statistics by default (bca84ca)

v0.2.0 (2025-01-09)

Feature

feat: add misc core metrics (2c731f6)

Unknown

Make sure grad is enabled for absorption tests (bd25ca0)

v0.1.0 (2025-01-09)

Feature

feat: EvalOutput and EvalConfig base classes to allow easy JSON schema export (537219a)

Fix

fix: eval_result_unstructured should be optional (38e81b0)
fix: dump to json file correctly (5f1cf15)

Unknown

git commit -m "fix: add missing init.py" (20b20f2)
Merge pull request #47 from chanind/packaging

feat: Setting up Python packaging and autodeploy with Semantic Release (e52a418)

Merge branch 'main' into packaging (9bc22a4)
Merge branch 'main' into packaging (bb10234)
Update SAE Bench demo to use new graphing functions (9bbfdc5)
switching to poetry and setting up CI (a9af271)
Add option to pass in arbitrary sae_class (e450661)
Mention dictionary_learning (c140e71)
Update graphing notebook to work with filenames (dc6f951)
deprecate graphing notebook (67118ee)
migrating to sae_bench base dir (bb8e145)
Use a smaller batch size for unlearning (3a099d2)
Reduce memory usage by only caching required activations (f026998)
Remove debugging check (8ea7162)
Add sanity checks before major run (0908b18)
Improve normalization check (16a3c0e)
Add normalization for batchtopk SAEs (6a031bd)
Add matroyshka loader (1078899)
Add pythia 160m (b219497)
simplify process of evaluating dictionary learning SAEs (c2dca52)
Add a script to run evals on dictionary learning SAEs (3f4139b)
Make the layer argument optional (e53675d)
Add batch_top_k, top_k, gated, and jump_relu implementations (9a7fce8)
Add a function to test the saes (864b4b3)
Update demo for new relu sae setup (5d04ce5)
Ensure loaded SAEs are on correct dtype and device (a5d6d62)
Create a base SAE class (8fcc9fe)
Add blog post link (2d47229)
cleanup README (0e724df)
Clean up graphing notebook (c08f3f5)
Graph results for all evals in demo notebook (29ac97b)
Clean up for release (1c9822c)
Include baseline pca in every graph. (a45afd2)
Clean up plot legends, support graphing subplots (7ade8b0)
Merge pull request #45 from adamkarvonen/update_jsonschemas

update jsonschemas (879c7ca)

update jsonschemas (a14d465)
Use notebook as default demo, mention in README (298796b)
Minor fixes to demo (05808c7)
Add missing batch size argument (877f2e7)
Fixes for changes to eval config formats (e0cb629)
Add an optional best of k graphing cell (081b59c)
Ignore any folder containing "eval_results" (12f8d66)
Add cell to add training tokens to config dictionaries (38173c9)
Also plot all sae bench checkpoints (93563e0)
Add eval links (2216f99)
rename core results to match convention (51e47fd)
Ignore autointerp with generations when downloading (aa20644)
Use != instead of > for L0 measurement (83504b7)
Add utility cell for removing llm generations (67c9b03)
Add utility cell for splitting up files by release name (3cc51ea)
Add force rerun option to core, match sae loading to other evals (8676d5d)
Improve plotting of results (89e5567)
Consolidate SAE loading and output locations (293b385)
Plot generator for SAE Bench (c2cb78e)
Add utility notebook for adding sae configs (8508a01)
Improve custom SAE usage (e959f65)
Improve graphing (490cd2a)
Fix failing tests (ed88f65)
match core output filename with others (8ca0787)
Remove del sae flag (feaf1f8)
Add current status to repo (9c95af7)
Add sae config to output file (b2fbd6d)
Add a flag for k sparse probing batch size (6f2e38f)
Merge pull request #44 from adamkarvonen/absorption-tweaks-2

improving memory usage of k-sparse probing (6ae8235)

Merge pull request #43 from adamkarvonen/fake_branch

single line update (7984d50)

single line update (d9637e1)
improving memory usage of k-sparse probing (841842a)
Add documentation to demo notebook (2e170e1)
adapted graphing to np result filestructure (3629b90)
Improve reduced memory script (ecb9f46)
Script for evaluating 1M width SAEs (63a6783)
Use expandable segments to reduce memory usage (4f3967d)
Delete SAE at the correct location in the for loop (ff0beda)
Shell script for running 65k width SAEs on 24 GB GPUs (9b0bd9d)
Delete sae at end of loop to lower memory usage. Primarily required for 1M width SAEs (08f9755)
Add absorption (b2e89c9)
Add note on usage (07cbf3c)
Add shell scripts for running all evals (a832e09)
add 9b-it unlearning precomputed artifacts (93502c0)
Add example of running all evals to notebook (473081d)
Clean up filename (a067c5c)
Create a demo of using custom SAEs on SAE bench (49d5ecd)
Move warnings to main function, raise error if not instruct tuned (e798adf)
perform calculations with a running sum to avoid underflow (d842a1f)
Do probe attribution calculation in original dtype for memory savings (366dc4c)
Use api key file instead of command line argument (bb48a6c)
Add flags to reduce VRAM usage (322334a)
fix unlearning test (5039e5e)
add optional flag to reduce peak memory usage (735f988)
Ignore core model name flag for now (43ef711)
Don't try set random seed if it's none (d1d6f72)
Make eval configs consistent, require model names in all eval arguments. (d37e77c)
Add ability to pass in random seed and sae / llm batch size (d8f026b)
Describe how values are set within eval configs (365fb40)
Always ignore the bio forget corpus (3e6d36f)
Use util function to convert str to dtype (7281627)
update graphing scripts (ff38240)
Merge pull request #39 from adamkarvonen/add_9b

add gemma-2-9b default DTYPE and BATCH_SIZE (164b6f5)

also add for 9b-it (b93f3c9)
add gemma-2-9b (8030c03)
Update regexes and test data to match new SAE Bench SAEs (6da4692)
Update outdated reference, don't get api_key if not required (da9a2dc)
Add ability to pass in flag for computing featurewise statistics, default it to false (f6430af)
Move str_to_dtype() to general utils (8ab32f9)
Pass in a string dtype (f49d41c)
Merge pull request #35 from adamkarvonen/add_pca

Add pca (f4fbd0c)

Delete old sae bench data (55f9b6f)
Mention disk space, fix repo name (04a2b01)
mention WMDP access (26816e5)
Be consistent when setting default dtype (3ed82b3)
Rename baselines to custom_saes (067bb79)
Rename shift to SCR (bbbdfdc)
correctly save and load state dict (35d64c8)
Just use the global PCA mean (2317de9)
Increase test tolerance, remove cli validation as other evals aren't using it (395095b)
Match core eval config to others with dtype usage (9197e3f)
Also check for b_mag so we don't ignore gated SAEs bias (edd0de2)
consolidate save locations of artifacts and eval results (c9a18b0)
revert eval_id change for now (2cfbcc0)
Save fake encoder bias as a tensor of zeros (4ccbec6)
Ensure that sae_name is unique (cce38b6)
Change default results location (6987235)
Also compare residual stream as a baseline (d528f3b)
Don't require sae to have a b_enc (c979427)
Include model name in tokens filename (a255df0)
Check if file exists (f8c9ab2)
Fix regex usage demo (1c4117d)
remove outdated import (9dddb3e)
Simplify custom SAE usage demonstration (542c659)
Benchmark autointerp scores on mlp neurons and saes (4ba6cba)
Simplify code by storing dtype as a string (963c9c2)
Add option to set dtype for core/main.py (d649826)
Pass in the optional flag to save activations (d0b8091)
Don't check for is SAE to enable use with custom SAEs (36cfba8)
mention new run all script (916df28)
Script for running evals on all custom SAEs at once (b91c210)
Rename formatting utils to general utils (dedec93)
Clean up duplicated functions (28d2f2f)
Clean up old graphing code (d3e8e87)
Fix memory leak (65fa76a)
Make test file names consistent (70e2eaa)
Remove unused flag (4ed9602)
Improve GPU PCA training (01e6306)
Fix namespace error and expected result format error (776e5f4)
Enable usage of core with custom SAEs (e359d1a)
Add a function to fit the PCA using GPU and CUML (2632849)
Switch from nested dict to list of tuples for selected_saes (dbdfe19)
Make it easier to train pca saes (eb41438)
Format with ruff (e62a436)
Test identity SAE implementation (e95e055)
Add a PCA baseline (645a040)
Move unlearning tokenization function to general utils file, consolidate tokenization functions (98c4b5c)
Merge pull request #34 from adamkarvonen/fix-core-eval-precision

fixing excessively low precision (e0ddf06)

fixing excessively low precision (d1eea66)
Merge pull request #33 from adamkarvonen/add_baselines

Add baselines (20c2a40)

Update README (e5b2ba4)
Fix regexes (478b41c)
Rename selection notebook (1b24a46)
Remove usage of SAE patterns list (80ed74d)
Make sure batch is on the correct dtype (f8a1158)
Adapt auto interp to enable use with custom saes (7ea0e59)
Adapt absorption to match existing format (7e2ac58)
Enable easy usage of evals with custom SAEs (64d2e23)
Use sae.encode() for compatibility instead of sae.run_with_cache() (42cc9ce)
fix device errors (1725899)
format with ruff (737b788)
Set autointerp context size in eval config (c4bfa82)
Add autointerp progress bars (bcc14a9)
Use baseline SAEs on the sparse probing eval (d3e5e07)
Merge pull request #32 from adamkarvonen/core_evals_ignore_special

Added option to exclude special tokens from SAE reconstruction (186bdb4)

Added option to exclude special tokens from SAE reconstruction (54a55f7)
Example jumprelu implementation (0ca103e)
identity sae baseline (5f65ace)
Merge pull request #31 from adamkarvonen/activation_consolidation

Activation consolidation (3ddcceb)

Add graphing for pythia and autointerp (1b4cf2a)
Correctly index into sae_acts (8c938ad)
Adapt format to Neuronpedia requirements (9baec7c)
Update README.md (de3ce5c)
Rename for consistency (632f54d)
Add end to end autointerp test (360dfe0)
Remove college biology from datasets as too close to wmdp_bio (dcdbbc5)
Print a warning if there aren't enough alive latents (23fc5a5)
Include dataset info in filename (d37fc04)
Add functions to encode precomputed activations (bd742ee)
Eliminate usage of activation store (fa58764)
Adapt autointerp to new format (1de25f7)
prepend bos token (46e00a4)
Mask off BOS, EOS, and pad tokens (4e2b0d6)
Collect the sparsity tensor for SAE autointerp (0dd3a91)
Format with ruff (2afc772)
Updated question ids running with one BOS token (df3b9d4)
Zero out SAE activations on BOS token (0d30360)
Only use one BOS token at beginning (ad97556)
Remove redundant with no_grad() (528959f)
Merge remote-tracking branch 'origin/main' into activation_consolidation (777e9d4)
Move the get_sparsity function to general utils folder, mask bos, pad, and eos tokens for unlearning (f679f0f)
Make it easier to use get_llm_activations() with other evals (1ed9a29)
Merge pull request #8 from callummcdougall/callum/autointerp

Autointerp eval (6f81495)

Merge branch 'main' into callum/autointerp (466a37d)
Improve graphing notebook for current output format (36fb3ba)
Apply nbstripout (ca27e41)
Notebook specifically for graphing and analyzing mdl results (114cefb)
Merge pull request #30 from adamkarvonen/mdl_fixes

Mdl fixes (65c3c98)

Add example data and add details to README. (21f3c83)
Use torch instead of t for consistency (903324f)
Move calculations to float32 to avoid dtype errors (099a94f)
Add descriptions to unlearning hyperparameters and descriptions of shift, tpp, and sparse probing evals. (6c141c1)
Merge pull request #28 from adamkarvonen/update_unlearning

Update unlearning output format (2bfd70b)

descriptions (c1f79b3)
Update (60247d0)
update description (03a2402)
fix unlearning test (7c50173)
remove artifact (41c7750)
output format (1201a7c)
update unlearning test (1eae76b)
unlearning start (a7be6df)
Merge pull request #27 from adamkarvonen/core_tests

Update JSON schema filenames (b6ed053)

remove unused (810afe8)
updated schema file names (e4df309)
update name of output schema file (af58f0f)
Merge pull request #26 from adamkarvonen/core_tests

Update ui_default_display, titles for display (371b80d)

Update titles (4885187)
default display (8f811b8)
Merge pull request #25 from adamkarvonen/core_tests

added tests for core eval output (92bc76a)

added tests for core eval output (591ce3f)
Add end to end unlearning test (79435ab)
clean up activations always defaults to false (5d545d1)
Further general cleanup of mdl eval (51e9b60)
Merge pull request #24 from adamkarvonen/core_update

New Core output format, plus converter (b01be8a)

New Core output format, plus converter (192e92b)
Save sae results per sae (7045ad2)
Fix variable name bug (d5ec2d2)
MDL is running (c486454)
Format with ruff (04b14b7)
Merge pull request #6 from koayon/mdl-eval

Implement MDL eval (829de0c)

Merge branch 'main' into mdl-eval (ad18568)
Generate bfloat16 question_ids and commit them to the proper location, remove old ones (d48c68b)
Add example unlearning output (9866453)
Merge pull request #23 from adamkarvonen/unlearning_adapt

Unlearning adapt (3c48cdb)

Allow plotting of gemma SAEs (c813b39)
Adapt unlearning eval to new format (26d1675)
pass artifact folder in to unlearning functions (87508a6)
Add a sparsity penalty when training the SHIFT / TPP linear probes (cc73c6f)
Merge pull request #21 from adamkarvonen/shift_sparse_probing_descriptions

Shift sparse probing descriptions (3e9555a)

Remove unnecessary test keys, add note to README (12f324d)
Merge pull request #22 from adamkarvonen/fix/handle_gated_in_core

handle case where gated SAEs don't have b_enc (586597b)

Finish rename of the spurious_corr variable (b28aab6)
handle case where gated SAEs don't have b_enc (355aaf4)
update doc about how to update json schemas files. add json schema files. (95fda67)
Update from uncategorized to shift_metrics and tpp_metrics (43bf1f4)
Improve titles and descriptions for sparse probing (177be38)
Improve descriptions, titles, and variable names in SHIFT and TPP (29e1ecc)
Merge pull request #20 from adamkarvonen/make_unstructured_optional

fix: eval_result_unstructured should be optional (76d72a6)

Merge pull request #19 from adamkarvonen/core_eval_incremental_saving

Core eval incremental saving (7ddd55a)

added error handling and exponential backoff (92abbbe)
added code to produce intermediate json output between SAEs (b15a2e2)
fix device bug, resolve test utils conflicts (9b2e909)
Merge pull request #18 from adamkarvonen/set_sparse_probing_default_display

set k = 1, 2, 5 default display = true for sparse probing (db93af6)

set k = 1, 2, 5 default display = true for sparse probing (bf9b5ac)
Merge pull request #17 from adamkarvonen/add_unstructured_eval_output

Feature: Support unstructured eval output (3b17927)

Merge pull request #16 from adamkarvonen/basic-evals

Added core evals to repo (c55f48f)

Support unstructured eval output (adf028a)
Added core evals to repo (9b1dd45)
Merge pull request #15 from adamkarvonen/json_schema_absorption

Use Pydantic for eval configs and outputs for annotations and portability (e75c8b5)

update shift/tpp and sparse probing to evaloutput format (2dbb6f8)
Merge remote-tracking branch 'origin/main' into json_schema_absorption (eb8c660)
Add pytorch cuda flag due to OOM error message (c8e74f4)
confirm shift_and_tpp to new output format (153c713)
Merge remote-tracking branch 'origin/main' into json_schema_absorption (b337d5f)
test pre-commit hook (648046f)
produce the JSON schema files and add as a pre-commit hook (6683e17)
Add example regexes for gemma 2 2b (aea66aa)
Merge pull request #14 from adamkarvonen/shift_sparse_probing_updates

Shift sparse probing updates (14b5025)

Add example usage of gemma-scope and gemma SAEs (f2dcacf)
Improve arg parsing and probe file name (ec8cd87)
Mention other use for GPU probe training (929cdc0)
Add early stopping patience to reduce variance (5285563)
Add note on random seed being overwritten by argparse (d5a215a)
Separate save areas for tpp and shift (c860deb)
Also ignore artifacts and test results (1a81702)
Add shift and tpp to new format (d7e4b8b)
Improve assert error message if keys don't match (c514f49)
force_rerun now reruns even if a results file exists for a given sae (1ff0b16)
Make shift and tpp tests compatible with old results (ff2f46d)
Make sparse probing test backwards compatible with old results (782a080)
fix absorption test (d62b752)
Create a new graphing notebook for regex based selection (65ef605)
Improve artifacts and results storage locations, add a utility to select saes using multiple regex patterns (1cdfdb7)
No longer aggregate over saes in a dict (0e5bccb)
Rename old graphing file (df72d30)
fix ctx len bug, handle dead features better (63c2561)
don't commit artifact file (2c5691a)
Add openai and tabulate to requirements.txt (f626447)
Begin shift / tpp adaptation (ab1f062)
No longer average over multiple saes (aaf06eb)
Add an optional list of regexes (41b86a4)
By default remove the bos token (4877424)
Match new sae bench format (5d484e3)
Add note on output formats (56c637f)
Add notes on custom sae usage (30d4f16)
Add a utility function to plot multiple results at once (a89c86e)
Ignore images and results folders (e07e65f)
Merge branch 'main' into mdl-eval (922fb14)
Update mdl_eval (bdefc02)
Merge pull request #12 from jbloomAus/demo-format-and-command-changes-absorption

Demo of Changes to enable easy running of evals at scale (using absorption) (a9603b8)

Merge branch 'main' into demo-format-and-command-changes-absorption (9a9d4b1)
delete old template (0092a1f)
add re-usable testing utils for the config, cli and output format. (03aee86)
delete old template (8d66a49)
Merge pull request #13 from adamkarvonen/minor_shift_improvements

Minor shift improvements (1b318d5)

Notebook used to test different datasets (d4d4fb5)
update stategy for running absorption via CLI (13f90d0)
Comment out outdated tests (f1e2f9e)
Add runtime estimates to READMEs (32da7aa)
Rename to match other readmes (8dc952f)
Reduce the default amount of n values for faster runtime (74fd9a1)
Lower peak memory usage to fit on a 3090 (1fecf15)
Skip first 150 chars per Neurons in a Haystack (46d9510)
Merge pull request #11 from adamkarvonen/add_datasets

Add additional sparse probing datasets (0512456)

Share dataset creation code between tpp and sparse probing evals (30f60b6)
Update scr and tpp tests (8561e91)
Add an optional keys to compare list, to only compare those values (d60388b)
Add ag_news and europarl datasets (82de70f)
Add a single shift scr metric key (b65f969)
Use new sparse probing dataset names (7b32c83)
Use more probe epochs, update to use new dataset names (b49db06)
Add several new sparse probing datasets (b4f5400)
Add dataset functions for amazon sentiment and github code (aa1a478)
Use full huggingface dataset names (d2d4001)
Merge pull request #10 from curt-tigges/main

Initial RAVEL code (fc6a59b)

Merge branch 'main' into main (d132ec3)
Change default unlearning hyperparameters (d4c1949)
Do further analysis of unlearning hyperparameters (8f6262c)
Add multiple subsets of existing datasets (ae36e81)
Retry loading dataset due to intermittent errors (dc38fd0)
Use stop at layer for faster inference (e752a9c)
Merge pull request #9 from adamkarvonen/unlearning_cleanup

Unlearning cleanup (209e526)

fix topk error (e39a9ab)
add sae encode function (b247f91)
Get known question ids if they don't exist (f3516f4)
Remove unused functions (ac3da4d)
discard unused variable (aea5531)
Get results for all retain thresholds (08eec18)
add regex based sae selection strategy (57e9be0)
Updated notebook (ae40301)
Save unlearning score in final output (f90b114)
Add file to get correct answers for a model (0361c07)
Fix missing filenames (6aad3ca)
Move hyperparameters to eval config (5d2b9d0)
restructure results json, store probe results (9ec59a8)
Move llm and sae to llm_dtype (56e8e43)
Fix utils import (79321f8)
Apply ruff formatter (1423c33)
Make sure we don't commit the forget dataset (63bc153)
Apply nbstripout (2ff3e72)
Merge pull request #7 from yeutong/unlearning

implement unlearning eval (42de6df)

Merge branch 'main' into unlearning (b2f6d68)
add version control utils (b516958)
first commit (4b23575)
Add pytorch flag due to CUDA OOM message (c57eef7)
Move sae to llm dtype (c510d95)
Add a README and test for absorption (a8f4190)
Add example main function to absorption eval (2f1c551)
Move sae to llm dtype (26ed7a0)
Merge pull request #3 from chanind/absorption

Feature Absorption Eval (8d80be6)

Added initial demo notebook (86dfd95)
Added initial RAVEL files for dataset generation (96e963f)
renaming dict keys (ff81c53)
Merge remote-tracking branch 'upstream/main' (daab3e2)
add analysis (5eb7dfa)
success (b67c97b)
add gemma-2-2b-it (5f502d7)
revert changes to template.ipynb (aac04e0)
fix detail (e6e0985)
fixing batching error in absorption calculator (8a425cd)
Merge branch 'main' into absorption (822ad11)
Merge pull request #5 from koayon/rename-utils

Rename utils to avoid name conflict (eb6cc7b)

update notebook imports (6b4ca4a)
notebook reversion (1391c79)
indentation (111a68b)
Utils renaming (cbd6b99)
rename utils to avoid name conflict (e2a380a)
Scaffold mdl eval (orange) (a6b3406)
arrange structure (6392064)
replace model and sae loading (91ce2fd)
moved all code (7a714b4)
Merge pull request #4 from adamkarvonen/shift_eval

Shift eval (22a2a72)

Update README with determinism (aed96e8)
Fixed shift and tpp end to end tests (2c2947a)
Merge branch 'main' into absorption (5db6ecb)
reverting to original sparse probing main.py (8d6779f)
fixing dtypes (280f51a)
Add README for shift and tpp (bcf3934)
Add end to end test for shift and tpp (4ac13ef)
Move SAE to model dtype, add option to set column1_vals_list (6ae4065)
adding absorption calculation code (704eb00)
Initial working SHIFT / TPP evals (fab86d4)
Add SHIFT paired classes (900a04b)
Modify probe training for usage with SHIFT / TPP (039fd29)
Pin dataset name for sparse probing test (26d85ed)
Correct shape annotation (dc1f3d7)
adding in k-sparse probing experiment code (0ab6d2c)
Merge pull request #2 from adamkarvonen/sparse_probing_add_datasets

Sparse probing add datasets (19b4c4a)

Check for columns with missing second group (1895318)
Run sparse probing eval on multiple datasets and average results (6e158b3)
Add function to average results from multiple runs (47af366)
Remove html files (9c42b2f)
WIP: absorption (6348fb2)
Update READMEs (e6f1e3b)
Create end to end test for sparse probing repo (1eb63e9)
Rename file so it isn't ran as a test (127924a)
Fix main.py imports, set seeds inside function (9ec51f2)
Deprecate temporary fix for new sae_lens version (13e5da0)
Don't pin to specific versions (146c1fc)
Remove old requirements.txt (2ea8bac)
Merge pull request #1 from adamkarvonen/restructure

Restructure (38e2721)

added restructure (c3952bd)
restructure (f328f7f)
created branch (374561a)
Update to use new eval results folder (69657d8)
Make selected SAEs input explicit (fdc0afd)
mention SAE Lens tutorial (0121f32)
Add pythia example results (e0a14be)
Fix titles (b7203b0)
Demonstrate loading of Pythia and Gemma SAEs (d158304)
Add new examples Gemma results (cc1a27c)
Include checkpoints and not checkpoints if include_checkpoints is True (f5b1a71)
Temp fix for SAE Bench TopK SAEs (8b7a6ec)
Use sklearn by default, except for training on all SAE latents (2b1e2b6)
Test file for experimenting with probe training (4cd9cff)
Add gemma-scope eval_results data (5f2bf51)
Use a dict of sae_release: list[sae_names] for the sparse probing eval (70100d8)
Define llm dtype in activation_collection.py (0f29194)
training plot scaled by steps (9b49ec4)
bugfix sae_bench prefix (da9f95f)
Merge branch 'main' of https://github.com/adamkarvonen/SAE_Bench_Template into main (54a6156)
add plot over training steps (065825e)
Also calculate k-sparse probing for LLM activations (5e816b0)
Move default results into sparse_probing/ from sparse_probing/src/ (2283cc5)
By default, perform k-sparse probing using sklearn instead of hand-rolled implementation (57c4216)
Optional L1 penalty (078fb32)
Separate logic for selecting topk features using mean diff (8700aa7)
Set dtype based on model (baa004e)
Add type annotations so config parameters are saved (337c21d)
add virtualenv instructions (68edafe)
added interactive 3var plot and renamed files for clarity (5afafb7)
adapted requirements (aeadbb4)
Merge branch 'main' of https://github.com/adamkarvonen/SAE_Bench_Template into main (5ee11cf)
debugging correlation plots (cee970f)
moved formatting utils to external file (8fcc5da)
Update README.md (98d5f57)
Update README.md (2e2e9f8)
clarified README.md (205b2fb)
clarify README.md (540123b)
added explanation to template (9df0fcb)
Improve README (68927c9)
Updated pythia and gemma results (57ecbea)
Improve graphing notebook (66d0b66)
Apply nbstripout (d955146)
Walkthrough notebook of dictionary format (5246390)
Add to .gitignore (c0b1b84)
Utility notebook to compare multiple run results (27352d3)
Improve SAE naming, use Gemma by default (c03e540)
Add READMEs (b1ea1b3)
Make determinstic, improve sae key naming (6a86bdb)
Make sure to type check shapes (e67a8bc)
Archive development notebooks (7ecd360)
Fix the recording name of saes (08ceced)
Add missing batch indexing (aeda80e)
Refactor batch processing to handle all sae_batch_size scenarios efficiently (c15d61d)
Fix reduced precision warning (36c9a8b)
correctly save results (f5dfabb)
example results (8b1ec0c)
Data on existing SAEs (7761e2b)
Cleanup (0f631d7)
Dev notebook (1412347)
sparse probing eval (fc789e5)
Create bias in bios dataset (0ff4cf1)
Apply nbstripout (b993ecc)
Beginning of plotting notebook (141fbd2)
initial commit (319300f)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

CHANGELOG

v0.4.0 (2025-02-22)

Chore

Feature

Fix

Unknown

v0.3.2 (2025-01-14)

Fix

Unknown

v0.3.1 (2025-01-14)

Fix

Unknown

v0.3.0 (2025-01-13)

Feature

Unknown

v0.2.0 (2025-01-09)

Feature

Unknown

v0.1.0 (2025-01-09)

Feature

Fix

Unknown

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

CHANGELOG

v0.4.0 (2025-02-22)

Chore

Feature

Fix

Unknown

v0.3.2 (2025-01-14)

Fix

Unknown

v0.3.1 (2025-01-14)

Fix

Unknown

v0.3.0 (2025-01-13)

Feature

Unknown

v0.2.0 (2025-01-09)

Feature

Unknown

v0.1.0 (2025-01-09)

Feature

Fix

Unknown