Skip to content

v0.4.0

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 22 Feb 02:39

v0.4.0 (2025-02-22)

Chore

  • chore: making test less flaky (3effa18)

  • chore: fix updated torch types (4c46da6)

  • chore: fixing linting errors and adding precommit hook (85f6241)

Feature

  • feat: allow setting the artifacts path (2a4b4dc)

Fix

  • fix: gracefully handle slashes in model filename for autointerp (5d6464a)

  • fix: fix typing and updating mdl for saelens >=5.4.0 (802d1c3)

  • fix: load probe class with weights_only = False (f05bf40)

  • fix: Update README to include eval output schema update instructions (f0adee2)

  • fix: Update json schema jsons (2b2a6d3)

Unknown

  • Merge pull request #60 from chanind/deflaking-test

chore: making test less flaky (963f2e8)

  • Remove threshold from state dict if we aren't using it (d91a218)

  • Merge pull request #59 from chanind/artifacts-path-option

feat: allow setting the artifacts path (53901a2)

  • Merge pull request #58 from chanind/fixing-types

chore: fix updated torch types (849018f)

  • Merge pull request #57 from chanind/fix-slash-in-model-name-autointerp

fix: gracefully handle slashes in model filename for autointerp (11b2e38)

  • adding artifacts_path to unlearning eval (ce1de32)

  • By default we don't use a threshold for custom topk SAEs (60579ed)

  • Merge pull request #56 from chanind/type-fixes

fix: fix typing and updating mdl for saelens >=5.4.0 (0888d07)

  • Merge pull request #55 from chanind/precommit-check

chore: fixing linting errors and adding precommit hook (7ac7ced)

  • Fix SAE Bench SAEs repo names (18dc457)

  • Prevent potential division by zero (92315dd)

  • Add optional pinned dependencies (e74f0cf)

  • Calculate featurewise statistics in demo (5204b48)

  • Improve documentation on custom SAE usage (f15fe53)

  • Merge pull request #53 from adamkarvonen/hide_absorption_stddev

hide stddev from default display for absorption (155afbc)

  • hide stddev from default display for absorption (d970f05)

  • Merge pull request #52 from adamkarvonen/update_scr_tpp

update scr_tpp_schema to show top 20 by default (f551e7b)

  • update scr_tpp_schema to show top 20 by default (59320e2)

  • Merge pull request #51 from adamkarvonen/update_schema_jsons

fix: Update eval output schema jsons (7b2021c)

  • Add computational requirements (9b621a9)

  • Improve graphing notebook, include matryoshka results in graphs (f2d1d98)

  • Merge pull request #50 from chanind/lint-and-type-check

chore: Adding formatting, linting and type checking (a0fb5e9)

  • adding README and Makefile with helpers (7452eca)

  • fixing linting and type-checking issues (e663e3a)

  • formatting with ruff (14dad45)

  • Check that unlearning data exists before running unlearning eval (294b25c)

  • Improve export notebook (e2b0b3c)

  • Improve graphing utils (661920d)

  • Fix spelling (8c0df93)

  • Add standard deviation for absorption / autointerp, store results per class for sparse probing / tpp for potential error bars (141aff7)

  • Use GPU probing in correct location (ec5efa8)