Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/automatic refinement transfer learning #41

Open
wants to merge 136 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
136 commits
Select commit Hold shift + click to select a range
cc0f2de
README for shared bmpc branch!
finnkap May 24, 2024
036f51e
README for shared bmpc branch!
finnkap May 24, 2024
7ba0178
added notebook to train baseline model
juli-p May 24, 2024
bad2115
added dataset preparation script
juli-p May 31, 2024
e1fc23d
dataset processing now working
juli-p May 31, 2024
27b8583
added code for trainint the baseline model
juli-p May 31, 2024
756c592
cleaned up, added gitignore
juli-p May 31, 2024
cc562a6
processing for two new datasets
L1W1 Jun 1, 2024
e07cf46
renamed datasets, added noptm dataset
juli-p Jun 1, 2024
0a7439d
Merge branch 'feature/bmpc' of github.com:wilhelm-lab/dlomix into fea…
juli-p Jun 1, 2024
ec8fc94
removed redundant configs
juli-p Jun 1, 2024
88b9d27
added cleaned small model
juli-p Jun 1, 2024
3613f0a
updated train configs
juli-p Jun 1, 2024
9b93463
added noptm dummy file
juli-p Jun 1, 2024
dada46c
updated training scripts
juli-p Jun 1, 2024
6e36a47
fixed gpu selection
juli-p Jun 2, 2024
48575f0
cleaned up
juli-p Jun 2, 2024
57be3cc
added decorator to custom loss/metric functions
finnkap Jun 3, 2024
42ee8c6
play around with changing output layer
finnkap Jun 3, 2024
dc7b36e
fixed model saving routine
juli-p Jun 3, 2024
2a2b208
added lr scheduling to baseline model training with agent
juli-p Jun 12, 2024
53743c6
added tutorial folder and tutorial notebook for freezing
L1W1 Jun 12, 2024
304303b
merging
L1W1 Jun 12, 2024
d45037a
Merge branch 'feature/bmpc' of github.com:wilhelm-lab/dlomix into fea…
L1W1 Jun 12, 2024
ba39c71
improved baseline training, removed num_proc from all configs
juli-p Jun 13, 2024
acf1356
added support for different alphabets
juli-p Jun 13, 2024
20d9fe7
Merge branch 'feature/bmpc-julian' into feature/bmpc
juli-p Jun 13, 2024
df703f8
Merge branch 'feature/bmpc-julian' of github.com:wilhelm-lab/dlomix i…
juli-p Jun 13, 2024
484691a
added alphabet
juli-p Jun 13, 2024
6acbcd5
Changing layers tutorial notebook
finnkap Jun 14, 2024
cea8305
changes
juli-p Jun 17, 2024
51d5c99
add create callback tutorial to bmpc branch
SylvieBaier Jun 19, 2024
407a737
addded parameters for model compilation to freezing function and crea…
L1W1 Jun 19, 2024
8436927
Merge branch 'feature/bmpc' of github.com:wilhelm-lab/dlomix into fea…
SylvieBaier Jun 19, 2024
d618fa1
create callback tutorial added
SylvieBaier Jun 19, 2024
bb6f4d2
create callbacks in correct folder
SylvieBaier Jun 19, 2024
a35bb49
restructured folders for transfer learning
SylvieBaier Jun 19, 2024
f5d32ce
updated config files for different alphabet datasets
juli-p Jun 19, 2024
db0c4c4
config file for create_callbacks.py added and create_callbacks.py itself
SylvieBaier Jun 19, 2024
e2ed568
Script for changing layers
finnkap Jun 20, 2024
5ab2e79
updated syntax
finnkap Jun 20, 2024
ef50f44
refinement and transfer learning utils file added with functions for …
SylvieBaier Jun 20, 2024
30c20ac
Removed finn_notebooks from shared branch and updated change_layers.py
finnkap Jun 23, 2024
2607de3
Merge branch 'feature/bmpc' of github.com:wilhelm-lab/dlomix into fea…
finnkap Jun 23, 2024
13b1025
updated configuration
juli-p Jun 23, 2024
53c61d6
Merge branch 'feature/bmpc' into feature/bmpc-julian
juli-p Jun 23, 2024
a4367e3
started agent-ready refinement/transfer learning script
juli-p Jun 23, 2024
d950364
changed change_layers.py
finnkap Jun 24, 2024
10ff099
added function to release the model
L1W1 Jun 25, 2024
c109fa4
updated paths
juli-p Jun 25, 2024
314761f
backup of julis scripts
juli-p Jun 25, 2024
a6ff5a4
Merge branch 'feature/bmpc' into feature/bmpc-julian
juli-p Jun 25, 2024
a1c8b54
dlomix preprocessing pipeline fails if unknown token is observed
juli-p Jun 25, 2024
338f710
created valid noptm small dataset, continued implementing rl/tl pipeline
juli-p Jun 25, 2024
37741ac
continued development of agent-ready rl/tl script
juli-p Jun 26, 2024
f1cf131
added function to partially release the embedding layer
finnkap Jun 26, 2024
6a70042
improved rl_tl_training script
juli-p Jun 26, 2024
eaec0ea
transferred changes from feature/bmpc-julian
juli-p Jun 26, 2024
7ad925b
Merge branch 'feature/bmpc' into feature/bmpc-julian
juli-p Jun 26, 2024
5d71dc1
added Finn\'s changes back
juli-p Jun 26, 2024
401e28e
Merge branch 'feature/bmpc' into feature/bmpc-julian
juli-p Jun 26, 2024
a73703b
fixed issues with rl_tl_training
juli-p Jun 28, 2024
3068054
updates
juli-p Jun 28, 2024
962089b
updated config file for create_callback tutorial and updated create_c…
SylvieBaier Jun 28, 2024
1fe6ad2
Alphabet tokenizer
finnkap Jun 28, 2024
0451b92
fixed alphabet bug, modifed rl_tl configs
juli-p Jun 29, 2024
761443c
modified configs
juli-p Jul 3, 2024
51bf38c
removed scripts juli directory
juli-p Jul 3, 2024
ff2702c
Merge branch 'feature/bmpc-julian' into feature/bmpc
juli-p Jul 3, 2024
f520ea0
fixed layer naming issue
L1W1 Jul 5, 2024
084288e
Updated change_layers.py
finnkap Jul 9, 2024
930f74b
Added partial freezing to the regressor layer
finnkap Jul 18, 2024
8022813
simpler regressor access
L1W1 Jul 18, 2024
3d3af5f
integrated partial regressor freezing into training pipeline
juli-p Jul 18, 2024
ad6b861
Oktoberfest interface to process dataset and load model added
finnkap Jul 18, 2024
7fc37ab
Merge branch 'main' into feature/bmpc
finnkap Jul 18, 2024
0576f12
Updated PeptideDataset so a test ratio can be specified
finnkap Jul 18, 2024
2d51db5
fixed
L1W1 Jul 19, 2024
1f0af6b
Merge branch 'feature/bmpc-req-part-freeze' into feature/bmpc
L1W1 Jul 19, 2024
419485e
Added inference_only and ion_types attributes to the two datasets
finnkap Jul 19, 2024
9cff2a2
Added ion_type attribute to oktoberfest_interface.py
finnkap Jul 19, 2024
4a52b64
Make model available on git
finnkap Jul 22, 2024
06e0b77
Baseline model file, to support offline model predictions
finnkap Jul 22, 2024
e7366ef
Load model function now outside of process_dataset in oktoberfest_int…
finnkap Jul 23, 2024
dbdbd28
Changed process_dataset in oktoberfest_interface.py, so that it only …
finnkap Jul 23, 2024
4bc6e66
changes from messy branch
juli-p Jul 23, 2024
eea4388
changes from messy branch
juli-p Jul 23, 2024
2dc041d
Merge branch 'feature/bmpc-julian' into feature/bmpc
juli-p Jul 23, 2024
3f35e41
fix label_column problem on bmpc branch
finnkap Jul 24, 2024
6fd15cf
fixed validation issue on small datasets
juli-p Jul 25, 2024
f2efef1
Added batch_size parameter to oktoberfest_interface.py
finnkap Jul 27, 2024
b31cebd
moved automatic refinement/transfer learning pipeline into the dlomix…
juli-p Jul 27, 2024
136f3fc
fixed import in example notebook
juli-p Jul 27, 2024
0f4ca8c
added documentation for all externally used API endpoints
juli-p Jul 27, 2024
82a9d90
Added functionality to load in already splitted parquuet files in the…
finnkap Jul 28, 2024
9c41727
automatic RL/TL: support for training from scratch, improved Inflecti…
juli-p Jul 28, 2024
8c57ed5
Merge remote-tracking branch 'refs/remotes/origin/feature/bmpc' into …
finnkap Jul 30, 2024
d75bd0d
Fixed imports and added __init__.py to new modules
finnkap Jul 30, 2024
d853340
tried to improve phase 3
juli-p Aug 4, 2024
9a34944
Merge branch 'feature/bmpc' of github.com:wilhelm-lab/dlomix into fea…
juli-p Aug 4, 2024
76e8821
fixed bug with auto RL/TL
juli-p Aug 4, 2024
765fde7
transferred sylvies changes
juli-p Aug 4, 2024
791feef
Changed all Path parameters to strings
finnkap Aug 5, 2024
0e44d21
improved parameters
juli-p Aug 5, 2024
34433a0
merge
juli-p Aug 5, 2024
240b4e3
transferred sylvies changes from feature/bmpc-sylvie
juli-p Aug 5, 2024
7d8bcd1
Merge branch 'feature/bmpc-rltl-logging' into feature/bmpc
juli-p Aug 5, 2024
08e10d8
feat(oktoberfest_interface): add option to keep additional columns in…
JSchlensok Aug 5, 2024
ee8cf89
moved spectral angle analysis call to train method
juli-p Aug 6, 2024
6efd6a7
Merge branch 'feature/bmpc' of github.com:wilhelm-lab/dlomix into fea…
juli-p Aug 6, 2024
2d1190e
transferred sylvies changes.
juli-p Aug 6, 2024
837624f
Merge branch 'feature/bmpc-rltl-logging' into feature/bmpc
juli-p Aug 6, 2024
d43582a
moved report notebook into dlomix package
juli-p Aug 6, 2024
e9237aa
html report generation
juli-p Aug 6, 2024
d1f3c4d
fixed issue with missing test set
juli-p Aug 6, 2024
ec324e6
added overfitting early stopping
juli-p Aug 7, 2024
34f1d55
fixed issue with large datasets
juli-p Aug 8, 2024
bbd1865
Cleaned up all unnecessary scripts
finnkap Aug 8, 2024
7c3b398
Update automatic_rl_tl.py
juli-p Aug 8, 2024
52d3fc7
Update automatic_rl_tl.py
juli-p Aug 8, 2024
bc9b700
improved overfitting early stopping
juli-p Aug 9, 2024
ca14395
sylvies changes: report plots updated
juli-p Aug 9, 2024
66d6946
Merge branch 'feature/bmpc' into feature/automatic-refinement-transfe…
juli-p Aug 9, 2024
ab0e7a7
added missing dependency
juli-p Aug 11, 2024
5b39241
improved logging of validation loss
juli-p Aug 11, 2024
3cc25b7
Merge branch 'feature/bmpc' into feature/automatic-refinement-transfe…
juli-p Aug 11, 2024
64188c1
hopefully fix shape issue
finnkap Aug 13, 2024
c98c5d9
Update dataset.py
finnkap Aug 13, 2024
9fa11a7
revert changes
finnkap Aug 13, 2024
8bbc2e1
feat: logging instead of printing or throwing UserWarnings
JSchlensok Aug 13, 2024
a115143
revert back automatic_rl_tl.py from 9fa11a7
juli-p Aug 16, 2024
54d9cdb
supressed tf logging during validation in rl/tl pipeline
juli-p Aug 16, 2024
90d57a0
addded back logging (Julius)
juli-p Aug 16, 2024
63a9ee2
Merge branch 'feature/bmpc' into feature/automatic-refinement-transfe…
juli-p Aug 16, 2024
0caf05c
Fix shape issue, change_input_layer issue and CAPS column names
finnkap Aug 19, 2024
4fa3330
Merge feature/bmpc src/dlomix into feature/automatic-refinement-trans…
finnkap Aug 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added baseline_model/Prosit_baseline_model.keras
Binary file not shown.
106 changes: 106 additions & 0 deletions notebooks/Example_automatic_refinement_transfer_learning.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"# os.environ[\"CUDA_VISIBLE_DEVICES\"] = '-1'\n",
"os.environ['HF_HOME'] = '/cmnfs/proj/bmpc_dlomix/datasets'\n",
"os.environ['HF_DATASETS_CACHE'] = '/cmnfs/proj/bmpc_dlomix/datasets/hf_cache'\n",
"\n",
"num_proc = 16\n",
"os.environ[\"OMP_NUM_THREADS\"] = f\"{num_proc}\"\n",
"os.environ[\"TF_NUM_INTRAOP_THREADS\"] = f\"{num_proc}\"\n",
"os.environ[\"TF_NUM_INTEROP_THREADS\"] = f\"{num_proc}\"\n",
"\n",
"import tensorflow as tf\n",
"tf.config.threading.set_inter_op_parallelism_threads(num_proc)\n",
"tf.config.threading.set_intra_op_parallelism_threads(num_proc)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dlomix.data import load_processed_dataset\n",
"\n",
"dataset = load_processed_dataset('/cmnfs/proj/bmpc_dlomix/datasets/processed/ptm_baseline_small_cleaned_bs1024')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dlomix.models import PrositIntensityPredictor\n",
"from dlomix.losses import masked_spectral_distance, masked_pearson_correlation_distance\n",
"\n",
"model = tf.keras.models.load_model('/cmnfs/proj/bmpc_dlomix/models/baseline_models/noptm_baseline_full_bs1024_unmod_extended/7ef3360f-2349-46c0-a905-01187d4899e2.keras')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dlomix.refinement_transfer_learning.automatic_rl_tl import AutomaticRlTlTraining, AutomaticRlTlTrainingConfig\n",
"\n",
"config = AutomaticRlTlTrainingConfig(\n",
" dataset=dataset,\n",
" baseline_model=model,\n",
" use_wandb=True\n",
")\n",
"\n",
"trainer = AutomaticRlTlTraining(config)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"new_model = trainer.train()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
6 changes: 5 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
packages=setuptools.find_packages(where="src"),
package_dir={"": "src"},
include_package_data=True,
package_data={"": ["data/processing/pickled_feature_dicts/*"]},
package_data={"": ["data/processing/pickled_feature_dicts/*", "prosit_baseline_model.txt", "refinement_transfer_learning/user_report.ipynb"]},
install_requires=[
"datasets",
"fpdf",
Expand All @@ -45,6 +45,10 @@
"wandb": [
"wandb >= 0.15",
],
"rltl-report": [
"nbconvert",
"ipykernel"
]
},
classifiers=[
"Programming Language :: Python :: 3",
Expand Down
8 changes: 8 additions & 0 deletions src/dlomix/data/charge_state.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ def __init__(
sequence_column: str = "modified_sequence",
label_column: str = "most_abundant_charge_by_count",
val_ratio: float = 0.2,
test_ratio: float = 0.2,
advanced_splitting: bool = False,
max_seq_len: Union[int, str] = 30,
dataset_type: str = "tf",
batch_size: int = 256,
Expand All @@ -59,6 +61,8 @@ def __init__(
auto_cleanup_cache: bool = True,
num_proc: Optional[int] = None,
batch_processing_size: int = 1000,
inference_only: bool = False,
ion_types: Optional[List[str]] = None,
):
super().__init__(
data_source,
Expand All @@ -68,6 +72,8 @@ def __init__(
sequence_column,
label_column,
val_ratio,
test_ratio,
advanced_splitting,
max_seq_len,
dataset_type,
batch_size,
Expand All @@ -85,4 +91,6 @@ def __init__(
auto_cleanup_cache,
num_proc,
batch_processing_size,
inference_only,
ion_types,
)
Loading