# LearnALCLengths This repository contains our implementation of concept length predictors in the ALC description logic. ## Installation - Clone this repository: ``` https://github.com/dice-group/LearnALCLengths.git ``` - Install Anaconda3, then all required librairies by executing the following commands (Linux): 1. ```conda create -n clip python==3.11.5 && conda activate clip ``` 2. ```pip install -r requirements.txt ``` 3. ```git clone https://github.com/dice-group/Ontolearn.git && cd Ontolearn && git checkout 0.5.4 && pip install -e .``` - Download DL-Learner-1.4.0 from [github](https://github.com/SmartDataAnalytics/DL-Learner/releases) and extract it into this repository (cloned above) - Clone DLFoil and DLFocl [dlfoil](https://bitbucket.org/grizzo001/dl-foil.git), [dlfocl](https://bitbucket.org/grizzo001/dlfocl.git), and extract the two repositories into `LearnALCLengths/` - Install Java (version 8+) and Apache Maven (Only necessary for running DL-Learner and DL-Foil/DL-Focl) ## Reproducing the reported results ### Datasets (necessary for running the algorithms) - Download [datasets](https://files.dice-research.org/archive/CLIP/) and extract the zip file into `LearnALCLengths/` and rename the folder as Datasets ### CLIP (our method) *Open a terminal and navigate into /reproduce_results/ ``` cd LearnALCLengths/reproduce_results/``` - Reproduce CLIP concept learning results on all KBs ``` sh reproduce_celoe_clp_experiment_all_kbs.sh``` - Reproduce the training of concept length predictors ``` sh reproduce_training_clp_on_all_kbs.sh``` - Furthermore, one can train concept length predictors on a single knowledge base as follows ``` python reproduce_training_length_predictors_K_kb.py```, where ```K``` is one of carcinogenesis, mutagenesis, semantic_bible or vicodi. Use -h to see more training options (example ```python reproduce_training_length_predictors_carcinogenesis_kb.py -h ```). ### CELOE, ELTL, OCEL from DL-Learner *Open a terminal and navigate into /other_learning_systems/scripts ``` cd LearnALCLengths/dllearner/scripts``` - Reproduce concept learning results on knowledge base K for algorithm Algo ``` python reproduce_dllearner_experiment.py --learning_systems Algo --knowledge_bases K``` - To reproduce the results for multiple algorithms on multiple knowledge bases, use the schema ``` python reproduce_dllearner_experiment.py --learning_systems Algo1 Algo2... --knowledge_bases K1 K2...``` Note that ```Algo``` is one of celoe, ocel or eltl, and ```K``` is one of carcinogenesis, mutagenesis, semantic_bible or vicodi (all lower cased) ### DLFoil and DLFocl *For DLFoil, open a terminal and navigate into /dl-foil/DLFoil2* ``` cd LearnALCLengths/dl-foil/DLFoil2``` - Run ```mvn clean install``` - Open a different terminal and run the following ```python LearnALCLengths/generators/generate_dlfoil_config_all_kbs.py``` - Now execute the following in the first terminal (in LearnALCLengths/dl-foil/DLFoil2): ```mvn -e exec:java -Dexec.mainClass=it.uniba.di.lacam.ml.DLFoilTest -Dexec.args=K_config.xml >> ../dlfoil_out_K.txt```, where `K` is one of carcinogenesis, mutagenesis, semantic_bible or vicodi. Note that DLFoil fails to solve our learning problems as it gets stuck on the refinement of certain partial descriptions. *We could not run DLFocl.* The authors did not provide sufficient documentation to run their algorithm; the documentation is [here](https://bitbucket.org/grizzo001/dlfocl.git) ### Statistical Test *Open a terminal and navigate into /reproduce_results/* ``` cd LearnALCLengths/reproduce_results/``` - Run Wilcoxon statistical test on concept learning results `All Algos vs CLIP`: ``` sh run_statistical_test_on_all_kbs.sh``` ### Use your own data - Add your data into Datasets: it should be a folder containing a file formatted as RDF/XML or OWL/XML and should have the same name as the folder. - Navigate into /generators and run ```python train_data/generate_training_data.py --kb your_folder_name```, use -h to see more options. The generated file Data.json under ```your_folder_name/Train_data/``` should serve for training concept length predictors, see example scripts in ```/reproduce_results/train_clp/```. - Similarly, learning problems can be generated using one of the example files in generators/learning_problems/ (replace folder names by your folder name) - Navigate into /Embeddings/Compute-Embeddings/ and run the following to embed your knowledge base: ```python run_script.py --path_dataset_folder your_folder_name``` - Train concept length predictors by preparing and running your python file ``` reproduce_training_length_predictors_K_kb.py ``` following examples in ```/reproduce_results/train_clp/```. - Finally, prepare a script (see examples in ```/reproduce_results/celoe_clp/```) and run CLIP on your data. ## Acknowledgement We based our implementation on the open source implementation of [ontolearn](https://docs--ontolearn-docs-dice-group.netlify.app/). We would like to thank the Ontolearn team for the readable codebase.