-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1208 from microsoft/kdd2020_tutorial_updated
Kdd2020 tutorial updated
- Loading branch information
Showing
25 changed files
with
5,913 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# Environment setup | ||
The following setup instructions assume users work in a Linux system. The testing was performed on a Ubuntu Linux system. | ||
We use Conda to install packages and manage the virtual environment. Type ``` conda list ``` to check if you have conda in your machine. If not, please follow the instructions on https://conda.io/projects/conda/en/latest/user-guide/install/linux.html to install either Miniconda or Anaconda (preferred) before we proceed. | ||
|
||
1. Clone the repository | ||
```bash | ||
git clone https://github.com/microsoft/recommenders | ||
``` | ||
|
||
1. Navigate to the tutorial folder. The materials for the tutorial are located under the directory of `recommenders/examples/07_tutorials/KDD2020-tutorial`. | ||
```bash | ||
cd recommenders/examples/07_tutorials/KDD2020-tutorial | ||
``` | ||
1. Download the dataset | ||
1. Download the dataset for hands on experiments and unzip to data_folder: | ||
```bash | ||
wget https://recodatasets.blob.core.windows.net/kdd2020/data_folder.zip | ||
unzip data_folder.zip -d data_folder | ||
``` | ||
After you unzip the file, there are two folders under data_folder, i.e. 'raw' and 'my_cached'. 'raw' folder contains original txt files from the COVID MAG dataset. 'my_cached' folder contains processed data files, if you miss some steps during the hands-on tutorial, you can make it up by copying corresponding files into experiment folders. | ||
1. Install the dependencies | ||
1. The model pre-training will use a tool for converting the original data into embeddings. Use of the tool will require `g++`. The following installs `g++` on a Linux system. | ||
```bash | ||
sudo apt-get install g++ | ||
``` | ||
1. The Python script will be run in a conda environment where the dependencies are installed. This can be done by using the `reco_gpu_kdd.yaml` file provided in the branch subfolder with the following commands. | ||
```bash | ||
conda env create -n kdd_tutorial_2020 -f reco_gpu_kdd.yaml | ||
conda activate kdd_tutorial_2020 | ||
``` | ||
1. The tutorial will be conducated by using the Jupyter notebooks. The newly created conda kernel can be registered with the Jupyter notebook server | ||
```bash | ||
python -m ipykernel install --user --name kdd_tutorial_2020 --display-name "Python (kdd tutorial)" | ||
``` | ||
|
||
# Tutorial notebooks/scripts | ||
After the setup, the users should be able to launch the notebooks locally with the command | ||
```bash | ||
jupyter notebook --port=8080 | ||
``` | ||
Then the notebook can be spinned off in a browser at the address of `localhost:8080`. | ||
Alternatively, if the jupyter notebook server is on a remote server, the users can launch the jupyter notebook by using the following command. | ||
```bash | ||
jupyter notebook --no-browser --ip=10.214.70.89 --port=8080 | ||
``` | ||
From the local browser, the notebook can be spinned off at the address of `10.214.70.89:8080`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
data: | ||
doc_size: 15 # Each feature length should be fixed at doc_size, if the number of words in document is more than doc_size, you should truncate the document to doc_size words, and if the number of words in document is less than doc_size, you should padding 0. | ||
his_size: 20 # Max number of user click history, we will automatically keep the last his_size number of user click history, if users' click history is more than his_size, and we will automatically padding 0 if less than his_size. | ||
word_size: 194755 # word vocabulary size | ||
entity_size: 57267 # entity vocabulary size | ||
data_format: dkn | ||
|
||
info: | ||
metrics: | ||
- auc | ||
pairwise_metrics: | ||
- group_auc | ||
- mean_mrr | ||
- ndcg@2;4;6 | ||
show_step: 10000 # print loss every show_step batches | ||
|
||
model: | ||
method : classification | ||
activation: | ||
- sigmoid | ||
attention_activation: relu | ||
attention_dropout: 0.0 | ||
attention_layer_sizes: 32 | ||
dim: 32 # word embedding dim | ||
use_entity: true # use entity embedding | ||
use_context: true # use context embedding | ||
|
||
entity_dim: 32 # entity embedding dim | ||
entity_embedding_method: TransE | ||
transform: true # add a transform layer for entity and context embeddings | ||
|
||
dropout: | ||
- 0.0 | ||
filter_sizes: # window size of kcnn filters | ||
- 1 | ||
- 2 | ||
- 3 | ||
layer_sizes: # layer size for final prediction score layer | ||
- 300 | ||
# model_type: DKN_without_context | ||
model_type: dkn | ||
num_filters: 50 # number of filter for each filter_size in kcnn part | ||
infer_model_name : epoch_2 | ||
|
||
train: | ||
batch_size: 100 | ||
embed_l1: 0.000 | ||
embed_l2: 0.000001 | ||
epochs: 50 | ||
init_method: uniform | ||
init_value: 0.01 | ||
layer_l1: 0.000 | ||
layer_l2: 0.000001 | ||
learning_rate: 0.00005 | ||
loss: log_loss | ||
optimizer: adam | ||
save_model: True | ||
save_epoch : 1 # save model every save_epoch epochs | ||
enable_BN : False | ||
is_clip_norm: False | ||
max_grad_norm: 0.5 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
#model | ||
model: | ||
model_type : "lightgcn" | ||
embed_size : 64 # the embedding dimension of users and items | ||
n_layers : 3 # number of layers of the model | ||
|
||
#train | ||
train: | ||
batch_size : 1024 | ||
decay : 0.0001 # l2 regularization for embedding parameters | ||
epochs : 1000 # number of epochs for training | ||
learning_rate : 0.001 | ||
eval_epoch : -1 # if it is not -1, evaluate the model every eval_epoch; -1 means that evaluation will not be performed during training | ||
top_k : 20 # number of items to recommend when calculating evaluation metrics | ||
|
||
#show info | ||
#metric : "recall", "ndcg", "precision", "map" | ||
info: | ||
save_model : True # whether to save model | ||
save_epoch : 1 # if save_model is set to True, save the model every save_epoch | ||
metrics : ["recall", "ndcg", "precision", "map"] # metrics for evaluation | ||
MODEL_DIR : ./tests/resources/deeprec/lightgcn/model/lightgcn_model/ # directory of saved models |
Oops, something went wrong.