Skip to content

Commit

Permalink
Merge branch 'staging' into jumin/update_o16n
Browse files Browse the repository at this point in the history
  • Loading branch information
loomlike committed Sep 20, 2020
2 parents d387202 + d49665d commit ed707e3
Show file tree
Hide file tree
Showing 21 changed files with 1,503 additions and 631 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ The table below lists the recommender algorithms currently available in the repo
| LightFM/Hybrid Matrix Factorization | [Python CPU](examples/02_model_hybrid/lightfm_deep_dive.ipynb) | Hybrid | Hybrid matrix factorization algorithm for both implicit and explicit feedbacks |
| LightGBM/Gradient Boosting Tree<sup>*</sup> | [Python CPU](examples/00_quick_start/lightgbm_tinycriteo.ipynb) / [PySpark](examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb) | Content-Based Filtering | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems |
| LightGCN | [Python CPU / Python GPU](examples/02_model_collaborative_filtering/lightgcn_deep_dive.ipynb) | Collaborative Filtering | Deep learning algorithm with simplifies the design of GCN for predicting implicit feedback |
| GeoIMC | [Python CPU](examples/00_quick_start/geoimc_movielens.ipynb) | Hybrid | Matrix completion algorithm that has into account user and item features using Riemannian conjugate gradients optimization and following a geometric approach. |
| GRU4Rec | [Python CPU / Python GPU](examples/00_quick_start/sequential_recsys_amazondataset.ipynb) | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks |
| Neural Recommendation with Long- and Short-term User Representations (LSTUR)<sup>*</sup> | [Python CPU / Python GPU](examples/00_quick_start/lstur_MIND.ipynb) | Content-Based Filtering | Neural recommendation algorithm with long- and short-term user interest modeling |
| Neural Recommendation with Attentive Multi-View Learning (NAML)<sup>*</sup> | [Python CPU / Python GPU](examples/00_quick_start/naml_MIND.ipynb) | Content-Based Filtering | Neural recommendation algorithm with attentive multi-view learning |
Expand Down Expand Up @@ -110,7 +111,7 @@ We provide a [benchmark notebook](examples/06_benchmarks/movielens.ipynb) to ill
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| [ALS](examples/00_quick_start/als_movielens.ipynb) | 0.004732 | 0.044239 | 0.048462 | 0.017796 | 0.965038 | 0.753001 | 0.255647 | 0.251648 |
| [SVD](examples/02_model_collaborative_filtering/surprise_svd_deep_dive.ipynb) | 0.012873 | 0.095930 | 0.091198 | 0.032783 | 0.938681 | 0.742690 | 0.291967 | 0.291971 |
| [SAR](examples/00_quick_start/sar_movielens.ipynb) | 0.113028 | 0.388321 | 0.333828 | 0.183179 | N/A | N/A | N/A | N/A |
| [SAR](examples/00_quick_start/sar_movielens.ipynb) | 0.110591 | 0.382461 | 0.330753 | 0.176385 | 1.253805 | 1.048484 | -0.569363 | 0.030474 |
| [NCF](examples/02_model_hybrid/ncf_deep_dive.ipynb) | 0.107720 | 0.396118 | 0.347296 | 0.180775 | N/A | N/A | N/A | N/A |
| [BPR](examples/02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb) | 0.105365 | 0.389948 | 0.349841 | 0.181807 | N/A | N/A | N/A | N/A |
| [FastAI](examples/00_quick_start/fastai_movielens.ipynb) | 0.025503 | 0.147866 | 0.130329 | 0.053824 | 0.943084 | 0.744337 | 0.285308 | 0.287671 |
Expand Down
32 changes: 16 additions & 16 deletions SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,13 +51,13 @@ conda update anaconda # use 'conda install anaconda' if the package is no
We provide a script, [generate_conda_file.py](tools/generate_conda_file.py), to generate a conda-environment yaml file
which you can use to create the target environment using the Python version 3.6 with all the correct dependencies.

**NOTE** the `xlearn` package has dependency on `cmake`. If one uses the `xlearn` related notebooks or scripts, make sure `cmake` is installed in the system. The easiest way to install on Linux is with apt-get: `sudo apt-get install -y build-essential cmake`. Detailed instructions for installing `cmake` from source can be found [here](https://cmake.org/install/).
**NOTE** the `xlearn` package has dependency on `cmake`. If one uses the `xlearn` related notebooks or scripts, make sure `cmake` is installed in the system. The easiest way to install on Linux is with apt-get: `sudo apt-get install -y build-essential cmake`. Detailed instructions for installing `cmake` from source can be found [here](https://cmake.org/install/).

Assuming the repo is cloned as `Recommenders` in the local system, to install **a default (Python CPU) environment**:

cd Recommenders
python tools/generate_conda_file.py
conda env create -f reco_base.yaml
conda env create -f reco_base.yaml

You can specify the environment name as well with the flag `-n`.

Expand All @@ -70,7 +70,7 @@ Assuming that you have a GPU machine, to install the Python GPU environment:

cd Recommenders
python tools/generate_conda_file.py --gpu
conda env create -f reco_gpu.yaml
conda env create -f reco_gpu.yaml

</details>

Expand All @@ -85,7 +85,7 @@ To install the PySpark environment:

> Additionally, if you want to test a particular version of spark, you may pass the --pyspark-version argument:
>
> python tools/generate_conda_file.py --pyspark-version 2.4.0
> python tools/generate_conda_file.py --pyspark-version 2.4.5
Then, we need to set the environment variables `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` to point to the conda python executable.

Expand All @@ -94,29 +94,29 @@ Click on the following menus to see details:
<summary><strong><em>Set PySpark environment variables on Linux or MacOS</em></strong></summary>

To set these variables every time the environment is activated, we can follow the steps of this [guide](https://conda.io/docs/user-guide/tasks/manage-environments.html#macos-and-linux).

First, get the path of the environment `reco_pyspark` is installed:

RECO_ENV=$(conda env list | grep reco_pyspark | awk '{print $NF}')
mkdir -p $RECO_ENV/etc/conda/activate.d
mkdir -p $RECO_ENV/etc/conda/deactivate.d

You also need to find where Spark is installed and set `SPARK_HOME` variable, on the DSVM, `SPARK_HOME=/dsvm/tools/spark/current`.

Then, create the file `$RECO_ENV/etc/conda/activate.d/env_vars.sh` and add:

#!/bin/sh
RECO_ENV=$(conda env list | grep reco_pyspark | awk '{print $NF}')
export PYSPARK_PYTHON=$RECO_ENV/bin/python
export PYSPARK_DRIVER_PYTHON=$RECO_ENV/bin/python
export SPARK_HOME_BACKUP=$SPARK_HOME
unset SPARK_HOME
export SPARK_HOME=/dsvm/tools/spark/current

This will export the variables every time we do `conda activate reco_pyspark`.
To unset these variables when we deactivate the environment, create the file `$RECO_ENV/etc/conda/deactivate.d/env_vars.sh` and add:
This will export the variables every time we do `conda activate reco_pyspark`. To unset these variables when we deactivate the environment, create the file `$RECO_ENV/etc/conda/deactivate.d/env_vars.sh` and add:

#!/bin/sh
unset PYSPARK_PYTHON
unset PYSPARK_DRIVER_PYTHON
export SPARK_HOME=$SPARK_HOME_BACKUP
unset SPARK_HOME_BACKUP


</details>

Expand All @@ -128,7 +128,7 @@ First, get the path of the environment `reco_pyspark` is installed:
for /f "delims=" %A in ('conda env list ^| grep reco_pyspark ^| awk "{print $NF}"') do set "RECO_ENV=%A"

Then, create the file `%RECO_ENV%\etc\conda\activate.d\env_vars.bat` and add:

@echo off
for /f "delims=" %%A in ('conda env list ^| grep reco_pyspark ^| awk "{print $NF}"') do set "RECO_ENV=%%A"
set PYSPARK_PYTHON=%RECO_ENV%\python.exe
Expand All @@ -149,7 +149,7 @@ create the file `%RECO_ENV%\etc\conda\deactivate.d\env_vars.bat` and add:
set SPARK_HOME_BACKUP=
set PYTHONPATH=%PYTHONPATH_BACKUP%
set PYTHONPATH_BACKUP=

</details>

</details>
Expand All @@ -176,7 +176,7 @@ We can register our created conda environment to appear as a kernel in the Jupyt

conda activate my_env_name
python -m ipykernel install --user --name my_env_name --display-name "Python (my_env_name)"

If you are using the DSVM, you can [connect to JupyterHub](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-ubuntu-intro#jupyterhub-and-jupyterlab) by browsing to `https://your-vm-ip:8000`.

### Troubleshooting for the DSVM
Expand Down Expand Up @@ -204,7 +204,7 @@ sudo update-alternatives --config java

### Requirements of Azure Databricks

* Databricks Runtime version 4.3 (Apache Spark 2.3.1, Scala 2.11) or greater
* Databricks Runtime version >= 4.3 (Apache Spark 2.3.1, Scala 2.11) and <= 5.5 (Apache Spark 2.4.3, Scala 2.11)
* Python 3

An example of how to create an Azure Databricks workspace and an Apache Spark cluster within the workspace can be found from [here](https://docs.microsoft.com/en-us/azure/azure-databricks/quickstart-create-databricks-workspace-portal). To utilize deep learning models and GPUs, you may setup GPU-enabled cluster. For more details about this topic, please see [Azure Databricks deep learning guide](https://docs.azuredatabricks.net/applications/deep-learning/index.html).
Expand Down Expand Up @@ -242,7 +242,7 @@ The installation script has a number of options that can also deal with differen
python tools/databricks_install.py -h
```
Once you have confirmed the databricks cluster is *RUNNING*, install the modules within this repository with the following commands.
Once you have confirmed the databricks cluster is *RUNNING*, install the modules within this repository with the following commands.

```{shell}
cd Recommenders
Expand Down Expand Up @@ -339,7 +339,7 @@ Additionally, you must install the [spark-cosmosdb connector](https://docs.datab

## Install the utilities via PIP

A [setup.py](setup.py) file is provided in order to simplify the installation of the utilities in this repo from the main directory.
A [setup.py](setup.py) file is provided in order to simplify the installation of the utilities in this repo from the main directory.

This still requires the conda environment to be installed as described above. Once the necessary dependencies are installed, you can use the following command to install `reco_utils` as a python package.

Expand Down
2 changes: 1 addition & 1 deletion contrib/sarplus/python/tests/test_pyspark_sar.py
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,7 @@ def test_sar_item_similarity(
.reset_index(drop=True)
)

if similarity_type is "cooccurrence":
if similarity_type == "cooccurrence":
assert (item_similarity_ref == item_similarity).all().all()
else:
assert (
Expand Down
Loading

0 comments on commit ed707e3

Please sign in to comment.