FM

1. Introduction

The Factorization Machine (FM) is a matrix-based machine learning algorithm proposed by Steffen Rendle. It can predict any real-valued vector. Its main advantages include: 1) capable of highly sparse data scenarios; 2) linear computational complexity.

Factorization Model

Factorization Machine Model

where： is the dot product of two k-dimensional vectors:

Model parameters are： whereindicates that the feature i is represented by k factors, and k is the hyperparameter that determines the factorization。

Factorization Machines as Predictors

FM can be used for a series of predictive tasks, such as：

classification： can be used directly as a predictor, and the optimization criterion is to minimize the least square difference。
regression：can use the symbol of to do classification prediction，parameters are estimated by hinge function or logistic regression function at any time.。

2. FM on SONA

FM algorithm model The model of the FM algorithm consists of two parts, wide and embedding, where wide is a typical linear model. The final output is the sum of the two parts of wide and embedding.
FM training process Angel implements the gradient descent method to optimize, iteratively trains the FM model, and the logic on each iteration of the worker and PS is as follows：
- worker：Pull the wide and embedding matrices from the PS to the local for each iteration, calculate the corresponding gradient update value, push to PS
- PS：PS summarizes the gradient update values pushed by all workers, averages them, calculates and updates the new wide and embedding models through the optimizer.
FM prediction result：
- format：rowID,pred,prob,label
- caption：rowID indicates the row ID of the sample, starting from 0; pred: the predicted value of the sample; prob: the probability of the sample relative to the predicted result; label: the category into which the predicted sample is classified, when the predicted result value pred is greater than 0, Label is 1, less than 0 is -1

3. Running and Performance

data format support Libsvm and dummy two data formats, libsvm format is as follows:
```
1 1:1 214:1 233:1 234:1
```
dummy data format:
```
1 1 214 233 234
```

4. Submitting script

Several steps must be done before editing the submitting script and running.

confirm Hadoop and Spark have ready in your environment
unzip sona--bin.zip to local directory (SONA_HOME)
upload sona--bin directory to HDFS (SONA_HDFS_HOME)
Edit $SONA_HOME/bin/spark-on-angel-env.sh, set SPARK_HOME, SONA_HOME, SONA_HDFS_HOME and ANGEL_VERSION

Here's an example of submitting scripts, remember to adjust the parameters and fill in the paths according to your own task.

#test description
actionType=train or predict
jsonFile=path-to-jsons/fm.json
modelPath=path-to-save-model
predictPath=path-to-save-predict-results
input=path-to-data
queue=your-queue

HADOOP_HOME=my-hadoop-home
source ./bin/spark-on-angel-env.sh
export HADOOP_HOME=$HADOOP_HOME

$SPARK_HOME/bin/spark-submit \
  --master yarn-cluster \
  --conf spark.ps.jars=$SONA_ANGEL_JARS \
  --conf spark.ps.instances=10 \
  --conf spark.ps.cores=2 \
  --conf spark.ps.memory=10g \
  --jars $SONA_SPARK_JARS \
  --files $jsonFile \
  --driver-memory 20g \
  --num-executors 20 \
  --executor-cores 5 \
  --executor-memory 30g \
  --queue $queue \
  --class org.apache.spark.angel.examples.JsonRunnerExamples \
  ./lib/angelml-$SONA_VERSION.jar \
  jsonFile:./fm.json \
  dataFormat:libsvm \
  data:$input \
  modelPath:$modelPath \
  predictPath:$predictPath \
  actionType:$actionType \
  numBatch:500 \
  maxIter:2 \
  lr:4.0 \
  numField:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fm_sona_en.md

fm_sona_en.md

FM

1. Introduction

Factorization Model

Factorization Machines as Predictors

2. FM on SONA

3. Running and Performance

4. Submitting script

Files

fm_sona_en.md

Latest commit

History

fm_sona_en.md

File metadata and controls

FM

1. Introduction

Factorization Model

Factorization Machines as Predictors

2. FM on SONA

3. Running and Performance

4. Submitting script