Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time Series #978

Closed
codemzs opened this issue Sep 21, 2018 · 9 comments · Fixed by #977
Closed

Time Series #978

codemzs opened this issue Sep 21, 2018 · 9 comments · Fixed by #977
Assignees
Milestone

Comments

@codemzs
Copy link
Member

codemzs commented Sep 21, 2018

Time Series in ML.NET

  • Forecasting

  • Anomaly Detection

    • Spike Detector
      • Detects spikes in an independent and identical(IID) sequence using adaptive kernel density estimation
    • Change Point Detector
      • Detects the change-points in an independent and identical(IID) sequence using adaptive kernel density estimation and martingales
  • Smoothing transforms

    • Exponential Average Transform
      • Is a weighted average of the values: ExpAvg(yt) = a * yt + (1-a) * ExpAvg(yt-1).
    • Moving Average Transform
      • Applies a moving average on a time series
    • Percentile Threshold Transform
      • Is a sequential transform that decides whether the current value of the time-series belongs to the 'percentile' % of the top values in the sliding window. The output of the transform will be a boolean flag.
    • P Value Transform
      • Calculates the p-value of the current input in the sequence with regard to the values in the sliding window.

New Features to come

  • Estimator and PiGSTy APIs for the below components
    • Component Priority
      IidChangePointDetector 0
      IidSpikeDetector 0
      SsaChangePointDetector 0
      ExponentialAverageTransform 1
      MovingAverageTransform 1
      PercentileThresholdTransform 1
      PValueTransform 1
      SlidingWindowTransform 1

      Example:

      var data = new[] { new Data() { Feature = 2 }, new Data() { Feature = 1} }; 
      var dataView = ComponentCreation.CreateDataView(Env, data); 
      var pipe = new SpikeDetectorEstimator(Env, new[]{ 
      new SpikeDetectorTransformer.ColumnInfo("Feature", "Anomaly", twnd:500, swnd:50) 
       }); 
      var result = pipe.Fit(dataView).Transform(dataView); 
      var resultRoles = new RoleMappedData(result);
  • Prediction Engine
    • The prediction engine we have today is a stateless one. For time series it is important we update the state of the model as we make prediction in the case of SSA models that have temporal relation between data points. This will require creating a new variant of prediction engine that will be used by time series components listed above. Support time series anomaly algorithms #163
    • Currently to achieve stateful prediction engine users have to write the below code and then create checkpoints by saving the model every so often.
                 const int size = 10;
                  List<Data> data = new List<Data>(size);
                  var dataView = env.CreateStreamingDataView(data);
                  List<Data> tempData = new List<Data>();
                  for (int i = 0; i < size / 2; i++)
                      tempData.Add(new Data(5));
      
                  for (int i = 0; i < size / 2; i++)
                      tempData.Add(new Data((float)(5 + i * 1.1)));
      
                  foreach (var d in tempData)
                      data.Add(new Data(d.Value));
      
                  var args = new IidChangePointDetector.Arguments()
                  {
                      Confidence = 80,
                      Source = "Value",
                      Name = "Change",
                      ChangeHistoryLength = size,
                      Data = dataView
                  };
                 
                 //Train
                  var detector = TimeSeriesProcessing.IidChangePointDetector(env, args);
                  //Anomaly Detection
                  var output = detector.Model.Apply(env, dataView);
                  var enumerator = output.AsEnumerable<Prediction>(env, true).GetEnumerator();
                 //Save the updated model 
                 detector.Model.Save();
    • Instead we could have a below API that allows the user to save the model at fixed intervals to a disk or a stream while updating the model.
           using (var env = new LocalEnvironment(seed: 1, conc: 1))
            {
                // Pipeline
                var loader = TextLoader.ReadFile(env, MakeArgs(), new MultiFileSource(GetDataPath(TestDatasets.AnomalyDetection)));
                var cachedTrain = new CacheDataView(env, loader, prefetch: null);

                // Train.
                var IidDetector = new IidChangePointDetector(env, new IidChangePointDetector.Arguments
                {
                   Confidence = 80,
                    Source = "Features",
                    Name = "Change",
                    ChangeHistoryLength = size,
                    Data = dataView
                });
                var trainRoles = new RoleMappedData(cachedTrain, feature: "Features");
                var predictor = IidDetector.Train(new Runtime.TrainContext(trainRoles));

                PredictionEngine<Input, Prediction> model;
                using (var file = env.CreateTempFile())
                {
                    // Save model. 
                    var roles = new RoleMappedData(trans, feature: "Features");
                    using (var ch = env.Start("saving"))
                        TrainUtils.SaveModel(env, ch, file, predictor, roles);

                    // Load model.
                    using (var fs = file.OpenReadStream())
                        model = env.CreatePredictionEngine<Input, Prediction>(fs);
                }

                // Take a couple examples out of the test data and run predictions on top.
                var testLoader = TextLoader.ReadFile(env, MakeArgs(), new MultiFileSource(GetDataPath(TestDatasets.AnomalyDetection));
                var testData = testLoader.AsEnumerable<Input>(env, false);
                foreach (var input in testData.Take(10))
                {
                   //Anomaly detection + update the model.
                    var prediction = model.Predict(input);
                }                
               
               //Save the model with updated state.
                model.Save();
            }
  • Online Training
    Currently we have to train the model with the entire train dataset to update the model but instead it would be nice if the model got updated as the data came in. Support time series anomaly algorithms #163

  • Evaluator

    • Currently we don’t have any evaluator for time series. Rolling CV is better for time dependent datasets by always testing on data which is newer than the training data. Standard CV leaks future data in to the training set. Other names of Rolling CV include { walk-forward / roll-forward / rolling origin / window } CV. Refer to Rolling Cross-validation for Time-series #1026
  • ARIMA model
    It seems the first thing novice time series users look for in a toolkit when doing a forecasting task is ARMIA model because it is the first thing that comes up in search results for forecasting. While ARMIA model isn’t the most accurate or performant model but it is the most well-known forecasting model. We should consider bringing in a simple implementation of ARIMA in ML.NET. Time series and forecasting #929

  • Time Series Featurizer
    The more performant models are the one that combine the features from a time series transform with non-time series features and feed in the resulting vector into a black-box regression learning algorithm. For example, one could have two features A and B, where A will contain data points that have temporal relationship between them, example, stock price and B contains non-temporal feature like country or zip code. We could feed A into SSA transform that will extract various components from an individual feature value such as trend, level, seasonality and then repeat this for all the feature values of A and then combine the result vector with feature B that could be feed into a regressor for prediction. The feature extraction step could be SSA or it could be a deep learning model such as LSTM. The regressor could be any regression based learner. Time series and forecasting #929

... and many more with time.

@codemzs codemzs self-assigned this Sep 21, 2018
@codemzs codemzs added this to the 0918 milestone Sep 21, 2018
@shauheen shauheen modified the milestones: 0918, 1018 Sep 25, 2018
@codemzs codemzs reopened this Oct 2, 2018
@Tealons
Copy link

Tealons commented Oct 23, 2018

I'm a big data science n00b. Any chance that a simple tutorial will be provided? I'm aiming for something like this with my data:

image

But then better :)

@shauheen shauheen modified the milestones: 1018, 1118 Nov 7, 2018
@shauheen
Copy link
Contributor

@codemzs is there a PR associated with this, or this issue is closed?

@codemzs
Copy link
Member Author

codemzs commented Nov 28, 2018

@shauheen Yes, PR has been checked-in but you can see it at PR #1727 and sample at #1762

@shahinaJoomun
Copy link

shahinaJoomun commented Mar 25, 2019

Hello, can you please elaborate how you got p-values and martingale values for the DetectIidChangePoint function? If possible, can you give the algorithm for calculating these values? Thank you

@DevOnBike
Copy link

Hi,

why MovingAverageTransform class from Microsoft.Transforms.TimeSeries is not publicly accessible? I want to use it in my app? Whats the strategy behind it?

@codemzs
Copy link
Member Author

codemzs commented Nov 1, 2019

it does not have an API ready yet.

@DevOnBike
Copy link

Do u have any roadmap? When this api will be ready?

@unruledboy
Copy link

hi @codemzs

we have Spike Detector to find the sudden jump, but for any sudden drop, is Change Point Detector meant for this kind of situation?

Thanks.

@pjsgsy
Copy link

pjsgsy commented Oct 9, 2021

Whatever happened to the smoothing transforms! They would be incredibly useful.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants