BioFSharp


Regression: Price prediction

ML.NET version

API type

Status

App Type

Data type

Scenario

ML Task

Algorithms

v1.40

Dynamic API

Up-to-date

Console app

.csv files

Price prediction

Regression

Sdca Regression

In this introductory sample, you'll see how to use FSharpML on top of ML.NET to predict taxi fares. In the world of machine learning, this type of prediction is known as regression.

Problem

This problem is centered around predicting the fare of a taxi trip in New York City. At first glance, it may seem to depend simply on the distance traveled. However, taxi vendors in New York charge varying amounts for other factors such as additional passengers, paying with a credit card instead of cash and so on. This prediction can be used in application for taxi providers to give users and drivers an estimate on ride fares.

To solve this problem, we will build an ML model that takes as inputs: vendor ID rate code passenger count trip time trip distance payment type

and predicts the fare of the ride.

ML task - Regression

The generalized problem of regression is to predict some continuous value for given parameters, for example: predict a house prise based on number of rooms, location, year built, etc. predict a car fuel consumption based on fuel type and car parameters. * predict a time estimate for fixing an issue based on issue attributes.

The common feature for all those examples is that the parameter we want to predict can take any numeric value in certain range. In other words, this value is represented by integer or float/double, not by enum or boolean types.

Solution

To solve this problem, first we will build an ML model. Then we will train the model on existing data, evaluate how good it is, and lastly we'll consume the model to predict taxi fares.

  1. Build and train the model ----------------------------

FSharpML containing two complementary parts named EstimatorModel and TransformerModel covering the full machine lerarning workflow. In order to build an ML model and fit it to the training data we use EstimatorModel. The 'fit' function in EstimatorModel applied on training data results into the TransformerModel that represents the trained model able to transform other data of the same shape and is used int the second part to evaluate and consume the model.

Building a model includes: uploading data (taxi-fare-train.csv with TextLoader), transforming the data so it can be used effectively by an ML algorithm (StochasticDualCoordinateAscent in this case):

*

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
45: 
46: 
47: 
48: 
49: 
50: 
51: 
52: 
53: 
54: 
55: 
56: 
57: 
58: 
59: 
60: 
61: 
62: 
63: 
64: 
65: 
66: 
67: 
68: 
69: 
#load "FSharpML.fsx"


open System
open Microsoft.ML
open Microsoft.ML.Data
open FSharpML
open FSharpML.EstimatorModel
open FSharpML.TransformerModel
//open Microsoft.ML.Transforms.Normalizers
open FSharpML



type TaxiTrip = {
    [<LoadColumn(0)>] VendorId       : string
    [<LoadColumn(1)>] RateCode       : string
    [<LoadColumn(2)>] PassengerCount : float32
    [<LoadColumn(3)>] TripTime       : float32
    [<LoadColumn(4)>] TripDistance   : float32
    [<LoadColumn(5)>] PaymentType    : string
    [<LoadColumn(6)>] FareAmount     : float32
    }


//Create the MLContext to share across components for deterministic results
let mlContext = MLContext(seed = Nullable 1) // Seed set to any number so you
                                             // have a deterministic environment

// STEP 1: Common data loading configuration
let trainingData =     
    __SOURCE_DIRECTORY__  + "./data/taxi-fare-train.csv"
    |> DataModel.fromTextFileWith<TaxiTrip> mlContext ',' true
    //Sample code of removing extreme data like "outliers" for FareAmounts higher than $150 and lower than $1 which can be error-data
    |> DataModel.appendFilterByColumn "FareAmount"  1. 150.
    |> DataModel.toDataview

let testingData = 
    __SOURCE_DIRECTORY__  + "./data/taxi-fare-test.csv"
    |> DataModel.fromTextFileWith<TaxiTrip> mlContext ',' true
    |> DataModel.toDataview


// STEP 2: Common data process configuration with pipeline data transformations
let modelbuilding = 
    EstimatorModel.create mlContext
    |> EstimatorModel.Transforms.copyColumn "Label" "FareAmount"
    |> EstimatorModel.Transforms.map (fun tfc -> tfc.Categorical.OneHotEncoding( "VendorIdEncoded", "VendorId") )
    |> EstimatorModel.transformBy (fun tfc -> tfc.Categorical.OneHotEncoding( "RateCodeEncoded", "RateCode") )
    |> EstimatorModel.transformBy (fun tfc -> tfc.Categorical.OneHotEncoding( "PaymentTypeEncoded", "PaymentType") )
    |> EstimatorModel.transformBy (fun tfc -> tfc.NormalizeMeanVariance( "PassengerCount", "PassengerCount") )
    |> EstimatorModel.transformBy (fun tfc -> tfc.NormalizeMeanVariance( "TripTime", "TripTime") )
    |> EstimatorModel.transformBy (fun tfc -> tfc.NormalizeMeanVariance( "TripDistance", "TripDistance") )
    |> EstimatorModel.Transforms.concatenate DefaultColumnNames.Features
                [|"VendorIdEncoded"; "RateCodeEncoded";  "PaymentTypeEncoded";  "PassengerCount";  "TripTime";  "TripDistance"|]
    |> EstimatorModel.appendCacheCheckpoint
    
    // Set the training algorithm (SDCA Regression algorithm)  
    |> EstimatorModel.appendBy (fun mlc -> 
        mlc.Regression.Trainers.Sdca
            (
                labelColumnName = DefaultColumnNames.Label,
                featureColumnName = DefaultColumnNames.Features
                ) )                                 

// STEP 3: Train the model fitting to the DataSet
let model =
    modelbuilding
    |> EstimatorModel.fit trainingData                             
  1. Evaluate and consume the model ---------------------------------

TransformerModel is used to evaluate the model and make prediction on independant data.

*

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
45: 
// STEP 4: Evaluate the model and show accuracy stats
let predictions = 
    model
    |> TransformerModel.transform testingData

let metrics = 
   Evaluation.Regression.evaluate testingData model
   


//let y,yy =    
//    predictions  
//    |> Data.getColumn<float32> mlContext DefaultColumnNames.Score
//    |> Seq.take 100,
//    predictions  
//    |> Data.getColumn<float32> mlContext DefaultColumnNames.Label
//    |> Seq.take 100

//Chart.Point(y, yy)
//|> Chart.Show


// STEP 5: Consume the model and predict a taxifare sample
let taxiTripSample = 
    {
        VendorId = "VTS"
        RateCode = "1"
        PassengerCount = 1.0f
        TripTime = 1140.0f
        TripDistance = 3.75f
        PaymentType = "CRD"
        FareAmount = 0.0f // To predict. Actual/Observed = 15.5
    }

[<CLIMutable>]
type RegresionResult = {    
    Score : float32
}

// 
let predict = 
    TransformerModel.createPredictionEngine<_,TaxiTrip,RegresionResult> model


predict taxiTripSample
namespace FSharp
namespace FSharp.Plotly
namespace System
namespace Microsoft
namespace Microsoft.ML
namespace Microsoft.ML.Data
module FSharpML
namespace FSharpML.EstimatorModel
namespace FSharpML.TransformerModel
type TaxiTrip =
  { VendorId: obj
    RateCode: obj
    PassengerCount: obj
    TripTime: obj
    TripDistance: obj
    PaymentType: obj
    FareAmount: obj }
Multiple items
type LoadColumnAttribute =
  inherit Attribute
  new : fieldIndex:int -> LoadColumnAttribute + 2 overloads

--------------------
LoadColumnAttribute(fieldIndex: Microsoft.FSharp.Core.int) : LoadColumnAttribute
LoadColumnAttribute(columnIndexes: Microsoft.FSharp.Core.int Microsoft.FSharp.Core.[]) : LoadColumnAttribute
LoadColumnAttribute(start: Microsoft.FSharp.Core.int, end: Microsoft.FSharp.Core.int) : LoadColumnAttribute
TaxiTrip.VendorId: Microsoft.FSharp.Core.obj
TaxiTrip.RateCode: Microsoft.FSharp.Core.obj
TaxiTrip.PassengerCount: Microsoft.FSharp.Core.obj
TaxiTrip.TripTime: Microsoft.FSharp.Core.obj
TaxiTrip.TripDistance: Microsoft.FSharp.Core.obj
TaxiTrip.PaymentType: Microsoft.FSharp.Core.obj
TaxiTrip.FareAmount: Microsoft.FSharp.Core.obj
val mlContext : MLContext
Multiple items
type MLContext =
  new : ?seed:Nullable<int> -> MLContext
  member AnomalyDetection : AnomalyDetectionCatalog
  member BinaryClassification : BinaryClassificationCatalog
  member Clustering : ClusteringCatalog
  member ComponentCatalog : ComponentCatalog
  member Data : DataOperationsCatalog
  member Forecasting : ForecastingCatalog
  member Model : ModelOperationsCatalog
  member MulticlassClassification : MulticlassClassificationCatalog
  member Ranking : RankingCatalog
  ...

--------------------
MLContext(?seed: Nullable<Microsoft.FSharp.Core.int>) : MLContext
Multiple items
type Nullable =
  static member Compare<'T> : n1:Nullable<'T> * n2:Nullable<'T> -> int
  static member Equals<'T> : n1:Nullable<'T> * n2:Nullable<'T> -> bool
  static member GetUnderlyingType : nullableType:Type -> Type

--------------------
type Nullable<'T (requires default constructor and value type and 'T :> ValueType)> =
  struct
    new : value:'T -> Nullable<'T>
    member Equals : other:obj -> bool
    member GetHashCode : unit -> int
    member GetValueOrDefault : unit -> 'T + 1 overload
    member HasValue : bool
    member ToString : unit -> string
    member Value : 'T
  end

--------------------
Nullable ()
Nullable(value: 'T) : Nullable<'T>
val trainingData : IDataView
module DataModel

from FSharpML
val fromTextFileWith<'Trow> : mlc:MLContext -> separatorChar:Microsoft.FSharp.Core.char -> hasHeader:Microsoft.FSharp.Core.bool -> path:Microsoft.FSharp.Core.string -> DataModel.DataModel<Microsoft.FSharp.Core.obj>
val appendFilterByColumn : columnName:Microsoft.FSharp.Core.string -> lower:Microsoft.FSharp.Core.float -> upper:Microsoft.FSharp.Core.float -> dataModel:DataModel.DataModel<'a> -> DataModel.DataModel<'a>
val toDataview : dataModel:DataModel.DataModel<'a> -> IDataView
val testingData : '_arg3 (requires '_arg3 :> IDataView)
val modelbuilding : '_arg3 (requires member ( |> ) and 'a :> ITransformer and reference type and 'b :> ITransformer and reference type)
Multiple items
namespace FSharpML.EstimatorModel

--------------------
module EstimatorModel

from FSharpML.EstimatorModel

--------------------
type EstimatorModel<'a (requires 'a :> ITransformer and reference type)> =
  { EstimatorChain: EstimatorChain<'a>
    Context: MLContext }
val create : mlContext:MLContext -> EstimatorModel<'a> (requires reference type and 'a :> ITransformer)
module Transforms

from FSharpML.EstimatorModel
val copyColumn : sourceColumnName:Microsoft.FSharp.Core.string -> targetColumnName:Microsoft.FSharp.Core.string -> estimatorModel:EstimatorModel<'a> -> EstimatorModel<Transforms.ColumnCopyingTransformer> (requires 'a :> ITransformer and reference type)
val map : transformsCatalog:(TransformsCatalog -> #IEstimator<'b>) -> estimatorModel:EstimatorModel<'c> -> EstimatorModel<'b> (requires 'b :> ITransformer and reference type and 'c :> ITransformer and reference type)
val tfc : TransformsCatalog
property TransformsCatalog.Categorical: TransformsCatalog.CategoricalTransforms with get
(extension) TransformsCatalog.CategoricalTransforms.OneHotEncoding(columns: InputOutputColumnPair Microsoft.FSharp.Core.[],?outputKind: Transforms.OneHotEncodingEstimator.OutputKind,?maximumNumberOfKeys: Microsoft.FSharp.Core.int,?keyOrdinality: Transforms.ValueToKeyMappingEstimator.KeyOrdinality,?keyData: IDataView) : Transforms.OneHotEncodingEstimator
(extension) TransformsCatalog.CategoricalTransforms.OneHotEncoding(outputColumnName: Microsoft.FSharp.Core.string,?inputColumnName: Microsoft.FSharp.Core.string,?outputKind: Transforms.OneHotEncodingEstimator.OutputKind,?maximumNumberOfKeys: Microsoft.FSharp.Core.int,?keyOrdinality: Transforms.ValueToKeyMappingEstimator.KeyOrdinality,?keyData: IDataView) : Transforms.OneHotEncodingEstimator
val transformBy : transformsCatalog:(TransformsCatalog -> #IEstimator<'b>) -> estimatorModel:EstimatorModel<'c> -> EstimatorModel<'b> (requires 'b :> ITransformer and reference type and 'c :> ITransformer and reference type)
(extension) TransformsCatalog.NormalizeMeanVariance(columns: InputOutputColumnPair Microsoft.FSharp.Core.[],?maximumExampleCount: Microsoft.FSharp.Core.int64,?fixZero: Microsoft.FSharp.Core.bool,?useCdf: Microsoft.FSharp.Core.bool) : Transforms.NormalizingEstimator
(extension) TransformsCatalog.NormalizeMeanVariance(outputColumnName: Microsoft.FSharp.Core.string,?inputColumnName: Microsoft.FSharp.Core.string,?maximumExampleCount: Microsoft.FSharp.Core.int64,?fixZero: Microsoft.FSharp.Core.bool,?useCdf: Microsoft.FSharp.Core.bool) : Transforms.NormalizingEstimator
val concatenate : outputColumnName:Microsoft.FSharp.Core.string -> inputColumnNames:Microsoft.FSharp.Core.string Microsoft.FSharp.Core.[] -> estimatorModel:EstimatorModel<'a> -> EstimatorModel<ColumnConcatenatingTransformer> (requires 'a :> ITransformer and reference type)
module DefaultColumnNames

from FSharpML
val Features : Microsoft.FSharp.Core.string
val appendCacheCheckpoint : estimatorModel:EstimatorModel<'a> -> EstimatorModel<'a> (requires 'a :> ITransformer and reference type)
val appendBy : transforming:(MLContext -> #IEstimator<'c>) -> estimatorModel:EstimatorModel<'d> -> EstimatorModel<'c> (requires 'c :> ITransformer and reference type and 'd :> ITransformer and reference type)
val mlc : MLContext
property MLContext.Regression: RegressionCatalog with get
property RegressionCatalog.Trainers: RegressionCatalog.RegressionTrainers with get
(extension) RegressionCatalog.RegressionTrainers.Sdca(options: Trainers.SdcaRegressionTrainer.Options) : Trainers.SdcaRegressionTrainer
(extension) RegressionCatalog.RegressionTrainers.Sdca(?labelColumnName: Microsoft.FSharp.Core.string,?featureColumnName: Microsoft.FSharp.Core.string,?exampleWeightColumnName: Microsoft.FSharp.Core.string,?lossFunction: Trainers.ISupportSdcaRegressionLoss,?l2Regularization: Nullable<Microsoft.FSharp.Core.float32>,?l1Regularization: Nullable<Microsoft.FSharp.Core.float32>,?maximumNumberOfIterations: Nullable<Microsoft.FSharp.Core.int>) : Trainers.SdcaRegressionTrainer
val Label : Microsoft.FSharp.Core.string
val model : TransformerModel<'a> (requires 'a :> ITransformer and reference type)
val fit : data:IDataView -> estimatorModel:EstimatorModel<'a> -> TransformerModel<'a> (requires 'a :> ITransformer and reference type)
val predictions : '_arg3
Multiple items
namespace FSharpML.TransformerModel

--------------------
module TransformerModel

from FSharpML.TransformerModel

--------------------
type TransformerModel<'a (requires 'a :> ITransformer and reference type)> =
  { TransformerChain: TransformerChain<'a>
    Context: MLContext }
val transform : data:IDataView -> transformerModel:TransformerModel<'b> -> IDataView (requires 'b :> ITransformer and reference type)
val metrics : RegressionMetrics
module Evaluation

from FSharpML.TransformerModel
Multiple items
module Regression

from FSharpML.TransformerModel.Evaluation

--------------------
type Regression =
  static member evaluateWith : ?Label:string * ?Score:string -> (IDataView -> TransformerModel<'a0> -> RegressionMetrics) (requires 'a0 :> ITransformer and reference type)
val evaluate : data:IDataView -> transformerModel:TransformerModel<'a> -> RegressionMetrics (requires 'a :> ITransformer and reference type)
val taxiTripSample : TaxiTrip
type RegresionResult =
  { Score: obj }
RegresionResult.Score: Microsoft.FSharp.Core.obj
val predict : (TaxiTrip -> 'a)
val createPredictionEngine : transformerModel:TransformerModel<'a> -> ('input -> 'predictionResult) (requires 'a :> ITransformer and reference type and reference type and default constructor and reference type)
Fork me on GitHub