v1.40 |
Dynamic API |
Up-to-date |
Console app |
.csv files |
Price prediction |
Regression |
Sdca Regression |
In this introductory sample, you'll see how to use FSharpML on top of ML.NET to predict taxi fares. In the world of machine learning, this type of prediction is known as regression.
This problem is centered around predicting the fare of a taxi trip in New York City. At first glance, it may seem to depend simply on the distance traveled. However, taxi vendors in New York charge varying amounts for other factors such as additional passengers, paying with a credit card instead of cash and so on. This prediction can be used in application for taxi providers to give users and drivers an estimate on ride fares.
To solve this problem, we will build an ML model that takes as inputs:
vendor ID
rate code
passenger count
trip time
trip distance
payment type
and predicts the fare of the ride.
The generalized problem of regression is to predict some continuous value for given parameters, for example:
predict a house prise based on number of rooms, location, year built, etc.
predict a car fuel consumption based on fuel type and car parameters.
* predict a time estimate for fixing an issue based on issue attributes.
The common feature for all those examples is that the parameter we want to predict can take any numeric value in certain range. In other words, this value is represented by integer
or float
/double
, not by enum
or boolean
types.
To solve this problem, first we will build an ML model. Then we will train the model on existing data, evaluate how good it is, and lastly we'll consume the model to predict taxi fares.
-
Build and train the model
----------------------------
FSharpML containing two complementary parts named EstimatorModel and TransformerModel covering the full machine lerarning workflow. In order to build an ML model and fit it to the training data we use EstimatorModel.
The 'fit' function in EstimatorModel applied on training data results into the TransformerModel that represents the trained model able to transform other data of the same shape and is used int the second part to evaluate and consume the model.
Building a model includes: uploading data (taxi-fare-train.csv
with TextLoader
), transforming the data so it can be used effectively by an ML algorithm (StochasticDualCoordinateAscent
in this case):
*
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64:
65:
66:
67:
68:
69:
|
#load "FSharpML.fsx"
open System
open Microsoft.ML
open Microsoft.ML.Data
open FSharpML
open FSharpML.EstimatorModel
open FSharpML.TransformerModel
//open Microsoft.ML.Transforms.Normalizers
open FSharpML
type TaxiTrip = {
[<LoadColumn(0)>] VendorId : string
[<LoadColumn(1)>] RateCode : string
[<LoadColumn(2)>] PassengerCount : float32
[<LoadColumn(3)>] TripTime : float32
[<LoadColumn(4)>] TripDistance : float32
[<LoadColumn(5)>] PaymentType : string
[<LoadColumn(6)>] FareAmount : float32
}
//Create the MLContext to share across components for deterministic results
let mlContext = MLContext(seed = Nullable 1) // Seed set to any number so you
// have a deterministic environment
// STEP 1: Common data loading configuration
let trainingData =
__SOURCE_DIRECTORY__ + "./data/taxi-fare-train.csv"
|> DataModel.fromTextFileWith<TaxiTrip> mlContext ',' true
//Sample code of removing extreme data like "outliers" for FareAmounts higher than $150 and lower than $1 which can be error-data
|> DataModel.appendFilterByColumn "FareAmount" 1. 150.
|> DataModel.toDataview
let testingData =
__SOURCE_DIRECTORY__ + "./data/taxi-fare-test.csv"
|> DataModel.fromTextFileWith<TaxiTrip> mlContext ',' true
|> DataModel.toDataview
// STEP 2: Common data process configuration with pipeline data transformations
let modelbuilding =
EstimatorModel.create mlContext
|> EstimatorModel.Transforms.copyColumn "Label" "FareAmount"
|> EstimatorModel.Transforms.map (fun tfc -> tfc.Categorical.OneHotEncoding( "VendorIdEncoded", "VendorId") )
|> EstimatorModel.transformBy (fun tfc -> tfc.Categorical.OneHotEncoding( "RateCodeEncoded", "RateCode") )
|> EstimatorModel.transformBy (fun tfc -> tfc.Categorical.OneHotEncoding( "PaymentTypeEncoded", "PaymentType") )
|> EstimatorModel.transformBy (fun tfc -> tfc.NormalizeMeanVariance( "PassengerCount", "PassengerCount") )
|> EstimatorModel.transformBy (fun tfc -> tfc.NormalizeMeanVariance( "TripTime", "TripTime") )
|> EstimatorModel.transformBy (fun tfc -> tfc.NormalizeMeanVariance( "TripDistance", "TripDistance") )
|> EstimatorModel.Transforms.concatenate DefaultColumnNames.Features
[|"VendorIdEncoded"; "RateCodeEncoded"; "PaymentTypeEncoded"; "PassengerCount"; "TripTime"; "TripDistance"|]
|> EstimatorModel.appendCacheCheckpoint
// Set the training algorithm (SDCA Regression algorithm)
|> EstimatorModel.appendBy (fun mlc ->
mlc.Regression.Trainers.Sdca
(
labelColumnName = DefaultColumnNames.Label,
featureColumnName = DefaultColumnNames.Features
) )
// STEP 3: Train the model fitting to the DataSet
let model =
modelbuilding
|> EstimatorModel.fit trainingData
|
-
Evaluate and consume the model
---------------------------------
TransformerModel is used to evaluate the model and make prediction on independant data.
*
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
|
// STEP 4: Evaluate the model and show accuracy stats
let predictions =
model
|> TransformerModel.transform testingData
let metrics =
Evaluation.Regression.evaluate testingData model
//let y,yy =
// predictions
// |> Data.getColumn<float32> mlContext DefaultColumnNames.Score
// |> Seq.take 100,
// predictions
// |> Data.getColumn<float32> mlContext DefaultColumnNames.Label
// |> Seq.take 100
//Chart.Point(y, yy)
//|> Chart.Show
// STEP 5: Consume the model and predict a taxifare sample
let taxiTripSample =
{
VendorId = "VTS"
RateCode = "1"
PassengerCount = 1.0f
TripTime = 1140.0f
TripDistance = 3.75f
PaymentType = "CRD"
FareAmount = 0.0f // To predict. Actual/Observed = 15.5
}
[<CLIMutable>]
type RegresionResult = {
Score : float32
}
//
let predict =
TransformerModel.createPredictionEngine<_,TaxiTrip,RegresionResult> model
predict taxiTripSample
|
namespace FSharp
namespace FSharp.Plotly
namespace System
namespace Microsoft
namespace Microsoft.ML
namespace Microsoft.ML.Data
module FSharpML
namespace FSharpML.EstimatorModel
namespace FSharpML.TransformerModel
type TaxiTrip =
{ VendorId: obj
RateCode: obj
PassengerCount: obj
TripTime: obj
TripDistance: obj
PaymentType: obj
FareAmount: obj }
Multiple items
type LoadColumnAttribute =
inherit Attribute
new : fieldIndex:int -> LoadColumnAttribute + 2 overloads
--------------------
LoadColumnAttribute(fieldIndex: Microsoft.FSharp.Core.int) : LoadColumnAttribute
LoadColumnAttribute(columnIndexes: Microsoft.FSharp.Core.int Microsoft.FSharp.Core.[]) : LoadColumnAttribute
LoadColumnAttribute(start: Microsoft.FSharp.Core.int, end: Microsoft.FSharp.Core.int) : LoadColumnAttribute
TaxiTrip.VendorId: Microsoft.FSharp.Core.obj
TaxiTrip.RateCode: Microsoft.FSharp.Core.obj
TaxiTrip.PassengerCount: Microsoft.FSharp.Core.obj
TaxiTrip.TripTime: Microsoft.FSharp.Core.obj
TaxiTrip.TripDistance: Microsoft.FSharp.Core.obj
TaxiTrip.PaymentType: Microsoft.FSharp.Core.obj
TaxiTrip.FareAmount: Microsoft.FSharp.Core.obj
val mlContext : MLContext
Multiple items
type MLContext =
new : ?seed:Nullable<int> -> MLContext
member AnomalyDetection : AnomalyDetectionCatalog
member BinaryClassification : BinaryClassificationCatalog
member Clustering : ClusteringCatalog
member ComponentCatalog : ComponentCatalog
member Data : DataOperationsCatalog
member Forecasting : ForecastingCatalog
member Model : ModelOperationsCatalog
member MulticlassClassification : MulticlassClassificationCatalog
member Ranking : RankingCatalog
...
--------------------
MLContext(?seed: Nullable<Microsoft.FSharp.Core.int>) : MLContext
Multiple items
type Nullable =
static member Compare<'T> : n1:Nullable<'T> * n2:Nullable<'T> -> int
static member Equals<'T> : n1:Nullable<'T> * n2:Nullable<'T> -> bool
static member GetUnderlyingType : nullableType:Type -> Type
--------------------
type Nullable<'T (requires default constructor and value type and 'T :> ValueType)> =
struct
new : value:'T -> Nullable<'T>
member Equals : other:obj -> bool
member GetHashCode : unit -> int
member GetValueOrDefault : unit -> 'T + 1 overload
member HasValue : bool
member ToString : unit -> string
member Value : 'T
end
--------------------
Nullable ()
Nullable(value: 'T) : Nullable<'T>
val trainingData : IDataView
module DataModel
from FSharpML
val fromTextFileWith<'Trow> : mlc:MLContext -> separatorChar:Microsoft.FSharp.Core.char -> hasHeader:Microsoft.FSharp.Core.bool -> path:Microsoft.FSharp.Core.string -> DataModel.DataModel<Microsoft.FSharp.Core.obj>
val appendFilterByColumn : columnName:Microsoft.FSharp.Core.string -> lower:Microsoft.FSharp.Core.float -> upper:Microsoft.FSharp.Core.float -> dataModel:DataModel.DataModel<'a> -> DataModel.DataModel<'a>
val toDataview : dataModel:DataModel.DataModel<'a> -> IDataView
val testingData : '_arg3 (requires '_arg3 :> IDataView)
val modelbuilding : '_arg3 (requires member ( |> ) and 'a :> ITransformer and reference type and 'b :> ITransformer and reference type)
Multiple items
namespace FSharpML.EstimatorModel
--------------------
module EstimatorModel
from FSharpML.EstimatorModel
--------------------
type EstimatorModel<'a (requires 'a :> ITransformer and reference type)> =
{ EstimatorChain: EstimatorChain<'a>
Context: MLContext }
val create : mlContext:MLContext -> EstimatorModel<'a> (requires reference type and 'a :> ITransformer)
module Transforms
from FSharpML.EstimatorModel
val copyColumn : sourceColumnName:Microsoft.FSharp.Core.string -> targetColumnName:Microsoft.FSharp.Core.string -> estimatorModel:EstimatorModel<'a> -> EstimatorModel<Transforms.ColumnCopyingTransformer> (requires 'a :> ITransformer and reference type)
val map : transformsCatalog:(TransformsCatalog -> #IEstimator<'b>) -> estimatorModel:EstimatorModel<'c> -> EstimatorModel<'b> (requires 'b :> ITransformer and reference type and 'c :> ITransformer and reference type)
val tfc : TransformsCatalog
property TransformsCatalog.Categorical: TransformsCatalog.CategoricalTransforms with get
(extension) TransformsCatalog.CategoricalTransforms.OneHotEncoding(columns: InputOutputColumnPair Microsoft.FSharp.Core.[],?outputKind: Transforms.OneHotEncodingEstimator.OutputKind,?maximumNumberOfKeys: Microsoft.FSharp.Core.int,?keyOrdinality: Transforms.ValueToKeyMappingEstimator.KeyOrdinality,?keyData: IDataView) : Transforms.OneHotEncodingEstimator
(extension) TransformsCatalog.CategoricalTransforms.OneHotEncoding(outputColumnName: Microsoft.FSharp.Core.string,?inputColumnName: Microsoft.FSharp.Core.string,?outputKind: Transforms.OneHotEncodingEstimator.OutputKind,?maximumNumberOfKeys: Microsoft.FSharp.Core.int,?keyOrdinality: Transforms.ValueToKeyMappingEstimator.KeyOrdinality,?keyData: IDataView) : Transforms.OneHotEncodingEstimator
val transformBy : transformsCatalog:(TransformsCatalog -> #IEstimator<'b>) -> estimatorModel:EstimatorModel<'c> -> EstimatorModel<'b> (requires 'b :> ITransformer and reference type and 'c :> ITransformer and reference type)
(extension) TransformsCatalog.NormalizeMeanVariance(columns: InputOutputColumnPair Microsoft.FSharp.Core.[],?maximumExampleCount: Microsoft.FSharp.Core.int64,?fixZero: Microsoft.FSharp.Core.bool,?useCdf: Microsoft.FSharp.Core.bool) : Transforms.NormalizingEstimator
(extension) TransformsCatalog.NormalizeMeanVariance(outputColumnName: Microsoft.FSharp.Core.string,?inputColumnName: Microsoft.FSharp.Core.string,?maximumExampleCount: Microsoft.FSharp.Core.int64,?fixZero: Microsoft.FSharp.Core.bool,?useCdf: Microsoft.FSharp.Core.bool) : Transforms.NormalizingEstimator
val concatenate : outputColumnName:Microsoft.FSharp.Core.string -> inputColumnNames:Microsoft.FSharp.Core.string Microsoft.FSharp.Core.[] -> estimatorModel:EstimatorModel<'a> -> EstimatorModel<ColumnConcatenatingTransformer> (requires 'a :> ITransformer and reference type)
module DefaultColumnNames
from FSharpML
val Features : Microsoft.FSharp.Core.string
val appendCacheCheckpoint : estimatorModel:EstimatorModel<'a> -> EstimatorModel<'a> (requires 'a :> ITransformer and reference type)
val appendBy : transforming:(MLContext -> #IEstimator<'c>) -> estimatorModel:EstimatorModel<'d> -> EstimatorModel<'c> (requires 'c :> ITransformer and reference type and 'd :> ITransformer and reference type)
val mlc : MLContext
property MLContext.Regression: RegressionCatalog with get
property RegressionCatalog.Trainers: RegressionCatalog.RegressionTrainers with get
(extension) RegressionCatalog.RegressionTrainers.Sdca(options: Trainers.SdcaRegressionTrainer.Options) : Trainers.SdcaRegressionTrainer
(extension) RegressionCatalog.RegressionTrainers.Sdca(?labelColumnName: Microsoft.FSharp.Core.string,?featureColumnName: Microsoft.FSharp.Core.string,?exampleWeightColumnName: Microsoft.FSharp.Core.string,?lossFunction: Trainers.ISupportSdcaRegressionLoss,?l2Regularization: Nullable<Microsoft.FSharp.Core.float32>,?l1Regularization: Nullable<Microsoft.FSharp.Core.float32>,?maximumNumberOfIterations: Nullable<Microsoft.FSharp.Core.int>) : Trainers.SdcaRegressionTrainer
val Label : Microsoft.FSharp.Core.string
val model : TransformerModel<'a> (requires 'a :> ITransformer and reference type)
val fit : data:IDataView -> estimatorModel:EstimatorModel<'a> -> TransformerModel<'a> (requires 'a :> ITransformer and reference type)
val predictions : '_arg3
Multiple items
namespace FSharpML.TransformerModel
--------------------
module TransformerModel
from FSharpML.TransformerModel
--------------------
type TransformerModel<'a (requires 'a :> ITransformer and reference type)> =
{ TransformerChain: TransformerChain<'a>
Context: MLContext }
val transform : data:IDataView -> transformerModel:TransformerModel<'b> -> IDataView (requires 'b :> ITransformer and reference type)
val metrics : RegressionMetrics
module Evaluation
from FSharpML.TransformerModel
Multiple items
module Regression
from FSharpML.TransformerModel.Evaluation
--------------------
type Regression =
static member evaluateWith : ?Label:string * ?Score:string -> (IDataView -> TransformerModel<'a0> -> RegressionMetrics) (requires 'a0 :> ITransformer and reference type)
val evaluate : data:IDataView -> transformerModel:TransformerModel<'a> -> RegressionMetrics (requires 'a :> ITransformer and reference type)
val taxiTripSample : TaxiTrip
type RegresionResult =
{ Score: obj }
RegresionResult.Score: Microsoft.FSharp.Core.obj
val predict : (TaxiTrip -> 'a)
val createPredictionEngine : transformerModel:TransformerModel<'a> -> ('input -> 'predictionResult) (requires 'a :> ITransformer and reference type and reference type and default constructor and reference type)