Tutorial: spam detection
FSharpML is a functional-friendly lightweight wrapper of the powerful ML.Net library. It is designed to enable users to explore ML.Net in a scriptable manner, while maintaining the functional style of F#.
After installing the package via Nuget we can load the delivered reference script and start using ML.Net in conjunction with FSharpML.
1: 2: 3: 4: 5: 6: 7: 8: |
|
To get a feel how this library handles ML.Net operations we rebuild the Spam Detection tutorial given by ML.Net. We will start by instantiating a MLContext, the heart of the ML.Net API and intended to serve as a method catalog. We will now use it to set a scope on data stored in a text file. The method name might be misleading, but ML.Net readers are lazy and the reading process will start when the data is processed (see).
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: |
|
Now that we told our interactive environment about the data we can start thinking about a model (EstimatorChain in the ML.Net jargon) we want to build. As the MLContext serves as a catalog we will use it to draw transformations that can be appended to form a estimator chain. At this point we will see FSharpML comming into play enabling us to use the beloved pipelining style familiar to FSharp users. We will now create an EstimatorChain which converts the text label to a bool then featurizes the text, and add a linear trainer.
1: 2: 3: 4: 5: 6: |
|
This is already pretty fsharp-friendly but we thought we could even closer by releaving us from carring around our instance of the MLContext explicitly. For this we created the type EstimatorModel which contains our EstimatorChain and the context. By Calling append by we only have to provide a lambda expression were we can define which method we want of our context.
1: 2: 3: 4: 5: 6: |
|
Way better. Now we can concentrate on machine learning. So lets start by fitting our EstimatorModel to the complete data set. The return value of this process is a so called TransformerModel, which contains a trained EstimatorChain and can be used to transform unseen data. For this we want to split the data we previously put in scope into two fractions. One to train the model and a remainder to evaluate the model.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: |
|
Now that we can examine the metrics of our model evaluation, see that we have a accuracy of 0.99 and be tempted to use it in production so lets test it first with some examples.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: |
|
As we see, even so our accuracy when evaluating the model on the test data set was very high, it does not set the correct lable true, to the second and the fourth message which look a lot like spam. Lets examine our training data set:
1: 2: 3: 4: |
|
|
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: |
|
|
1:
|
|
The chart clearly shows that the data we learned uppon is highly inhomogenous. We have a lot more ham than spam, which is generally preferable but but our models labeling threshold is clearly to high. Lets have a look at the precision recall curves of our model. For this we will evaluate the model with different thresholds and plot both
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: |
|
type MLContext =
new : ?seed:Nullable<int> -> MLContext
member AnomalyDetection : AnomalyDetectionCatalog
member BinaryClassification : BinaryClassificationCatalog
member Clustering : ClusteringCatalog
member ComponentCatalog : ComponentCatalog
member Data : DataOperationsCatalog
member Forecasting : ForecastingCatalog
member Model : ModelOperationsCatalog
member MulticlassClassification : MulticlassClassificationCatalog
member Ranking : RankingCatalog
...
--------------------
MLContext(?seed: Nullable<Microsoft.FSharp.Core.int>) : MLContext
type Nullable =
static member Compare<'T> : n1:Nullable<'T> * n2:Nullable<'T> -> int
static member Equals<'T> : n1:Nullable<'T> * n2:Nullable<'T> -> bool
static member GetUnderlyingType : nullableType:Type -> Type
--------------------
type Nullable<'T (requires default constructor and value type and 'T :> ValueType)> =
struct
new : value:'T -> Nullable<'T>
member Equals : other:obj -> bool
member GetHashCode : unit -> int
member GetValueOrDefault : unit -> 'T + 1 overload
member HasValue : bool
member ToString : unit -> string
member Value : 'T
end
--------------------
Nullable ()
Nullable(value: 'T) : Nullable<'T>
member GetOutputSchema : unit -> DataViewSchema
member Load : source:IMultiStreamSource -> IDataView
nested type Column
nested type Options
nested type Range
new : unit -> Column + 3 overloads
val Name : string
val Source : Range[]
val KeyCount : KeyCount
member DataKind : DataKind with get, set
| SByte = 1uy
| Byte = 2uy
| Int16 = 3uy
| UInt16 = 4uy
| Int32 = 5uy
| UInt32 = 6uy
| Int64 = 7uy
| UInt64 = 8uy
| Single = 9uy
| Double = 10uy
...
type EstimatorChain<'TLastTransformer (requires reference type and 'TLastTransformer :> ITransformer)> =
new : unit -> EstimatorChain<'TLastTransformer>
val LastEstimator : IEstimator<'TLastTransformer>
member Append<'TNewTrans> : estimator:IEstimator<'TNewTrans> * ?scope:TransformerScope -> EstimatorChain<'TNewTrans>
member AppendCacheCheckpoint : env:IHostEnvironment -> EstimatorChain<'TLastTransformer>
member Fit : input:IDataView -> TransformerChain<'TLastTransformer>
member GetOutputSchema : inputSchema:SchemaShape -> SchemaShape
--------------------
EstimatorChain() : EstimatorChain<'TLastTransformer>
from FSharpML
from FSharpML
(extension) TransformsCatalog.TextTransforms.FeaturizeText(outputColumnName: Microsoft.FSharp.Core.string, options: Transforms.Text.TextFeaturizingEstimator.Options, [<ParamArray>] inputColumnNames: Microsoft.FSharp.Core.string Microsoft.FSharp.Core.[]) : Transforms.Text.TextFeaturizingEstimator
module Data
from FSharpML
--------------------
namespace Microsoft.ML.Data
--------------------
namespace System.Data
from FSharpML.TransformerModel
module BinaryClassification
from FSharpML.TransformerModel.Evaluation
--------------------
type BinaryClassification =
static member evaluateNonCalibratedWith : ?Label:string * ?Score:string * ?PredictedLabel:string -> (IDataView -> TransformerModel<'a0> -> BinaryClassificationMetrics) (requires 'a0 :> ITransformer and reference type)
static member evaluateWith : ?Label:string * ?Score:string * ?Probability:string * ?PredictedLabel:string -> (IDataView -> TransformerModel<'a0> -> CalibratedBinaryClassificationMetrics) (requires 'a0 :> ITransformer and reference type)
module TransformerModel
from FSharpML.TransformerModel
--------------------
namespace FSharpML.TransformerModel
--------------------
type TransformerModel<'a (requires 'a :> ITransformer and reference type)> =
{ TransformerChain: TransformerChain<'a>
Context: MLContext }
{ LabelText: obj
Message: obj }
member Clone : unit -> obj
member CopyTo : array:Array * index:int -> unit + 1 overload
member GetEnumerator : unit -> IEnumerator
member GetLength : dimension:int -> int
member GetLongLength : dimension:int -> int64
member GetLowerBound : dimension:int -> int
member GetUpperBound : dimension:int -> int
member GetValue : index:int -> obj + 7 overloads
member Initialize : unit -> unit
member IsFixedSize : bool
...
inherit SingleFeaturePredictionTransformerBase<'TModel>
type TransformerChain<'TLastTransformer (requires reference type and 'TLastTransformer :> ITransformer)> =
new : [<ParamArray>] transformers:ITransformer[] -> TransformerChain<'TLastTransformer> + 1 overload
val LastTransformer : 'TLastTransformer
member Append<'TNewLast> : transformer:'TNewLast * ?scope:TransformerScope -> TransformerChain<'TNewLast>
member GetEnumerator : unit -> IEnumerator<ITransformer>
member GetModelFor : scopeFilter:TransformerScope -> TransformerChain<ITransformer>
member GetOutputSchema : inputSchema:DataViewSchema -> DataViewSchema
member Transform : input:IDataView -> IDataView
--------------------
TransformerChain([<ParamArray>] transformers: ITransformer Microsoft.FSharp.Core.[]) : TransformerChain<'TLastTransformer>
TransformerChain(transformers: Collections.Generic.IEnumerable<ITransformer>, scopes: Collections.Generic.IEnumerable<TransformerScope>) : TransformerChain<'TLastTransformer>