Binary classification: Spam Detection for Text Messages
ML.NET version |
API type |
Status |
App Type |
Data type |
Scenario |
ML Task |
Algorithms |
---|---|---|---|---|---|---|---|
v1.40 |
Dynamic API |
Up-to-date |
Console app |
.tsv files |
Spam detection |
Two-class classification |
SDCA (linear learner) |
In this sample, you'll see how to use FSharpML on top of ML.NET to predict whether a text message is spam. In the world of machine learning, this type of prediction is known as binary classification.
Problem
Our goal here is to predict whether a text message is spam (an irrelevant/unwanted message). We will use the SMS Spam Collection Data Set from UCI, which contains close to 6000 messages that have been classified as being "spam" or "ham" (not spam). We will use this dataset to train a model that can take in new message and predict whether they are spam or not.
This is an example of binary classification, as we are classifying the text messages into one of two categories.
Solution
To solve this problem, first we will build an estimator to define the ML pipeline we want to use. Then we will train this estimator on existing data, evaluate how good it is, and lastly we'll consume the model to predict whether a few examples messages are spam.
- Build and train the model ----------------------------
FSharpML containing two complementary parts named EstimatorModel and TransformerModel covering the full machine lerarning workflow. In order to build an ML model and fit it to the training data we use EstimatorModel. The 'fit' function in EstimatorModel applied on training data results into the TransformerModel that represents the trained model able to transform other data of the same shape and is used int the second part to evaluate and consume the model.
To build the estimator we will:
Define how to read the spam dataset that will be downloaded from https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection.
-
Apply several data transformations:
- Convert the label ("spam" or "ham") to a boolean ("true" represents spam) so we can use it with a binary classifier.
- Featurize the text message into a numeric vector so a machine learning trainer can use it
Add a trainer (such as
StochasticDualCoordinateAscent
).
*
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: |
|
- Evaluate and consume the model ---------------------------------
TransformerModel is used to evaluate the model and make prediction on independant data.
*
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: |
|
{ LabelText: obj
Message: obj }
Type representing the Message to run analysis on.
type LoadColumnAttribute =
inherit Attribute
new : fieldIndex:int -> LoadColumnAttribute + 2 overloads
--------------------
LoadColumnAttribute(fieldIndex: Microsoft.FSharp.Core.int) : LoadColumnAttribute
LoadColumnAttribute(columnIndexes: Microsoft.FSharp.Core.int Microsoft.FSharp.Core.[]) : LoadColumnAttribute
LoadColumnAttribute(start: Microsoft.FSharp.Core.int, end: Microsoft.FSharp.Core.int) : LoadColumnAttribute
type MLContext =
new : ?seed:Nullable<int> -> MLContext
member AnomalyDetection : AnomalyDetectionCatalog
member BinaryClassification : BinaryClassificationCatalog
member Clustering : ClusteringCatalog
member ComponentCatalog : ComponentCatalog
member Data : DataOperationsCatalog
member Forecasting : ForecastingCatalog
member Model : ModelOperationsCatalog
member MulticlassClassification : MulticlassClassificationCatalog
member Ranking : RankingCatalog
...
--------------------
MLContext(?seed: Nullable<Microsoft.FSharp.Core.int>) : MLContext
type Nullable =
static member Compare<'T> : n1:Nullable<'T> * n2:Nullable<'T> -> int
static member Equals<'T> : n1:Nullable<'T> * n2:Nullable<'T> -> bool
static member GetUnderlyingType : nullableType:Type -> Type
--------------------
type Nullable<'T (requires default constructor and value type and 'T :> ValueType)> =
struct
new : value:'T -> Nullable<'T>
member Equals : other:obj -> bool
member GetHashCode : unit -> int
member GetValueOrDefault : unit -> 'T + 1 overload
member HasValue : bool
member ToString : unit -> string
member Value : 'T
end
--------------------
Nullable ()
Nullable(value: 'T) : Nullable<'T>
from FSharpML
module EstimatorModel
from FSharpML.EstimatorModel
--------------------
namespace FSharpML.EstimatorModel
--------------------
type EstimatorModel<'a (requires 'a :> ITransformer and reference type)> =
{ EstimatorChain: EstimatorChain<'a>
Context: MLContext }
(extension) TransformsCatalog.ConversionTransforms.MapValue<'TInputType,'TOutputType>(outputColumnName: Microsoft.FSharp.Core.string, keyValuePairs: Collections.Generic.IEnumerable<Collections.Generic.KeyValuePair<'TInputType,'TOutputType>>,?inputColumnName: Microsoft.FSharp.Core.string,?treatValuesAsKeyType: Microsoft.FSharp.Core.bool) : Transforms.ValueMappingEstimator<'TInputType,'TOutputType>
(extension) TransformsCatalog.ConversionTransforms.MapValue(outputColumnName: Microsoft.FSharp.Core.string, lookupMap: IDataView, keyColumn: DataViewSchema.Column, valueColumn: DataViewSchema.Column,?inputColumnName: Microsoft.FSharp.Core.string) : Transforms.ValueMappingEstimator
from FSharpML
(extension) TransformsCatalog.TextTransforms.FeaturizeText(outputColumnName: Microsoft.FSharp.Core.string, options: Transforms.Text.TextFeaturizingEstimator.Options, [<ParamArray>] inputColumnNames: Microsoft.FSharp.Core.string Microsoft.FSharp.Core.[]) : Transforms.Text.TextFeaturizingEstimator
(extension) BinaryClassificationCatalog.BinaryClassificationTrainers.SdcaLogisticRegression(?labelColumnName: Microsoft.FSharp.Core.string,?featureColumnName: Microsoft.FSharp.Core.string,?exampleWeightColumnName: Microsoft.FSharp.Core.string,?l2Regularization: Nullable<Microsoft.FSharp.Core.float32>,?l1Regularization: Nullable<Microsoft.FSharp.Core.float32>,?maximumNumberOfIterations: Nullable<Microsoft.FSharp.Core.int>) : Trainers.SdcaLogisticRegressionBinaryTrainer
module TransformerModel
from FSharpML.TransformerModel
--------------------
namespace FSharpML.TransformerModel
--------------------
type TransformerModel<'a (requires 'a :> ITransformer and reference type)> =
{ TransformerChain: TransformerChain<'a>
Context: MLContext }
from FSharpML.TransformerModel
module BinaryClassification
from FSharpML.TransformerModel.Evaluation
--------------------
type BinaryClassification =
static member evaluateNonCalibratedWith : ?Label:string * ?Score:string * ?PredictedLabel:string -> (IDataView -> TransformerModel<'a0> -> BinaryClassificationMetrics) (requires 'a0 :> ITransformer and reference type)
static member evaluateWith : ?Label:string * ?Score:string * ?Probability:string * ?PredictedLabel:string -> (IDataView -> TransformerModel<'a0> -> CalibratedBinaryClassificationMetrics) (requires 'a0 :> ITransformer and reference type)