Binary classification: Sentiment Analysis for User Reviews
ML.NET version |
API type |
Status |
App Type |
Data type |
Scenario |
ML Task |
Algorithms |
---|---|---|---|---|---|---|---|
v1.40 |
Dynamic API |
README.md updated |
Console app |
.tsv files |
Sentiment Analysis |
Two-class classification |
Linear Classification |
In this introductory sample, you'll see how to use FSharpML on top of ML.NET to predict a sentiment (positive or negative) for customer reviews. In the world of machine learning, this type of prediction is known as binary classification.
Problem
This problem is centered around predicting if a customer's review has positive or negative sentiment. We will use wikipedia-detox-datasets (one dataset for training and a second dataset for model's accuracy evaluation) that were processed by humans and each comment has been assigned a sentiment label:
0 - negative 1 - positive
Using those datasets we will build a model that will analyze a string and predict a sentiment value of 0 or 1.
ML task - Binary classification
The generalized problem of binary classification is to classify items into one of two classes classifying items into more than two classes is called multiclass classification.
- predict if an insurance claim is valid or not.
- predict if a plane will be delayed or will arrive on time.
- predict if a face ID (photo) belongs to the owner of a device.
The common feature for all those examples is that the parameter we want to predict can take only one of two values. In other words, this value is represented by boolean
type.
Solution
To solve this problem, first we will build an ML model. Then we will train the model on existing data, evaluate how good it is, and lastly we'll consume the model to predict a sentiment for new reviews.
- Build and train the model ----------------------------
FSharpML containing two complementary parts named EstimatorModel and TransformerModel covering the full machine lerarning workflow. In order to build an ML model and fit it to the training data we use EstimatorModel. The 'fit' function in EstimatorModel applied on training data results into the TransformerModel that represents the trained model able to transform other data of the same shape and is used int the second part to evaluate and consume the model.
Building a model includes:
Define the data's schema maped to the datasets to read (
wikipedia-detox-250-line-data.tsv
andwikipedia-detox-250-line-test.tsv
) with a DataReaderCreate an Estimator and transform the data to numeric vectors so it can be used effectively by an ML algorithm (with
FeaturizeText
)Choosing a trainer/learning algorithm (such as
FastTree
) to train the model with.
*
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 64: 65: 66: 67: 68: 69: 70: |
|
- Evaluate and consume the model ---------------------------------
TransformerModel is used to evaluate the model and make prediction on independant data.
*
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: |
|
{ Label: obj
Text: obj }
Type representing the text to run sentiment analysis on.
type LoadColumnAttribute =
inherit Attribute
new : fieldIndex:int -> LoadColumnAttribute + 2 overloads
--------------------
LoadColumnAttribute(fieldIndex: Microsoft.FSharp.Core.int) : LoadColumnAttribute
LoadColumnAttribute(columnIndexes: Microsoft.FSharp.Core.int Microsoft.FSharp.Core.[]) : LoadColumnAttribute
LoadColumnAttribute(start: Microsoft.FSharp.Core.int, end: Microsoft.FSharp.Core.int) : LoadColumnAttribute
SentimentIssue.Text: Microsoft.FSharp.Core.obj
--------------------
namespace System.Text
{ Prediction: obj
Probability: obj
Score: obj }
Result of sentiment prediction.
type ColumnNameAttribute =
inherit Attribute
new : name:string -> ColumnNameAttribute
--------------------
ColumnNameAttribute(name: Microsoft.FSharp.Core.string) : ColumnNameAttribute
type MLContext =
new : ?seed:Nullable<int> -> MLContext
member AnomalyDetection : AnomalyDetectionCatalog
member BinaryClassification : BinaryClassificationCatalog
member Clustering : ClusteringCatalog
member ComponentCatalog : ComponentCatalog
member Data : DataOperationsCatalog
member Forecasting : ForecastingCatalog
member Model : ModelOperationsCatalog
member MulticlassClassification : MulticlassClassificationCatalog
member Ranking : RankingCatalog
...
--------------------
MLContext(?seed: Nullable<Microsoft.FSharp.Core.int>) : MLContext
type Nullable =
static member Compare<'T> : n1:Nullable<'T> * n2:Nullable<'T> -> int
static member Equals<'T> : n1:Nullable<'T> * n2:Nullable<'T> -> bool
static member GetUnderlyingType : nullableType:Type -> Type
--------------------
type Nullable<'T (requires default constructor and value type and 'T :> ValueType)> =
struct
new : value:'T -> Nullable<'T>
member Equals : other:obj -> bool
member GetHashCode : unit -> int
member GetValueOrDefault : unit -> 'T + 1 overload
member HasValue : bool
member ToString : unit -> string
member Value : 'T
end
--------------------
Nullable ()
Nullable(value: 'T) : Nullable<'T>
from FSharpML
module EstimatorModel
from FSharpML.EstimatorModel
--------------------
namespace FSharpML.EstimatorModel
--------------------
type EstimatorModel<'a (requires 'a :> ITransformer and reference type)> =
{ EstimatorChain: EstimatorChain<'a>
Context: MLContext }
(extension) TransformsCatalog.TextTransforms.FeaturizeText(outputColumnName: Microsoft.FSharp.Core.string, options: Transforms.Text.TextFeaturizingEstimator.Options, [<ParamArray>] inputColumnNames: Microsoft.FSharp.Core.string Microsoft.FSharp.Core.[]) : Transforms.Text.TextFeaturizingEstimator
(extension) BinaryClassificationCatalog.BinaryClassificationTrainers.FastTree(?labelColumnName: Microsoft.FSharp.Core.string,?featureColumnName: Microsoft.FSharp.Core.string,?exampleWeightColumnName: Microsoft.FSharp.Core.string,?numberOfLeaves: Microsoft.FSharp.Core.int,?numberOfTrees: Microsoft.FSharp.Core.int,?minimumExampleCountPerLeaf: Microsoft.FSharp.Core.int,?learningRate: Microsoft.FSharp.Core.float) : Trainers.FastTree.FastTreeBinaryTrainer
module TransformerModel
from FSharpML.TransformerModel
--------------------
namespace FSharpML.TransformerModel
--------------------
type TransformerModel<'a (requires 'a :> ITransformer and reference type)> =
{ TransformerChain: TransformerChain<'a>
Context: MLContext }
from FSharpML.TransformerModel
module BinaryClassification
from FSharpML.TransformerModel.Evaluation
--------------------
type BinaryClassification =
static member evaluateNonCalibratedWith : ?Label:string * ?Score:string * ?PredictedLabel:string -> (IDataView -> TransformerModel<'a0> -> BinaryClassificationMetrics) (requires 'a0 :> ITransformer and reference type)
static member evaluateWith : ?Label:string * ?Score:string * ?Probability:string * ?PredictedLabel:string -> (IDataView -> TransformerModel<'a0> -> CalibratedBinaryClassificationMetrics) (requires 'a0 :> ITransformer and reference type)
from FSharpML