BioFSharp


Binary classification: Sentiment Analysis for User Reviews

ML.NET version

API type

Status

App Type

Data type

Scenario

ML Task

Algorithms

v1.40

Dynamic API

README.md updated

Console app

.tsv files

Sentiment Analysis

Two-class classification

Linear Classification

In this introductory sample, you'll see how to use FSharpML on top of ML.NET to predict a sentiment (positive or negative) for customer reviews. In the world of machine learning, this type of prediction is known as binary classification.

Problem

This problem is centered around predicting if a customer's review has positive or negative sentiment. We will use wikipedia-detox-datasets (one dataset for training and a second dataset for model's accuracy evaluation) that were processed by humans and each comment has been assigned a sentiment label:

0 - negative 1 - positive

Using those datasets we will build a model that will analyze a string and predict a sentiment value of 0 or 1.

ML task - Binary classification

The generalized problem of binary classification is to classify items into one of two classes classifying items into more than two classes is called multiclass classification.

  • predict if an insurance claim is valid or not.
  • predict if a plane will be delayed or will arrive on time.
  • predict if a face ID (photo) belongs to the owner of a device.

The common feature for all those examples is that the parameter we want to predict can take only one of two values. In other words, this value is represented by boolean type.

Solution

To solve this problem, first we will build an ML model. Then we will train the model on existing data, evaluate how good it is, and lastly we'll consume the model to predict a sentiment for new reviews.

  1. Build and train the model ----------------------------

FSharpML containing two complementary parts named EstimatorModel and TransformerModel covering the full machine lerarning workflow. In order to build an ML model and fit it to the training data we use EstimatorModel. The 'fit' function in EstimatorModel applied on training data results into the TransformerModel that represents the trained model able to transform other data of the same shape and is used int the second part to evaluate and consume the model.

Building a model includes:

  • Define the data's schema maped to the datasets to read (wikipedia-detox-250-line-data.tsv and wikipedia-detox-250-line-test.tsv) with a DataReader

  • Create an Estimator and transform the data to numeric vectors so it can be used effectively by an ML algorithm (with FeaturizeText)

  • Choosing a trainer/learning algorithm (such as FastTree) to train the model with.

*

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
45: 
46: 
47: 
48: 
49: 
50: 
51: 
52: 
53: 
54: 
55: 
56: 
57: 
58: 
59: 
60: 
61: 
62: 
63: 
64: 
65: 
66: 
67: 
68: 
69: 
70: 
#load "../../FSharpML.fsx"


open System;
open Microsoft.ML
open Microsoft.ML.Data;
open FSharpML
open FSharpML.EstimatorModel
open FSharpML.TransformerModel



/// Type representing the text to run sentiment analysis on.
[<CLIMutable>] 
type SentimentIssue = 
    { 
        [<LoadColumn(0)>]
        Label : bool

        [<LoadColumn(1)>]
        Text : string 
    }

/// Result of sentiment prediction.
[<CLIMutable>]
type  SentimentPrediction = 
    { 
        // ColumnName attribute is used to change the column name from
        // its default value, which is the name of the field.
        [<ColumnName("PredictedLabel")>]
        Prediction : bool; 

        // No need to specify ColumnName attribute, because the field
        // name "Probability" is the column name we want.
        Probability : float32; 

        Score : float32 
    }


//Create the MLContext to share across components for deterministic results
let mlContext = MLContext(seed = Nullable 1) // Seed set to any number so you
                                             // have a deterministic environment

// STEP 1: Common data loading configuration
let fullData = 
    __SOURCE_DIRECTORY__  + "./data/wikipedia-detox-250-line-all.tsv"
    |> DataModel.fromTextFileWith<SentimentIssue> mlContext '\t' true 

let trainingData, testingData = 
    fullData
    |> DataModel.trainTestSplit 0.2 
    

    // DefaultColumnNames

//STEP 2: Process data, create and train the model 
let model = 
    EstimatorModel.create mlContext
    // Process data transformations in pipeline
    |> EstimatorModel.appendBy (fun mlc -> mlc.Transforms.Text.FeaturizeText("Features" , "Text"))
    // Create the model
    |> EstimatorModel.appendBy (fun mlc -> mlc.BinaryClassification.Trainers.FastTree(labelColumnName = "Label", featureColumnName = "Features"))
    // Train the model
    |> EstimatorModel.fit trainingData.Dataview

// STEP3: Run the prediciton on the test data
let predictions =
    model
    |> TransformerModel.transform testingData.Dataview
  1. Evaluate and consume the model ---------------------------------

TransformerModel is used to evaluate the model and make prediction on independant data.

*

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
// STEP4: Evaluate accuracy of the model
let metrics = 
    model
    |> Evaluation.BinaryClassification.evaluateWith(Label=DefaultColumnNames.Label,Score=DefaultColumnNames.Score) testingData.Dataview



let sampleStatement = { Label = false; Text = "This is a very rude movie" }
// STEP5: Create prediction engine function related to the loaded trained model
let predict = 
    TransformerModel.createPredictionEngine<_,SentimentIssue,SentimentPrediction> model

// Score
let prediction = predict sampleStatement
namespace FSharp
namespace FSharp.Plotly
namespace System
namespace Microsoft
namespace Microsoft.ML
namespace Microsoft.ML.Data
module FSharpML
namespace FSharpML.EstimatorModel
namespace FSharpML.TransformerModel
type SentimentIssue =
  { Label: obj
    Text: obj }


 Type representing the text to run sentiment analysis on.
Multiple items
type LoadColumnAttribute =
  inherit Attribute
  new : fieldIndex:int -> LoadColumnAttribute + 2 overloads

--------------------
LoadColumnAttribute(fieldIndex: Microsoft.FSharp.Core.int) : LoadColumnAttribute
LoadColumnAttribute(columnIndexes: Microsoft.FSharp.Core.int Microsoft.FSharp.Core.[]) : LoadColumnAttribute
LoadColumnAttribute(start: Microsoft.FSharp.Core.int, end: Microsoft.FSharp.Core.int) : LoadColumnAttribute
SentimentIssue.Label: Microsoft.FSharp.Core.obj
Multiple items
SentimentIssue.Text: Microsoft.FSharp.Core.obj

--------------------
namespace System.Text
type SentimentPrediction =
  { Prediction: obj
    Probability: obj
    Score: obj }


 Result of sentiment prediction.
Multiple items
type ColumnNameAttribute =
  inherit Attribute
  new : name:string -> ColumnNameAttribute

--------------------
ColumnNameAttribute(name: Microsoft.FSharp.Core.string) : ColumnNameAttribute
SentimentPrediction.Prediction: Microsoft.FSharp.Core.obj
SentimentPrediction.Probability: Microsoft.FSharp.Core.obj
SentimentPrediction.Score: Microsoft.FSharp.Core.obj
val mlContext : MLContext
Multiple items
type MLContext =
  new : ?seed:Nullable<int> -> MLContext
  member AnomalyDetection : AnomalyDetectionCatalog
  member BinaryClassification : BinaryClassificationCatalog
  member Clustering : ClusteringCatalog
  member ComponentCatalog : ComponentCatalog
  member Data : DataOperationsCatalog
  member Forecasting : ForecastingCatalog
  member Model : ModelOperationsCatalog
  member MulticlassClassification : MulticlassClassificationCatalog
  member Ranking : RankingCatalog
  ...

--------------------
MLContext(?seed: Nullable<Microsoft.FSharp.Core.int>) : MLContext
Multiple items
type Nullable =
  static member Compare<'T> : n1:Nullable<'T> * n2:Nullable<'T> -> int
  static member Equals<'T> : n1:Nullable<'T> * n2:Nullable<'T> -> bool
  static member GetUnderlyingType : nullableType:Type -> Type

--------------------
type Nullable<'T (requires default constructor and value type and 'T :> ValueType)> =
  struct
    new : value:'T -> Nullable<'T>
    member Equals : other:obj -> bool
    member GetHashCode : unit -> int
    member GetValueOrDefault : unit -> 'T + 1 overload
    member HasValue : bool
    member ToString : unit -> string
    member Value : 'T
  end

--------------------
Nullable ()
Nullable(value: 'T) : Nullable<'T>
val fullData : Microsoft.FSharp.Core.obj
module DataModel

from FSharpML
val fromTextFileWith<'Trow> : mlc:MLContext -> separatorChar:Microsoft.FSharp.Core.char -> hasHeader:Microsoft.FSharp.Core.bool -> path:Microsoft.FSharp.Core.string -> DataModel.DataModel<Microsoft.FSharp.Core.obj>
val trainingData : Microsoft.FSharp.Core.obj
val testingData : 'a
val trainTestSplit : testfraction:Microsoft.FSharp.Core.float -> dataModel:DataModel.DataModel<'a> -> DataModel.DataModel<DataModel.TrainTestSplitInfo> * DataModel.DataModel<DataModel.TrainTestSplitInfo>
val model : '_arg3 (requires member ( |> ) and member ( |> ) and 'a :> ITransformer and reference type and 'c :> ITransformer and reference type)
Multiple items
module EstimatorModel

from FSharpML.EstimatorModel

--------------------
namespace FSharpML.EstimatorModel

--------------------
type EstimatorModel<'a (requires 'a :> ITransformer and reference type)> =
  { EstimatorChain: EstimatorChain<'a>
    Context: MLContext }
val create : mlContext:MLContext -> EstimatorModel<'a> (requires reference type and 'a :> ITransformer)
val appendBy : transforming:(MLContext -> #IEstimator<'c>) -> estimatorModel:EstimatorModel<'d> -> EstimatorModel<'c> (requires 'c :> ITransformer and reference type and 'd :> ITransformer and reference type)
val mlc : MLContext
property MLContext.Transforms: TransformsCatalog with get
property TransformsCatalog.Text: TransformsCatalog.TextTransforms with get
(extension) TransformsCatalog.TextTransforms.FeaturizeText(outputColumnName: Microsoft.FSharp.Core.string,?inputColumnName: Microsoft.FSharp.Core.string) : Transforms.Text.TextFeaturizingEstimator
(extension) TransformsCatalog.TextTransforms.FeaturizeText(outputColumnName: Microsoft.FSharp.Core.string, options: Transforms.Text.TextFeaturizingEstimator.Options, [<ParamArray>] inputColumnNames: Microsoft.FSharp.Core.string Microsoft.FSharp.Core.[]) : Transforms.Text.TextFeaturizingEstimator
property MLContext.BinaryClassification: BinaryClassificationCatalog with get
property BinaryClassificationCatalog.Trainers: BinaryClassificationCatalog.BinaryClassificationTrainers with get
(extension) BinaryClassificationCatalog.BinaryClassificationTrainers.FastTree(options: Trainers.FastTree.FastTreeBinaryTrainer.Options) : Trainers.FastTree.FastTreeBinaryTrainer
(extension) BinaryClassificationCatalog.BinaryClassificationTrainers.FastTree(?labelColumnName: Microsoft.FSharp.Core.string,?featureColumnName: Microsoft.FSharp.Core.string,?exampleWeightColumnName: Microsoft.FSharp.Core.string,?numberOfLeaves: Microsoft.FSharp.Core.int,?numberOfTrees: Microsoft.FSharp.Core.int,?minimumExampleCountPerLeaf: Microsoft.FSharp.Core.int,?learningRate: Microsoft.FSharp.Core.float) : Trainers.FastTree.FastTreeBinaryTrainer
val fit : data:IDataView -> estimatorModel:EstimatorModel<'a> -> TransformerModel<'a> (requires 'a :> ITransformer and reference type)
val predictions : '_arg3
Multiple items
module TransformerModel

from FSharpML.TransformerModel

--------------------
namespace FSharpML.TransformerModel

--------------------
type TransformerModel<'a (requires 'a :> ITransformer and reference type)> =
  { TransformerChain: TransformerChain<'a>
    Context: MLContext }
val transform : data:IDataView -> transformerModel:TransformerModel<'b> -> IDataView (requires 'b :> ITransformer and reference type)
val metrics : '_arg3
module Evaluation

from FSharpML.TransformerModel
Multiple items
module BinaryClassification

from FSharpML.TransformerModel.Evaluation

--------------------
type BinaryClassification =
  static member evaluateNonCalibratedWith : ?Label:string * ?Score:string * ?PredictedLabel:string -> (IDataView -> TransformerModel<'a0> -> BinaryClassificationMetrics) (requires 'a0 :> ITransformer and reference type)
  static member evaluateWith : ?Label:string * ?Score:string * ?Probability:string * ?PredictedLabel:string -> (IDataView -> TransformerModel<'a0> -> CalibratedBinaryClassificationMetrics) (requires 'a0 :> ITransformer and reference type)
static member Evaluation.BinaryClassification.evaluateWith : ?Label:Microsoft.FSharp.Core.string * ?Score:Microsoft.FSharp.Core.string * ?Probability:Microsoft.FSharp.Core.string * ?PredictedLabel:Microsoft.FSharp.Core.string -> (IDataView -> TransformerModel<'a0> -> CalibratedBinaryClassificationMetrics) (requires 'a0 :> ITransformer and reference type)
module DefaultColumnNames

from FSharpML
val Label : Microsoft.FSharp.Core.string
val Score : Microsoft.FSharp.Core.string
val testingData : 'a Microsoft.FSharp.Core.[]
val sampleStatement : SentimentIssue
namespace System.Text
val predict : (SentimentIssue -> 'a)
val createPredictionEngine : transformerModel:TransformerModel<'a> -> ('input -> 'predictionResult) (requires 'a :> ITransformer and reference type and reference type and default constructor and reference type)
val prediction : 'a
Fork me on GitHub