Visualizing and understanding relationships between sets plays an important role in analyzing data at hand. A widely used visualization method is the Venn diagram. But Venn diagrams are limited in their capability. While two, three, or even four sets may be easily visualizable, they struggle with higher set counts. To address this issue, the concept of UpSet plots was developed by Lex et al. in 2014. In this blogpost I will demonstrate how to visualize sets with Venn diagrams and UpSet plots. I use a dataframe containing information about movies as source for our sets. The comparisons will be based on the genre of the movies.
#r "nuget: Deedle, 2.3.0"
#r "nuget: Plotly.NET.Interactive, 2.0.0"
#r "nuget: FSharp.Stats, 0.4.1"
#r "nuget: FSharpAux"
#r "nuget: BioFSharp.Vis, 3.0.1"
open Deedle
open Plotly.NET
open Plotly.NET.Interactive
open Plotly.NET.LayoutObjects
open FSharp.Stats
open BioFSharp.Vis
open BioFSharp.Vis.UpSet
open BioFSharp.Vis.Venn
open System
open System.IO
// do fsi.AddPrinter(fun (printer:Deedle.Internal.IFsiFormattable) -> "\n" + (printer.Format()))
let movieFrame =
let path =
let parDir = (Directory.GetCurrentDirectory() |> DirectoryInfo).Parent.Parent.FullName
Path.Combine(parDir, "files", "movies.csv")
Frame.ReadCsv(path, separators = ";")
|> Frame.sliceCols ["Name";"Action";"Comedy";"Drama";"AvgRating"]
|> Frame.filterRows (fun k s ->
s.GetAs<bool>("Action") ||
s.GetAs<bool>("Comedy") ||
s.GetAs<bool>("Drama")
)
let getSetByGenre (category: string) (frame: Frame<int,string>) =
frame
|> Frame.indexRowsUsing(fun s ->
{|
Name = s.GetAs<string>("Name");
Genre = s.GetAs<bool>(category)
|}
)
|> fun f -> f.RowKeys
|> Seq.toArray
|> Array.filter (fun x -> x.Genre = true)
|> Array.map (fun x -> x.Name)
|> Set.ofArray
let getScoreMap (frame: Frame<int,string>) =
frame
|> Frame.indexRowsUsing(fun s ->
s.GetAs<string>("Name"),
s.GetAs<float>("AvgRating")
)
|> fun f -> f.RowKeys
|> Map.ofSeq
movieFrame.Format()
Loading extensions from `C:\Users\revil\.nuget\packages\plotly.net.interactive\2.0.0\interactive-extensions\dotnet\Plotly.NET.Interactive.dll`
Name Action Comedy Drama AvgRating 0 -> Toy Story (1995) False True False 4.15 2 -> Grumpier Old Men (1995) False True False 3.02 3 -> Waiting to Exhale (1995) False True True 2.73 4 -> Father of the Bride Part II (1995) False True False 3.01 5 -> Heat (1995) True False False 3.88 6 -> Sabrina (1995) False True False 3.41 8 -> Sudden Death (1995) True False False 2.66 9 -> GoldenEye (1995) True False False 3.54 10 -> American President, The (1995) False True True 3.79 11 -> Dracula: Dead and Loving It (1995) False True False 2.36 13 -> Nixon (1995) False False True 3.54 14 -> Cutthroat Island (1995) True False False 2.46 15 -> Casino (1995) False False True 3.79 16 -> Sense and Sensibility (1995) False False True 4.03 18 -> Ace Ventura: When Nature Calls (1995) False True False 2.48 : ... ... ... ... ... 3851 -> Beach Party (1963) False True False 2.64 3852 -> Bikini Beach (1964) False True False 2.59 3854 -> Pajama Party (1964) False True False 2.92 3855 -> Stranger Than Paradise (1984) False True False 3.85 3858 -> Abbott and Costello Meet Frankenstein (1948) False True False 3.44 3859 -> Bank Dick, The (1940) False True False 3.99 3866 -> Phantom of the Opera, The (1943) False False True 3.72 3873 -> Bamboozled (2000) False True False 3.05 3874 -> Bootmen (2000) False True True 2.11 3876 -> Get Carter (2000) True False True 2.26 3878 -> Meet the Parents (2000) False True False 3.64 3879 -> Requiem for a Dream (2000) False False True 4.12 3880 -> Tigerland (2000) False False True 3.67 3881 -> Two Family House (2000) False False True 3.9 3882 -> Contender, The (2000) False False True 3.78
A Venn diagram uses simple closed shapes to represent sets. Those shapes are often circles or ellipses. Let`s start with a simple comparison of two sets using circles as our shape. For that we take genres action and comedy and determine their intersections:
let actionSet =
movieFrame
|> getSetByGenre "Action"
let comedySet =
movieFrame
|> getSetByGenre "Comedy"
let intersectionCount =
Venn.ofSetList [|"Action";"Comedy"|] [|actionSet;comedySet|]
|> Venn.toVennCount
intersectionCount
key | value |
---|---|
Action | 438 |
Action&Comedy | 65 |
Comedy | 1135 |
union | 1638 |
Now we can start building our Venn diagram with Plotly. First of all we need to create two shapes for the circles at the correct position and put them in a layout.
let axis =
LinearAxis.init(
ShowTickLabels = false,
ShowGrid = false,
ZeroLine = false
)
let circleAction =
Shape.init(
Opacity = 0.3,
Xref = "x",
Yref = "y",
Fillcolor = Color.fromKeyword Red,
X0 = 0,
Y0 = 0,
X1 = 2,
Y1 = 2,
ShapeType = StyleParam.ShapeType.Circle,
Line = Line.init(Color = Color.fromKeyword Red)
)
let circleComedy =
Shape.init(
Opacity = 0.3,
Xref = "x",
Yref = "y",
Fillcolor = Color.fromKeyword Blue,
X0 = 1.5,
Y0 = 0,
X1 = 3.5,
Y1 = 2,
ShapeType = StyleParam.ShapeType.Circle,
Line = Line.init(Color = Color.fromKeyword Blue)
)
let layout =
Layout.init(
Shapes = [circleAction;circleComedy],
Margin =
Margin.init(
Left = 20,
Right = 20,
Bottom = 100
)
)
|> Layout.updateLinearAxisById(StyleParam.SubPlotId.XAxis 1, axis)
|> Layout.updateLinearAxisById(StyleParam.SubPlotId.YAxis 1, axis)
Next, we need some text to describe our sets and intersection counts. This can be achieved via Chart.Scatter
.
let vennChart =
Trace2D.initScatter(
Trace2DStyle.Scatter(
X = [|1.; 2.5; 1.75|],
Y = [|1.; 1.; 1.|],
Mode = StyleParam.Mode.Text,
MultiText = ["Action<br>438";"Comedy<br>1135";"65"],
TextFont =
Font.init (
Family = StyleParam.FontFamily.Arial,
Size = 18.,
Color = Color.fromString "black"
)
)
)
|> GenericChart.ofTraceObject true
|> Chart.withSize (400.,400.)
We can now complete our Venn diagram by adding our previously created layout to the Chart.Scatter
vennChart
|> Chart.withLayout layout
This required a lot of manual formatting. Luckily, BioFSharp.Vis contains chart extensions for Venn diagrams with two and three sets.
let dramaSet =
movieFrame
|> getSetByGenre "Drama"
Chart.Venn (
[|"Action";"Comedy";"Drama"|],
[|actionSet;comedySet;dramaSet|]
)
Since Venn diagrams with more than three sets are increasingly difficult to model and read, BioFSharp.Vis also includes UpSet plots. UpSet plots consist of three basic parts. The first is a matrix representing the intersection between sets. Each row corresponds to a set and each column to an intersection. Sets taht are part of that particular intersection are marked with a filled in dot and connected by a line. We can try to create the intersection matrix for the three sets used in the previous Venn diagramm. We start again by computing the intersections.
let intersections = Venn.ofSetList [|"Action";"Comedy";"Drama"|] [|actionSet;comedySet;dramaSet|]
Now we need the sets that are part of each intersection. We also need a row position for each set in the matrix.
let intersectingSets =
intersections
|> Map.toArray
|> Array.map (snd >> (fun v -> v.Label))
|> Array.filter (List.isEmpty >> not)
intersectingSets
index | value |
---|---|
0 | [ Action ] |
1 | [ Action, Comedy ] |
2 | [ Action, Comedy, Drama ] |
3 | [ Action, Drama ] |
4 | [ Comedy ] |
5 | [ Comedy, Drama ] |
6 | [ Drama ] |
let setPositions =
[|
"Action", 0
"Comedy", 1
"Drama" , 2
|]
With this information we can create the first column of the intersection matrix:
let createIntersectionMatrixPart (setPos: (string*int)[]) (iSet: string list) (position: int) =
// Creates the part of the intersection matrix representing the current intersection.
// The position on the y-Axis is based on the order the labels and sets are given in.
// The position on the x-Axis is based on the given position (determined by intersection size).
UpSetParts.createIntersectionPlotPart
position
iSet
setPos
25
(Color.fromKeyword DarkBlue)
(Color.fromKeyword LightBlue)
let intersectionMatrixPart =
createIntersectionMatrixPart
setPositions
intersectingSets.[0]
0
intersectionMatrixPart
We can apply this function now to all intersections and add the correct labels to the rows:
let intersectionMatrix =
intersectingSets
|> Array.mapi (fun i iS ->
createIntersectionMatrixPart
setPositions
iS
i
)
|> Chart.combine
// Axis styling
|> Chart.withYAxis (
LinearAxis.init(
ShowGrid=false,
ShowLine=false,
ShowTickLabels=true,
ZeroLine=false,
TickMode=StyleParam.TickMode.Array,
TickVals=[0 .. setPositions.Length - 1],
TickText=(setPositions |> Array.map fst)
)
)
|> Chart.withXAxis (
LinearAxis.init(
ShowGrid=false,
ShowLine=false,
ShowTickLabels=false,
ZeroLine=false,
Domain=StyleParam.Range.MinMax (0.4,1.)
)
)
|> Chart.withLegend false
The next part is a bar chart representing the size of each set. The bar for each set gets placed next to the row representing the set in the matrix.
let setSizeBar =
// Creates a bar chart with the set sizes
UpSetParts.createSetSizePlot
(setPositions |> Array.map fst)
[|actionSet;comedySet;dramaSet|]
2.5
(Color.fromKeyword DarkBlue)
(0.,0.3)
(Font.init(StyleParam.FontFamily.Arial, Size=20.))
[
setSizeBar
intersectionMatrix
]
|> Chart.Grid (1,2)
|> Chart.withSize (900.,600.)
Lastly we come to our third basic part. It is a bar chart representing the size of each intersection, which it placed atop of the column representing each intersection.
let intersectionCounts =
intersections
|> Map.toArray
|> Array.map (fun (_,labelSet) ->
labelSet.Label, labelSet.Set.Count
)
|> Array.filter (fun (id,_) -> not id.IsEmpty)
let intersectionSizeBar =
// Creates a bar chart with the intersection sizes
UpSetParts.createIntersectionSizePlots
intersectionCounts
(float intersectionCounts.Length - 0.5)
(Color.fromKeyword DarkBlue)
(0.4, 1.)
(Font.init(StyleParam.FontFamily.Arial, Size=20.))
[|
Chart.Invisible()
intersectionSizeBar
setSizeBar
intersectionMatrix
|]
|> Chart.Grid(2,2)
|> Chart.withSize (900.,600.)
We now have a basic UpSet plot. There is also a chart extension for UpSet plot in BioFSharp.Vis
.
Chart.UpSet(
[|"Action";"Comedy";"Drama"|],
[|actionSet;comedySet;dramaSet|]
)
|> Chart.withSize (1400, 800)
|> Chart.withTemplate ChartTemplates.light
The UpSet plot can be augmented by different charts representing features of the intersections. We just need a map connecting set elements to the feature and a charting function with a title:
Chart.UpSet(
[|"Action";"Comedy";"Drama"|],
[|actionSet;comedySet;dramaSet|],
[|(getScoreMap movieFrame)|],
[|(fun y -> Chart.BoxPlot(Y = y)),"Score"|]
)
|> Chart.withSize (1400., 800.)
|> Chart.withTemplate ChartTemplates.light
We can theoretically plot multiple different features with individual charts for our intersections. We also are not as limited in the number of sets as we are with Venn diagrams. Even though the UpSet plot gets also more complex with increasing number of sets, this is less extreme than with a Venn diagram. Here is a small example: