The Jaccard similarity index is calculated as: Jaccard Similarity = (number of observations in both sets) / (number in either set). Let be the contingency table of binary data such as n11 = a, n10 = b, n01 = c and n00 = d. All these distances are of type d = sqrt(1 - s) with s a similarity coefficient. 1 = Jaccard index (1901) S3 coefficient of Gower & Legendre s1 = a / (a+b+c). The Jaccard coefficient takes a value between [0, 1] with zero indicating that the two shape are completely dissimilar and one indicating identical shapes. It can range from 0 to 1. The measurement emphasizes similarity between finite sample sets, and is formally defined as the size of the intersection divided by the size of the union of the sample sets. I have two binary dataframes c(0,1), and I didn't find any method which calculates the Jaccard similarity coefficient between both dataframes. I have seen methods that do this calculation between the columns of a single data frame. Text file one Cd5l Mcm6 Wdhd1 Serpina4-ps1 Nop58 Ugt2b38 Prim1 Rrm1 Mcm2 Fgl1. DF1 <- data.frame(a=c(0,0,1,0), b=c(1,0,1,0), c=c(1,1,1,1))

where R (S) is the region enclosed by contour S, and | R | computes the area of the region R. For open shapes, the first and last landmarks are connected to enclose the region.

The code below leverages this to quickly calculate the Jaccard Index without having to store the intermediate matrices in memory. The Jaccard index, also known as the Jaccard similarity coefficient (originally coined coefficient de communauté by Paul Jaccard), is a statistic used for comparing the similarity and diversity of sample sets. It was developed by Paul Jaccard, originally giving the French name coefficient de communauté, and independently formulated again by T. Tanimoto.

Simplest index, developed to compare regional floras (e.g., Jaccard 1912, The distribution of the flora of the alpine zone, New Phytologist 11:37-50); widely used to assess similarity of quadrats.

This package provides computation Jaccard Index based on n-grams for strings. This can be used as a metric for computing similarity between two strings. jaccard.R
# Written in 2012 by Joona Lehtomäki
# To the extent possible under law, the author(s) have dedicated all
# copyright and related and neighboring rights to this software to
# the public domain worldwide.

The function is specifically useful to detect population stratification in rare variant sequencing data. Text file two Serpina4-ps1 Trib3 Alas1 Tsku Tnfaip2 Fgl1 Nop58 Socs2 Ppargc1b Per1 Inhba Nrep Irf1 Map3k5 Osgin1 Ugt2b37 Yod1.

The correct value is 8 / (12 + 23 + 8) = 0.186. So a Jaccard index of 0.73 means two sets are 73% similar.

The R package scclusteval and the accompanying Snakemake workflow implement all steps of the pipeline: subsampling the cells, repeating the clustering with Seurat and estimation of cluster stability using the Jaccard similarity index and providing rich visualizations. In this blog post, I outline how you can calculate the Jaccard similarity between documents stored in two pandas columns.

Jaccard P. (1908) Nouvelles recherches sur la distribution florale. Bull. Soc. Vaudoise Sci. Nat. 44: 223-270.

Real R. & Vargas J.M. (1996) The Probabilistic Basis of Jaccard's Index of Similarity Systematic Biology 45(3): 380-385.

He. Z. & Weichuan Y. (2010) Stable feature selection for biomarker discovery. Computational Biology and Chemistry 34 215-225. 