methods

Value Members

object CATD

Provides functions for transforming an annotation dataset into a standard label dataset using the CATD algorithm.
Provides functions for transforming an annotation dataset into a standard label dataset using the CATD algorithm.
This algorithm only works with continuous label datasets of type types.RealAnnotation:
The algorithm returns a CATD.CATDModel, with information about the class true label estimation and the annotators weight
Example:
1. import com.enriquegrodrigo.spark.crowd.methods.CATD import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val annFile = "data/real-ann.parquet" val annData = spark.read.parquet(annFile) //Applying the learning algorithm val mode = CATD(annData.as[RealAnnotation]) //Get MulticlassLabel with the class predictions val pred = mode.getMu() //Annotator weights val annweights = mode.getAnnotatorWeights()
Version
0.2.0
See also
Q. Li, Y. Li, J. Gao, L. Su, B. Zhao, M. Demirbas, W. Fan, and J. Han. A confidence-aware approach for truth discovery on long-tail data. PVLDB, 8(4):425–436, 2014.

object CGlad

Provides functions for transforming an annotation dataset into a standard label dataset using the CGlad algorithm.

This algorithm only works with types.BinaryAnnotation datasets.

The algorithm returns a types.CGladModel, with information as the class true label estimation, the annotator precision or the cluster difficulty.

Version: 0.2.1

object DawidSkene

Provides functions for transforming an annotation dataset into a standard label dataset using the DawidSkene algorithm.
Provides functions for transforming an annotation dataset into a standard label dataset using the DawidSkene algorithm.
This algorithm only works with types.MulticlassAnnotation datasets although one can easily use it for types.BinaryAnnotation through Spark Dataset as method
It returns a types.DawidSkeneModel with information about the estimation of the true class, as well as the annotator quality and the log-likelihood obtained by the model.
Example:
1. import com.enriquegrodrigo.spark.crowd.methods.DawidSkene import com.enriquegrodrigo.spark.crowd.types._ val exampleFile = "data/multi-ann.parquet" val exampleData = spark.read.parquet(exampleFile).as[MulticlassAnnotation] //Applying the learning algorithm val mode = DawidSkene(exampleData) //Get MulticlassLabel with the class predictions val pred = mode.getMu().as[MulticlassLabel] //Annotator precision matrices val annprec = mode.getAnnotatorPrecision() //Annotator likelihood val like = mode.getLogLikelihood()
Version
0.1.5
See also
Dawid, Alexander Philip, and Allan M. Skene. "Maximum likelihood estimation of observer error-rates using the EM algorithm." Applied statistics (1979): 20-28.
object Glad

Provides functions for transforming an annotation dataset into a standard label dataset using the Glad algorithm.
Provides functions for transforming an annotation dataset into a standard label dataset using the Glad algorithm.
This algorithm only works with types.BinaryAnnotation datasets.
The algorithm returns a types.GladModel, with information about the class true label estimation, the annotator precision, the instances difficulty and the log-likilihood of the model.
Example:
1. import com.enriquegrodrigo.spark.crowd.methods.Glad import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val annFile = "data/binary-ann.parquet" val annData = spark.read.parquet(annFile).as[BinaryAnnotation] //Applying the learning algorithm val mode = Glad(annData) //Get MulticlassLabel with the class predictions val pred = mode.getMu().as[BinarySoftLabel] //Annotator precision matrices val annprec = mode.getAnnotatorPrecision() //Annotator precision matrices val annprec = mode.getInstanceDifficulty() //Annotator likelihood val like = mode.getLogLikelihood()
Version
0.1.5
See also
Whitehill, Jacob, et al. "Whose vote should count more: Optimal integration of labels from labelers of unknown expertise." Advances in neural information processing systems. 2009.
object IBCC

Provides functions for transforming an annotation dataset into a standard label dataset using the IBCC algorithm.
Provides functions for transforming an annotation dataset into a standard label dataset using the IBCC algorithm.
This algorithm only works with multiclass target variables (Datasets of type types.MulticlassAnnotation
The algorithm returns a IBCC.IBCCModel, with information about the class true label estimation, the annotators precision, and the class prior estimation
Example:
1. import com.enriquegrodrigo.spark.crowd.methods.IBCC import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val annFile = "data/binary-ann.parquet" val annData = spark.read.parquet(annFile) //Applying the learning algorithm val mode = IBCC(annData) //Get MulticlassLabel with the class predictions val pred = mode.getMu() //Annotator precision matrices val annprec = mode.getAnnotatorPrecision() //Annotator precision matrices val classPrior = mode.getClassPrior()
Version
0.2.0
See also
H.-C. Kim and Z. Ghahramani. Bayesian classifier combination. In AISTATS, pages 619–627, 2012.
object MajorityVoting

Provides functions for transforming an annotation dataset into a standard label dataset using the majority voting approach
Provides functions for transforming an annotation dataset into a standard label dataset using the majority voting approach
This object provides several functions for using majority voting style algorithms over annotations datasets (spark datasets with types types.BinaryAnnotation, types.MulticlassAnnotation, or types.RealAnnotation). For discrete types (types.BinaryAnnotation, types.MulticlassAnnotation) the method uses the most frequent class. For continuous types, the mean is used.
The object also provides methods for estimating the probability of a class for the discrete type, computing, for the binary case, the proportion of the positive class and, for the multiclass case, the proportion of each of the classes.
The next example can be found in the examples folder of the project.
Example:
1. import com.enriquegrodrigo.spark.crowd.methods.MajorityVoting import com.enriquegrodrigo.spark.crowd.types._ val exampleFile = "data/binary-ann.parquet" val exampleFileMulti = "data/multi-ann.parquet" val exampleFileCont = "data/cont-ann.parquet" val exampleDataBinary = spark.read.parquet(exampleFile).as[BinaryAnnotation] val exampleDataMulti = spark.read.parquet(exampleFileMulti).as[MulticlassAnnotation] val exampleDataCont = spark.read.parquet(exampleFileCont).as[RealAnnotation] //Applying the learning algorithm //Binary class val muBinary = MajorityVoting.transformBinary(exampleDataBinary) val muBinaryProb = MajorityVoting.transformSoftBinary(exampleDataBinary) //Multiclass val muMulticlass = MajorityVoting.transformMulticlass(exampleDataMulti) val muMulticlassProb = MajorityVoting.transformSoftMulti(exampleDataMulti) //Continuous case val muCont = MajorityVoting.transformReal(exampleDataCont)
Version
0.1.3
object PM

Provides functions for transforming an annotation dataset into a standard label dataset using the PM algorithm.
Provides functions for transforming an annotation dataset into a standard label dataset using the PM algorithm.
This algorithm only works with continuous target variables. Thus you need an annotation dataset of types.RealAnnotation:
The algorithm returns a PM.PMModel, with information about the class true label estimation and the annotators weight
Example:
1. import com.enriquegrodrigo.spark.crowd.methods.PM import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val annFile = "data/real-ann.parquet" val annData = spark.read.parquet(annFile) //Applying the learning algorithm val mode = PM(annData) //Get MulticlassLabel with the class predictions val pred = mode.getMu() //Annotator weights val annweights = mode.getAnnotatorWeights()
Version
0.2.0
See also
Q. Li, Y. Li, J. Gao, B. Zhao, W. Fan, and J. Han. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In SIGMOD, pages 1187–1198, 2014.
object PMTI

Provides functions for transforming an annotation dataset into a standard label dataset using the modified version of the PM algorithm in the paper Truth Inference in Crowdsourcing: Is the problem solved?.
Provides functions for transforming an annotation dataset into a standard label dataset using the modified version of the PM algorithm in the paper Truth Inference in Crowdsourcing: Is the problem solved?.
This algorithm only works with continuous target variables. Thus you need an annotation dataset of types.RealAnnotation:
The algorithm returns a PMTI.PMModel, with information about the class true label estimation and the annotators weight
Example:
1. import com.enriquegrodrigo.spark.crowd.methods.PMTI import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val annFile = "data/real-ann.parquet" val annData = spark.read.parquet(annFile) //Applying the learning algorithm val mode = PMTI(annData) //Get MulticlassLabel with the class predictions val pred = mode.getMu() //Annotator weights val annweights = mode.getAnnotatorWeights()
Version
0.2.0
See also
Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, Reynold Cheng. Truth Inference in Crowdsourcing: Is the Problem Solved? In VLDB 2017, Vol 10, Isuue 5, Pages 541-552, Full Paper, Present in VLDB 2017, Aug 28 - Sep 1, Munich, Germany.
object RaykarBinary

Provides functions for transforming an annotation dataset into a standard label dataset using the RaykarBinary algorithm
Provides functions for transforming an annotation dataset into a standard label dataset using the RaykarBinary algorithm
This algorithm only works with types.BinaryAnnotation datasets. There are versions for the types.MulticlassAnnotation (RaykarMulti) and types.RealAnnotation (RaykarCont).
It will return a types.RaykarBinaryModel with information about the estimation of the ground truth for each example, the annotator precision estimation of the model, the weights of the logistic regression model learned and the log-likelihood of the model.
The next example can be found in the examples folders. In it, the user may also find an example of how to add prior confidence on the annotators.
Example:
1. import com.enriquegrodrigo.spark.crowd.methods.RaykarBinary import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val exampleFile = "data/binary-data.parquet" val annFile = "data/binary-ann.parquet" val exampleData = spark.read.parquet(exampleFile) val annData = spark.read.parquet(annFile).as[BinaryAnnotation] //Applying the learning algorithm val mode = RaykarBinary(exampleData, annData) //Get MulticlassLabel with the class predictions val pred = mode.getMu().as[BinarySoftLabel] //Annotator precision matrices val annprec = mode.getAnnotatorPrecision() //Annotator likelihood val like = mode.getLogLikelihood()
Version
0.1.5
See also
Raykar, Vikas C., et al. "Learning from crowds." Journal of Machine Learning Research 11.Apr (2010): 1297-1322.
object RaykarCont

Provides functions for transforming an annotation dataset into a standard label dataset using the Raykar algorithm for multiclass
Provides functions for transforming an annotation dataset into a standard label dataset using the Raykar algorithm for multiclass
This algorithm only works with types.RealAnnotation datasets. There are versions for the types.BinaryAnnotation (RaykarBinary) and types.MulticlassAnnotation (RaykarMulti).
It will return a types.RaykarContModel with information about the estimation of the ground truth for each example, the annotator precision estimation of the model, the weights of the linear regression model learned and the MAE of the model.
The next example can be found in the examples folders. In it, the user may also find an example of how to add prior confidence on the annotators.
Example:
1. import com.enriquegrodrigo.spark.crowd.methods.RaykarCont import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val exampleFile = "data/cont-data.parquet" val annFile = "data/cont-ann.parquet" val exampleData = spark.read.parquet(exampleFile) val annData = spark.read.parquet(annFile).as[RealAnnotation] //Applying the learning algorithm val mode = RaykarCont(exampleData, annData) //Get MulticlassLabel with the class predictions val pred = mode.getMu().as[RealLabel] //Annotator precision matrices val annprec = mode.getAnnotatorPrecision() //Annotator likelihood val like = mode.getLogLikelihood()
Version
0.1.5
See also
Raykar, Vikas C., et al. "Learning from crowds." Journal of Machine Learning Research 11.Apr (2010): 1297-1322.
object RaykarMulti

Provides functions for transforming an annotation dataset into a standard label dataset using the Raykar algorithm for multiclass
Provides functions for transforming an annotation dataset into a standard label dataset using the Raykar algorithm for multiclass
This algorithm only works with types.MulticlassAnnotation datasets. There are versions for the types.BinaryAnnotation (RaykarBinary) and types.RealAnnotation (RaykarCont).
It will return a types.RaykarMultiModel with information about the estimation of the ground truth for each example (probability for each class), the annotator precision estimation of the model, the weights of the three (one vs all) logistic regression model learned and the log-likelihood of the model.
The next example can be found in the examples folders. In it, the user may also find an example of how to add prior confidence on the annotators.
Example:
1. import com.enriquegrodrigo.spark.crowd.methods.RaykarMulti import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val exampleFile = "data/multi-data.parquet" val annFile = "data/multi-ann.parquet" val exampleData = spark.read.parquet(exampleFile) val annData = spark.read.parquet(annFile).as[MulticlassAnnotation] //Applying the learning algorithm val mode = RaykarMulti(exampleData, annData) //Get MulticlassLabel with the class predictions val pred = mode.getMu().as[MulticlassSoftProb] //Annotator precision matrices val annprec = mode.getAnnotatorPrecision() //Annotator likelihood val like = mode.getLogLikelihood()
Version
0.1.5
See also
Raykar, Vikas C., et al. "Learning from crowds." Journal of Machine Learning Research 11.Apr (2010): 1297-1322.

package methods

Value Members

object CATD

object CGlad

object DawidSkene

object Glad

object IBCC

object MajorityVoting

object PM

object PMTI

object RaykarBinary

object RaykarCont

object RaykarMulti

Ungrouped