Provides functions for transforming an annotation dataset into a standard label dataset using the CATD algorithm.
Provides functions for transforming an annotation dataset into a standard label dataset using the CGlad algorithm.
Provides functions for transforming an annotation dataset into a standard label dataset using the CGlad algorithm.
This algorithm only works with types.BinaryAnnotation datasets.
The algorithm returns a types.CGladModel, with information as the class true label estimation, the annotator precision or the cluster difficulty.
import com.enriquegrodrigo.spark.crowd.methods.CGlad import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val annFile = "data/binary-ann.parquet" val annData = spark.read.parquet(annFile).as[BinaryAnnotation] //Applying the learning algorithm val mode = CGlad(annData) //Get MulticlassLabel with the class predictions val pred = mode.getMu().as[BinarySoftLabel] //Annotator precision matrices val annprec = mode.getAnnotatorPrecision() //Annotator precision matrices val annprec = mode.getClusterDifficulty() //Cluster for each example val annprec = mode.getClusters()
0.2.1
Provides functions for transforming an annotation dataset into a standard label dataset using the DawidSkene algorithm.
Provides functions for transforming an annotation dataset into a standard label dataset using the DawidSkene algorithm.
This algorithm only works with types.MulticlassAnnotation datasets although one
can easily use it for types.BinaryAnnotation through Spark Dataset
methodas
It returns a types.DawidSkeneModel with information about the estimation of the true class, as well as the annotator quality and the log-likelihood obtained by the model.
import com.enriquegrodrigo.spark.crowd.methods.DawidSkene import com.enriquegrodrigo.spark.crowd.types._ val exampleFile = "data/multi-ann.parquet" val exampleData = spark.read.parquet(exampleFile).as[MulticlassAnnotation] //Applying the learning algorithm val mode = DawidSkene(exampleData) //Get MulticlassLabel with the class predictions val pred = mode.getMu().as[MulticlassLabel] //Annotator precision matrices val annprec = mode.getAnnotatorPrecision() //Annotator likelihood val like = mode.getLogLikelihood()
0.1.5
Dawid, Alexander Philip, and Allan M. Skene. "Maximum likelihood estimation of observer error-rates using the EM algorithm." Applied statistics (1979): 20-28.
Provides functions for transforming an annotation dataset into a standard label dataset using the Glad algorithm.
Provides functions for transforming an annotation dataset into a standard label dataset using the Glad algorithm.
This algorithm only works with types.BinaryAnnotation datasets.
The algorithm returns a types.GladModel, with information about the class true label estimation, the annotator precision, the instances difficulty and the log-likilihood of the model.
import com.enriquegrodrigo.spark.crowd.methods.Glad import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val annFile = "data/binary-ann.parquet" val annData = spark.read.parquet(annFile).as[BinaryAnnotation] //Applying the learning algorithm val mode = Glad(annData) //Get MulticlassLabel with the class predictions val pred = mode.getMu().as[BinarySoftLabel] //Annotator precision matrices val annprec = mode.getAnnotatorPrecision() //Annotator precision matrices val annprec = mode.getInstanceDifficulty() //Annotator likelihood val like = mode.getLogLikelihood()
0.1.5
Whitehill, Jacob, et al. "Whose vote should count more: Optimal integration of labels from labelers of unknown expertise." Advances in neural information processing systems. 2009.
Provides functions for transforming an annotation dataset into a standard label dataset using the IBCC algorithm.
Provides functions for transforming an annotation dataset into a standard label dataset using the IBCC algorithm.
This algorithm only works with multiclass target variables (Datasets of type types.MulticlassAnnotation
The algorithm returns a IBCC.IBCCModel, with information about the class true label estimation, the annotators precision, and the class prior estimation
import com.enriquegrodrigo.spark.crowd.methods.IBCC import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val annFile = "data/binary-ann.parquet" val annData = spark.read.parquet(annFile) //Applying the learning algorithm val mode = IBCC(annData) //Get MulticlassLabel with the class predictions val pred = mode.getMu() //Annotator precision matrices val annprec = mode.getAnnotatorPrecision() //Annotator precision matrices val classPrior = mode.getClassPrior()
0.2.0
H.-C. Kim and Z. Ghahramani. Bayesian classifier combination. In AISTATS, pages 619–627, 2012.
Provides functions for transforming an annotation dataset into a standard label dataset using the majority voting approach
Provides functions for transforming an annotation dataset into a standard label dataset using the majority voting approach
This object provides several functions for using majority voting style algorithms over annotations datasets (spark datasets with types types.BinaryAnnotation, types.MulticlassAnnotation, or types.RealAnnotation). For discrete types (types.BinaryAnnotation, types.MulticlassAnnotation) the method uses the most frequent class. For continuous types, the mean is used.
The object also provides methods for estimating the probability of a class for the discrete type, computing, for the binary case, the proportion of the positive class and, for the multiclass case, the proportion of each of the classes.
The next example can be found in the examples folder of the project.
import com.enriquegrodrigo.spark.crowd.methods.MajorityVoting import com.enriquegrodrigo.spark.crowd.types._ val exampleFile = "data/binary-ann.parquet" val exampleFileMulti = "data/multi-ann.parquet" val exampleFileCont = "data/cont-ann.parquet" val exampleDataBinary = spark.read.parquet(exampleFile).as[BinaryAnnotation] val exampleDataMulti = spark.read.parquet(exampleFileMulti).as[MulticlassAnnotation] val exampleDataCont = spark.read.parquet(exampleFileCont).as[RealAnnotation] //Applying the learning algorithm //Binary class val muBinary = MajorityVoting.transformBinary(exampleDataBinary) val muBinaryProb = MajorityVoting.transformSoftBinary(exampleDataBinary) //Multiclass val muMulticlass = MajorityVoting.transformMulticlass(exampleDataMulti) val muMulticlassProb = MajorityVoting.transformSoftMulti(exampleDataMulti) //Continuous case val muCont = MajorityVoting.transformReal(exampleDataCont)
0.1.3
Provides functions for transforming an annotation dataset into a standard label dataset using the PM algorithm.
Provides functions for transforming an annotation dataset into a standard label dataset using the PM algorithm.
This algorithm only works with continuous target variables. Thus you need an annotation dataset of types.RealAnnotation:
The algorithm returns a PM.PMModel, with information about the class true label estimation and the annotators weight
import com.enriquegrodrigo.spark.crowd.methods.PM import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val annFile = "data/real-ann.parquet" val annData = spark.read.parquet(annFile) //Applying the learning algorithm val mode = PM(annData) //Get MulticlassLabel with the class predictions val pred = mode.getMu() //Annotator weights val annweights = mode.getAnnotatorWeights()
0.2.0
Q. Li, Y. Li, J. Gao, B. Zhao, W. Fan, and J. Han. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In SIGMOD, pages 1187–1198, 2014.
Provides functions for transforming an annotation dataset into a standard label dataset using the modified version of the PM algorithm in the paper Truth Inference in Crowdsourcing: Is the problem solved?.
Provides functions for transforming an annotation dataset into a standard label dataset using the modified version of the PM algorithm in the paper Truth Inference in Crowdsourcing: Is the problem solved?.
This algorithm only works with continuous target variables. Thus you need an annotation dataset of types.RealAnnotation:
The algorithm returns a PMTI.PMModel, with information about the class true label estimation and the annotators weight
import com.enriquegrodrigo.spark.crowd.methods.PMTI import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val annFile = "data/real-ann.parquet" val annData = spark.read.parquet(annFile) //Applying the learning algorithm val mode = PMTI(annData) //Get MulticlassLabel with the class predictions val pred = mode.getMu() //Annotator weights val annweights = mode.getAnnotatorWeights()
0.2.0
Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, Reynold Cheng. Truth Inference in Crowdsourcing: Is the Problem Solved? In VLDB 2017, Vol 10, Isuue 5, Pages 541-552, Full Paper, Present in VLDB 2017, Aug 28 - Sep 1, Munich, Germany.
Provides functions for transforming an annotation dataset into a standard label dataset using the RaykarBinary algorithm
Provides functions for transforming an annotation dataset into a standard label dataset using the RaykarBinary algorithm
This algorithm only works with types.BinaryAnnotation datasets. There are versions for the types.MulticlassAnnotation (RaykarMulti) and types.RealAnnotation (RaykarCont).
It will return a types.RaykarBinaryModel with information about the estimation of the ground truth for each example, the annotator precision estimation of the model, the weights of the logistic regression model learned and the log-likelihood of the model.
The next example can be found in the examples folders. In it, the user may also find an example of how to add prior confidence on the annotators.
import com.enriquegrodrigo.spark.crowd.methods.RaykarBinary import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val exampleFile = "data/binary-data.parquet" val annFile = "data/binary-ann.parquet" val exampleData = spark.read.parquet(exampleFile) val annData = spark.read.parquet(annFile).as[BinaryAnnotation] //Applying the learning algorithm val mode = RaykarBinary(exampleData, annData) //Get MulticlassLabel with the class predictions val pred = mode.getMu().as[BinarySoftLabel] //Annotator precision matrices val annprec = mode.getAnnotatorPrecision() //Annotator likelihood val like = mode.getLogLikelihood()
0.1.5
Raykar, Vikas C., et al. "Learning from crowds." Journal of Machine Learning Research 11.Apr (2010): 1297-1322.
Provides functions for transforming an annotation dataset into a standard label dataset using the Raykar algorithm for multiclass
Provides functions for transforming an annotation dataset into a standard label dataset using the Raykar algorithm for multiclass
This algorithm only works with types.RealAnnotation datasets. There are versions for the types.BinaryAnnotation (RaykarBinary) and types.MulticlassAnnotation (RaykarMulti).
It will return a types.RaykarContModel with information about the estimation of the ground truth for each example, the annotator precision estimation of the model, the weights of the linear regression model learned and the MAE of the model.
The next example can be found in the examples folders. In it, the user may also find an example of how to add prior confidence on the annotators.
import com.enriquegrodrigo.spark.crowd.methods.RaykarCont import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val exampleFile = "data/cont-data.parquet" val annFile = "data/cont-ann.parquet" val exampleData = spark.read.parquet(exampleFile) val annData = spark.read.parquet(annFile).as[RealAnnotation] //Applying the learning algorithm val mode = RaykarCont(exampleData, annData) //Get MulticlassLabel with the class predictions val pred = mode.getMu().as[RealLabel] //Annotator precision matrices val annprec = mode.getAnnotatorPrecision() //Annotator likelihood val like = mode.getLogLikelihood()
0.1.5
Raykar, Vikas C., et al. "Learning from crowds." Journal of Machine Learning Research 11.Apr (2010): 1297-1322.
Provides functions for transforming an annotation dataset into a standard label dataset using the Raykar algorithm for multiclass
Provides functions for transforming an annotation dataset into a standard label dataset using the Raykar algorithm for multiclass
This algorithm only works with types.MulticlassAnnotation datasets. There are versions for the types.BinaryAnnotation (RaykarBinary) and types.RealAnnotation (RaykarCont).
It will return a types.RaykarMultiModel with information about the estimation of the ground truth for each example (probability for each class), the annotator precision estimation of the model, the weights of the three (one vs all) logistic regression model learned and the log-likelihood of the model.
The next example can be found in the examples folders. In it, the user may also find an example of how to add prior confidence on the annotators.
import com.enriquegrodrigo.spark.crowd.methods.RaykarMulti import com.enriquegrodrigo.spark.crowd.types._ sc.setCheckpointDir("checkpoint") val exampleFile = "data/multi-data.parquet" val annFile = "data/multi-ann.parquet" val exampleData = spark.read.parquet(exampleFile) val annData = spark.read.parquet(annFile).as[MulticlassAnnotation] //Applying the learning algorithm val mode = RaykarMulti(exampleData, annData) //Get MulticlassLabel with the class predictions val pred = mode.getMu().as[MulticlassSoftProb] //Annotator precision matrices val annprec = mode.getAnnotatorPrecision() //Annotator likelihood val like = mode.getLogLikelihood()
0.1.5
Raykar, Vikas C., et al. "Learning from crowds." Journal of Machine Learning Research 11.Apr (2010): 1297-1322.
Provides functions for transforming an annotation dataset into a standard label dataset using the CATD algorithm.
This algorithm only works with continuous label datasets of type types.RealAnnotation:
The algorithm returns a CATD.CATDModel, with information about the class true label estimation and the annotators weight
0.2.0
Q. Li, Y. Li, J. Gao, L. Su, B. Zhao, M. Demirbas, W. Fan, and J. Han. A confidence-aware approach for truth discovery on long-tail data. PVLDB, 8(4):425–436, 2014.