Package

com.enriquegrodrigo.spark.crowd

methods

Permalink

package methods

Visibility
  1. Public
  2. All

Value Members

  1. object CATD

    Permalink

    Provides functions for transforming an annotation dataset into a standard label dataset using the CATD algorithm.

    Provides functions for transforming an annotation dataset into a standard label dataset using the CATD algorithm.

    This algorithm only works with continuous label datasets of type types.RealAnnotation:

    The algorithm returns a CATD.CATDModel, with information about the class true label estimation and the annotators weight

    Example:
    1. import com.enriquegrodrigo.spark.crowd.methods.CATD
      import com.enriquegrodrigo.spark.crowd.types._
      sc.setCheckpointDir("checkpoint")
      val annFile = "data/real-ann.parquet"
      val annData = spark.read.parquet(annFile)
      //Applying the learning algorithm
      val mode = CATD(annData.as[RealAnnotation])
      //Get MulticlassLabel with the class predictions
      val pred = mode.getMu()
      //Annotator weights
      val annweights = mode.getAnnotatorWeights()
    Version

    0.2.0

    See also

    Q. Li, Y. Li, J. Gao, L. Su, B. Zhao, M. Demirbas, W. Fan, and J. Han. A confidence-aware approach for truth discovery on long-tail data. PVLDB, 8(4):425–436, 2014.

  2. object CGlad

    Permalink

    Provides functions for transforming an annotation dataset into a standard label dataset using the CGlad algorithm.

    Provides functions for transforming an annotation dataset into a standard label dataset using the CGlad algorithm.

    This algorithm only works with types.BinaryAnnotation datasets.

    The algorithm returns a types.CGladModel, with information as the class true label estimation, the annotator precision or the cluster difficulty.

    Example:
    1. import com.enriquegrodrigo.spark.crowd.methods.CGlad
      import com.enriquegrodrigo.spark.crowd.types._
      sc.setCheckpointDir("checkpoint")
      val annFile = "data/binary-ann.parquet"
      val annData = spark.read.parquet(annFile).as[BinaryAnnotation]
      //Applying the learning algorithm
      val mode = CGlad(annData)
      //Get MulticlassLabel with the class predictions
      val pred = mode.getMu().as[BinarySoftLabel]
      //Annotator precision matrices
      val annprec = mode.getAnnotatorPrecision()
      //Annotator precision matrices
      val annprec = mode.getClusterDifficulty()
      //Cluster for each example
      val annprec = mode.getClusters()
    Version

    0.2.1

  3. object DawidSkene

    Permalink

    Provides functions for transforming an annotation dataset into a standard label dataset using the DawidSkene algorithm.

    Provides functions for transforming an annotation dataset into a standard label dataset using the DawidSkene algorithm.

    This algorithm only works with types.MulticlassAnnotation datasets although one can easily use it for types.BinaryAnnotation through Spark Dataset as method

    It returns a types.DawidSkeneModel with information about the estimation of the true class, as well as the annotator quality and the log-likelihood obtained by the model.

    Example:
    1. import com.enriquegrodrigo.spark.crowd.methods.DawidSkene
      import com.enriquegrodrigo.spark.crowd.types._
      val exampleFile = "data/multi-ann.parquet"
      val exampleData = spark.read.parquet(exampleFile).as[MulticlassAnnotation]
      //Applying the learning algorithm
      val mode = DawidSkene(exampleData)
      //Get MulticlassLabel with the class predictions
      val pred = mode.getMu().as[MulticlassLabel]
      //Annotator precision matrices
      val annprec = mode.getAnnotatorPrecision()
      //Annotator likelihood
      val like = mode.getLogLikelihood()
    Version

    0.1.5

    See also

    Dawid, Alexander Philip, and Allan M. Skene. "Maximum likelihood estimation of observer error-rates using the EM algorithm." Applied statistics (1979): 20-28.

  4. object Glad

    Permalink

    Provides functions for transforming an annotation dataset into a standard label dataset using the Glad algorithm.

    Provides functions for transforming an annotation dataset into a standard label dataset using the Glad algorithm.

    This algorithm only works with types.BinaryAnnotation datasets.

    The algorithm returns a types.GladModel, with information about the class true label estimation, the annotator precision, the instances difficulty and the log-likilihood of the model.

    Example:
    1. import com.enriquegrodrigo.spark.crowd.methods.Glad
      import com.enriquegrodrigo.spark.crowd.types._
      sc.setCheckpointDir("checkpoint")
      val annFile = "data/binary-ann.parquet"
      val annData = spark.read.parquet(annFile).as[BinaryAnnotation]
      //Applying the learning algorithm
      val mode = Glad(annData)
      //Get MulticlassLabel with the class predictions
      val pred = mode.getMu().as[BinarySoftLabel]
      //Annotator precision matrices
      val annprec = mode.getAnnotatorPrecision()
      //Annotator precision matrices
      val annprec = mode.getInstanceDifficulty()
      //Annotator likelihood
      val like = mode.getLogLikelihood()
    Version

    0.1.5

    See also

    Whitehill, Jacob, et al. "Whose vote should count more: Optimal integration of labels from labelers of unknown expertise." Advances in neural information processing systems. 2009.

  5. object IBCC

    Permalink

    Provides functions for transforming an annotation dataset into a standard label dataset using the IBCC algorithm.

    Provides functions for transforming an annotation dataset into a standard label dataset using the IBCC algorithm.

    This algorithm only works with multiclass target variables (Datasets of type types.MulticlassAnnotation

    The algorithm returns a IBCC.IBCCModel, with information about the class true label estimation, the annotators precision, and the class prior estimation

    Example:
    1. import com.enriquegrodrigo.spark.crowd.methods.IBCC
      import com.enriquegrodrigo.spark.crowd.types._
      sc.setCheckpointDir("checkpoint")
      val annFile = "data/binary-ann.parquet"
      val annData = spark.read.parquet(annFile)
      //Applying the learning algorithm
      val mode = IBCC(annData)
      //Get MulticlassLabel with the class predictions
      val pred = mode.getMu()
      //Annotator precision matrices
      val annprec = mode.getAnnotatorPrecision()
      //Annotator precision matrices
      val classPrior = mode.getClassPrior()
    Version

    0.2.0

    See also

    H.-C. Kim and Z. Ghahramani. Bayesian classifier combination. In AISTATS, pages 619–627, 2012.

  6. object MajorityVoting

    Permalink

    Provides functions for transforming an annotation dataset into a standard label dataset using the majority voting approach

    Provides functions for transforming an annotation dataset into a standard label dataset using the majority voting approach

    This object provides several functions for using majority voting style algorithms over annotations datasets (spark datasets with types types.BinaryAnnotation, types.MulticlassAnnotation, or types.RealAnnotation). For discrete types (types.BinaryAnnotation, types.MulticlassAnnotation) the method uses the most frequent class. For continuous types, the mean is used.

    The object also provides methods for estimating the probability of a class for the discrete type, computing, for the binary case, the proportion of the positive class and, for the multiclass case, the proportion of each of the classes.

    The next example can be found in the examples folder of the project.

    Example:
    1. import com.enriquegrodrigo.spark.crowd.methods.MajorityVoting
      import com.enriquegrodrigo.spark.crowd.types._
      val exampleFile = "data/binary-ann.parquet"
      val exampleFileMulti = "data/multi-ann.parquet"
      val exampleFileCont = "data/cont-ann.parquet"
      val exampleDataBinary = spark.read.parquet(exampleFile).as[BinaryAnnotation]
      val exampleDataMulti = spark.read.parquet(exampleFileMulti).as[MulticlassAnnotation]
      val exampleDataCont = spark.read.parquet(exampleFileCont).as[RealAnnotation]
      //Applying the learning algorithm
      //Binary class
      val muBinary = MajorityVoting.transformBinary(exampleDataBinary)
      val muBinaryProb = MajorityVoting.transformSoftBinary(exampleDataBinary)
      //Multiclass
      val muMulticlass = MajorityVoting.transformMulticlass(exampleDataMulti)
      val muMulticlassProb = MajorityVoting.transformSoftMulti(exampleDataMulti)
      //Continuous case
      val muCont = MajorityVoting.transformReal(exampleDataCont)
    Version

    0.1.3

  7. object PM

    Permalink

    Provides functions for transforming an annotation dataset into a standard label dataset using the PM algorithm.

    Provides functions for transforming an annotation dataset into a standard label dataset using the PM algorithm.

    This algorithm only works with continuous target variables. Thus you need an annotation dataset of types.RealAnnotation:

    The algorithm returns a PM.PMModel, with information about the class true label estimation and the annotators weight

    Example:
    1. import com.enriquegrodrigo.spark.crowd.methods.PM
      import com.enriquegrodrigo.spark.crowd.types._
      sc.setCheckpointDir("checkpoint")
      val annFile = "data/real-ann.parquet"
      val annData = spark.read.parquet(annFile)
      //Applying the learning algorithm
      val mode = PM(annData)
      //Get MulticlassLabel with the class predictions
      val pred = mode.getMu()
      //Annotator weights
      val annweights = mode.getAnnotatorWeights()
    Version

    0.2.0

    See also

    Q. Li, Y. Li, J. Gao, B. Zhao, W. Fan, and J. Han. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In SIGMOD, pages 1187–1198, 2014.

  8. object PMTI

    Permalink

    Provides functions for transforming an annotation dataset into a standard label dataset using the modified version of the PM algorithm in the paper Truth Inference in Crowdsourcing: Is the problem solved?.

    Provides functions for transforming an annotation dataset into a standard label dataset using the modified version of the PM algorithm in the paper Truth Inference in Crowdsourcing: Is the problem solved?.

    This algorithm only works with continuous target variables. Thus you need an annotation dataset of types.RealAnnotation:

    The algorithm returns a PMTI.PMModel, with information about the class true label estimation and the annotators weight

    Example:
    1. import com.enriquegrodrigo.spark.crowd.methods.PMTI
      import com.enriquegrodrigo.spark.crowd.types._
      sc.setCheckpointDir("checkpoint")
      val annFile = "data/real-ann.parquet"
      val annData = spark.read.parquet(annFile)
      //Applying the learning algorithm
      val mode = PMTI(annData)
      //Get MulticlassLabel with the class predictions
      val pred = mode.getMu()
      //Annotator weights
      val annweights = mode.getAnnotatorWeights()
    Version

    0.2.0

    See also

    Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, Reynold Cheng. Truth Inference in Crowdsourcing: Is the Problem Solved? In VLDB 2017, Vol 10, Isuue 5, Pages 541-552, Full Paper, Present in VLDB 2017, Aug 28 - Sep 1, Munich, Germany.

  9. object RaykarBinary

    Permalink

    Provides functions for transforming an annotation dataset into a standard label dataset using the RaykarBinary algorithm

    Provides functions for transforming an annotation dataset into a standard label dataset using the RaykarBinary algorithm

    This algorithm only works with types.BinaryAnnotation datasets. There are versions for the types.MulticlassAnnotation (RaykarMulti) and types.RealAnnotation (RaykarCont).

    It will return a types.RaykarBinaryModel with information about the estimation of the ground truth for each example, the annotator precision estimation of the model, the weights of the logistic regression model learned and the log-likelihood of the model.

    The next example can be found in the examples folders. In it, the user may also find an example of how to add prior confidence on the annotators.

    Example:
    1. import com.enriquegrodrigo.spark.crowd.methods.RaykarBinary
      import com.enriquegrodrigo.spark.crowd.types._
      sc.setCheckpointDir("checkpoint")
      val exampleFile = "data/binary-data.parquet"
      val annFile = "data/binary-ann.parquet"
      val exampleData = spark.read.parquet(exampleFile)
      val annData = spark.read.parquet(annFile).as[BinaryAnnotation]
      //Applying the learning algorithm
      val mode = RaykarBinary(exampleData, annData)
      //Get MulticlassLabel with the class predictions
      val pred = mode.getMu().as[BinarySoftLabel]
      //Annotator precision matrices
      val annprec = mode.getAnnotatorPrecision()
      //Annotator likelihood
      val like = mode.getLogLikelihood()
    Version

    0.1.5

    See also

    Raykar, Vikas C., et al. "Learning from crowds." Journal of Machine Learning Research 11.Apr (2010): 1297-1322.

  10. object RaykarCont

    Permalink

    Provides functions for transforming an annotation dataset into a standard label dataset using the Raykar algorithm for multiclass

    Provides functions for transforming an annotation dataset into a standard label dataset using the Raykar algorithm for multiclass

    This algorithm only works with types.RealAnnotation datasets. There are versions for the types.BinaryAnnotation (RaykarBinary) and types.MulticlassAnnotation (RaykarMulti).

    It will return a types.RaykarContModel with information about the estimation of the ground truth for each example, the annotator precision estimation of the model, the weights of the linear regression model learned and the MAE of the model.

    The next example can be found in the examples folders. In it, the user may also find an example of how to add prior confidence on the annotators.

    Example:
    1. import com.enriquegrodrigo.spark.crowd.methods.RaykarCont
      import com.enriquegrodrigo.spark.crowd.types._
      sc.setCheckpointDir("checkpoint")
      val exampleFile = "data/cont-data.parquet"
      val annFile = "data/cont-ann.parquet"
      val exampleData = spark.read.parquet(exampleFile)
      val annData = spark.read.parquet(annFile).as[RealAnnotation]
      //Applying the learning algorithm
      val mode = RaykarCont(exampleData, annData)
      //Get MulticlassLabel with the class predictions
      val pred = mode.getMu().as[RealLabel]
      //Annotator precision matrices
      val annprec = mode.getAnnotatorPrecision()
      //Annotator likelihood
      val like = mode.getLogLikelihood()
    Version

    0.1.5

    See also

    Raykar, Vikas C., et al. "Learning from crowds." Journal of Machine Learning Research 11.Apr (2010): 1297-1322.

  11. object RaykarMulti

    Permalink

    Provides functions for transforming an annotation dataset into a standard label dataset using the Raykar algorithm for multiclass

    Provides functions for transforming an annotation dataset into a standard label dataset using the Raykar algorithm for multiclass

    This algorithm only works with types.MulticlassAnnotation datasets. There are versions for the types.BinaryAnnotation (RaykarBinary) and types.RealAnnotation (RaykarCont).

    It will return a types.RaykarMultiModel with information about the estimation of the ground truth for each example (probability for each class), the annotator precision estimation of the model, the weights of the three (one vs all) logistic regression model learned and the log-likelihood of the model.

    The next example can be found in the examples folders. In it, the user may also find an example of how to add prior confidence on the annotators.

    Example:
    1. import com.enriquegrodrigo.spark.crowd.methods.RaykarMulti
      import com.enriquegrodrigo.spark.crowd.types._
      sc.setCheckpointDir("checkpoint")
      val exampleFile = "data/multi-data.parquet"
      val annFile = "data/multi-ann.parquet"
      val exampleData = spark.read.parquet(exampleFile)
      val annData = spark.read.parquet(annFile).as[MulticlassAnnotation]
      //Applying the learning algorithm
      val mode = RaykarMulti(exampleData, annData)
      //Get MulticlassLabel with the class predictions
      val pred = mode.getMu().as[MulticlassSoftProb]
      //Annotator precision matrices
      val annprec = mode.getAnnotatorPrecision()
      //Annotator likelihood
      val like = mode.getLogLikelihood()
    Version

    0.1.5

    See also

    Raykar, Vikas C., et al. "Learning from crowds." Journal of Machine Learning Research 11.Apr (2010): 1297-1322.

Ungrouped