spark-crowdΒΆ

Learning from crowdsourced Big Data


Learning from crowdsourced data imposes new challenges in the area of machine learning. This package, spark-crowd, helps practitioners when dealing with this kind of data at scale, using Apache Spark.

_images/illustration.png

The main features of spark-crowd are the following:

  • It implements well-known methods for learning from crowdsourced labeled data.
  • It is suitable for working with both large and small datasets.
  • It uses Apache Spark, which allows the code to run in different environments, from a computer to a multi-node cluster.
  • It is suitable both for research and production environments.
  • It provides an easy to use API, allowing the practitioner to start using the library in minutes.
  • It is licensed under the MIT license.

See the Quick Start to get started using spark-crowd