.. _quickstart: Quick Start ============= You can easily start using ``spark-crowd`` through our `docker `_ image or through `spark-packages `_. See :ref:`installation`, for all installation alternatives (such as how to add the package as a dependency in your project). Start with our docker image --------------------------- The quickest way to try out the package is through the `provided docker image `_ with the latest version of our package, as you do not need to install any other software (apart from docker). .. code-block:: shell docker pull enriquegrodrigo/spark-crowd Thanks to it, you can run the examples provided along with the `package `_. For example, to run `DawidSkeneExample.scala` we can use: .. code-block:: shell docker run --rm -it -v $(pwd)/:/home/work/project enriquegrodrigo/spark-crowd DawidSkeneExample.scala You can also open a spark shell with the library preloaded. .. code-block:: shell docker run --rm -it -v $(pwd)/:/home/work/project enriquegrodrigo/spark-crowd By doing that, you can test you code directly. You will not benefit from the distributed execution of Apache Spark, but you are still able to use the algorithms with medium-sized datasets (since docker can use several cores in your machine). Start with `spark-packages` ---------------------------------------- If you have an installation of `Apache Spark `_ you can open a `spark-shell` with our package pre-loaded using: .. code-block:: shell spark-shell --packages com.enriquegrodrigo:spark-crowd_2.11:0.2.1 Likewise, you can submit an application to your cluster that uses `spark-crowd` using: .. code-block:: shell spark-submit --packages com.enriquegrodrigo:spark-crowd_2.11:0.2.1 application.scala To use this option you do not need to have a cluster of computers, you can also execute the code from your local machine because Apache Spark can be installed locally. For more information on how to install Apache Spark, please refer to its `homepage `_. Basic usage ---------------- Once you have chosen a procedure to run the package, you have to import the method that you want to use as well as the types for your data, as you can see below: .. code-block:: scala import com.enriquegrodrigo.spark.crowd.methods.DawidSkene import com.enriquegrodrigo.spark.crowd.types.MulticlassAnnotation val exampleFile = "examples/data/multi-ann.parquet" val exampleData = spark.read.parquet(exampleFile).as[MulticlassAnnotation] //Applying the learning algorithm val mode = DawidSkene(exampleData) //Get MulticlassLabel with the class predictions val pred = mode.getMu().as[MulticlassLabel] //Annotator precision matrices val annprec = mode.getAnnotatorPrecision() You can find a description of the code below: #. First the method and the type are imported, in this case ``DawidSkene`` and ``MulticlassAnnotation``. The type is needed as the package API only accepts typed datasets for the annotations. #. Then the data file (provided with the package) is loaded. It contains annotations for different examples. As you can see, the examples uses the method ``as`` to convert the Spark DataFrame in a typed Spark Dataset (with type MulticlassAnnotation). #. To execute the model and obtain the result, you can use the model name directly. This function returns a ``DawidSkeneModel``, which includes several methods to obtain results from the algorithm. #. The method ``getMu`` returns the ground truth estimations made by the model. #. We use ``getAnnotatorPrecision`` to obtain the annotator quality calculated by the model. You can consult the models implemented in this package in :ref:`methods`, where you can find a link to the original article for each algorithm.