Class BeamWordCount


  • public final class BeamWordCount
    extends Object
    An example that counts words in Shakespeare and includes Beam best practices. THIS EXAMPLE IS TAKEN FROM THE APACHE BEAM REPOSITORY.

    For a detailed walkthrough of this example, see https://beam.apache.org/get-started/wordcount-example/

    Basic concepts, also in the MinimalWordCount example: Reading text files; counting a PCollection; writing to text files

    New Concepts:

       1. Executing a Pipeline both locally and using the selected runner
       2. Using ParDo with static DoFns defined out-of-line
       3. Building a composite transform
       4. Defining your own pipeline options
     

    Concept #1: you can execute this pipeline either locally or using by selecting another runner. These are now command-line options and not hard-coded as they were in the MinimalWordCount example.

    To change the runner, specify:

    
     --runner=YOUR_SELECTED_RUNNER
     

    To execute this pipeline, specify a local output file (if using the DirectRunner) or output prefix on a supported distributed file system.

    
     --output=[YOUR_LOCAL_FILE | YOUR_OUTPUT_PREFIX]
     

    The input file defaults to a public data set containing the text of of King Lear, by William Shakespeare. You can override it and choose your own input with --inputFile.