Class BeamWordCount

  • public final class BeamWordCount
    extends Object
    An example that counts words in Shakespeare and includes Beam best practices. THIS EXAMPLE IS TAKEN FROM THE APACHE BEAM REPOSITORY.

    For a detailed walkthrough of this example, see

    Basic concepts, also in the MinimalWordCount example: Reading text files; counting a PCollection; writing to text files

    New Concepts:

       1. Executing a Pipeline both locally and using the selected runner
       2. Using ParDo with static DoFns defined out-of-line
       3. Building a composite transform
       4. Defining your own pipeline options

    Concept #1: you can execute this pipeline either locally or using by selecting another runner. These are now command-line options and not hard-coded as they were in the MinimalWordCount example.

    To change the runner, specify:


    To execute this pipeline, specify a local output file (if using the DirectRunner) or output prefix on a supported distributed file system.


    The input file defaults to a public data set containing the text of of King Lear, by William Shakespeare. You can override it and choose your own input with --inputFile.