Class BeamWordCount


  • public final class BeamWordCount
    extends java.lang.Object
    An example that counts words in Shakespeare and includes Beam best practices. THIS EXAMPLE IS TAKEN FROM THE APACHE BEAM REPOSITORY.

    For a detailed walkthrough of this example, see https://beam.apache.org/get-started/wordcount-example/

    Basic concepts, also in the MinimalWordCount example: Reading text files; counting a PCollection; writing to text files

    New Concepts:

       1. Executing a Pipeline both locally and using the selected runner
       2. Using ParDo with static DoFns defined out-of-line
       3. Building a composite transform
       4. Defining your own pipeline options
     

    Concept #1: you can execute this pipeline either locally or using by selecting another runner. These are now command-line options and not hard-coded as they were in the MinimalWordCount example.

    To change the runner, specify:

    
     --runner=YOUR_SELECTED_RUNNER
     

    To execute this pipeline, specify a local output file (if using the DirectRunner) or output prefix on a supported distributed file system.

    
     --output=[YOUR_LOCAL_FILE | YOUR_OUTPUT_PREFIX]
     

    The input file defaults to a public data set containing the text of of King Lear, by William Shakespeare. You can override it and choose your own input with --inputFile.

    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String TOKENIZER_PATTERN  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void main​(java.lang.String[] args)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • TOKENIZER_PATTERN

        public static final java.lang.String TOKENIZER_PATTERN
        See Also:
        Constant Field Values
    • Method Detail

      • main

        public static void main​(java.lang.String[] args)