public final class BeamWordCount extends ObjectAn example that counts words in Shakespeare and includes Beam best practices. THIS EXAMPLE IS TAKEN FROM THE APACHE BEAM REPOSITORY.
For a detailed walkthrough of this example, see https://beam.apache.org/get-started/wordcount-example/
Basic concepts, also in the MinimalWordCount example: Reading text files; counting a PCollection; writing to text files
1. Executing a Pipeline both locally and using the selected runner 2. Using ParDo with static DoFns defined out-of-line 3. Building a composite transform 4. Defining your own pipeline options
Concept #1: you can execute this pipeline either locally or using by selecting another runner. These are now command-line options and not hard-coded as they were in the MinimalWordCount example.
To change the runner, specify:
To execute this pipeline, specify a local output file (if using the
DirectRunner) or output prefix on a supported distributed file system.
--output=[YOUR_LOCAL_FILE | YOUR_OUTPUT_PREFIX]
The input file defaults to a public data set containing the text of of King Lear, by William Shakespeare. You can override it and choose your own input with
Nested Class Summary
Nested Classes Modifier and Type Class Description
BeamWordCount.CountWordsA PTransform that converts a PCollection containing lines of text into a PCollection of formatted word counts.
BeamWordCount.FormatAsTextFnA SimpleFunction that converts a Word and Count into a printable string.
BeamWordCount.WordCountOptionsOptions supported by