Class BeamWordCount
- java.lang.Object
-
- org.apache.nemo.examples.beam.BeamWordCount
-
public final class BeamWordCount extends java.lang.Object
An example that counts words in Shakespeare and includes Beam best practices. THIS EXAMPLE IS TAKEN FROM THE APACHE BEAM REPOSITORY.For a detailed walkthrough of this example, see https://beam.apache.org/get-started/wordcount-example/
Basic concepts, also in the MinimalWordCount example: Reading text files; counting a PCollection; writing to text files
New Concepts:
1. Executing a Pipeline both locally and using the selected runner 2. Using ParDo with static DoFns defined out-of-line 3. Building a composite transform 4. Defining your own pipeline options
Concept #1: you can execute this pipeline either locally or using by selecting another runner. These are now command-line options and not hard-coded as they were in the MinimalWordCount example.
To change the runner, specify:
--runner=YOUR_SELECTED_RUNNER
To execute this pipeline, specify a local output file (if using the
DirectRunner
) or output prefix on a supported distributed file system.--output=[YOUR_LOCAL_FILE | YOUR_OUTPUT_PREFIX]
The input file defaults to a public data set containing the text of of King Lear, by William Shakespeare. You can override it and choose your own input with
--inputFile
.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BeamWordCount.CountWords
A PTransform that converts a PCollection containing lines of text into a PCollection of formatted word counts.static class
BeamWordCount.FormatAsTextFn
A SimpleFunction that converts a Word and Count into a printable string.static interface
BeamWordCount.WordCountOptions
Options supported byWordCount
.
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
TOKENIZER_PATTERN
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static void
main(java.lang.String[] args)
-
-
-
Field Detail
-
TOKENIZER_PATTERN
public static final java.lang.String TOKENIZER_PATTERN
- See Also:
- Constant Field Values
-
-