public final class DataSkewHashPartitioner extends Object implements Partitioner<Integer>
Partitionerwhich hashes output data from a source task appropriate to detect data skew. It hashes data finer than
HashPartitioner. The elements will be hashed by their key, and applied "modulo" operation. When we need to split or recombine the output data from a task after it is stored, we multiply the hash range with a multiplier, which is commonly-known by the source and destination tasks, to prevent the extra deserialize - rehash - serialize process. For more information, please check
|Constructor and Description|
|Modifier and Type||Method and Description|
Divides the output data from a task into multiple blocks.
public DataSkewHashPartitioner(int hashRangeMultiplier, int dstParallelism, KeyExtractor keyExtractor)
hashRangeMultiplier- the hash range multiplier.
dstParallelism- the number of destination tasks.
keyExtractor- the key extractor that extracts keys from elements.
Copyright © 2019 The Apache Software Foundation. All rights reserved.