Class SkewAnnotatingPass

  • All Implemented Interfaces:
    Function<IRDAG,​IRDAG>

    public final class SkewAnnotatingPass
    extends AnnotatingPass
    For each shuffle edge, set the number of partitions to (dstParallelism * HASH_RANGE_MULTIPLIER). With this finer-grained partitioning, we can dynamically assign partitions to destination tasks based on data sizes.
    • Field Detail

      • HASH_RANGE_MULTIPLIER

        public static final int HASH_RANGE_MULTIPLIER
        Hash range multiplier. If we need to split or recombine an output data from a task after it is stored, we multiply the hash range with this factor in advance to prevent the extra deserialize - rehash - serialize process. In these cases, the hash range will be (hash range multiplier X destination task parallelism).
        See Also:
        Constant Field Values
    • Constructor Detail

      • SkewAnnotatingPass

        public SkewAnnotatingPass()
        Default constructor.