Class SamplingSkewReshapingPass

  • All Implemented Interfaces:
    java.util.function.Function<IRDAG,​IRDAG>

    public final class SamplingSkewReshapingPass
    extends ReshapingPass
    Optimizes the PartitionSet property of shuffle edges to handle data skews using the SamplingVertex.

    This pass effectively partitions the IRDAG by non-oneToOne edges, clones each subDAG partition using SamplingVertex to process sampled data, and executes each cloned partition prior to executing the corresponding original partition.

    Suppose the IRDAG is partitioned into three sub-DAG partitions with shuffle dependencies as follows: P1 - P2 - P3

    Then, this pass will produce something like: P1' - P1 - P2' - P2 - P3 where Px' consists of SamplingVertex objects that clone the execution of Px. (P3 is not cloned here because it is a sink partition, and none of the outgoing edges of its vertices needs to be optimized)

    For each Px' this pass also inserts a TriggerVertex, to use its data statistics for dynamically optimizing the execution behaviors of Px.

    • Constructor Detail

      • SamplingSkewReshapingPass

        public SamplingSkewReshapingPass()
        Default constructor.