• H
    XForm for index joins with groupby and project on the inner side (#10711) · 24671b5b
    Hans Zeller 提交于
    * Add a CPatternNode operator
    
    This operator can be used in patterns of xforms and it can match one of
    multiple regular operators.
    
    Which operators it matches depends on its match type. Right now, there is
    only one match type, EmtMatchInnerOrLeftOuterJoin, that matches a
    logical inner join or a logical left outer join.
    
    * Add 2 new xforms for index apply
    
    The new xforms use the new CPatternNode for a patterns of the form
    
    CPatternNode(Leaf, Tree, Tree)
    
    The CPatternNode matches both an inner and a left outer join.
    The Tree for the right child can contain arbitrary operators.
    To avoid an explosion of the search space, we add two conditions
    to the xform: First, the xform is applied only once, not for all
    bindings of the Tree node. Second, the xform is applied only if
    the right child has a "join depth" of 1, excluding any children
    that are complex and wouldn't satisfy the conditions of this xform
    anyway.
    
    * Remove 16 obsolete xforms and replace them with 2 new ones
    
    To remove xforms, we have to add a mechanism to skip unused xform ids,
    to preserve the ids of the remaining xforms that are used in trace flags.
    Our array of xforms now allows for holes.
    
    * Changes to unit test programs
    
    Updated the test programs to tolerate holes in the xform array.
    Used a new xform instead of one of the removed ones.
    
    * MDP changes
    
    * ICG changes
    
    * Fixes for ICG failures in join and aggregates tests
    
    The new xform allows additional plans that are chosen in the explain
    output. It also surfaced a bug where we can't eliminate a groupby that
    sits on top of a CLogicalIndexGet, because the index get doesn't derive
    a key set.
    
    * Support for project nodes in index nested loop joins
    
    When generating required distribution specs for its children,
    CPhysicalInnerIndexNLJoin will start with its inner child and send it
    an ANY required distribution spec. It will then force the outer child
    to match the inner's distribution spec (or require a broadcast on the outer).
    
    Now, assume we have CPhysicalComputeScalar as the inner child. This
    node, in CPhysicalComputeScalar::PdsRequired will currently require
    its child to be REPLICATED (opt request 1) or SINGLETON (opt request
    0), if the expression has any outer references. This won't work, since
    the underlying table has neither of these distribution schemes and
    since we don't want any motions between the index join and the
    index get.
    
    This commit changes the behavior of a CPhysicalComputeScalar. If it
    senses that it is part of an index nested loop join, it will just
    propagate the required distribution spec from the parent.
    How does it sense that? By the required ANY distribution spec that
    allows outer references. This request is generated in only two places:
    CPhysicalInnerIndexNLJoin::PdsRequired and
    CPhysicalLeftOuterIndexNLJoin::PdsRequired, so it is only used in the
    context of an index NLJ. This behavior is similar to what the CPhysicalFilter
    node does, the other node allowed between an index NLJ and the get.
    Co-authored-by: NDavid Kimura <dkimura@vmware.com>
    24671b5b
CXformJoin2IndexApply.cpp 27.4 KB