未验证 提交 24671b5b 编写于 作者: H Hans Zeller 提交者: GitHub

XForm for index joins with groupby and project on the inner side (#10711)

* Add a CPatternNode operator

This operator can be used in patterns of xforms and it can match one of
multiple regular operators.

Which operators it matches depends on its match type. Right now, there is
only one match type, EmtMatchInnerOrLeftOuterJoin, that matches a
logical inner join or a logical left outer join.

* Add 2 new xforms for index apply

The new xforms use the new CPatternNode for a patterns of the form

CPatternNode(Leaf, Tree, Tree)

The CPatternNode matches both an inner and a left outer join.
The Tree for the right child can contain arbitrary operators.
To avoid an explosion of the search space, we add two conditions
to the xform: First, the xform is applied only once, not for all
bindings of the Tree node. Second, the xform is applied only if
the right child has a "join depth" of 1, excluding any children
that are complex and wouldn't satisfy the conditions of this xform
anyway.

* Remove 16 obsolete xforms and replace them with 2 new ones

To remove xforms, we have to add a mechanism to skip unused xform ids,
to preserve the ids of the remaining xforms that are used in trace flags.
Our array of xforms now allows for holes.

* Changes to unit test programs

Updated the test programs to tolerate holes in the xform array.
Used a new xform instead of one of the removed ones.

* MDP changes

* ICG changes

* Fixes for ICG failures in join and aggregates tests

The new xform allows additional plans that are chosen in the explain
output. It also surfaced a bug where we can't eliminate a groupby that
sits on top of a CLogicalIndexGet, because the index get doesn't derive
a key set.

* Support for project nodes in index nested loop joins

When generating required distribution specs for its children,
CPhysicalInnerIndexNLJoin will start with its inner child and send it
an ANY required distribution spec. It will then force the outer child
to match the inner's distribution spec (or require a broadcast on the outer).

Now, assume we have CPhysicalComputeScalar as the inner child. This
node, in CPhysicalComputeScalar::PdsRequired will currently require
its child to be REPLICATED (opt request 1) or SINGLETON (opt request
0), if the expression has any outer references. This won't work, since
the underlying table has neither of these distribution schemes and
since we don't want any motions between the index join and the
index get.

This commit changes the behavior of a CPhysicalComputeScalar. If it
senses that it is part of an index nested loop join, it will just
propagate the required distribution spec from the parent.
How does it sense that? By the required ANY distribution spec that
allows outer references. This request is generated in only two places:
CPhysicalInnerIndexNLJoin::PdsRequired and
CPhysicalLeftOuterIndexNLJoin::PdsRequired, so it is only used in the
context of an index NLJ. This behavior is similar to what the CPhysicalFilter
node does, the other node allowed between an index NLJ and the get.
Co-authored-by: NDavid Kimura <dkimura@vmware.com>
上级 5e75e9e4
......@@ -2836,10 +2836,7 @@
<dxl:DefaultValue/>
</dxl:Column>
</dxl:Columns>
<dxl:IndexInfoList>
<dxl:IndexInfo Mdid="0.17049.1.0" IsPartial="false"/>
<dxl:IndexInfo Mdid="0.17959.1.0" IsPartial="false"/>
</dxl:IndexInfoList>
<dxl:IndexInfoList/>
<dxl:Triggers/>
<dxl:CheckConstraints/>
</dxl:Relation>
......
......@@ -1563,10 +1563,10 @@ where oid= (select reltoastrelid from pg_class where relname='fsts_alter_set_sto
</dxl:LogicalGet>
</dxl:LogicalSelect>
</dxl:Query>
<dxl:Plan Id="0" SpaceSize="32">
<dxl:HashJoin JoinType="Inner">
<dxl:Plan Id="0" SpaceSize="50">
<dxl:NestedLoopJoin JoinType="Inner" IndexNestedLoopJoin="true" OuterRefAsParam="false">
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="868.370421" Rows="337.000000" Width="4"/>
<dxl:Cost StartupCost="0" TotalCost="75.123853" Rows="337.000000" Width="4"/>
</dxl:Properties>
<dxl:ProjList>
<dxl:ProjElem ColId="5" Alias="pg_toast_relidx">
......@@ -1574,43 +1574,12 @@ where oid= (select reltoastrelid from pg_class where relname='fsts_alter_set_sto
</dxl:ProjElem>
</dxl:ProjList>
<dxl:Filter/>
<dxl:JoinFilter/>
<dxl:HashCondList>
<dxl:Comparison ComparisonOperator="=" OperatorMdid="0.607.1.0">
<dxl:Ident ColId="31" ColName="oid" TypeMdid="0.26.1.0"/>
<dxl:Ident ColId="48" ColName="reltoastidxid" TypeMdid="0.26.1.0"/>
</dxl:Comparison>
</dxl:HashCondList>
<dxl:TableScan>
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="431.038182" Rows="337.000000" Width="12"/>
</dxl:Properties>
<dxl:ProjList>
<dxl:ProjElem ColId="5" Alias="relfilenode">
<dxl:Ident ColId="5" ColName="relfilenode" TypeMdid="0.26.1.0"/>
</dxl:ProjElem>
<dxl:ProjElem ColId="31" Alias="oid">
<dxl:Ident ColId="31" ColName="oid" TypeMdid="0.26.1.0"/>
</dxl:ProjElem>
</dxl:ProjList>
<dxl:Filter/>
<dxl:TableDescriptor Mdid="0.1259.1.1" TableName="pg_class">
<dxl:Columns>
<dxl:Column ColId="5" Attno="6" ColName="relfilenode" TypeMdid="0.26.1.0"/>
<dxl:Column ColId="30" Attno="-1" ColName="ctid" TypeMdid="0.27.1.0"/>
<dxl:Column ColId="31" Attno="-2" ColName="oid" TypeMdid="0.26.1.0"/>
<dxl:Column ColId="32" Attno="-3" ColName="xmin" TypeMdid="0.28.1.0"/>
<dxl:Column ColId="33" Attno="-4" ColName="cmin" TypeMdid="0.29.1.0"/>
<dxl:Column ColId="34" Attno="-5" ColName="xmax" TypeMdid="0.28.1.0"/>
<dxl:Column ColId="35" Attno="-6" ColName="cmax" TypeMdid="0.29.1.0"/>
<dxl:Column ColId="36" Attno="-7" ColName="tableoid" TypeMdid="0.26.1.0"/>
<dxl:Column ColId="37" Attno="-8" ColName="gp_segment_id" TypeMdid="0.23.1.0"/>
</dxl:Columns>
</dxl:TableDescriptor>
</dxl:TableScan>
<dxl:JoinFilter>
<dxl:ConstValue TypeMdid="0.16.1.0" Value="true"/>
</dxl:JoinFilter>
<dxl:Assert ErrorCode="P0003">
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="437.258775" Rows="1.000000" Width="4"/>
<dxl:Cost StartupCost="0" TotalCost="12.464046" Rows="1.000000" Width="4"/>
</dxl:Properties>
<dxl:ProjList>
<dxl:ProjElem ColId="48" Alias="reltoastidxid">
......@@ -1627,7 +1596,7 @@ where oid= (select reltoastrelid from pg_class where relname='fsts_alter_set_sto
</dxl:AssertConstraintList>
<dxl:Window PartitionColumns="">
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="437.258771" Rows="337.000000" Width="12"/>
<dxl:Cost StartupCost="0" TotalCost="12.464042" Rows="337.000000" Width="12"/>
</dxl:Properties>
<dxl:ProjList>
<dxl:ProjElem ColId="115" Alias="row_number">
......@@ -1638,9 +1607,9 @@ where oid= (select reltoastrelid from pg_class where relname='fsts_alter_set_sto
</dxl:ProjElem>
</dxl:ProjList>
<dxl:Filter/>
<dxl:HashJoin JoinType="Inner">
<dxl:NestedLoopJoin JoinType="Inner" IndexNestedLoopJoin="true" OuterRefAsParam="false">
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="437.258771" Rows="337.000000" Width="4"/>
<dxl:Cost StartupCost="0" TotalCost="12.464042" Rows="337.000000" Width="4"/>
</dxl:Properties>
<dxl:ProjList>
<dxl:ProjElem ColId="48" Alias="reltoastidxid">
......@@ -1648,40 +1617,9 @@ where oid= (select reltoastrelid from pg_class where relname='fsts_alter_set_sto
</dxl:ProjElem>
</dxl:ProjList>
<dxl:Filter/>
<dxl:JoinFilter/>
<dxl:HashCondList>
<dxl:Comparison ComparisonOperator="=" OperatorMdid="0.607.1.0">
<dxl:Ident ColId="69" ColName="oid" TypeMdid="0.26.1.0"/>
<dxl:Ident ColId="85" ColName="reltoastrelid" TypeMdid="0.26.1.0"/>
</dxl:Comparison>
</dxl:HashCondList>
<dxl:TableScan>
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="431.038182" Rows="337.000000" Width="12"/>
</dxl:Properties>
<dxl:ProjList>
<dxl:ProjElem ColId="48" Alias="reltoastidxid">
<dxl:Ident ColId="48" ColName="reltoastidxid" TypeMdid="0.26.1.0"/>
</dxl:ProjElem>
<dxl:ProjElem ColId="69" Alias="oid">
<dxl:Ident ColId="69" ColName="oid" TypeMdid="0.26.1.0"/>
</dxl:ProjElem>
</dxl:ProjList>
<dxl:Filter/>
<dxl:TableDescriptor Mdid="0.1259.1.1" TableName="pg_class">
<dxl:Columns>
<dxl:Column ColId="48" Attno="11" ColName="reltoastidxid" TypeMdid="0.26.1.0"/>
<dxl:Column ColId="68" Attno="-1" ColName="ctid" TypeMdid="0.27.1.0"/>
<dxl:Column ColId="69" Attno="-2" ColName="oid" TypeMdid="0.26.1.0"/>
<dxl:Column ColId="70" Attno="-3" ColName="xmin" TypeMdid="0.28.1.0"/>
<dxl:Column ColId="71" Attno="-4" ColName="cmin" TypeMdid="0.29.1.0"/>
<dxl:Column ColId="72" Attno="-5" ColName="xmax" TypeMdid="0.28.1.0"/>
<dxl:Column ColId="73" Attno="-6" ColName="cmax" TypeMdid="0.29.1.0"/>
<dxl:Column ColId="74" Attno="-7" ColName="tableoid" TypeMdid="0.26.1.0"/>
<dxl:Column ColId="75" Attno="-8" ColName="gp_segment_id" TypeMdid="0.23.1.0"/>
</dxl:Columns>
</dxl:TableDescriptor>
</dxl:TableScan>
<dxl:JoinFilter>
<dxl:ConstValue TypeMdid="0.16.1.0" Value="true"/>
</dxl:JoinFilter>
<dxl:Assert ErrorCode="P0003">
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="6.147125" Rows="1.000000" Width="4"/>
......@@ -1725,7 +1663,7 @@ where oid= (select reltoastrelid from pg_class where relname='fsts_alter_set_sto
<dxl:IndexCondList>
<dxl:Comparison ComparisonOperator="=" OperatorMdid="0.93.1.0">
<dxl:Ident ColId="76" ColName="relname" TypeMdid="0.19.1.0"/>
<dxl:ConstValue TypeMdid="0.19.1.0" Value="ZnN0c19hbHRlcl9zZXRfc3RvcmFnZV90YWJsZQAAAAAAAAAAAAAAAAAAAAAA&#10;AAAAAAAAAAAAAAAAAAAAAAAAAA=="/>
<dxl:ConstValue TypeMdid="0.19.1.0" Value="ZnN0c19hbHRlcl9zZXRfc3RvcmFnZV90YWJsZQAAAAAAAAAAAAAAAAAAAAAA&#xA;AAAAAAAAAAAAAAAAAAAAAAAAAA=="/>
</dxl:Comparison>
</dxl:IndexCondList>
<dxl:IndexDescriptor Mdid="0.2663.1.0" IndexName="pg_class_relname_nsp_index"/>
......@@ -1751,7 +1689,38 @@ where oid= (select reltoastrelid from pg_class where relname='fsts_alter_set_sto
</dxl:WindowKeyList>
</dxl:Window>
</dxl:Assert>
</dxl:HashJoin>
<dxl:IndexScan IndexScanDirection="Forward">
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="6.309690" Rows="337.000000" Width="4"/>
</dxl:Properties>
<dxl:ProjList>
<dxl:ProjElem ColId="48" Alias="reltoastidxid">
<dxl:Ident ColId="48" ColName="reltoastidxid" TypeMdid="0.26.1.0"/>
</dxl:ProjElem>
</dxl:ProjList>
<dxl:Filter/>
<dxl:IndexCondList>
<dxl:Comparison ComparisonOperator="=" OperatorMdid="0.607.1.0">
<dxl:Ident ColId="69" ColName="oid" TypeMdid="0.26.1.0"/>
<dxl:Ident ColId="85" ColName="reltoastrelid" TypeMdid="0.26.1.0"/>
</dxl:Comparison>
</dxl:IndexCondList>
<dxl:IndexDescriptor Mdid="0.2662.1.0" IndexName="pg_class_oid_index"/>
<dxl:TableDescriptor Mdid="0.1259.1.1" TableName="pg_class">
<dxl:Columns>
<dxl:Column ColId="48" Attno="11" ColName="reltoastidxid" TypeMdid="0.26.1.0"/>
<dxl:Column ColId="68" Attno="-1" ColName="ctid" TypeMdid="0.27.1.0"/>
<dxl:Column ColId="69" Attno="-2" ColName="oid" TypeMdid="0.26.1.0"/>
<dxl:Column ColId="70" Attno="-3" ColName="xmin" TypeMdid="0.28.1.0"/>
<dxl:Column ColId="71" Attno="-4" ColName="cmin" TypeMdid="0.29.1.0"/>
<dxl:Column ColId="72" Attno="-5" ColName="xmax" TypeMdid="0.28.1.0"/>
<dxl:Column ColId="73" Attno="-6" ColName="cmax" TypeMdid="0.29.1.0"/>
<dxl:Column ColId="74" Attno="-7" ColName="tableoid" TypeMdid="0.26.1.0"/>
<dxl:Column ColId="75" Attno="-8" ColName="gp_segment_id" TypeMdid="0.23.1.0"/>
</dxl:Columns>
</dxl:TableDescriptor>
</dxl:IndexScan>
</dxl:NestedLoopJoin>
<dxl:WindowKeyList>
<dxl:WindowKey>
<dxl:SortingColumnList/>
......@@ -1759,7 +1728,38 @@ where oid= (select reltoastrelid from pg_class where relname='fsts_alter_set_sto
</dxl:WindowKeyList>
</dxl:Window>
</dxl:Assert>
</dxl:HashJoin>
<dxl:IndexScan IndexScanDirection="Forward">
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="6.309690" Rows="337.000000" Width="4"/>
</dxl:Properties>
<dxl:ProjList>
<dxl:ProjElem ColId="5" Alias="relfilenode">
<dxl:Ident ColId="5" ColName="relfilenode" TypeMdid="0.26.1.0"/>
</dxl:ProjElem>
</dxl:ProjList>
<dxl:Filter/>
<dxl:IndexCondList>
<dxl:Comparison ComparisonOperator="=" OperatorMdid="0.607.1.0">
<dxl:Ident ColId="31" ColName="oid" TypeMdid="0.26.1.0"/>
<dxl:Ident ColId="48" ColName="reltoastidxid" TypeMdid="0.26.1.0"/>
</dxl:Comparison>
</dxl:IndexCondList>
<dxl:IndexDescriptor Mdid="0.2662.1.0" IndexName="pg_class_oid_index"/>
<dxl:TableDescriptor Mdid="0.1259.1.1" TableName="pg_class">
<dxl:Columns>
<dxl:Column ColId="5" Attno="6" ColName="relfilenode" TypeMdid="0.26.1.0"/>
<dxl:Column ColId="30" Attno="-1" ColName="ctid" TypeMdid="0.27.1.0"/>
<dxl:Column ColId="31" Attno="-2" ColName="oid" TypeMdid="0.26.1.0"/>
<dxl:Column ColId="32" Attno="-3" ColName="xmin" TypeMdid="0.28.1.0"/>
<dxl:Column ColId="33" Attno="-4" ColName="cmin" TypeMdid="0.29.1.0"/>
<dxl:Column ColId="34" Attno="-5" ColName="xmax" TypeMdid="0.28.1.0"/>
<dxl:Column ColId="35" Attno="-6" ColName="cmax" TypeMdid="0.29.1.0"/>
<dxl:Column ColId="36" Attno="-7" ColName="tableoid" TypeMdid="0.26.1.0"/>
<dxl:Column ColId="37" Attno="-8" ColName="gp_segment_id" TypeMdid="0.23.1.0"/>
</dxl:Columns>
</dxl:TableDescriptor>
</dxl:IndexScan>
</dxl:NestedLoopJoin>
</dxl:Plan>
</dxl:Thread>
</dxl:DXLMessage>
......@@ -317,7 +317,7 @@
</dxl:Comparison>
</dxl:LogicalJoin>
</dxl:Query>
<dxl:Plan Id="0" SpaceSize="8">
<dxl:Plan Id="0" SpaceSize="7">
<dxl:GatherMotion InputSegments="0,1,2" OutputSegments="-1">
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="437.000401" Rows="1.000000" Width="8"/>
......
......@@ -298,7 +298,7 @@ Gather Motion 3:1 (slice2; segments: 3) (cost=0.00..433.00 rows=1 width=24)
</dxl:And>
</dxl:LogicalJoin>
</dxl:Query>
<dxl:Plan Id="0" SpaceSize="4">
<dxl:Plan Id="0" SpaceSize="3">
<dxl:GatherMotion InputSegments="0,1,2" OutputSegments="-1">
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="437.000490" Rows="1.000000" Width="24"/>
......
......@@ -289,7 +289,7 @@
</dxl:Comparison>
</dxl:LogicalJoin>
</dxl:Query>
<dxl:Plan Id="0" SpaceSize="8">
<dxl:Plan Id="0" SpaceSize="4">
<dxl:GatherMotion InputSegments="0,1,2" OutputSegments="-1">
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="437.000345" Rows="1.000000" Width="16"/>
......
......@@ -985,7 +985,7 @@ OR
</dxl:And>
</dxl:LogicalJoin>
</dxl:Query>
<dxl:Plan Id="0" SpaceSize="246228">
<dxl:Plan Id="0" SpaceSize="298878">
<dxl:Result>
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="14963.550781" Rows="521.833333" Width="246"/>
......
......@@ -182,9 +182,7 @@ select * from foo, (select NULL a from bar) other where foo.a is not distinct fr
<dxl:DefaultValue/>
</dxl:Column>
</dxl:Columns>
<dxl:IndexInfoList>
<dxl:IndexInfo Mdid="0.38326.1.0" IsPartial="false"/>
</dxl:IndexInfoList>
<dxl:IndexInfoList/>
<dxl:Triggers/>
<dxl:CheckConstraints/>
</dxl:Relation>
......
......@@ -122,9 +122,7 @@ Optimizer status: PQO version 2.67.0
<dxl:DefaultValue/>
</dxl:Column>
</dxl:Columns>
<dxl:IndexInfoList>
<dxl:IndexInfo Mdid="0.82836.1.0" IsPartial="false"/>
</dxl:IndexInfoList>
<dxl:IndexInfoList/>
<dxl:Triggers/>
<dxl:CheckConstraints/>
</dxl:Relation>
......@@ -162,9 +160,7 @@ Optimizer status: PQO version 2.67.0
<dxl:DefaultValue/>
</dxl:Column>
</dxl:Columns>
<dxl:IndexInfoList>
<dxl:IndexInfo Mdid="0.82822.1.0" IsPartial="false"/>
</dxl:IndexInfoList>
<dxl:IndexInfoList/>
<dxl:Triggers/>
<dxl:CheckConstraints/>
</dxl:Relation>
......
......@@ -1277,7 +1277,7 @@ WHERE
</dxl:And>
</dxl:LogicalJoin>
</dxl:Query>
<dxl:Plan Id="0" SpaceSize="120488">
<dxl:Plan Id="0" SpaceSize="126560">
<dxl:GatherMotion InputSegments="0,1,2" OutputSegments="-1">
<dxl:Properties>
<dxl:Cost StartupCost="0" TotalCost="1306.824501" Rows="199.680000" Width="148"/>
......
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册