• O
    Only request stats of columns needed for cardinality estimation [#150424379] · 5b659321
    Omer Arap 提交于
    GPORCA should not spend time extracting column statistics that are not
    needed for cardinality estimation. This commit eliminates this overhead
    of requesting and generating the statistics for columns that are not
    used in cardinality estimation unnecessarily.
    
    E.g:
    `CREATE TABLE foo (a int, b int, c int);`
    
    For table foo, the query below only needs for stats for column `a` which
    is the distribution column and column `c` which is the column used in
    where clause.
    `select * from foo where c=2;`
    
    However, prior to that commit, the column statistics for column `b` is
    also calculated and passed for the cardinality estimation. The only
    information needed by the optimizer is the `width` of column `b`. For
    this tiny information, we transfer every stats information for that
    column.
    
    This commit and its counterpart commit in GPORCA ensures that the column
    width information is passed and extracted in the `dxl:Relation` metadata
    information.
    
    Preliminary results for short running queries provides up to 65x
    performance improvement.
    Signed-off-by: NJemish Patel <jpatel@pivotal.io>
    5b659321
CTranslatorUtils.h 13.2 KB