提交 44621b6f 编写于 作者: C Chris Hajas 提交者: Chris Hajas

Remove Orca assertions when merging buckets

These assertions started getting tripped in the previous commit when
adding tests, but aren't related to the Epsilon change. Rather, we're
calculating the frequency of a singleton bucket using two different
methods which causes this assertion to break down. The first method
(calculating the upper_third) assumes the singleton has 1 NDV and that there is an even distribution
across the NDVs. The second (in GetOverlapPercentage) calculates a
"resolution" that is based on Epsilon and assumes the bucket contains
some small Epsilon frequency. It results in the overlap percentage being
too high, instead it too should likely be based on the NDV.

In practice, this won't have much impact unless the NDV is very small.
Additionally, the conditional logic is based on the bounds, not
frequency. However, it would be good to align in the future so our
statistics calculations are simpler to understand and predictable.

For now, we'll remove the assertions and add a TODO. Once we align the
methods, we should add these assertions back.
上级 45e49e17
......@@ -186,6 +186,14 @@ CBucket::GetOverlapPercentage(const CPoint *point, BOOL include_point) const
CDouble res = 1 / distance_upper;
res = res * distance_middle;
// TODO: When calculating the overlap percentage for doubles, we're
// using a different method than when calculating the frequency. This
// causes the frequency of singleton Double buckets to be inconsistent
// -- the frequency of the split bucket exceeds the frequency of the
// original bucket. Instead, we should have a consistent method of
// calculating singleton frequency, either through NDV or assuming a
// small epsilon (using the NDV in GetOverlapPercentage is probably
// safer)
return CDouble(std::min(res.Get(), DOUBLE(1.0)));
}
......@@ -1195,9 +1203,6 @@ CBucket::SplitAndMergeBuckets(
true /*include_lower*/);
this_overlap =
this->GetOverlapPercentage(minUpper, false /*include_point*/);
GPOS_ASSERT(this_overlap * this->GetFrequency() +
upper_third->GetFrequency() <=
this->GetFrequency() + CStatistics::Epsilon);
}
else
{
......@@ -1206,9 +1211,6 @@ CBucket::SplitAndMergeBuckets(
mp, minUpper, true /*include_lower*/);
bucket_other_overlap = bucket_other->GetOverlapPercentage(
minUpper, false /*include_point*/);
GPOS_ASSERT(bucket_other_overlap * bucket_other->GetFrequency() +
upper_third->GetFrequency() <=
bucket_other->GetFrequency() + CStatistics::Epsilon);
}
}
else
......@@ -1222,9 +1224,6 @@ CBucket::SplitAndMergeBuckets(
true /*include_lower*/);
this_overlap =
this->GetOverlapPercentage(minUpper, false /*include_point*/);
GPOS_ASSERT(this_overlap * this->GetFrequency() +
upper_third->GetFrequency() <=
this->GetFrequency() + CStatistics::Epsilon);
}
else if (bucket_other->IsUpperClosed() && !this->IsUpperClosed())
{
......@@ -1232,9 +1231,6 @@ CBucket::SplitAndMergeBuckets(
mp, minUpper, true /*include_lower*/);
bucket_other_overlap = bucket_other->GetOverlapPercentage(
minUpper, false /*include_point*/);
GPOS_ASSERT(bucket_other_overlap * bucket_other->GetFrequency() +
upper_third->GetFrequency() <=
bucket_other->GetFrequency() + CStatistics::Epsilon);
}
// the buckets are completely identical
// [1,5) & [1,5) OR (1,5] & (1,5] OR [1,5] & [1,5]
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册