- 04 4月, 2014 15 次提交
-
-
由 antirez 提交于
-
由 antirez 提交于
-
由 antirez 提交于
-
由 antirez 提交于
-
由 antirez 提交于
-
由 antirez 提交于
-
由 antirez 提交于
-
由 antirez 提交于
This will be a non-op most of the times since the object will be unshared / decoded, however it is more technically correct to start this way since the object may be decoded even in the read-only code path.
-
由 antirez 提交于
-
由 antirez 提交于
This commit shapes the main ideas for the implementation but doesn't fix all the command implementations, nor handles loading of LZF compressed objects in a way able to perserve the compression.
-
由 antirez 提交于
We also avoid to re-create an object that is already in EMBSTR encoding.
-
由 antirez 提交于
Using a seed of zero has the side effect of having the empty string hashing to what is a very special case in the context of HyperLogLog: a very long run of zeroes. This did not influenced the correctness of the result with 16k registers because of the harmonic mean, but still it is inconvenient that a so obvious value maps to a so special hash. The seed 0xadc83b19 is used instead, which is the first 64 bits of the SHA1 of the empty string. Reference: issue #1657.
-
由 antirez 提交于
-
由 antirez 提交于
-
由 antirez 提交于
We need to guarantee that the last bit is 1, otherwise an element may hash to just zeroes with probability 1/(2^64) and trigger an infinite loop. See issue #1657.
-
- 03 4月, 2014 5 次提交
- 02 4月, 2014 5 次提交
-
-
由 antirez 提交于
The function to generate graphs is also more flexible as now includes step and max value. The step of the samples generation function is no longer limited to min step of 1000.
-
由 antirez 提交于
This will allow future changes like compressed representations. Currently the magic is not checked for performance reasons but this may change in the future, for example if we add new types encoded in strings that may have the same size of HyperLogLogs.
-
由 Salvatore Sanfilippo 提交于
Change HLL* to PF* in a few spots
-
由 Raymond Myers 提交于
-
由 Raymond Myers 提交于
-
- 01 4月, 2014 4 次提交
- 31 3月, 2014 11 次提交
-
-
由 antirez 提交于
-
由 antirez 提交于
-
由 antirez 提交于
Better results can be achieved by compensating for the bias of the raw approximation just after 2.5m (when LINEARCOUNTING is no longer used) by using a polynomial that approximates the bias at a given cardinality. The curve used was found using this web page: http://www.xuru.org/rt/PR.asp That performs polynomial regression given a set of values.
-
由 antirez 提交于
Merge N HLL data structures by selecting the max value for every M[i] register among the set of HLLs.
-
由 antirez 提交于
When we update the cached value, we need to propagate the command and signal the key as modified for WATCH.
-
由 antirez 提交于
The following form is given: HLLADD myhll No element is provided in the above case so if 'myhll' var does not exist the result is to just create an empty HLL structure, and no update will be performed on the registers. In this case, the DB should still be set dirty and the command propagated.
-
由 antirez 提交于
-
由 antirez 提交于
Those are generated to trace graphs using gnuplot.
-
由 antirez 提交于
The HyperLogLog original paper suggests using LINEARCOUNTING for cardinalities < 2.5m, however for P=14 the median / max error curves show that a value of '3' is the best pick for m = 16384.
-
由 antirez 提交于
-
由 antirez 提交于
The more we add elements to an HyperLogLog counter, the smaller is the probability that we actually update some register. From this observation it is easy to see how it is possible to use caching of a previously computed cardinality and reuse it to serve HLLCOUNT queries as long as no register was updated in the data structure. This commit does exactly this by using just additional 8 bytes for the data structure to store a 64 bit unsigned integer value cached cardinality. When the most significant bit of the 64 bit integer is set, it means that the value computed is no longer usable since at least a single register was modified and we need to recompute it at the next call of HLLCOUNT. The value is always stored in little endian format regardless of the actual CPU endianess.
-