How to Approximate a Set without Knowing it's Size in Advance
By Udi Wieder
Appears in collection : 2016 - T1 - WS2 - Fundamental inequalities and lower bounds theme
The dynamic approximate membership problem asks to represent a set S of size n, whose elements are provided in an on-line fashion, supporting membership queries without false negatives and with a false positive rate at most ϵ. That is, the membership algorithm must be correct on each x∈S, and may err with probability at most ϵ on each x∉S. We study a well-motivated, yet insufficiently explored, variant of this problem where the size n of the set is not known in advance. Existing optimal approximate membership data structures require that the size is known in advance, but in many practical scenarios this is not a realistic assumption. Moreover, even if the eventual size n of the set is known in advance, it is desirable to have the smallest possible space usage also when the current number of inserted elements is smaller than n. We show a super-linear gap between the space complexity when the size is known in advance and the space complexity when the size is not known in advance. When the size is known in advance, it is well-known that Θ(nlog(1/ϵ)) bits of space are necessary and sufficient. However, when the size is not known in advance, we prove that at least nlog(1/ϵ)+Ω(nloglogn) bits of space must be used. In particular, the average number of bits per element must depend on the size of the set. If time allows we will show that this space lower bound is tight, and can be matched by an efficient data structure. The data structure supports membership queries in constant time in the worst case with high probability, and supports insertions in expected amortized constant time.