Space/Time Trade-offs in Hash Coding with Allowable Errors
In this paper trade-offs among certain computational factors
a given set of messages.  Two new hash-coding methods are examined
and compared with a particular conventional hash-coding method.
The computational factors considered are the size of the hash area
(space), the time required to identify a message as a nonmember of the 
given set (reject time), and an allowable error frequency.  The new methods 
are intended to reduce the amount of space required to contain the hash-coded 
information from that associated with conventional methods.  The reduction in 
space is accomplished by exploiting the possibility that a small fraction of 
errors of commission may be tolerable in some applications, in particular, 
applications in which a large amount of data is involved and a core resident
hash area is consequently not feasible using conventional methods.  In such 
applications, it is envisaged that overall performance
could be improved by using a smaller core resident hash area in
conjunction with the new methods and, when necessary, by using some
secondary and perhaps time-consuming test to "catch" the small
fraction of errors associated with new methods.  An example is discussed
which illustrates possible areas of application for the new
methods.  Analysis of the paradigm problem demonstrates that allowing
a small number of test messages to be falsely identified as
members of the given set will permit a much smaller hash
area to be used without increasing reject time.
CACM July, 1970
Bloom, B. H.
