PHP Hash Functions Manual

PHP array, associative array, object properties, function table, symbol table, and so on are used for the container to do HashTable.

PHP used the Hash is the most common DJBX33A (Daniel J. Bernstein, Times 33 with Addition), this feature has been widely used with a number of software projects, Apache, Perl, etc., and Berkeley DB. For the purposes of this string is know the best hash function, the function of reason is very fast and very good classification (small conflicts, distribution).

Core functions are:


hash(i) = hash(i-1) * 33 + str[i]

In zend_hash.h, we can find this function:


static inline ulong zend_inline_hash_func(char *arKey, uint nKeyLength)
{
    register ulong hash = 5381;

    /* variant with the hash unrolled eight times */
    for (; nKeyLength >= 8; nKeyLength -= 8) {
        hash = ((hash << 5) + hash) + *arKey++;
        hash = ((hash << 5) + hash) + *arKey++;
        hash = ((hash << 5) + hash) + *arKey++;
        hash = ((hash << 5) + hash) + *arKey++;
        hash = ((hash << 5) + hash) + *arKey++;
        hash = ((hash << 5) + hash) + *arKey++;
        hash = ((hash << 5) + hash) + *arKey++;
        hash = ((hash << 5) + hash) + *arKey++;
    }
    switch (nKeyLength) {
        case 7: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
        case 6: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
        case 5: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
        case 4: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
        case 3: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
        case 2: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
        case 1: hash = ((hash << 5) + hash) + *arKey++; break;
        case 0: break;
EMPTY_SWITCH_DEFAULT_CASE()
    }
    return hash;
}

Compared to Apache and Perl in the direct use of the Times 33 function:


hashing function used in Perl 5.005:
  //Return the hashed value of a string: $hash = perlhash("key")
  //(Defined by the PERL_HASH macro in hv.h)
  sub perlhash
  {
      $hash = 0;
      foreach (split, shift) {
          $hash = $hash*33 + ord($_);
      }
      return $hash;
  }

The hash function in PHP, we can find a lot of different.

First, PHP does not use * 33, is used:


hash << 5 + hash

This will of course faster.

Idea is to use the unrolled, PHP encouraged about 8 characters in the index, he unrolled the use of 8 as a unit to improve efficiency.
There are also inline, register variables …

The last is, hash value of the initial set up has become a 5381, compared to the times in the Apache and Perl in the Hash function (using the initial hash to 0), why the 5381 election it? Specific reason I do not know, But I found a number of properties 5381:

Magic Constant 5381:
1. odd number
2. prime number
3. deficient number
4. 001/010/100/000/101 b

Read these, I have reason to believe that the initial value selected to provide a better classification.

Why is the Times not Times 33 other figures in the PHP Hash functions of some Notes Description:


DJBX33A (Daniel J. Bernstein, Times 33 with Addition)

  This is Daniel J. Bernstein's popular `times 33' hash function as
  posted by him years ago on comp.lang.c. It basically uses a function
  like ``hash(i) = hash(i-1) * 33 + str[i]''. This is one of the best
  known hash functions for strings. Because it is both computed very
  fast and distributes very well.

  The magic of number 33, i.e. why it works better than many other
  constants, prime or not, has never been adequately explained by
  anyone. So I try an explanation: if one experimentally tests all
  multipliers between 1 and 256 (as RSE did now) one detects that even
  numbers are not useable at all. The remaining 128 odd numbers
  (except for the number 1) work more or less all equally well. They
  all distribute in an acceptable way and this way fill a hash table
  with an average percent of approx. 86%.

  If one compares the Chi^2 values of the variants, the number 33 not
  even has the best value. But the number 33 and a few other equally
  good numbers like 17, 31, 63, 127 and 129 have nevertheless a great
  advantage to the remaining numbers in the large set of possible
  multipliers: their multiply operation can be replaced by a faster
  operation based on just one shift plus either a single addition
  or subtraction operation. And because a hash function has to both
  distribute good _and_ has to be very fast to compute, those few
  numbers should be preferred and seems to be the reason why Daniel J.
  Bernstein also preferred it.

                   -- Ralf S. Engelschall <rse@engelschall.com>
Digg Google Bookmarks reddit Mixx StumbleUpon Technorati Yahoo! Buzz DesignFloat Delicious BlinkList Furl

No Responses to “PHP Hash Functions Manual”

Leave a Reply

Name:
Email:
Website:
Comment:
XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>