PHP array, associative array, object properties, function table, symbol table, and so on are used for the container to do HashTable.
PHP used the Hash is the most common DJBX33A (Daniel J. Bernstein, Times 33 with Addition), this feature has been widely used with a number of software projects, Apache, Perl, etc., and Berkeley DB. For the purposes of this string is know the best hash function, the function of reason is very fast and very good classification (small conflicts, distribution).
Core functions are:
hash(i) = hash(i-1) * 33 + str[i]
In zend_hash.h, we can find this function:
static inline ulong zend_inline_hash_func(char *arKey, uint nKeyLength) { register ulong hash = 5381; /* variant with the hash unrolled eight times */ for (; nKeyLength >= 8; nKeyLength -= 8) { hash = ((hash << 5) + hash) + *arKey++; hash = ((hash << 5) + hash) + *arKey++; hash = ((hash << 5) + hash) + *arKey++; hash = ((hash << 5) + hash) + *arKey++; hash = ((hash << 5) + hash) + *arKey++; hash = ((hash << 5) + hash) + *arKey++; hash = ((hash << 5) + hash) + *arKey++; hash = ((hash << 5) + hash) + *arKey++; } switch (nKeyLength) { case 7: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */ case 6: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */ case 5: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */ case 4: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */ case 3: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */ case 2: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */ case 1: hash = ((hash << 5) + hash) + *arKey++; break; case 0: break; EMPTY_SWITCH_DEFAULT_CASE() } return hash; }
Compared to Apache and Perl in the direct use of the Times 33 function:
hashing function used in Perl 5.005: //Return the hashed value of a string: $hash = perlhash("key") //(Defined by the PERL_HASH macro in hv.h) sub perlhash { $hash = 0; foreach (split, shift) { $hash = $hash*33 + ord($_); } return $hash; }
The hash function in PHP, we can find a lot of different.
First, PHP does not use * 33, is used:
hash << 5 + hash
This will of course faster.
Idea is to use the unrolled, PHP encouraged about 8 characters in the index, he unrolled the use of 8 as a unit to improve efficiency.
There are also inline, register variables …
The last is, hash value of the initial set up has become a 5381, compared to the times in the Apache and Perl in the Hash function (using the initial hash to 0), why the 5381 election it? Specific reason I do not know, But I found a number of properties 5381:
Magic Constant 5381:
1. odd number
2. prime number
3. deficient number
4. 001/010/100/000/101 b
Read these, I have reason to believe that the initial value selected to provide a better classification.
Why is the Times not Times 33 other figures in the PHP Hash functions of some Notes Description:
DJBX33A (Daniel J. Bernstein, Times 33 with Addition) This is Daniel J. Bernstein's popular `times 33' hash function as posted by him years ago on comp.lang.c. It basically uses a function like ``hash(i) = hash(i-1) * 33 + str[i]''. This is one of the best known hash functions for strings. Because it is both computed very fast and distributes very well. The magic of number 33, i.e. why it works better than many other constants, prime or not, has never been adequately explained by anyone. So I try an explanation: if one experimentally tests all multipliers between 1 and 256 (as RSE did now) one detects that even numbers are not useable at all. The remaining 128 odd numbers (except for the number 1) work more or less all equally well. They all distribute in an acceptable way and this way fill a hash table with an average percent of approx. 86%. If one compares the Chi^2 values of the variants, the number 33 not even has the best value. But the number 33 and a few other equally good numbers like 17, 31, 63, 127 and 129 have nevertheless a great advantage to the remaining numbers in the large set of possible multipliers: their multiply operation can be replaced by a faster operation based on just one shift plus either a single addition or subtraction operation. And because a hash function has to both distribute good _and_ has to be very fast to compute, those few numbers should be preferred and seems to be the reason why Daniel J. Bernstein also preferred it. -- Ralf S. Engelschall <rse@engelschall.com>

No Responses to “PHP Hash Functions Manual”