Binary heap implementation in PHP
Short explanation what is binary heap and how to implement it in pure PHP from scratch. I will show you how to compare it with native solution (SPL). Performance results are surprising.
The main motivation to create this implementation was a need to create a different structure (kd-tree) for my machine learning library php-ml. This is also an interesting research, so I decided that it is worth sharing.
Heap
Heap is a special data structure that is based on a tree and finds use in many algorithms. A typical data structure, which can be built using a heap, is a priority queue. The heap also used for one of the most popular and efficient sorting algorithms - heap sort.
The heap is defined in such a way, that the root of the heap is smaller or larger than its children's nodes.
If the parent node has a higher value than the children's nodes, we have dealing with max-heap
, and if the parent
node has a smaller value than children's nodes, we're talking about a min-heap
.
Example of min-heap
:
The important thing is that we can always easily find the maximum or minimum value belonging to the tree (its peak).
There are many types of heaps, such as binary heap, Fibonacci heap, B-heap or Brodal heap.
Ok, so far so good, let's go deeper.
Binary heap
The binary heap is a full binary tree, which all internal levels are completely filled and the last level can be filled in completely or partially. As we are dealing with a binary structure, most of the operations can be done in a logarithmic time.
Most popular way to implement a binary heap is to use an array. If we assume that the root is under index 1, its elements
children will be located under indexes 2 and 3. This order can be generally written down as follows:
index i
indicates parent, index 2*i
- left child, and 2*i + 1
- right child.
Implementation
At the beginning let's assume that elements of the heap can be any elements mixed[]
.
Such elements can't be compared using >
or <
operator, so we need some kind of callable $scoreFunction
.
final class BinaryHeap
{
/**
* @var mixed[]
*/
private $nodes = [];
/**
* @var callable
*/
private $scoreFunction;
public function __construct(callable $scoreFunction)
{
$this->scoreFunction = $scoreFunction;
}
}
Score function allow us to create heap of any kind: from simple (min/max) to more complicated structure (array, objects). Some examples:
// min-heap
function ($x) {
return $x;
}
// max-heap
function ($x) {
return -$x;
}
// max-heap with array data
function ($x) {
return -$x[1];
}
// min-heap with object data
function ($x) {
return $x->someValue();
}
Then we add some basic methods which they explain themselves.
public function peek()
{
return $this->nodes[0];
}
public function size(): int
{
return count($this->nodes);
}
public function isEmpty(): bool
{
return $this->nodes === [];
}
public function nodes(): array
{
return $this->nodes;
}
As you can see, peek
is really simple without any complexity (O(1)
).
Now a more difficult part.
When an element needs to be added to the heap, it is placed at the end of the array and must "bubble" up by repeatedly exchanging it from the parent until we find a parent that is smaller than the new node.
public function push($node): void
{
$this->nodes[] = $node;
$this->bubbleUp(count($this->nodes) - 1);
}
public function pop()
{
$peek = $this->nodes[0];
$last = array_pop($this->nodes);
if (!$this->isEmpty()) {
$this->nodes[0] = $last;
$this->sinkDown(0);
}
return $peek;
}
private function bubbleUp(int $index): void
{
$node = $this->nodes[$index];
$score = ($this->scoreFunction)($node);
while ($index > 0) {
$parentIndex = (int) floor(($index + 1) / 2) - 1;
$parent = $this->nodes[$parentIndex];
if ($score >= ($this->scoreFunction)($parent)) {
break;
}
$this->nodes[$parentIndex] = $node;
$this->nodes[$index] = $parent;
$index = $parentIndex;
}
}
private function sinkDown(int $index): void
{
$size = $this->size();
$node = $this->nodes[$index];
$score = ($this->scoreFunction)($node);
while (true) {
$child2Index = ($index + 1) * 2;
$child1Index = $child2Index - 1;
$swap = null;
if ($child1Index < $size && ($child1Score = ($this->scoreFunction)($this->nodes[$child1Index])) < $score) {
$swap = $child1Index;
}
if ($child2Index < $size && ($this->scoreFunction)($this->nodes[$child2Index]) < ($swap === null ? $score : $child1Score)) {
$swap = $child2Index;
}
if ($swap === null) {
break;
}
$this->nodes[$index] = $this->nodes[$swap];
$this->nodes[$swap] = $node;
$index = $swap;
}
}
Finally, it's time to test our fresh new BinaryHeap
class
public function testBinaryHeap(): void
{
$heap = new BinaryHeap(function ($x) {
return $x;
});
foreach([10, 5, 3, 2, 1, 7] as $int) {
$heap->push($int);
}
self::assertEquals(1, $heap->pop());
self::assertEquals(2, $heap->pop());
}
You can find full working code in php-ai/php-data-structures.
Performance 🚀
After happy coding we need to check how our solution is compared to other approaches. Consider how you can draw from the collection the smallest possible value differently.
sort
You can sort the array and extract the first element from it (works only for primitives) .
sort($numbers);
echo $numbers[0];
foreach
Classic combination of foreach
and if
:
$min = PHP_INT_MAX;
foreach ($numbers as $number) {
if($number < $min) {
$min = $number;
}
}
echo $min;
min
Built-in PHP
solution:
echo min($numbers);
SplMinHeap
Built-in SplMinHeap
class (which is an implementation of min-heap
):
$heap = new \SplMinHeap();
$heap->insert(...);
echo $heap->top();
SplHeap
Similar to SplMinHeap
, but SplHeap
allows to implement custom score function (more generic approach):
final class GenericSplHeap extends \SplHeap
{
protected function compare($a, $b)
{
if($a === $b) {
return 0;
}
return $a > $b ? 1 : -1;
}
}
$heap = new GenericSplHeap();
$heap->insert(...);
echo $heap->top();
Benchmarks
A simple script will not provide good quality performance tests. It also doesn't ensure their good stability.
Fortunately, there is a very good designed project in PHP
land:
phpbench/phpbench (thanks to @dantleech).
For me Retry Threshold is the most important feature:
PHPBench is able to dramatically improve the stability of your benchmarks by retrying the iteration set until all the deviations in time between iterations fit within a given margin of error.
This means that phpbench
will keep repeating tests until their average time stabilizes.
This gives us confidence that no other processes (running in the background) will affect our tests.
We will run script with --retry-threshold=2
, what will mean, that each iteration can have a maximum of 2% deviation over time.
Let's say clearly what we will tests: we will draw 10,000 random numbers from the range. Then we will measure the time of adding the next random number to our collection and extract the minimum from it. You can check full benchmark script in BinaryHeapBench.
Lets run benchmarks and see what's happened. Here are results for 10 000
random numbers:
composer bench-time benchmarks/Heap/BinaryHeapBench.php
+---------------------+----------+----------+--------+---------+
| subject | mode | mean | rstdev | diff |
+---------------------+----------+----------+--------+---------+
| benchSplMinHeap | 0.759μs | 0.760μs | 0.28% | 1.00x |
| benchGenericSplHeap | 0.874μs | 0.873μs | 0.30% | 1.15x |
| benchBinaryHeap | 1.177μs | 1.182μs | 0.92% | 1.56x |
| benchNativeMin | 10.884μs | 10.823μs | 0.88% | 14.24x |
| benchNativeForeach | 12.502μs | 12.538μs | 0.98% | 16.50x |
| benchNativeSort | 96.412μs | 96.769μs | 0.92% | 127.33x |
+---------------------+----------+----------+--------+---------+
Column marked as mean
means average execution time spend on given approach.
The picture is worth a thousand words so simple chart illustrate it better.
Surprised? 🤯
As you can see, our implementation may not be the fastest, but it maintains an order of magnitude the same as the native solution. To be sure of the stability of the solution, we will now do a series of tests. We will increase the number of items in array to search: from 10x to 100000x 😱. Have no fear, Mr. Kondas is here 😂.
The chart has been cropped (since sort
have dramatic times above 10000 elements), you can see original chart here.
You can find data collected in benchmarks in sources list under this post.
Summary
As you can see BinaryHeap
has very good performance while maintaining flexibility.
We have conducted a series of tests with good stability (--retry-threshold
). You can use scoreFunction
to
implement any heap: from min to max and a whole bunch of others. You can install it using composer: php-ai/php-data-structures
.
Happy heaping 😉