Last post, I described the basic idea of the binary heap data
structure. But I didn't talk about how to implement it - and there's
a very cool way of implementing it - optimally space efficient,
very fast, and with respectable cache locality for moderate sized
The idea is to use a compact, array-based implementation of a binary tree. If we put the nodes in a graph into an array in the order in which they would by traversed by a breadth-first traversal of the tree, then you'll get a
representation of a graph in which the children of the node at position N in the tree will be located at position 2N+1 and 2N+2; and the
parent of the node at position M will be located at either (N-1)/2 (if N is odd) or (N-2)/2 (if N is even). For example, look at
the example heap below; this is the same example heap from yesterday's post, and beneath it is the array-representation of the heap. I've added a couple of children arrows, to let you see how it's laid out.
Remember how yesterday I alluded to heapsort? It's amazingly easy to implement it using this form as an in-place sort - that is, a sort that
can sort a list in the original set of memory locations that it occupies. Heapsort needs only enough extra storage to store one integer (the current size of the heap), and two temporary copies of a single value from the
list (for implementing swaps).
The basic idea is, you've got a heap of N elements, you know that the
largest value is the one in position 0. So you remove that, and do the
bubbling to reinstate the heap properties. Now you've got a temporary copy of
what used to be the largest value in the heap; and you've got a heap of size
N-1. So you can put the largest value - the one you just removed - into
position N, because that's no longer part of the heap. Each time you remove
the largest value, the heap shrinks by one, and you get the correct position
into which you should insert that removed value. When the heap size reaches 0,
your list is now sorted.
Implementing a heap using the array-based storage is also really easy.
Below is a simple implementation of an array-based heap in Python. (This is
not an optimal implementation; it's written to be easy to read. There's a bunch of places where its performance could be improved, or where
it could be written more compactly.)
class Heap(object): def __init__(self): self.list =  def Insert(self, val): # to insert, append the new value to the end, # and then bubble it up to its correct position. l = len(self.list) self.list.append(val) self.BubbleUp(l) def BubbleUp(self, pos): if pos == 0: return parent = self._GetParent(pos) if self.list[pos] > self.list[parent]: self._SwapEntries(pos, parent) self.BubbleUp(parent) def _GetParent(self, pos): if pos == 1 or pos == 2: return 0 # if pos is an odd number, then it's a left child; # if it's even, then it's a right child. if pos % 2 == 0: return (pos-2)/2 else: return (pos-1)/2 def RemoveLargest(self): # to remove largest, store the largest, insert the # last element into its position, and then bubble it # down. result = self.list self.list = self.list.pop() self.BubbleDown(0) return result def BubbleDown(self, pos): children = self._GetChildrenOf(pos) # Three cases: leaf, one child, two children if children == : return elif len(children) == 1: if self.list[children] > self.list[pos]: self._SwapEntries(chldren, pos) self.bubbleDown(children) else: # len(children) == 2 left = self.list[children] right = self.list[children] root = self.list[pos] if left > root or right > root: if left > right: self._SwapEntries(children, pos) self.BubbleDown(children) else: self._SwapEntries(children, pos) self.BubbleDown(children) def _SwapEntries(self, p1, p2): tmp = self.list[p1] self.list[p1] = self.list[p2] self.list[p2] = tmp def _GetChildrenOf(self, pos): l = len(self.list) if l <= (2*pos+1): return  elif l == (2*pos+2): return [2*pos+1] else: return [2*pos+1, 2*pos+2] def _PrettyPrint(self, position, indent): children = self._GetChildrenOf(position) for i in range(indent): print " ", print self.list[position] for c in children: self._PrettyPrint(c, indent+1) def PrettyPrint(self): self._PrettyPrint(0, 0) def HeapUseExample(): h = Heap() for i in range(20): h.Insert(i) print "===============================" h.PrettyPrint() print "===============================" h.RemoveLargest() print "===============================" h.PrettyPrint()
Don't you mean breadth first traversal? Depth first would be [11,10,9,8,2,5,3,1,7,6,4].
Thanks for the catch. I was thinking breadth-first, but typing depth-first. Fixed now.
No problem. Great post. I've been subscribed to this blog for a couple of months now, and I'm loving it all. Keep it up.
... two temporary copies of a single value from the list (for implementing swaps).
Are these really necessary? How about in-place swapping via XOR swap?
I was under the impression that the XOR swap was a trick of binary computers and does not generalized to general architectures. In addition, the advocating of the XOR swap makes many assumptions about the CPU architecture and the implementation of the code (which is a function of the compiler).
The temporary value in a swap() doesn't actually consumes memory space, in practice it will always be implemented as a temporary register.
The "XOR" trick is very type-dependent. It relies on the fact that you can use XOR on the type of values in the heap.
For example, in the Python code used in this post, there's no requirement that the values be integers. They could be floats, doubles, strings, tuples, etc. I can't XOR a float in Python, so the XOR trick wouldn't work for that - much less for things like strings.
Similarly, for the "register" approach, that assumes that you're working with values that can meaningfully be put into a register. If I were using C++, and building a heap that contained strings, with certain implementations of strings, I would need to copy the string - not just the pointer to the string. So it wouldn't be sufficient to just keep a copy of a pointer that could fit in a register - I would need to have a real copy of the string.
1: XOR swap is pointless. Computers are fast, and it's really not an optimization. Everything else has more overhead. It was a neat trick in 1985, but like function inlining for speed, hasn't applied since the late 80s.
2: General architectures? Do you actually have a non-binary computer manufactured after 1955 that any normal language will run on?
As the Wikipedia article points out, XOR swap is actually slower than a temporary variable swap on modern processors with modern compilers. If XOR swap were the fastest approach, the compiler would probably do it for you.
This is a perfect example of the dangers of premature/blind optimization. You will get usually better performance by trusting the compiler to identify and use the best optimizations available. With modern CPUs, a good compiler can usually do better than you ever could--unless you write optimizing compilers for a living :-).
The time to start looking at optimizations is after you've profiled your application for the slow spots. You'll save yourself lots of ugly, buggy code that seemed like it should be faster but really just prevented the compiler from doing a good job.
Function inlining is THE most powerful optimization. Few years ago, I compared contributions from different types of optimization in a complex application (gene sequencer), and inlining was responsible for about 80% of speedups compared to a non-optimized version.
We just don't do inlining manually now.
Nice. More topics like this, please. Also, for a completely another topic, I would like to see your take on the Chinese Remainder Theorem some time.
I would just like to point out that if you use integer division you can just:
Nice article though
It's also worth mentioning that Python ships with a heap implementation in the standard library in the heapq module, the implementation for which has some informative documentation in heapq.py. Additionally heapq functions have fast C implementations where available.
Re: the XOR question.
MarkCC was talking about the space requirements of the algorithm *in principle*, before any discussion of particular concrete hardware and software. Since all data are fundamentally natural numbers with binary representations, the real point of my comment is that no algorithm requires swap space... in principle.
Connecting a comment to the previous post, turning this into a D heap would require little more than calculating the parent index with [D: maximum number of children per element, D>=2] ((pos+D - 2) / D) and the first child with (pos*D - (D-2)) and change the bubbleDown method to find the maximum among up to D number of children.
Scratch my last comment, as it assumes a starting index of 1 and integer division.
Thanks for these heap synopses.
I had not gotten around to really examing heaps which are used in Max-Plus Algebra and Petrie Nets.
"Finite automata with weights in the max-plus semiring are considered. ... Modeling and analysis of timed Petri nets using heaps of pieces ..." in
Heapsort actually has very poor sequential locality. Note that in the array based implementation you must constantly append the sorted list to the end of your array. If you're sorting large datasets on disk or over the network, this will be really slow.
Mergesort is a much better option for sorting on disk; it needs no extra storage and has much better sequential locality, and still has a worst case of O(n log n), though you'll need O(n) storage. Perhaps another post is in order? :)
Still waiting on the "what's it good for" post (and connecting the two Heaps), as in my undergrad days I've never needed one. They taught it in the optional advanced data structures class (along with graphs) which i skipped to take Intro to AI (with Prolog) instead.
Joe Shelby: Consider the priority queue which can be implemented using a heap. And a priority queue in turn can make implementing something like A* or Dijkstra's algorithm alot easier.
ah - yeah, if the "key" to the heap was the priority value, that would do it. I keep forgetting those things exist 'cause for my MOM system (and the Agent framework it was a simulation implementation for), I needed the opposite: I needed to be sure that every message was delivered in exactly the order it arrived in.
Aside from representational storage and/or visual display, I've yet to actually need to do much with graph theory at all in the 15 years I've been a developer. Just the nature of the projects I've been on.
One of the things that interests me is how much variation there can be in the work of people who supposedly do the same jobs.
Back when I worked for IBM, one of the things that I found incredibly strange was that they treated all programmers as if they were interchangeable. But you can take an amazingly skilled and experienced database developer, and if you move them to GUI development, they can be worse than a totally inexperienced new hire.
For me, graph theory is one of the bits of math that I use most. I don't think that I've ever worked on a project that wasn't graph heavy. I spend most of yesterday working on an algorithm that does a search over an edge-labeled graph.
As a result, I sometimes forget that not everyone thinks in such graph-heavy terms.
At the time, we were doing heavy Semantic Web stuff for DARPA (the company still does, but I personally have moved on), and while there was lots of Graph Theory internally, most of that was implemented by the two PhD's who liked that stuff and studied under our academic consultant at UMD College Park.
My job was in the infrastructure and user interface level, so I merely trusted them to get the data right and I just made it look good. :)
Heaps have a number of uses. Here's one (I got it from the page I put in the URL field - not my Web site):
Let's say you have a large set of numbers, and you want only the top 10. One way is to sort the whole list and just read off the top 10. But a faster way using heaps is to create a heap of 10 elements. Start reading in the list of numbers, maintaining the heap structure at every point (not growing it, though - throw away the smallest element to make room for a bigger one).
In a sense, both methods have nlog(n) complexity. However, if the number of m elements needed (10 in our case) is much smaller than n, then using heaps to do this can be much faster. The Web site does some tests and gives you an idea on the speedups.
(Small note: He shows how to do it in Python using the heapq module. I didn't get anywhere near the speedup he claimed, so I emailed him. He gets my result on some machines, and the really fast result on others. Perhaps, as a commenter noted above, some implementations are in C, others (like mine) are in pure Python).