A while ago, I wrote a
of posts that claimed to talk about finger trees. Unfortunately, I really
botched it. I'd read a bunch of data structure papers, and managed to get
myself thoroughly scrambled. What I wrote about was distantly related
to finger trees, and it was useful to help understand how fingertrees work -
but it was not, in any way, shape, or form, actually a description of
fingertrees. Since then, I've been meaning to write a proper post explaining
finger trees - but with the work on my book, and with chaos at work, I just
haven't had the time. This time, in order to do my damnedest to make sure that
I don't screw it up again, I'm basically go to describe finger trees over a
couple of posts by walking through the best finger-tree paper that I could
find. The paper is href="http://www.soi.city.ac.uk/~ross/papers/FingerTree.html">"Finger Trees: a
simple general-purpose data structure", by Ralf Hinze and Ross Patterson.
This might by the paper that introduced the structure, but I'm not
The point of finger trees is pretty simple. It's very similar to the point
of zippers. Programming in functional languages is terrific. As I've described
before, there are a lot of advantages to writing functional code. But there are also
a lot of places where a naive implementation of an algorithm using a functional
data structure is dreadfully inefficient. Functional code may be prettier, more maintainable,
and more reusable - but imperative code is frequently much more efficient. When you're
doing an operation that, conceptually, modifies a piece of a complex data structure,
then functional code can really suck. Finger trees give you a way around that - for many common updatabale data structures, you can build finger-tree versions that are very close to or fully as good as imperative, updating structures.
For example, imagine some numerical code that operates on an array. If you
use a naive array, and you update one element of the array, then in a
functional program, you need to copy the entire array to create a new
array with the modified element. Of course, in a real program, you probably
wouldn't do anything quite that naive - you can usually find some way of
representing things to reduce the cost (or you can use a compiler that can do
copy-to-update transformations when it's possible). But you are, at least
conceptually, stuck making copies of something in order to do a
functional update. So what you want to do is to minimize that cost.
The simplest example of that is something like a gap buffer. To modify something at the
gap, you only need to copy a couple of pointers, and you can re-use the vast majority
of your original structure. Similarly, in a zipper, you can modify something at the focal
point of the zipper very quickly, and you can move the focus reasonably quickly. What
a finger tree does is generalize the concept of zipper, so that you end up with
a structure that's sort of like a tree with multiple zippers. It's almost like taking
a zipper - which splits a tree into (left, path-to-current, right) - and then taking
the left and right subtrees, and turning them into zippers; and taking their left and right
subtrees, and turning them into zippers. You end up with a structure where you've
got multiple pointers into the depths of the tree - and you can modify the tree just by modifying
the zipper-like structure closest to the value you want to update.
To explain a finger tree in detail is pretty complicated. The structure itself isn't
that bad - but to understand why it has the performance characteristics that
it does is rather tricky. It relies on the fact that certain operations are done lazily -
and that the lazy evaluation guarantees that no more than a certain amount of time will
need to be spent on any structure change.
But we'll come back to that later. Let's start by looking at a simple structure,
which we'll gradually turn into a finger tree. To start off with, we want to
represent a sequence. We can do that with a 2/3 tree like the following Haskell code.
data Tree t = Zero t | Succ (Tree (Node t)) data Node s = Node2 s s | Node3 s s s
Just this initial code is a bit tricky, so I'll explain it in a bit of detail.
It's creating a 2-3 tree that has a semi-numerical structure to it. Each level
of the tree is basically a distinct type. In this tree, a node is an internal
non-leaf node of the tree. Only leafs contain values; internal nodes are just for
The trick to understanding this is in the definition of
Tree a. What this does is effectively create
a distinct type for each tree level. A leaf of a tree of ts is a value of
type t, wrapped by the
Zero constructor. So a leaf node of a tree of
integers could be
Zero 23. A level one node of a
is a value of type
Tree (Node t) wrapped by the
Succ constructor -- so,
for example, it could be
Succ( Zero (Node2 23 34))
. A level two
node is a value of type
Tree (Node (Tree (Node tt)) - like
Succ((Zero(Node2 (Succ (Zero (Node2 23 32))) (Succ (Zero (Node2 43 45)))))).
Just this much is interesting: the type itself is expressing one of the
structural constraints of the tree! 2-3 trees are always balanced - the length of
the path from root to leaf is always the same for all nodes. This type declaration
means that it's a type error to have a level 1 tree be a child of a level
And we haven't even gotten to the finger tree yet.
To get to a finger tree, we need to know what a finger is. A
finger is a structure that provides a reference to a node deep in a tree. In a
zipper, the zipper-path is a finger: it's a trail down the tree that gives you
fast access to a particular node of the tree, and which you can then use for
fast updates or traversals of the nodes around the finger. A finger tree is a
tree structure of fingers superimposed on a conventional balanced tree. In
general, the internal nodes of the tree are annotated with monoids (a la my
botched original article) in order to help you find the part of the tree that
Before we get to anything like a type declaration, let's look at a
picture. Here's a 2-3 tree of integers.
To make it a finger tree, what you do, in theory, is reach down to the
leftmost and and rightmost internal nodes of the tree - in our case, the
parents of (1,2) and (8, 9, 10). Then you pick up the tree by those two
nodes, and let the rest dangle down. Like this:
If you look at this, you've got a finger for the node (1,2), and a finger
for the node (8,9,10). In between them, you've got a bunch of stuff dangling.
But if you look at the dangling stuff: it's a finger tree. It's got a finger
for its leftmost internal node, and a finger for its rightmost internal node,
and in between, it's got some stuff dangling. That stuff is, in this
case, an empty finger tree.
That's basically the structure of a finger tree: you've got fingers for
the head and tail parts of the tree; and you've got a finger tree in between
When you make the tree a bit bigger, you can see some more of the basic
properties of the tree. Take this tree:
And dangle it:
The finger on the outermost tree contains only leaf
nodes. The finger on the next inner tree contains level one nodes. And the
next one contains level 2 nodes. And so on. You've got fingers for the first
and last nodes at each depth level of the tree. the tree.
We'll see a bit more about that as we look at how to build some structures
with a finger tree. We'll start by looking at a generic finger tree. It's
roughly what you would expect from the description above; a finger tree is
either a single node; a empty node (like the root in the dangled tree
above); or a left finger, a right finger, and a finger tree in between. It's built
on the 2-3 tree, so we'll reuse the
Node type from that.
data FingerTree a = Empty | Single a | Deep (Digit a) (FingerTree (Node a)) (Digit a) data Digit a = [ a ]
Digit type is new. I'd prefer to call it a finger, but
Hinze and Paterson use
Digit is basically a buffer
that contains the members of the node of the 2/3 tree that was replaced by the digit.
The reason that we make that switch - from a node using a constructor with fixed numbers
of children is because we're going to cheat. You'll see the details later.
An important thing to notice here: this uses the same node type trick as
the two-three tree: each time you shift to a deeper level of finger tree,
the type of the arguments to
FingerTree enforce the
idea that the fingers on each level point to higher-order nodes. At the top of
the finger tree, we've got a digit of
as; at the next level, we've got digits
Node as; climbing down the central spine, the third level
has digits of
Node2 as; in general, at level N+1,
the digits contain
We've already got enough to start seeing a bit of what makes the tree interesting. Suppose
we want to find the value at position x. Even though the number of nodes per branch is
variable, we still know that the value as position x will be either in, or very close to,
the nodes at level lg(x).
To actually write the implementation, we need a bunch of general reduction operations. In
Haskell, the whole idea of reduction is really deeply engraved in the language - so there's
a type-class for defining reduction. So what we do is just provide the instance
definition of reductions that we'll need for the tree.
We'll start with nodes:
instance Reduce Node where reducer (-<) (Node2 a b) z = a -< (b -< z) reducer (-<) (Node3 a b c) z = a -< (b -< (c -< z)) reducer (>-) (Node2 b a) = (z >- b) >- a reducer (>-) (Node3 c b a) = ((z >- c) >- b) >- a
Reduction on trees is kind of strange until you wrap your head around it.
It uses a technique called lifting. Basically, reduction on
a tree is based on reduction on a node - the operation of reducing the tree is,
in fact, doing a reduce on the tree using reduce on a node as the reduction operation:
instance Reduce FingerTree where reducer (-<) Empty zero = zero reducer (-<) (Single x) zero = x -< zero reducer (-<) Deep left mid right zero = left -<' (mid -<'' (right -<' zero)) where (-<') = reducer (-<) (-<'') = reducer (reducer (-<)) reducel (>-) zero Empty = zero reducel (>-) zero (Single x) = zero >- x reducel (>-) zero (Deep left mid right) = ((zero >-' left) >-'' mid) >-' right where (>-') = reducel (>-) (>-'') = reducel (reducel (>-))
It's a bit subtle - but it does really make sense. A finger tree is a
collection of nodes on the left, a tree of nodes in the middle, and a
collection of nodes on the right. So to reduce a finger tree, you need to
reduce the left-hand nodes to a list of values, and then reduce those values -
in other words, do a reduction of the list of nodes using the reduction of
nodes as the operator. And so on. Trace it through, and it makes a lot of
sense. (But I would probably never have come up with it myself.)
Now, let's get a look at an example of something you can do with a finger
tree that shows it off. We're going to build a double-ended queue. Building a
single-ended queue is easy; it's no less efficient in a functional language
than in an imperative one. But a double-ended queue is a problem. Traditional
implementations would end up with roughly O(lg n) time. With a finger tree, we
can get amortized time of O(1) - the same performance as imperative. That's
So how does it work? Well, for starters, we need to munge the structure a
bit. Theoretically, our finger-tree is based on a 2-3 tree. But if we always
perfectly maintained the two-three tree structure, we wouldn't be able to get
the performance we want. That's why our definition of
Digit is a
list of values: sometimes we'll temporarily have either one or four in a
digit. We'll always eventually restore the proper 2-3 tree, but sometimes
we'll defer it for a while.
Let's consider the push operations. In a deque, there are two different push operations;
one to push at the front, and one at the back. In the H&P paper, they define these
using some pretty symbols; but it's much easier for me to type text than symbols in
HTML, so I'm going to name them push_front and push_back:
push_front :: a -> FingerTree a -> FingerTree a push_front a Empty = Single a push_front a (Single b) = Deep [a] Empty [b] push_front a (Deep [b, c, d, e] mid right) = Deep [a, b] (push_front (Node3 c d e) mid) right push_front (Deep left mid right) a = Deep ([a] ++ left) mid right
So far, this is pretty simple:
- Push onto an empty queue-tree, and you get a tree with
a single value.
- Push onto a tree with one value, and you get a finger tree where the left finger is the
first value, the right finger is the second value, and in between them is an empty tree.
- Push into a non-empty finger tree with a four-value left digit, and you fix the
two-three structure; you take one value from the four in the left digit, and join it
with the new entry; take the remaining three, and turn them into a node; then
insert the new node into the finger tree in the center.
To do a push at the end, you just take the mirror image of the code above:
push_back :: FingerTree a -> a -> FingerTree a push_back Empty a = Single a push_back (Single b) a = Deep [b] Empty [a] push_back (Deep left mid [e, d, c, b]) a = Deep left (push_back mid (Node3 e d c)) [b,a] push_back left mid right = Deep left mid (right ++ [a])
This is really beautiful. We're working with this remarkably clever
structure - and yet an insert operation isn't any more complicated than the insert
on a simpler structure.
So we can push things into trees. We need to be able to create deques from
lists of values; and we need to be able to remove values from the trees. To build those
operations, we need a bit of infrastructure. In particular, we need lifted versions of
the push operations.
push_front' :: (Reduce f) => f a -> FingerTree a -> FingerTree a push_front' = reducer push_front push_back' :: (Reduce f) => FingerTree a -> f a -> FingerTree a push_back' = reducel push_back
With these lifted operators, we can create a finger tree from any reducable collection.
Basically, you can create a finger tree based on a reduction operator by just lifting that operator
toTree :: (Reduce f) => f a -> FingerTree a
toTree s = push_front' s Empty
Now we're getting to the point where things start to get tricky. This
isn't the way that I would have written it - but they provide a way of
defining a view of the fingertree as a list. That lets
you take the fingertree, and call a function which can lazily
convert your tree into a traditional cons-list:
data LeftView s a = LeftNil | LeftCons a (s a) leftView :: FingerTree a -> LeftView FingerTree a leftView Empty = LeftNil leftView (Single x) = LeftCons x Empty leftView (Deep left mid right) = LeftCons (head left) (leftDeep (tail prefix) mid right)
Now, you'll notice that in the last, recursive case of
leftView, we call
leftDeep". Normally, in code like this, that would be the "
constructor of the fingertree. But sometimes, there will be only one element in the left
digit; so when we use
leftView, it will invoke the constructor with the tail of
the left digit, which is empty. That's not valid in the
of the finger tree. But instead of making the cases of
leftView more complicated,
we provide a pseudo-constructor:/p>
leftDeep :: [a] -> FingerTree (Node a) -> Digit a -> FingerTree a leftDeep  mid right = case leftView mid of LeftNil -> toTree right LeftCons a mid' = Deep (toList a) mid' right leftDeep left mid right = Deep left mid right
You can create a
rightView by mirroring
leftDeep exactly the way we mirrored
These view functions are the heart of how we do pops from the deque. They do all of the work:
- To check if the deque is empty, compare its
- To retrieve the front element, take its
leftViewand extract the first parameter
- To get the deque with the first element removed, take its
leftView, and extract
the second parameter of its
(And, of course, mirror all of the above for the right-end of the deque.)
To make things simpler, you can provide functions for all of this:
left_peek :: FingerTree a -> Maybe a left_peek tree = case (leftView tree) of LeftNil -> Nothing LeftCons h t -> Just h left_pop :: FingerTree a -> FingerTree a left_pop tree = case (leftView tree) of LeftNil -> LeftNil LeftCons h t -> t
In terms of performance, there is a bit of subtlety to this. First and most importantly, you
need to remember laziness. When you call
leftView, nothing is
actually calculated until you extract and use the values. So while when you look at the code,
you see the full construction of the remainder deque whenever you peek at the head or tail
of the deque, it doesn't happen until you use it. And even then, it doesn't do the
whole thing: it does just enough to give you the values that you need.
To get the O(1) bound, you need to do an amortized analysis. I'm not going to go into depth
about it - check the paper if you're interested. But most of the time, there's a non-empty list in the
finger. In those cases, it's obviously O(1). Less frequently, you need to do a bit of
shuffling between the finger and the middle tree. Even less frequently - less frequently by
a factor of O(lg N), while doing the shuffling on the first inner tree, you'll need to do
some shuffling on the next inner tree. And so on. Each level of shuffling becomes less likely
by a factor of O(lg N). That O(lg n) factor works out as an implication of the structure of the tree:
the fingers on each inner tree get bigger, making it less likely that you'll need to push further
into the tree.
Anyway - this is getting long and complicated, so I'm going to end this post here. Next, we'll
look at more structures that you can build using finger trees as a foundation. Just from this much, the
finger tree is incredibly cool - but it gets better.
One comment: I much preferred your other posts which were Literate Haskell programs. It's nice to know that I can copy/paste an entire blog post and execute it.
//I haven't had time to go all the way through the actual content of the post.
data Digit a = [ a ]
Pretty sure you meant to say type Digit a = [a].
To my mind, any discussion of finger trees that doesn't at least mention the fact that splay trees accomplish some of the same things automatically (without the explicit fingers and without all the work) is at least somewhat deficient.
Specifically, in a finger tree with one finger, accessing an item takes time O(log distance to finger). In a splay tree, you get the same time bound, as Cole showed ("On the Dynamic Finger Conjecture for Splay Trees", SIAM J. Comput. 1999 and 2000).
The dynamic optimality conjecture for splay trees would imply that splay trees could match the O(log k) + O(log distance to nearest finger) time of a finger tree with k fingers, automatically, without need for k to be specified as part of the data structure description, but I'm not aware of a proof of that more general result.
D. Eppstein said:
"To my mind, any discussion of finger trees that doesn't at least mention the fact that splay trees accomplish some of the same things automatically (without the explicit fingers and without all the work) is at least somewhat deficient."
I don't see how that applies in a persistent setting. Even if mutation is allowed, I think persistence makes the amortized bounds assisted by splaying useless, since the worst-case behavior can happen repeatedly.
On a tangent from D. Eppstein's post, as far as simplicity goes, the FTs above are simpler than a much more complicated structure of Kaplan & Tarjan that achieves the same bounds in a purely functional setting, but worst-case, rather than amortized. In fact, K&T's version can append in lg(lg(min(n,m))) (actually, I think it can do even better: lg(min(lg n, |lg m - lg n|, lg m))), while the version presented above takes lg(min(n,m)).
An perhaps simpler purely functional fingertree than even the one presented above is treaps, which can achieve the same bounds, but randomized rather than amortized. See Functional Set Operations with Treaps
Your description of the 2/3 tree type is a bit off; an example of a level two tree is
Succ (Succ (Zero (Node2 (Node2 23 32) (Node2 34 45))))
There are no occurrences of the "Tree" type (and its constructors Succ and Zero) once you start hitting nodes; rather, the internal bit is of type
(Node (Node (... (Node Int) ...)))
where the number of occurrences of Node is exactly equal to the number of Succs used. This is cool because the tree balance condition is encoded into the type system; it's impossible to construct an unbalanced 2/3 tree of this type.
Pity we imperative-language coders! A little java would be nice.
If you're an imperative-language programmer, there's no need for anything like a fingertree. In an imperative language, you can create a deque out of a doubly-linked list. The point of a fingertree is to try to get the same efficiency out of a lazy functional language like Haskell as you could using an imperative approach in a language like Java.
Oh - and what happens when the data structure contains only one element? You point at a leaf node, I suppose.
Mark C. Chu-Carroll wrote:
"If you're an imperative-language programmer, there's no need for anything like a fingertree. In an imperative language, you can create a deque out of a doubly-linked list."
That isn't quite true. A finger tree does much more than provide the usual deque operations (push/pop/insert/extract). Those four operations can be supported in constant amortized by simpler implementations like Okasaki's BankersDeque.
Additionally, imperative doubly-linked lists support some constant-time operations that finger trees do not support (yet, at least) in constant time, like append. OTOH, there are purely functional data structures that do offer constant-time push/pop/insert/extract and append, even worst-case.
As you can see, finger trees neither subsume nor are subsumed by deques.
On their own, finger trees are still useful for imperative programmers. In fact, the invention of finger trees was in an imperative setting. Gerth Brodal explains some applications of finger trees in a book chapter he wrote on the subject. Applications like "fast sort" are as useful to imperative programmers as they are to functional programmers.
Ok, maybe I'm just confused by the similarity of names between two different things. Finger search trees are a special kind of binary search tree that lets you maintain fingers into the tree and search quickly for stuff near the fingers â and the fast searching performance of finger search trees can be simulated by splay trees. They don't really have anything to do with persistence any more than any other data structure can be made persistent. But If I understand your post, finger trees are sort of like that without the searching part, so you can only move between consecutive elements, but they're functional so they work in a persistent setting?
You make it sound as though the issue is that the functional code is usually or often "less efficient" than the imperative code, but this is not the case at all. The issue here is the distinction between persistent and ephemeral data structures. Functional languages make it particularly easy to implement persistent data structures, and slowly but surely people at large are beginning to realize that this is the common case. But it is not at all fair or reasonable to compare a persistent data structure implemented functionally to an ephemeral data structure implemented ephemerally. They have different semantics, and hence are incomparable.
I claim that if you implement a persistent data structure such as, say, a dictionary using imperative methods you will, after spending who knows how long getting it wrong, have something no more efficient, sometimes even less efficient, than the functional implementation. Now if you insist on an ephemeral data structure, then that can be hard to do functionally, but who would want to _force_ the ephemeral semantics? At best we settle for an ephemeral data structure when that last iota of efficiency matters. But really it rarely does, for two reasons. One is latency hiding. It is totally unimportant if one operation of a data structure is, say, slower by a log factor in the persistent, as opposed to the ephemeral case, because _end to end_ (ie, in an application) there may be other log factors that will mask the difference anyway. Another reason is parallelism. Good luck making an imperative implementation of an inferior ephemeral data structure parallel. Often the persistent data structure with a functional implementation is readily parallelizable, with zero headaches about correctness (eg, no race conditions). Third, a non-obvious reason why functional implementations of persistent data structures is faster than even an imperative implementation of an ephemeral structure is _cache effects_. False sharing in the imperative case can cause the code to run slower than the persistent, functional case because you don't reuse cache lines.
Bottom line: it is far from obvious that functional is slower than imperative. In fact it's often, but not always, the other way around! Plus you get the added benefit of a richer persistent semantics, rather than an impoverished ephemeral semantics. What's not to like?
D. Eppstein wrote:
"Ok, maybe I'm just confused by the similarity of names between two different things. Finger search trees are a special kind of binary search tree that lets you maintain fingers into the tree and search quickly for stuff near the fingers â and the fast searching performance of finger search trees can be simulated by splay trees."
I mostly agree with this; finger search trees need not be binary. I think it is clearer to think about "finger search tree" as an interface.
"They don't really have anything to do with persistence any more than any other data structure can be made persistent."
"But If I understand your post, finger trees are sort of like that without the searching part, so you can only move between consecutive elements, but they're functional so they work in a persistent setting?"
Nope. The trees presented in the post above, which were invented AFAIK by Hinze & Patterson, can provide the all of operations that are available in finger search trees with the same asymptotic cost, but limited by the following: there do not exist more than a constant number of fingers.
The general finger search tree interface looks something like:
empty: an empty FST. This doubles as a finger.
find(f,x): find value x starting from finger f; return a finger to x's location. f remains a valid finger.
delete(f): delete the value pointed to by finger f, return a finger to the element just larger than the one at f. f is no longer a valid finger.
insert(f,x): insert a value x to the right of finger f, returning a finger pointing at x. f remains a valid finger.
I took inspiration from Brodal's optimal pointer-machine finger search trees for that interface. Brodal offers: O(1) time for empty, delete and insert, and O(lg d(f,x)) time for find, where d(f,x) is the number of elements in the tree between x and the element pointed to by f.
The Hinze & Patterson trees presented in the blog post at the top of this page offer:
empty: an empty FST
singleton(x): builds a new FST with one element
split(t,x): Builds and returns two new finger trees L and R such that
(1) the concatenation of L and R (in that order) is identical to the tree t
(2) all values in L are less than x and all values in R are greater than or equal to x
join(s,t): Builds a new finger tree containing all the elements of s and t in that order
split takes time O(lg(min(|L|,|R|))) and join takes time O(lg(min(|s|,|t|))).
Now, a single H&P tree has two fingers: one at the left and one at the right. But you can simulate, if you wish, a constant number of fingers. To make a new finger in the middle of the tree t at value x, just split t (at x) into L and R. Now you have the burden of keeping track of two trees. This is why H&P trees only support a constant number of fingers (while maintaining their performance bounds, of course. You could add n fingers if you like, but you'd have to keep track of n elements, which is the very problem you started with!)
Now the traditional FST operations are pretty simple: "find" is discussed in the last paragraph; it's just split. "delete" is just a split into L and R of size 1 and n-1, respectively; the deletion is done by just ignoring L. "insert" is just singleton followed by join.
Oh, and though it's true that MCC didn't mention split and join in his blog post, they are sort of the heart of H&P's (and K&T's, who I mentioned in an earlier comment) finger tree design. Without them, you just have deques. If you're looking for amortized constant-time deques, you can use a data structure like
data Deque a = Deque [a] [a] Int Int
(See Okasaki's thesis)
Of course, that deque has O(n) worst-case cost.
Also, Mark, I am a bit unclear on a bit at the end of your post about how often you must recurse. Where you say:
"Even less frequently - less frequently by a factor of O(lg N), while doing the shuffling on the first inner tree, you'll need to do some shuffling on the next inner tree. And so on. Each level of shuffling becomes less likely by a factor of O(lg N)."
"Thus, at most half of the operations descend one level, at most a quarter two levels, and so on. Consequently, in a sequence of operations, the average cost is constant."
So I think the factor is constant, rather than lg N.
Keep up the good work!
Amusingly, I happen to be giving a talk tomorrow on FingerTrees at the Boston Haskell user group, so this came with impeccable timing. ;)
As an interesting side note, until you have to implement split you can get away with just letting the (nested) digits in your Deep constructors deal with the slop and you only need Node3s, since cons/snoc/view never introduce a Node2.
"there's a type-class for defining reduction"
Where can I find that? My ghc 6.8.2 shows no trace of it: "Not in scope: type constructor or class `Reduce'"
I tried to find a module for it, but Google is no help, either.
> A level two node is a value of type Tree (Node (Tree (Node tt)) - like Succ((Zero(Node2 (Succ (Zero (Node2 23 32))) (Succ (Zero (Node2 43 45)))))).
This is incorrect, a level two node is just a value of Tree t.
Succ (Succ (Zero (Node2 (Node3 1 2 3) (Node3 4 5))))
The reason for the polymorphic recursion is that you wind up with something of the form Succ^n (Zero nodes), where nodes have as many layers of Node2/3 as you have Succ nodes.
Reduce is defined in the paper. It has since been supplanted by Data.Foldable, which the current Data.Sequence and Data.FingerTree data types instantiate.
In case you're interested the slides from my finger tree talk are now online:
They may serve as a nice way to flesh out the explanation given here.
In the paper the typeclass they use is called Reduce but the standard typeclass is Foldable which does the same thing.
A level two node is a value of type Tree (Node (Tree (Node tt)) - like Succ((Zero(Node2 (Succ (Zero (Node2 23 32))) (Succ (Zero (Node2 43 45)))))).