What is a tree data structure?

Tree is a widely-used powerful data structure that is a viable addition to a programmer’s toolkit. A lot of complex problems can be solved relatively easily if the data is stored in a tree. It is a nonlinear data structure, compared to arrays, linked lists, stacks and queues which are linear data structures.

Tree is a non-linear data structure which stores the information naturally in the form of hierarchy style.

Terminology in Trees

Root

The topmost node in a tree. The tree originates from this node
This is the only node in the tree that has no parent.
There is exactly one root in each tree data structure.

Edge

The line that connects two nodes is known as an edge.
If there are n nodes in a tree, there exist n-1 edges.
Edge is also referred to as ‘ branch ’ in trees.

Child

A node that is a descendant of some node is called child node.
All nodes(except root) are child nodes.

Parent

Parent node is the immediate predecessor of a node.
Any node which has one or more children is a parent node.
A group of nodes with the same parent is called siblings.

Leaf

A node that does not have any child node is a leaf node.
Leaf nodes are also known as external or terminal nodes.
A node with atleast one child node is called internal nodes.

Height

The height of a node is the maximum number of edges between that node and a leaf node.
The height of the tree is the height of the root node.

Level

Level of a node is basically the height difference between that node and the root node
Level of root is 0.

Trees can have any number of children but the simplest and most common type of tree is a binary tree.

Ancestor: Any node which precedes a node i.e itself, its parent, or an ancestor of its parents.

Descendent: Any node which follows a node i.e. itself, its child, or a descendent of its child.

Ordered Tree: A rooted tree in which an ordering is specified for the children of each vertex. For example: Binary Search Tree, Heap etc.

Size of a tree: Number of nodes in the tree.

A node in a binary tree can have a maximum of 2 child nodes .

Binary Tree Properties

Maximum number of nodes in a tree of height h = 2^h-1
Minimum number of nodes in a tree of height h = h (One node in each level)
Maximum number of nodes on a level l = 2^l - 1
In a Binary Tree with N nodes, minimum possible height or minimum number of levels = Log2(N+1)
In Binary tree where every node has 0 or 2 children, number of leaf nodes is always one more than nodes with two children.

Binary Tree Types

There are primarily 5 types of binary trees:-

1. Rooted Binary Tree

Rooted Binary tree is the simplest type of binary tree where each node has either 0, 1 or 2 child nodes.

2. Full Tree

Every node has either 0 or 2 children. No node should have only 1 child node. It is also known as strictly binary tree.

3. Complete Tree

In a complete tree, all internal nodes have 2 children and all leaf nodes are on the same level. In other words, all levels in a complete tree are entirely filled to its maximum capacity. It is also known as perfect binary tree .

4. Almost Complete Tree

In an almost complete tree, all levels are entirely filled except the last level. The last level is filled from left to right.

5. Skewed Tree

A skewed tree is one in which all nodes have only one child except the leaf node. In a left-skewed tree, all child nodes are left child nodes and in a right-skewed tree, all child nodes are right child nodes.

Advantages and applications of trees

As mentioned above, tree is a very powerful data structure and has a lot of advantages and applications. Some of these are:

It is used to represent structural relationships between its data elements.
It is an effective tool to represent hierarchies like folder structure, XML data or organization structure.
Heap is an almost complete binary tree implemented using arrays.
There are also variations of trees like Tries, B+ Trees, Suffix Trees, Syntax Tree, etc. which are designed in a way to provide specific benefits in their applications. Example, B+ trees are used to implement indexing in databases.
Binary Search Tree(BST) is the most common variation of binary tree providing extremely efficient insertion, deletion, searching operations.
Like Linked Lists and unlike Arrays, Trees don’t have an upper limit on number of nodes as nodes are linked using pointers.

Representation of Binary Trees

We’ve all learnt what a binary tree is, what it’s properties are and what are its advantages, but do you know how to represent a binary tree?

Confused? No problem.

You are given a simple binary tree, how do you suppose you’re gonna store it in your program? This is known as the representation of a binary tree.

There are basically two primary ways we use to represent binary trees:

Linked Representation: The parent nodes contain the address of its children.
Sequential Representation: Represent the level-order sequence in an array.

Linked Representation

In linked representation of a binary tree, every node is an object of a class. The class is supposed to represent a node of a tree and each class has mainly three data members: the value of the node, the reference to the left child and the reference to the right child

class Node
{
      int val
      Node left
      Node right
}

left holds the reference to the left child of a node and right holds the reference to the right child of the node. Similarly, the entire tree is constructed.

But what if there is no left child or no right child? Well, if there is no left child, the reference to the left child will hold the value NULL.

Advantages

Difficult to implement, but simple to understand.
Easy to augment the binary tree, like adding additional data or references.
Does not waste memory.

Sequential Representation

In sequential representation, an array is used which stores the value of the nodes of each level from left to right.

The size of this array will be 2^h + 1, where H is the height of the tree. If a node is not present(known as a hole ), it can be represented by a special character such as ‘-’.

As you can see, whether a node is present or not, memory is allocated for a complete binary tree of height h. So if there are a lot of holes present in the tree, the sequential representation might be a bad idea as it takes up a lot of space.

But how do we identify the child or parent node of a specific node in sequential representation?

Let us consider a node at index i .

Parent of a node: At index i/2

Left child of a node: At index i*2

Right child of a node: At index i*2+1

Advantages

Difficult to understand but simple to implement.
Less prone to bugs during implementation.
Easy to compress and portable as it’s structure is not dependant on memory addresses.

These two are the most commonly used representations of a node. You can create some other representation too that suits your tailored needs.

Happy Coding! Enjoy Algorithms!