Trees represent hierarchical data structures. While this article focuses on practical implementations—ranging from building basic structures to using professional libraries and machine learning models—the fundamental theory is covered in the article Trees - Theory. These two articles are complementary and should be read together to bridge general tree concepts with hands-on Python practice.
In This Article You Will Learn:
- Implement basic Binary Trees and Binary Search Trees (BST).
- Compare recursive and iterative tree implementations.
- Manage non-binary hierarchical structures like Organization Charts.
- Use professional libraries like
bigtree and sortedcontainers.
- See an example of Decision Trees for Machine Learning with
scikit-learn.
All files related to this article can be found in my GitHub repository at the following link:
https://github.com/analysislessons/Article_TreesPractice
The repository contains the following resources:
-
Two text files containing the sample data:
org_chart.csv and numbers.txt.
-
One code file:
trees_practice.ipynb, which includes the same descriptions as this article and provides the most convenient way to run the code and practice the tasks from the article.
1. Basic Binary Tree Implementation
A simple structure where each node has at most two children.
class Node:
def __init__(self, value):
self.value = value
self.left = None
self.right = None
root = Node("Root")
root.left = Node("Left")
root.right = Node("Right")
print(root.value)
2. Recursive Binary Search Tree (BST)
Below is a recursive implementation of a Binary Search Tree (BST) that uses the Node class defined above. Although a basic BST is rarely used in professional production environments due to its poor worst-case complexity, studying this implementation helps in understanding the core concepts and challenges of managing dynamic tree structures, including operations like insertion, deletion, searching, and traversal.
class BST:
def __init__(self):
self.root = None
# INSERT
def insert(self, value):
self.root = self._insert(self.root, value)
def _insert(self, node, value):
if node is None:
return Node(value)
if value < node.value:
node.left = self._insert(node.left, value)
elif value > node.value:
node.right = self._insert(node.right, value)
return node
# SEARCH
def search(self, value):
return self._search(self.root, value)
def _search(self, node, value):
if node is None:
return False
if value == node.value:
return True
elif value < node.value:
return self._search(node.left, value)
else:
return self._search(node.right, value)
# DELETE
def delete(self, value):
self.root = self._delete(self.root, value)
def _delete(self, node, value):
if node is None:
return None
if value < node.value:
node.left = self._delete(node.left, value)
elif value > node.value:
node.right = self._delete(node.right, value)
else:
# case 1: no child
if node.left is None and node.right is None:
return None
# case 2: one child
if node.left is None:
return node.right
if node.right is None:
return node.left
# case 3: two children
min_larger_node = self._min_value_node(node.right)
node.value = min_larger_node.value
node.right = self._delete(node.right, min_larger_node.value)
return node
def _min_value_node(self, node):
current = node
while current.left:
current = current.left
return current
# TRAVERSAL (in-order)
def inorder(self):
result = []
self._inorder(self.root, result)
return result
def _inorder(self, node, result):
if node:
self._inorder(node.left, result)
result.append(node.value)
self._inorder(node.right, result)
3. Iterative BST Implementation
In professional environments, trees are typically implemented iteratively rather than recursively to optimize performance and avoid stack overflow issues. This is true for standard library structures like C++ STL's std::map and std::set, Java's TreeMap and TreeSet, and most database indexes. Below is an iterative implementation of the same BST for comparison; however, I recommend skipping this section for now and returning to it once you have finished reading the entire article.
class Node:
def __init__(self, value):
self.value = value
self.left = None
self.right = None
class BSTIterative:
def __init__(self):
self.root = None
# INSERT (iterative)
def insert(self, value):
new_node = Node(value)
if self.root is None:
self.root = new_node
return
current = self.root
while True:
if value < current.value:
if current.left is None:
current.left = new_node
return
current = current.left
elif value > current.value:
if current.right is None:
current.right = new_node
return
current = current.right
else:
return # ignore duplicates
# SEARCH (iterative)
def search(self, value):
current = self.root
while current:
if value == current.value:
return True
elif value < current.value:
current = current.left
else:
current = current.right
return False
# DELETE (iterative)
def delete(self, value):
parent = None
current = self.root
# find node
while current and current.value != value:
parent = current
if value < current.value:
current = current.left
else:
current = current.right
if current is None:
return # not found
# helper to replace child
def replace_child(parent, old, new):
if parent is None:
self.root = new
elif parent.left == old:
parent.left = new
else:
parent.right = new
# case 1: no children
if current.left is None and current.right is None:
replace_child(parent, current, None)
# case 2: one child
elif current.left is None:
replace_child(parent, current, current.right)
elif current.right is None:
replace_child(parent, current, current.left)
# case 3: two children
else:
# find inorder successor
succ_parent = current
succ = current.right
while succ.left:
succ_parent = succ
succ = succ.left
current.value = succ.value
# delete successor
if succ_parent.left == succ:
succ_parent.left = succ.right
else:
succ_parent.right = succ.right
# INORDER TRAVERSAL (iterative)
def inorder(self):
result = []
stack = []
current = self.root
while stack or current:
while current:
stack.append(current)
current = current.left
current = stack.pop()
result.append(current.value)
current = current.right
return result
Example usage of BST tree:
tree = BSTIterative()
tree.insert(10)
tree.insert(5)
tree.insert(15)
tree.insert(12)
print(tree.inorder()) # [5, 10, 12, 15]
print(tree.search(12)) # True
tree.delete(10)
print(tree.inorder()) # [5, 12, 15]
📝 Tasks: BST Implementation
- Multiple Inserts: Create a new BST instance and insert the following values: 50, 30, 70, 60, 20, 80. Print the inorder traversal to verify it's sorted.
- Search Verification: Search for the values 30 (exists) and 90 (does not exist) in your tree and print the results.
- Leaf Deletion: Delete the value 20 (a leaf node) and print the inorder traversal again.
4. Persistent Trees: Loading and Saving Data
In many cases, you might already have data stored in a file that you want to load into a tree for efficient manipulation. In these scenarios, you need functions to import data from a file into the tree and, after making modifications, save it back to the file.
def load_tree_from_file(filename):
tree = BSTIterative()
with open(filename, "r") as f:
numbers = f.read().split()
for num in numbers:
tree.insert(int(num))
return tree
def save_tree_to_file(tree, filename):
values = tree.inorder() # sorted output
with open(filename, "w") as f:
f.write(" ".join(map(str, values)))
📝 Tasks: File Interaction
The numbers.txt file contains a list of numbers. Write code using the classes and functions defined above to: open the file, load the numbers into a tree structure, insert the value 20, delete the value 5, and save the modified tree to a file named "output.txt".
tree = load_tree_from_file("numbers.txt")
print("Inorder:", tree.inorder())
tree.insert(20)
tree.delete(5)
save_tree_to_file(tree, "output.txt")
5. Non-Binary Hierarchies: Organization Charts
This implementation utilizes unique IDs for all operations and stores data in a flat table format (CSV). While we previously discussed binary trees, many real-world hierarchies, such as organizational charts, are non-binary. In these structures, a single parent node can have any number of children—for example, a director may have many more subordinates than a lower-level employee. To implement this, we use a list to store subordinates; this list is initially empty, and the add_employee() function is used to append new employees to a superior's subordinate list.
Since an organization's structure is generally persistent, we typically rebuild this tree from a file when the application starts. A sample org_chart.csv with 20 employees has been created for this exercise; the load_from_csv() and save_to_csv() functions below allow you to interact with it.
import csv
import os
class EmployeeNode:
def __init__(self, emp_id, name, position):
self.emp_id = str(emp_id)
self.name = name
self.position = position
self.subordinates = []
def find_node(self, emp_id):
"""Find an employee by their unique ID."""
if self.emp_id == str(emp_id):
return self
for sub in self.subordinates:
found = sub.find_node(emp_id)
if found:
return found
return None
def add_employee(self, emp_id, name, position, superior_id=None):
"""Adds a new employee under the specified superior's ID."""
new_employee = EmployeeNode(emp_id, name, position)
if superior_id is None:
self.subordinates.append(new_employee)
return True
superior = self.find_node(superior_id)
if superior:
superior.subordinates.append(new_employee)
return True
return False
def remove_employee(self, emp_id):
"""Removes an employee and their subtree from the hierarchy."""
for i, sub in enumerate(self.subordinates):
if sub.emp_id == str(emp_id):
return self.subordinates.pop(i)
removed = sub.remove_employee(emp_id)
if removed:
return removed
return None
def to_table(self, superior_id=""):
"""Flattens the tree into a list of lists (table format)."""
rows = [[self.emp_id, self.name, self.position, superior_id]]
for sub in self.subordinates:
rows.extend(sub.to_table(self.emp_id))
return rows
@staticmethod
def from_table(rows):
"""Reconstructs the tree from table rows (list of lists)."""
if not rows: return None
if rows[0][0].lower() == 'id':
rows = rows[1:]
nodes = {row[0]: EmployeeNode(row[0], row[1], row[2]) for row in rows}
root = None
for row in rows:
emp_id, _, _, superior_id = row
if superior_id and superior_id in nodes:
nodes[superior_id].subordinates.append(nodes[emp_id])
else:
root = nodes[emp_id]
return root
def save_to_csv(self, filename):
data = self.to_table()
with open(filename, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(["id", "name", "position", "superior_id"])
writer.writerows(data)
@staticmethod
def load_from_csv(filename):
if not os.path.exists(filename):
return None
with open(filename, 'r') as f:
reader = csv.reader(f)
rows = list(reader)
return EmployeeNode.from_table(rows)
def display(self, level=0):
indent = " " * level
print(f"{indent}- [{self.emp_id}] {self.name} ({self.position})")
for sub in self.subordinates:
sub.display(level + 1)
📝 Tasks: Organization Chart
- Count Team Size: Add a method
get_team_size() to EmployeeNode that returns the total number of people in an employee's hierarchy (themselves + all descendants).
- Find Path to CEO: Implement a function
get_path_to_id(root, target_id) that returns a list of names from the root (CEO) down to the target employee.
- File Interaction: Use
EmployeeNode.load_from_csv("org_chart.csv") to load the organizational chart. Add a new employee under an existing superior, then save the modified hierarchy as my_org_chart.csv.
Solutions:
# Task 1: Add this to the class or use as external function
def get_team_size(node):
return 1 + sum(get_team_size(sub) for sub in node.subordinates)
# Task 2
def get_path_to_id(current_node, target_id, path=None):
if path is None: path = []
path.append(current_node.name)
if current_node.emp_id == str(target_id):
return path
for sub in current_node.subordinates:
result = get_path_to_id(sub, target_id, list(path))
if result: return result
return None
# Verification
root = EmployeeNode("1", "Alice", "CEO")
root.add_employee("2", "Bob", "CTO", "1")
root.add_employee("3", "David", "Dev", "2")
print(f"Total team size: {get_team_size(root)}")
print(f"Path to David: {' -> '.join(get_path_to_id(root, '3'))}")
6. Tree Management with bigtree
bigtree is designed for unstructured trees, where each node can have an unlimited number of children (non-binary). It excels at path-based tree construction and provides seamless integration with Pandas for complex data operations.
from bigtree import dataframe_to_tree_by_relation, print_tree
import pandas as pd
df_org = pd.read_csv("org_chart.csv")
# We map superior_id to name for a name-based tree
id_to_name = dict(zip(df_org['id'], df_org['name']))
df_org['superior_name'] = df_org['superior_id'].map(id_to_name)
root_org = dataframe_to_tree_by_relation(
df_org,
child_col="name",
parent_col="superior_name"
)
print("\nOrg Chart loaded via bigtree (mapped names):")
print_tree(root_org)
from bigtree import dataframe_to_tree
# Path-based construction
data_paths = pd.DataFrame([
{"path": "Project/Source/main.py"},
{"path": "Project/Source/utils.py"},
{"path": "Project/Tests/test_main.py"}
])
root_paths = dataframe_to_tree(data_paths)
print("Path-based tree:")
print_tree(root_paths)
📝 Tasks: bigtree Library
- Custom Data: Create a tree from paths representing a website menu (e.g.,
Home/Products/Software, Home/Products/Hardware, Home/Contact) and print it.
- Find Node Depth: Use
bigtree's built-in attributes (like node.depth) to find how deep the Hardware node is in your tree.
Solutions:
from bigtree import list_to_tree, find_path
# Task 1
menu_paths = ["Home/Products/Software", "Home/Products/Hardware", "Home/Contact"]
menu_root = list_to_tree(menu_paths)
print_tree(menu_root)
# Task 2
hardware_node = find_path(menu_root, "Home/Products/Hardware")
print(f"Depth of Hardware: {hardware_node.depth}")
7. Practical Alternative: sortedcontainers
In Python, instead of trees, we often use sortedcontainers for high-performance, automatically sorted collections.
from sortedcontainers import SortedList
# store (score, name)
leaderboard = SortedList()
leaderboard.add((100, "Alice"))
leaderboard.add((200, "Bob"))
leaderboard.add((150, "Charlie"))
leaderboard.add((80, "Luke"))
leaderboard.add((175, "Emma"))
print(leaderboard) # sorted automatically
# top 2 players
top_2 = leaderboard[-2:]
print(top_2)
# scores between 120 and 200
result = [x for x in leaderboard if 120 <= x[0] <= 200]
print(result)
from sortedcontainers import SortedDict
prices = SortedDict({"TSLA": 180.50, "AAPL": 175.20, "MSFT": 420.10})
for ticker, price in prices.items():
print(f"{ticker}: ${price}")
📝 Tasks: sortedcontainers
- High Scores: Use
SortedList to store scores. Write code that adds 5 scores and then prints only the top 3 highest scores.
- Time-based Log: Use
SortedDict to store log messages where the key is an integer timestamp. Retrieve all logs between timestamp 100 and 200.
Solutions:
from sortedcontainers import SortedList, SortedDict
# Task 1
scores = SortedList([450, 120, 890, 340, 670])
print(f"Top 3: {list(scores)[-3:]}")
# Task 2
logs = SortedDict({50: "Start", 150: "Middle", 250: "End", 120: "Process"})
print("Logs between 100 and 200:")
for ts in logs.irange(100, 200):
print(f"[{ts}] {logs[ts]}")
8. Machine Learning Example: Decision Tree with scikit-learn
Decision trees are a type of supervised learning algorithm used for classification and regression. They use a tree-like model of decisions and their possible consequences.
# Note: Requires 'pip install scikit-learn'
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_text
# Load the famous Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
clf = DecisionTreeClassifier(max_depth=3)
clf.fit(X, y)
# Export the 'tree' logic as text
tree_rules = export_text(clf, feature_names=iris['feature_names'])
print("Decision Tree logic for Iris dataset classification:")
print(tree_rules)