DATA ANALYSIS  

Data Structures: Trees in Python

Trees represent hierarchical data structures. While this article focuses on practical implementations—ranging from building basic structures to using professional libraries and machine learning models—the fundamental theory is covered in the article Trees - Theory. These two articles are complementary and should be read together to bridge general tree concepts with hands-on Python practice.

In This Article You Will Learn:

  • Implement basic Binary Trees and Binary Search Trees (BST).
  • Compare recursive and iterative tree implementations.
  • Manage non-binary hierarchical structures like Organization Charts.
  • Use professional libraries like bigtree and sortedcontainers.
  • See an example of Decision Trees for Machine Learning with scikit-learn.

All files related to this article can be found in my GitHub repository at the following link: https://github.com/analysislessons/Article_TreesPractice

The repository contains the following resources:

  • Two text files containing the sample data: org_chart.csv and numbers.txt.
  • One code file: trees_practice.ipynb, which includes the same descriptions as this article and provides the most convenient way to run the code and practice the tasks from the article.

1. Basic Binary Tree Implementation

A simple structure where each node has at most two children.

class Node: def __init__(self, value): self.value = value self.left = None self.right = None root = Node("Root") root.left = Node("Left") root.right = Node("Right") print(root.value)

2. Recursive Binary Search Tree (BST)

Below is a recursive implementation of a Binary Search Tree (BST) that uses the Node class defined above. Although a basic BST is rarely used in professional production environments due to its poor worst-case complexity, studying this implementation helps in understanding the core concepts and challenges of managing dynamic tree structures, including operations like insertion, deletion, searching, and traversal.

class BST: def __init__(self): self.root = None # INSERT def insert(self, value): self.root = self._insert(self.root, value) def _insert(self, node, value): if node is None: return Node(value) if value < node.value: node.left = self._insert(node.left, value) elif value > node.value: node.right = self._insert(node.right, value) return node # SEARCH def search(self, value): return self._search(self.root, value) def _search(self, node, value): if node is None: return False if value == node.value: return True elif value < node.value: return self._search(node.left, value) else: return self._search(node.right, value) # DELETE def delete(self, value): self.root = self._delete(self.root, value) def _delete(self, node, value): if node is None: return None if value < node.value: node.left = self._delete(node.left, value) elif value > node.value: node.right = self._delete(node.right, value) else: # case 1: no child if node.left is None and node.right is None: return None # case 2: one child if node.left is None: return node.right if node.right is None: return node.left # case 3: two children min_larger_node = self._min_value_node(node.right) node.value = min_larger_node.value node.right = self._delete(node.right, min_larger_node.value) return node def _min_value_node(self, node): current = node while current.left: current = current.left return current # TRAVERSAL (in-order) def inorder(self): result = [] self._inorder(self.root, result) return result def _inorder(self, node, result): if node: self._inorder(node.left, result) result.append(node.value) self._inorder(node.right, result)

3. Iterative BST Implementation

In professional environments, trees are typically implemented iteratively rather than recursively to optimize performance and avoid stack overflow issues. This is true for standard library structures like C++ STL's std::map and std::set, Java's TreeMap and TreeSet, and most database indexes. Below is an iterative implementation of the same BST for comparison; however, I recommend skipping this section for now and returning to it once you have finished reading the entire article.

class Node: def __init__(self, value): self.value = value self.left = None self.right = None class BSTIterative: def __init__(self): self.root = None # INSERT (iterative) def insert(self, value): new_node = Node(value) if self.root is None: self.root = new_node return current = self.root while True: if value < current.value: if current.left is None: current.left = new_node return current = current.left elif value > current.value: if current.right is None: current.right = new_node return current = current.right else: return # ignore duplicates # SEARCH (iterative) def search(self, value): current = self.root while current: if value == current.value: return True elif value < current.value: current = current.left else: current = current.right return False # DELETE (iterative) def delete(self, value): parent = None current = self.root # find node while current and current.value != value: parent = current if value < current.value: current = current.left else: current = current.right if current is None: return # not found # helper to replace child def replace_child(parent, old, new): if parent is None: self.root = new elif parent.left == old: parent.left = new else: parent.right = new # case 1: no children if current.left is None and current.right is None: replace_child(parent, current, None) # case 2: one child elif current.left is None: replace_child(parent, current, current.right) elif current.right is None: replace_child(parent, current, current.left) # case 3: two children else: # find inorder successor succ_parent = current succ = current.right while succ.left: succ_parent = succ succ = succ.left current.value = succ.value # delete successor if succ_parent.left == succ: succ_parent.left = succ.right else: succ_parent.right = succ.right # INORDER TRAVERSAL (iterative) def inorder(self): result = [] stack = [] current = self.root while stack or current: while current: stack.append(current) current = current.left current = stack.pop() result.append(current.value) current = current.right return result

Example usage of BST tree:

tree = BSTIterative() tree.insert(10) tree.insert(5) tree.insert(15) tree.insert(12) print(tree.inorder()) # [5, 10, 12, 15] print(tree.search(12)) # True tree.delete(10) print(tree.inorder()) # [5, 12, 15]

📝 Tasks: BST Implementation

  • Multiple Inserts: Create a new BST instance and insert the following values: 50, 30, 70, 60, 20, 80. Print the inorder traversal to verify it's sorted.
  • Search Verification: Search for the values 30 (exists) and 90 (does not exist) in your tree and print the results.
  • Leaf Deletion: Delete the value 20 (a leaf node) and print the inorder traversal again.

4. Persistent Trees: Loading and Saving Data

In many cases, you might already have data stored in a file that you want to load into a tree for efficient manipulation. In these scenarios, you need functions to import data from a file into the tree and, after making modifications, save it back to the file.

def load_tree_from_file(filename): tree = BSTIterative() with open(filename, "r") as f: numbers = f.read().split() for num in numbers: tree.insert(int(num)) return tree def save_tree_to_file(tree, filename): values = tree.inorder() # sorted output with open(filename, "w") as f: f.write(" ".join(map(str, values)))

📝 Tasks: File Interaction

The numbers.txt file contains a list of numbers. Write code using the classes and functions defined above to: open the file, load the numbers into a tree structure, insert the value 20, delete the value 5, and save the modified tree to a file named "output.txt".

tree = load_tree_from_file("numbers.txt") print("Inorder:", tree.inorder()) tree.insert(20) tree.delete(5) save_tree_to_file(tree, "output.txt")

5. Non-Binary Hierarchies: Organization Charts

This implementation utilizes unique IDs for all operations and stores data in a flat table format (CSV). While we previously discussed binary trees, many real-world hierarchies, such as organizational charts, are non-binary. In these structures, a single parent node can have any number of children—for example, a director may have many more subordinates than a lower-level employee. To implement this, we use a list to store subordinates; this list is initially empty, and the add_employee() function is used to append new employees to a superior's subordinate list.

Since an organization's structure is generally persistent, we typically rebuild this tree from a file when the application starts. A sample org_chart.csv with 20 employees has been created for this exercise; the load_from_csv() and save_to_csv() functions below allow you to interact with it.

import csv import os class EmployeeNode: def __init__(self, emp_id, name, position): self.emp_id = str(emp_id) self.name = name self.position = position self.subordinates = [] def find_node(self, emp_id): """Find an employee by their unique ID.""" if self.emp_id == str(emp_id): return self for sub in self.subordinates: found = sub.find_node(emp_id) if found: return found return None def add_employee(self, emp_id, name, position, superior_id=None): """Adds a new employee under the specified superior's ID.""" new_employee = EmployeeNode(emp_id, name, position) if superior_id is None: self.subordinates.append(new_employee) return True superior = self.find_node(superior_id) if superior: superior.subordinates.append(new_employee) return True return False def remove_employee(self, emp_id): """Removes an employee and their subtree from the hierarchy.""" for i, sub in enumerate(self.subordinates): if sub.emp_id == str(emp_id): return self.subordinates.pop(i) removed = sub.remove_employee(emp_id) if removed: return removed return None def to_table(self, superior_id=""): """Flattens the tree into a list of lists (table format).""" rows = [[self.emp_id, self.name, self.position, superior_id]] for sub in self.subordinates: rows.extend(sub.to_table(self.emp_id)) return rows @staticmethod def from_table(rows): """Reconstructs the tree from table rows (list of lists).""" if not rows: return None if rows[0][0].lower() == 'id': rows = rows[1:] nodes = {row[0]: EmployeeNode(row[0], row[1], row[2]) for row in rows} root = None for row in rows: emp_id, _, _, superior_id = row if superior_id and superior_id in nodes: nodes[superior_id].subordinates.append(nodes[emp_id]) else: root = nodes[emp_id] return root def save_to_csv(self, filename): data = self.to_table() with open(filename, 'w', newline='') as f: writer = csv.writer(f) writer.writerow(["id", "name", "position", "superior_id"]) writer.writerows(data) @staticmethod def load_from_csv(filename): if not os.path.exists(filename): return None with open(filename, 'r') as f: reader = csv.reader(f) rows = list(reader) return EmployeeNode.from_table(rows) def display(self, level=0): indent = " " * level print(f"{indent}- [{self.emp_id}] {self.name} ({self.position})") for sub in self.subordinates: sub.display(level + 1)

📝 Tasks: Organization Chart

  • Count Team Size: Add a method get_team_size() to EmployeeNode that returns the total number of people in an employee's hierarchy (themselves + all descendants).
  • Find Path to CEO: Implement a function get_path_to_id(root, target_id) that returns a list of names from the root (CEO) down to the target employee.
  • File Interaction: Use EmployeeNode.load_from_csv("org_chart.csv") to load the organizational chart. Add a new employee under an existing superior, then save the modified hierarchy as my_org_chart.csv.

Solutions:

# Task 1: Add this to the class or use as external function def get_team_size(node): return 1 + sum(get_team_size(sub) for sub in node.subordinates) # Task 2 def get_path_to_id(current_node, target_id, path=None): if path is None: path = [] path.append(current_node.name) if current_node.emp_id == str(target_id): return path for sub in current_node.subordinates: result = get_path_to_id(sub, target_id, list(path)) if result: return result return None # Verification root = EmployeeNode("1", "Alice", "CEO") root.add_employee("2", "Bob", "CTO", "1") root.add_employee("3", "David", "Dev", "2") print(f"Total team size: {get_team_size(root)}") print(f"Path to David: {' -> '.join(get_path_to_id(root, '3'))}")

6. Tree Management with bigtree

bigtree is designed for unstructured trees, where each node can have an unlimited number of children (non-binary). It excels at path-based tree construction and provides seamless integration with Pandas for complex data operations.

from bigtree import dataframe_to_tree_by_relation, print_tree import pandas as pd df_org = pd.read_csv("org_chart.csv") # We map superior_id to name for a name-based tree id_to_name = dict(zip(df_org['id'], df_org['name'])) df_org['superior_name'] = df_org['superior_id'].map(id_to_name) root_org = dataframe_to_tree_by_relation( df_org, child_col="name", parent_col="superior_name" ) print("\nOrg Chart loaded via bigtree (mapped names):") print_tree(root_org)
from bigtree import dataframe_to_tree # Path-based construction data_paths = pd.DataFrame([ {"path": "Project/Source/main.py"}, {"path": "Project/Source/utils.py"}, {"path": "Project/Tests/test_main.py"} ]) root_paths = dataframe_to_tree(data_paths) print("Path-based tree:") print_tree(root_paths)

📝 Tasks: bigtree Library

  • Custom Data: Create a tree from paths representing a website menu (e.g., Home/Products/Software, Home/Products/Hardware, Home/Contact) and print it.
  • Find Node Depth: Use bigtree's built-in attributes (like node.depth) to find how deep the Hardware node is in your tree.

Solutions:

from bigtree import list_to_tree, find_path # Task 1 menu_paths = ["Home/Products/Software", "Home/Products/Hardware", "Home/Contact"] menu_root = list_to_tree(menu_paths) print_tree(menu_root) # Task 2 hardware_node = find_path(menu_root, "Home/Products/Hardware") print(f"Depth of Hardware: {hardware_node.depth}")

7. Practical Alternative: sortedcontainers

In Python, instead of trees, we often use sortedcontainers for high-performance, automatically sorted collections.

from sortedcontainers import SortedList # store (score, name) leaderboard = SortedList() leaderboard.add((100, "Alice")) leaderboard.add((200, "Bob")) leaderboard.add((150, "Charlie")) leaderboard.add((80, "Luke")) leaderboard.add((175, "Emma")) print(leaderboard) # sorted automatically # top 2 players top_2 = leaderboard[-2:] print(top_2) # scores between 120 and 200 result = [x for x in leaderboard if 120 <= x[0] <= 200] print(result)
from sortedcontainers import SortedDict prices = SortedDict({"TSLA": 180.50, "AAPL": 175.20, "MSFT": 420.10}) for ticker, price in prices.items(): print(f"{ticker}: ${price}")

📝 Tasks: sortedcontainers

  • High Scores: Use SortedList to store scores. Write code that adds 5 scores and then prints only the top 3 highest scores.
  • Time-based Log: Use SortedDict to store log messages where the key is an integer timestamp. Retrieve all logs between timestamp 100 and 200.

Solutions:

from sortedcontainers import SortedList, SortedDict # Task 1 scores = SortedList([450, 120, 890, 340, 670]) print(f"Top 3: {list(scores)[-3:]}") # Task 2 logs = SortedDict({50: "Start", 150: "Middle", 250: "End", 120: "Process"}) print("Logs between 100 and 200:") for ts in logs.irange(100, 200): print(f"[{ts}] {logs[ts]}")

8. Machine Learning Example: Decision Tree with scikit-learn

Decision trees are a type of supervised learning algorithm used for classification and regression. They use a tree-like model of decisions and their possible consequences.

# Note: Requires 'pip install scikit-learn' from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifier, export_text # Load the famous Iris dataset iris = load_iris() X, y = iris.data, iris.target clf = DecisionTreeClassifier(max_depth=3) clf.fit(X, y) # Export the 'tree' logic as text tree_rules = export_text(clf, feature_names=iris['feature_names']) print("Decision Tree logic for Iris dataset classification:") print(tree_rules)

Contact details:

+48 790-430-860

analysislessons@gmail.com