Distributed Task Execution and Graph Processing using Apache ZooKeeper
🔗View on GitHubZooKeeper-based distributed coordination system supporting dynamic task assignment, variable graph sizes, and visual result output.
Overview
Designed and implemented a distributed computing framework leveraging Apache ZooKeeper for synchronized task coordination and fault-tolerant execution. The system supports dynamic task scaling, graph-based computation, and visual output generation. Each worker node registers with ZooKeeper and executes tasks assigned by the master node, maintaining consistency through ephemeral znodes. Introduced visualization and scalability features for analyzing performance across varying graph sizes and workloads.
System Architecture
ZooKeeper-Based Master-Worker Architecture: ZooKeeper ensemble provides coordination and fault tolerance. Master node creates task znodes in /tasks path. Worker nodes register as ephemeral children under /workers. Workers use Watchers to detect new task assignments. Task execution model: workers claim tasks atomically via CAS (Compare-And-Set) operations. Graph processing: tasks construct graph from input, perform BFS/DFS traversals, aggregate results. Visualization: computed results written to JSON; D3.js frontend for interactive graph rendering. Scalability: supports 10-100+ workers; tested with graphs 100-10,000 nodes. Fault tolerance: task reassignment on worker failure via ZooKeeper watches.
Setup & Implementation
Configured ZooKeeper ensemble and launched master and worker nodes on distributed hosts. The master creates znodes representing tasks; workers monitor and claim available tasks via watchers for real-time updates. Implemented modular graph task logic with variable size input graphs to analyze performance. Added graphical output to visualize task completion and workload distribution across nodes, ensuring robust coordination and synchronization under concurrent workloads.