Distributed Inverted Indexing System with Hierarchical Dispatch and Mobile Agents

🔗View on GitHub

A Java-based distributed indexing framework implementing hierarchical dispatch and mobile agent features for scalable and fault-tolerant data processing.

🔗 Distributed Systems

Overview

Built an advanced distributed inverted indexing system in Java to enable parallel text analysis and search across multiple hosts. Implemented three execution modes — Local, Each, and Remote — to support both centralized and distributed indexing. Introduced hierarchical dispatch where a parent agent (HierarchicalIndexingAgent) distributes indexing tasks to multiple child agents (ChildIndexingAgent) across servers, aggregates results using a shared Hazelcast map, and synchronizes collection after completion. Added mobile agent functionality to migrate agents between hosts using Place-based deployment, allowing remote execution and result retrieval through shared storage. The system demonstrates autonomous distributed execution, concurrent processing, and coordinated result merging.

System Architecture

Hierarchical Agent-Based Architecture: ParentIndexingAgent splits large text documents and creates ChildIndexingAgent tasks. Tasks distributed via agent migration (JSCH SSH) to worker nodes based on load balancing. Each ChildAgent performs local inverted indexing using HashMap<Term, List<DocID>>. Results stored in shared Hazelcast Distributed Map for aggregation. Synchronization barrier waits for all children completion. Final ParentAgent merges child results using reduce operations. Fault tolerance: dead agents detected via heartbeat; tasks reassigned to healthy nodes. Three execution modes: Local (sequential), Each (one task per worker), Remote (dynamic distribution).

Setup & Implementation

Deploy Hazelcast instances across multiple hosts to establish a distributed shared memory. Run the parent HierarchicalIndexingAgent locally to dispatch child agents to remote servers. Configure hostnames and input text datasets in the configuration file. Each child performs local indexing and stores its results in the shared Hazelcast map. The parent then collects and merges the results to form a global inverted index. The mobile agent feature can alternatively deploy agents to remote hosts via Place for single-node execution and retrieval, supporting modular scalability and distributed coordination.

Technologies Used

JavaHazelcastRMIJSCHMultithreadingConcurrentHashMapDistributed Systems

Explore More Projects

View All Projects →