1.1 Distributed Systems
The basic requirements from a distributed system are that the nodes should be autonomous so that they can work independently; the network should be connected, that is, any node should have a communication link directly or indirectly to any other node; and there should be a coordination mechanism for the nodes to cooperate to achieve common goals.
There are a number of benefits to be gained by utilizing distributed systems. One of the obvious advantages of using a distributed system is resource sharing . Access to a central resource has two disadvantages as this central site becomes a bottleneck for communications and also is a single point of failure. Distributing the resources such as the database and peripherals over a network overcomes these problems.
Resources and computation can be replicated at various sites providing fault tolerance as a replica may be substituted in the case of the dysfunctioning of a node. This type of fault tolerance is an important reason to employ distributed systems. It is also possible for the application to be inherently distributed such as bank transaction systems and airline reservation systems where employment of distributed systems is inevitable.
A distributed system can be modeled as a graph G ( V , E ) conveniently where V is the set of vertices and E is the set of edges of G . The computing nodes of the distributed system are represented by the vertices of the graph, and an edge exists between the nodes if there is a communication link between them. Figure displays a graph that represents a distributed system consisting of nodes numbered 1,,10. The first thing that may be noticed is that the graph is connected, providing a communication path between any pair of nodes. Many nodes are not directly connected to each other; therefore, they have to rely on their neighbor nodes to communicate with the other nodes of the network.
Fig. 1.1
A graph representing a distributed system
We will use graphs to represent distributed systems and show the execution of a distributed algorithm in these graphs frequently. In this chapter, we will first describe platforms and models for distributed computing in Sects.. Finally, we conclude by the organization of the book.
1.2 Distributed Computing Platforms
Due to the recent technological advancements, in the last few decades, we have witnessed diverse distributed system platforms such as the Grid, The Cloud, mobile ad hoc networks, and wireless sensor networks that are described below.
1.2.1 The Grid
The Grid consists of loosely coupled, heterogeneous, and geographically dispersed computing elements that are connected by a network acting together to perform large tasks []. These computationally intensive scientific tasks may include various applications such as seismic analysis, drug discovery, and bioinformatics problems. Grid computing provides effective usage of the unused processing power and results in decreased completion time for a task due to parallelization.
The size of a grid varies from a small network of workstations in a corporation to thousands of nodes across many networks and nations. Grids require general software libraries called the middleware to accomplish coordination among a large number of nodes that comprise them. Resource discovery is the process of finding the location of the required resources such as the database tables in the Grid [] is prototyping a computational grid for infrastructure and an access grid for people.
1.2.2 Cloud Computing
The cloud computing evolved from grid computing with the aim to deliver the computing as a service to the users by extending the object-oriented programming paradigm. Cloud computing provides computation, software applications, data access, data management, and storage for resources without requiring cloud users to know the location and other details of the computing infrastructure []. Grid computing may be included in the cloud or not depending on the type of application and users. Cloud computing and grid computing aim at scalability, and both use load balancing to accomplish scalability. In grid computing, a single task is divided into smaller tasks that are run on a number of processors to effectively use the available computing power, whereas in cloud computing, service offered to users is not restricted to processing power and includes website hosting, database support, etc. Cloud computing, in general, offers more services than the Grid.
1.2.3 Mobile Ad hoc Networks
A wireless ad hoc network is a decentralized network consisting of wireless nodes that do not rely on a predefined infrastructure such as routers or access points. Instead, each node participates in routing by forwarding data to other nodes regarding dynamically changing network topology. A mobile ad hoc network ( MANET ) is a network without any fixed structure formed for a purpose by mobile devices connected by wireless communication links. Each node of a MANET moves independently, forming a dynamic network that changes its topology continuously. Nodes of a MANET must be able to route any messages not destined to them; therefore, each node functions as a router. Examples of MANETs are the disaster relief operations, military networks, and vehicular ad hoc networks.
1.2.4 Wireless Sensor Networks
A wireless sensor network ( WSN ) consists of many small nodes of computing elements, each equipped with sensing and wireless communication capabilities. These networks can obtain data about their environment and transfer this data to a central node using multi-hop communication to be analyzed further. The WSNs have large application spectrum such as habitat monitoring, military surveillance, and target tracking []. WSNs form a large-scale distributed system and require scalable distributed algorithms to solve problems such as data aggregation, topology control, and routing.
1.3 Models
The basic models of a distributed system are the message passing and shared-memory models. In the message passing model, nodes of the distributed system communicate by messages only. Messages are communicated in rounds in synchronous message passing, where messages sent in round k are delivered to all recipients before messages in round k +1 can be transferred. In asynchronous message passing, however, messages are assumed to eventually reach the destinations after unknown delays. Analyzing asynchronous message passing algorithms is more difficult than synchronous ones due to the uncertainties involved.
In shared-memory models, processes communicate by reading and writing to shared memory. Synchronization is an important issue also in shared-memory systems. Distributed shared-memory systems implement shared memory model over the message passing model to use the available shared memory software modules conveniently. Our analysis in this book is confined to message-passing distributed systems without any shared memory in general, except for some self-stabilizing algorithms, where it will be assumed that a process can read the values of the registers of its neighbors.