The repository for the DS project.
Authors:
- Gradle
- Java
- Clone the repository:
git clone https://github.com/LeonardoDeFaveri/DistributedSystemsProject - Enter the folder with the terminal:
cd DistributedSystemsProject - Run the application with gradle (if gradle is in the environment variables):
gradle run
- Both replicas and clients hold a list of all the replicas in the system:
- Replicas need to know which are the other replicas
- Clients pick the replica to contact from that list. Each client has a favorite replica and keeps contacting it until it crashes. When that happens, it picks another one
- Initially the coordinator is the first replica created
- Replicas keep track of the last time they heard anythig from the coordinator. HearbetMsg and any other kind of message received from the coordinator makes them reset this time of last contact
- Start the system with
gradle runReplicas and clients are created - Press
ENTERto send a start signal to all hosts:- Clients begin sending requests to replicas
- Replicas begin expecting heartbeat messages (or ay kind of message) from the coordinator
- Coordinator begin sending heartbeat messages to replicas
- Press
ENTERagain to send a stop signal to clients forcing them to stop producing more requests - Press
ENTERa third time to terminate the system
Crash detection works by means of periodic checks of arrival of messages.
-
Read requests:
ReadMsgEach clients identifies univocally a read request by means of an incrementing index that's placed withing the request itself. Upon sending the request to a replica, the client registers the sent request into
readMsgs(maps read indexes to replicas) and starts a timeout (READOK_TIMEOUT). When the timeout expires aReadOkReceivedMsgis sent by the client to itself.A client expects to receive a
ReadOkMsgfrom the contacted replica and that message should contain the read value and the Id of the served read request. On arrival, the client removes fromreadMsgsthe request with the ID found in theReadOkMsg. AReadOkReceivedMsgalso contains the ID of the associated request and the handler checks ifreadMsgsholds a value. If that's the case, then it means that noReadOkMsghas been received so the replica found in the map associated to that ID is presumed to be crashed and removed from the list of active replicas. Otherwise, if the map has no value it means that the request has been served and the replica was fine up to that point. -
Update requests:
UpdateRequestMsgAn update request is identified by the
ActorRefof the client and another incrementing ID. Such pair is handled by theUpdateRequestIdclass. As for read requests, when a client sends un update to a replica, it registers the request and the destination replica in a map (writeMsgs) using the ID as key and starts a timeout (UpdateRequestOkReceivedMsg).The replica, once the request has been served, responds with an
UpdateRequestOkMsgholding the index of the served request. On arrival, the index found is used to remove the associated value fromwriteMsgsand on arrival of the correspondingUpdateRequestOkMsgthe same check done for read requests is performed. So, a no-value means that the request has been served and otherwise that the contacted replica has taken too long to answer and is identified as crashed.On replica side, when an update request is received the ID (pair
<client, index>) is saved in the setupdateRequests. This ID is also put into all associatedWriteMsgs andWriteOkMsgs. When the update protocol terminates and a replica receives aWriteOkMSg, it takes the update request ID carried by the message and checks if it is present intoupdateRequests. If that's the case, it means that this replica is the one that has received the request from the client and thus it has to send back an ACK, namely anUpdateRequestOkMsgto that client. This message holds the local update request index of that client.
-
On update requests:
UpdateRequestMsgWhen a replica that's not the coordinator receives an update request, forwards it to the coordinator and expects to recive back a
WriteMsg. So, upon forwarding the request the replica start a timeout (WRITEMSG_TIMEOUT) and registers the ID of the request into thependingUpdateRequestsset.When the coordinator sends a
WriteMsgit embeds in it theupdateRequestIdof the update request that's being served. Upon reception of this message by the coordinator, a replica removes theupdateRequestIdfrompendingUpdateRequestsand adds theWriteIdof the message intowriteRequeststo register the serving of that write request.When
WRITEMSG_TIMEOUTexpires aWriteMsgReceivedMsgis sent by the replica to itself and if the expectedWriteMsghas not been received, so ifpendingUpdateRequestshasupdateRequestIdin it, the coordinator is considered to be crashed. Alive, otherwise.On coordinator side, when an
UpdateRequestMsgis received, the associationWriteId->UpdateRequestId,WriteIdbeing a class representing the pair<epoch, write_index>, is saved intowritesToUpatesmaps and is later used to build theWriteOkMsgs. -
On write message:
WriteMsgWhen a replica receives a
WriteMsgit sends an ACK back to the coordinator and expects to receive aWriteOkMsgin response. So, again a timer (WRITEOK_TIMEOUT) is set and aWriteOkReceivedMSgis sent by a replica to itself at expiration. On receipt of aWriteOk, the associatedWriteIdis removed fromwriteRequestsand on receipt ofWriteOkRecivedMsgthe same kind of cheks done for the othersReceivedMsgs is done. So, ifwriteRequestsstill holdsWriteId, the request has not been served yet, and the coordinator is set to crashed.On coordinator side, when enough ACKs are received for a request and the corresponing
WriteOkMsgcan be sent, theWriteIdassociated by current values forepochandcurrentWriteToAckis used to retrive fromwritesToUpdatestoUpdateRequestIdof originating request and such ID is put into theWriteOk.
-
On ACK message:
WriteAckMsgThanks to the assumption on the minimum available number of a replicas, no crash check needs to be perfomed by coordinator side on recipt of ACK messages.