4.3 Construcción del caso
4.3.5 La anulación retroactiva como defensa
The TxStore class provides a resilient store that supports distributed transactions and ensures data consistency and availability in the presence of non-catastrophic failures. It supports both global synchronous recovery (in the same way applied in PlaceLocalStore) and local asynchronous recovery. Table5.4lists the store’s APIs.
Table 5.4: The TxStore APIs.
Class Function Returns
TxStore[K,V]
make(pg:PlaceGroup, asyncRec:Boolean, cc:String, work:(TxStore[K,V])=>void) TxStore[K,V]
activePlaces() PlaceGroup recover(changes:ChangeDescription) void makeTransaction() Tx[K,V] executeTransaction(closure:(Tx[K,V])=>void) void Tx[K,V] get(key:Key) V
set(key:K, value:V) void
The program creates an instance ofTxStoreby callingTxStore.makeand provid- ing the initialization parameters. The first parameterpgis the initial group of active places, the second parameterasyncRecstates whether asynchronous recovery should be used, the third parameter cc states the chosen concurrency control mechanism (see Section5.2.3.3), and the fourth parameterworkdefines the work to be done by a new joining place (used in asynchronous recovery only). We will show how thework closure is invoked during the recovery in Listing5.3-Line29.
Each active place receives a copy of the activePlaces group at start-up time. The methodactivePlaces()evaluates locally and returns the activePlaces known to the current place at the time of invocation. When asynchronous recovery is used, updates to the active places occur asynchronously and are not reflected at all places at the same time. Thus, calling activePlaces() at different places may return different values depending on what updates each place discovered.
When synchronous recovery is used, place failure events are reported to the application through exceptions. The application may react to these failures by call- ing recover(changes) to initiate a global synchronous recovery in the same way described in Section5.3.1.1.
Transaction Processing
All read and write operations on the store are invoked by transactions. The function makeTransaction creates a new transaction of type Tx. The executeTransaction function executes a closure representing the body of a transaction within a trans- actional finish scope. Although it is acceptable for the application to execute the transaction directly,executeTransactionprovides the advantage of handling trans- action exceptions transparently. It captures errors raised during execution, such as ConflictExceptionsandDeadPlaceExceptions, and consequently restarts the exe- cution until it succeeds or a catastrophic failure occurs.
The Txclass provides the functionsgetandsetfor reading and writing resilient data at the calling place. Expanding the transaction scope to other places can be done using the asyncAtfunction, which identifies the target place using its virtual id. The following code uses the transaction store APIs to perform a bank transaction by moving $100 from account A located at place 1, to account B located at place 2. Using the virtual place identifiers enableexecuteTransactionto seamlessly execute the transaction even if the physical places change during execution.
1 def bank(store:TxStore[String,Long]) { 2 val tx = store.makeTransaction();
3 store.executeTransaction((tx:Tx[String,Long]) => {
4 tx.asyncAt( 1, ()=> { tx.set ("A", tx.get("A") - 100 ) }); 5 tx.asyncAt( 2, ()=> { tx.set ("B", tx.get("B") + 100 ) });
6 });
§5.3 Resilient Application Frameworks 131
Asynchronous Recovery
When a TxStoreis created, an instance of it is registered in the runtime system at every place (active and spare). Normally in RX10, the network layer notifies the runtime layer when a place fails. We modified the notification handler to not only recover the control flow by updating the finish objects, but to also notify the local instance of the transactional store of the failure. If the TxStore is configured to recover asynchronously and the failed place is the current place’s slave (i.e. its right neighbor), the current place immediately starts the recovery protocol concurrently with other place tasks.
While the recovery protocol is executing, read and write actions at places that were not impacted by the failure proceed normally. However, actions targeted to the dead place and/or its two direct neighbors (its master place at the left and its slave place at the right) may temporarily fail until the recovery completes.
Replica Migratability
A challenging aspect of asynchronous recovery is how to handle data migration concurrently with the store’s read/write actions. This challenge is avoided in the PlaceLocalStoreby requiring the program to stop using the store while it is being recovered. Because no write actions are performed on any of the replicas during recovery, all the replicas are in a migratable state and can be safely copied to the spare places, as shown in Listing 5.2. However, in theTxStore, the replicas at the neighboring places of the dead place may not be in a migratable state immediately after the failure.
The master of the dead place can be in the middle of handling read/write trans- actions either as a transaction coordinator or a transaction participant. Some of these transactions may have passed the prepare phase of the2PCprotocol and are waiting for the finalCOMMITorABORTmessage. The master has no choice but to wait for these transactions to terminate. It cannot abort them in order to start the recovery, because the2PCprotocol does not permit a place to abort a transaction after it has voted to commit it. For transactions that did not yet receive thePREPAREmessage, the master has the choice to abort them or allow them to terminate.
On the other hand, the slave of the dead place maintains prepare-logs for the transactions of its master that are prepared to commit. When the master dies, some of the transaction logs may be in transit to the slave place and will eventually be received by it. Unfortunately, the slave cannot independently determine whether to commit or abort these pending transactions. The slave, therefore, has no choice but to wait for these transactions to terminate by receiving a COMMIT orABORTmessage from their coordinators. Because the transaction coordinator is always available (see Section5.2.3.5), the slave will eventually receive these required messages.
The state diagrams in Figure5.10describe how a master or a slave place make a transition from a non-migratable state to a migratable state.
Active Paused Read Only
Dead
Pending Tx No Pending Tx
Dead
a) master states b) slave states
Figure 5.10: Master replica and slave replica state diagrams. A solid line means a non- migratable state, and a dotted line means a migratable state.
A master place can be in one of four states:
1. Active: the default state in which all actions are accepted normally.
2. Paused: a temporary state for preparing the master to be migratable. In this state, the master aims to finalize any transactions prepared to commit and pauses accepting write transactions temporarily. It accepts COMMIT and ABORT messages for prepared transactions and aborts any non-prepared write trans- actions. Read-only transactions are kept active unless they attempt to upgrate from read-only to write; in that case, they will be aborted.
3. Read-only: no prepared write transaction is pending, and only read actions are permitted on the master replica. The read-only state is the only migratable state for a master place.
4. Dead: the master has died. We assume the master can reach a dead state from the active state only, because the other states imply that the slave is also dead. A failure of both a master and its slave is unrecoverable in our system.
A slave place can be in one of three states:
1. No pending transactions: the slave does not hold prepare-logs for any transac- tion. It is the only migratable state for a slave place.
2. Pending transactions: the slave holds pending prepare-logs and is expecting COMMIT/ABORTmessages regarding them from the transaction coordinators. 3. Dead: the slave has died. It is reachable from the two other states.
Note that the waiting time for a transaction to terminate is limited due to the transaction termination guarantee described in Section5.2.3.5.
Replication Recovery Using Finish
In Listing5.3, we outline the pseudocode for the asynchronous recovery procedure. Three places are involved in the recovery of a place: its master (left neighbor), its slave (right neighbor), and the spare place that will replace it.
§5.3 Resilient Application Frameworks 133
The entire recovery procedure is performed using onefinishblock (Lines7–26) that controls three tasks. The first task is sent to the slave place (Lines9–12); it enters a busy loop waiting to reach the migratable state “no pending transactions” (Line10), after that, it sends a copy of its data to the spare place (Line 11). The slave place will not receive new pending requests until after the spare place is fully recovered, therefore, the slave place is automatically paused. The second task is created at the master place (Lines14–19); it starts by pausing the master replica (Line15) to prepare it to reach the read-only state. It waits until the read-only state is reached (Line16), before sending a copy of the master replica to the spare place (Line 17). Finally, it activates the master replica to continue processing the transactions normally (Line18). The third task is sent to the spare place (Lines 21–25); it aims to activate the spare place after ensuring that it received the replicas successfully from the master and slave places. Since it knows their identities, it can break its busy waiting and throw a DeadPlaceExceptionif it detects that either the master or the slave has died.
Listing 5.3: TxStore asynchronous recovery. 1 /*called by the master of the dead place*/
2 def recover(deadPlace:Place) { 3 val left = here;
4 val right = store.activePlaces().next(deadPlace); 5 assert !right.isDead();
6 val spare = store.allocateSpare(); 7 finish {
8 //create the spare’s master replica
9 at (right) async {
10 slave.waitUntilMigratable();
11 slave.copyToMaster(spare);
12 }
13 //create the spare’s slave replica
14 at (left) async { //local task (left = here)
15 master.pause();
16 master.waitUntilMigratable();
17 master.copyToSlave(spare);
18 master.activate();
19 }
20 //activate the spare after receiving its replicas
21 at (spare) async { 22 slave.waitForReplicaFrom(left); 23 master.waitForReplicaFrom(right); 24 master.activate(); 25 } 26 } 27
28 //assign work to the spare place
29 at (spare) @Uncounted async { store.work(store); } 30 }
After the spare place has been initialized, it can participate in the computation by performing a user-defined task. The user provides this task in theworkclosure given to the store at construction time (see the parameters of TxStore.makein Table5.4). The recover function invokes the work closure asynchronously at the spare place (Line29), only after thefinishblock terminates successfully.
The @Uncounted annotation, used in Line 29, makes the spawned async not tracked by any finish. However, the application can programmatically integrate thisasyncwith existing application tasks, for example, by exchanging certain notifi- cation messages. We will demonstrate this mechanism in the context of the parallel workers framework in Section5.3.3.
Handshaking
When a spare place takes the role of a dead place, the other places need to recognize this change. A simple mechanism to achieve this goal is to perform a global handshak- ing operation from the spare to all other places. However, eagerly notifying all places not only slows down the store’s recovery, but may be unnecessary for some places that do not need to communicate with that place. For this reason, we implemented an on-demand handshaking mechanism that works as follows: when a place detects the failure of another place, it queries its master for the physical identity of the dead place given its virtual id and updates its local activePlaces group accordingly. If the master of the dead place is also dead, a catastrophic failure is detected, and the entire application terminates.
Summary
The TxStore provides a simple programming interface for the application to store critical application data that require atomic multi-place updates. The asynchronous recovery mechanism minimizes the impact of failures to only the places in direct communication with the dead place. Transactions targeted to the dead place and only write transactions targeted to the master of the dead place are temporarily aborted. By retrying these transactions, they will eventually succeed after the dead place has been recovered. Spare places can automatically join the computation without global synchronization with other places. In Section5.3.3, we describe the parallel workers application framework that uses theTxStoreas its foundation for data resilience.