Ouyang et al., 1995 - Google Patents
Checkpointing and recovery for distributed shared memory applicationsOuyang et al., 1995
View PDF- Document ID
- 920960395938361119
- Author
- Ouyang J
- Heiser G
- Publication year
- Publication venue
- Proceedings of International Workshop on Object Orientation in Operating Systems
External Links
Snippet
The paper proposes an approach for adding fault tolerance, based on consistent checkpointing, to distributed shared memory applications. Two different mechanisms are presented to efficiently address the issue of message losses due to either site failures or …
- 238000011084 recovery 0 title abstract description 31
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant details of failing over
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2097—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1471—Saving, restoring, recovering or retrying involving logging of persistent data for recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1405—Saving, restoring, recovering or retrying at machine instruction level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/1438—Restarting or rejuvenating
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99951—File or database maintenance
- Y10S707/99952—Coherency, e.g. same view to multiple users
- Y10S707/99953—Recoverability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Application independent communication protocol aspects or techniques in packet data networks
- H04L69/40—Techniques for recovering from a failure of a protocol instance or entity, e.g. failover routines, service redundancy protocols, protocol state redundancy or protocol service redirection in case of a failure or disaster recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP1116115B1 (en) | Protocol for replicated servers | |
| Zhao et al. | Fault tolerance middleware for cloud computing | |
| Ladin et al. | Lazy replication: Exploiting the semantics of distributed services | |
| Guerraoui et al. | Software-based replication for fault tolerance | |
| Wang | Optimistic message logging for independent checkpointing in message-passing systems | |
| Babaoglu et al. | Understanding non-blocking atomic commitment | |
| Singh et al. | Antecedence graph approach to checkpointing for fault tolerance in mobile agent systems | |
| Cao et al. | Checkpointing with mutable checkpoints | |
| Zhao et al. | Low latency fault tolerance system | |
| Cao et al. | Checkpointing and rollback of wide-area distributed applications using mobile agents | |
| Babaoğlu | Fault-tolerant computing based on Mach | |
| Jalote | Fault tolerant processes | |
| Ouyang et al. | Checkpointing and recovery for distributed shared memory applications | |
| Mishra et al. | The Timewheel group communication system | |
| Chowdhury et al. | A consistent checkpointing-recovery protocol for minimal number of nodes in mobile computing system | |
| Issarny et al. | Efficient treatment of failures in RPC systems | |
| Mena et al. | A new look at atomic broadcast in the asynchronous crash-recovery model | |
| Ropars et al. | Active optimistic message logging for reliable execution of MPI applications | |
| Ye | Providing reliable web services through active replication | |
| Mishra et al. | The timewheel group membership protocol | |
| Jahanshahi et al. | Two new approaches for orphan detection | |
| Mitchell et al. | A non-blocking recovery algorithm for causal message logging | |
| Garg et al. | A review of checkpointing fault tolerance techniques in distributed mobile systems | |
| Kumar et al. | Hierarchical Non-blocking coordinated CHECKPOINTING algorithms for mobile distributed computing | |
| Maheshwari et al. | Supporting fault-tolerance in heterogeneous distributed applications |