[go: up one dir, main page]

CN116594809A - Distributed coding backup recovery system - Google Patents

Distributed coding backup recovery system Download PDF

Info

Publication number
CN116594809A
CN116594809A CN202310491812.XA CN202310491812A CN116594809A CN 116594809 A CN116594809 A CN 116594809A CN 202310491812 A CN202310491812 A CN 202310491812A CN 116594809 A CN116594809 A CN 116594809A
Authority
CN
China
Prior art keywords
detection unit
module
arbitration
repair
distributed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310491812.XA
Other languages
Chinese (zh)
Inventor
刘�东
赵彦钧
常清雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Huakun Zhenyu Intelligent Technology Co ltd
Original Assignee
Sichuan Huakun Zhenyu Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Huakun Zhenyu Intelligent Technology Co ltd filed Critical Sichuan Huakun Zhenyu Intelligent Technology Co ltd
Priority to CN202310491812.XA priority Critical patent/CN116594809A/en
Publication of CN116594809A publication Critical patent/CN116594809A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02WCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO WASTEWATER TREATMENT OR WASTE MANAGEMENT
    • Y02W90/00Enabling technologies or technologies with a potential or indirect contribution to greenhouse gas [GHG] emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention relates to a distributed coding backup recovery system and a method, which belong to the technical field of electric digital data processing and aim at real-time abnormality monitoring of the execution process of a storage server consisting of a failure detection module, a distributed arbitration module, a repair module and the like; and the abnormal accurate positioning can be realized through the orderly coordination operation among the repair result verification unit, the detection output detection unit, the arbitration input detection unit, the arbitration output detection unit and the repair input detection unit, and meanwhile, the long-time invalid operation of a plurality of detection units can be avoided, the occupation of the system operation space is reduced, and the like.

Description

Distributed coding backup recovery system
Technical Field
The invention belongs to the technical field of electric digital data processing, and particularly relates to a distributed coding backup recovery system and method.
Background
In wide area network data storage systems, a wide variety of backup and archiving systems have been implemented at different levels. Most backup and archiving systems primarily take into account disk failures or disk failures, etc., and do not take into account the impact of the data transmission link in the wide area network environment. The storage server for overcoming the defects generally comprises a failure detection module, a distributed arbitration module, a repair module and the like. When the failure detection module finds that a certain storage server fails, the repair module downloads image file copies which are the same as the image file copies stored by the failure storage server from other effective storage servers, stores the image file copies onto the alternative storage servers selected by the distributed arbitration module, and the alternative storage servers completely replace the failed storage server; namely, the detection result of the failure detection module triggers the distributed arbitration module, the distributed arbitration module triggers the repair module after selecting to finish the replacement storage server, and the repair module stores the target image file attachment to the replacement storage server.
However, at present, a corresponding execution anomaly monitoring scheme is not designed for the execution process, and an anomaly locating scheme when an anomaly occurs in the execution process is also lacking.
Therefore, a distributed code backup and restore system, a distributed code backup and restore method and a storage medium are needed to solve the above problems.
Disclosure of Invention
The invention aims to provide a distributed coding backup recovery system, a distributed coding backup recovery method and a storage medium, which are used for solving the technical problems in the prior art, monitoring abnormality in the execution process of a storage server consisting of a failure detection module, a distributed arbitration module, a repair module and the like, and realizing abnormality positioning.
In order to achieve the above purpose, the technical scheme of the invention is as follows:
the distributed coding backup recovery system comprises a failure detection module, a distributed arbitration module, a repair result verification unit, a detection output detection unit, an arbitration input detection unit, an arbitration output detection unit, a repair input detection unit and an operation control unit, wherein the failure detection module, the distributed arbitration module and the repair module sequentially execute related data transmission;
the operation control unit is respectively connected with the repair result verification unit, the detection output detection unit, the arbitration input detection unit, the arbitration output detection unit and the repair input detection unit;
the repair result verification unit is used for verifying whether the repair module successfully stores the target image file attachment in the alternative storage server;
the detection output detection unit is used for detecting whether the output data of the failure detection module is abnormal or not;
the arbitration input detection unit is used for detecting whether the input data of the distributed arbitration module is abnormal;
the arbitration output detection unit is used for detecting whether the output data of the distributed arbitration module is abnormal;
the repair input detection unit is used for detecting whether the input data of the repair module is abnormal or not;
the operation control unit is used for controlling the operation of the repair result verification unit, the detection output detection unit, the arbitration input detection unit, the arbitration output detection unit and the repair input detection unit.
Further, the operation control unit controls the operation state of the repair result verification unit to be normally open, and controls the operation states of the detection output detection unit, the arbitration input detection unit, the arbitration output detection unit and the repair input detection unit to be normally closed;
when the repair result verification unit verifies that the repair module does not successfully store the target image file attachment in the alternative storage server, the operation control unit controls the repair input detection unit to be started;
and if the repair input detection unit detects that the input data of the repair module is not abnormal, the operation control unit judges that the repair module fails.
Further, when the repair input detection unit detects that the input data of the repair module is abnormal, the operation control unit controls the arbitration output detection unit to be started;
and if the arbitration output detection unit detects that the output data of the distributed arbitration module is not abnormal, the operation control unit judges that the data transmission between the distributed arbitration module and the repair module is faulty.
Further, when the arbitration output detection unit detects that the output data of the distributed arbitration module is abnormal, the operation control unit controls the arbitration input detection unit to be started;
and if the arbitration input detection unit detects that the input data of the distributed arbitration module is not abnormal, the operation control unit judges that the distributed arbitration module is faulty.
Further, when the arbitration input detection unit detects that the input data of the distributed arbitration module is abnormal, the operation control unit controls the detection output detection unit to be started;
if the detection output detection unit detects that the output data of the failure detection module is not abnormal, the operation control unit judges that the data transmission between the failure detection module and the distributed arbitration module is faulty; and if the detection output detection unit detects that the output data of the failure detection module is abnormal, the operation control unit judges that the failure detection module is faulty.
Further, the system also comprises an abnormal feedback unit, wherein the abnormal feedback unit is connected with the operation control unit.
A distributed coding backup recovery method adopts the distributed coding backup recovery system to carry out distributed coding backup recovery.
A storage medium having stored thereon a computer program which when executed performs a distributed coded backup restoration method as described above.
Compared with the prior art, the invention has the following beneficial effects:
one of the beneficial effects of the scheme is that the real-time abnormality monitoring is carried out for the execution process of a storage server consisting of a failure detection module, a distributed arbitration module, a repair module and the like; and the abnormal accurate positioning can be realized through the orderly coordination operation among the repair result verification unit, the detection output detection unit, the arbitration input detection unit, the arbitration output detection unit and the repair input detection unit, and meanwhile, the long-time invalid operation of a plurality of detection units can be avoided, the occupation of the system operation space is reduced, and the like.
Drawings
Fig. 1 is a schematic system configuration diagram of the embodiment.
Fig. 2 is a schematic diagram of the system operation principle of the embodiment.
Detailed Description
For the purpose of making the technical solution and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and examples. It should be understood that the particular embodiments described herein are illustrative only and are not intended to limit the invention, i.e., the embodiments described are merely some, but not all, of the embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention. It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
As shown in fig. 1, a distributed code backup recovery system is provided, which includes a failure detection module, a distributed arbitration module, a repair result verification unit, a detection output detection unit, an arbitration input detection unit, an arbitration output detection unit, a repair input detection unit, and an operation control unit, wherein the failure detection module, the distributed arbitration module, and the repair module sequentially execute related data transmission;
the operation control unit is respectively connected with the repair result verification unit, the detection output detection unit, the arbitration input detection unit, the arbitration output detection unit and the repair input detection unit;
the repair result verification unit is used for verifying whether the repair module successfully stores the target image file attachment in the alternative storage server;
the detection output detection unit is used for detecting whether the output data of the failure detection module is abnormal or not;
the arbitration input detection unit is used for detecting whether the input data of the distributed arbitration module is abnormal;
the arbitration output detection unit is used for detecting whether the output data of the distributed arbitration module is abnormal;
the repair input detection unit is used for detecting whether the input data of the repair module is abnormal or not;
the operation control unit is used for controlling the operation of the repair result verification unit, the detection output detection unit, the arbitration input detection unit, the arbitration output detection unit and the repair input detection unit.
Further, as shown in fig. 2, the operation control unit controls the operation state of the repair result verification unit to be normally open, and controls the operation states of the detection output detection unit, the arbitration input detection unit, the arbitration output detection unit and the repair input detection unit to be normally closed;
when the repair result verification unit verifies that the repair module does not successfully store the target image file attachment in the alternative storage server, the operation control unit controls the repair input detection unit to be started;
and if the repair input detection unit detects that the input data of the repair module is not abnormal, the operation control unit judges that the repair module fails.
Further, when the repair input detection unit detects that the input data of the repair module is abnormal, the operation control unit controls the arbitration output detection unit to be started;
and if the arbitration output detection unit detects that the output data of the distributed arbitration module is not abnormal, the operation control unit judges that the data transmission between the distributed arbitration module and the repair module is faulty.
Further, when the arbitration output detection unit detects that the output data of the distributed arbitration module is abnormal, the operation control unit controls the arbitration input detection unit to be started;
and if the arbitration input detection unit detects that the input data of the distributed arbitration module is not abnormal, the operation control unit judges that the distributed arbitration module is faulty.
Further, when the arbitration input detection unit detects that the input data of the distributed arbitration module is abnormal, the operation control unit controls the detection output detection unit to be started;
if the detection output detection unit detects that the output data of the failure detection module is not abnormal, the operation control unit judges that the data transmission between the failure detection module and the distributed arbitration module is faulty; and if the detection output detection unit detects that the output data of the failure detection module is abnormal, the operation control unit judges that the failure detection module is faulty.
In the scheme, the real-time abnormality monitoring is carried out on the execution process of the storage server consisting of the failure detection module, the distributed arbitration module, the repair module and the like; and the abnormal accurate positioning can be realized through the orderly coordination operation among the repair result verification unit, the detection output detection unit, the arbitration input detection unit, the arbitration output detection unit and the repair input detection unit, and meanwhile, the long-time invalid operation of a plurality of detection units can be avoided, the occupation of the system operation space is reduced, and the like.
Further, the system also comprises an abnormal feedback unit, wherein the abnormal feedback unit is connected with the operation control unit and can perform corresponding abnormal feedback for each fault.
A distributed coding backup recovery method adopts the distributed coding backup recovery system to carry out distributed coding backup recovery.
A storage medium having stored thereon a computer program which when executed performs a distributed coded backup restoration method as described above.
The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.

Claims (8)

1. The distributed coding backup recovery system comprises a failure detection module, a distributed arbitration module and a repair module, wherein the failure detection module, the distributed arbitration module and the repair module sequentially execute related data transmission, and the distributed coding backup recovery system is characterized by further comprising a repair result verification unit, a detection output detection unit, an arbitration input detection unit, an arbitration output detection unit, a repair input detection unit and an operation control unit;
the operation control unit is respectively connected with the repair result verification unit, the detection output detection unit, the arbitration input detection unit, the arbitration output detection unit and the repair input detection unit;
the repair result verification unit is used for verifying whether the repair module successfully stores the target image file attachment in the alternative storage server;
the detection output detection unit is used for detecting whether the output data of the failure detection module is abnormal or not;
the arbitration input detection unit is used for detecting whether the input data of the distributed arbitration module is abnormal;
the arbitration output detection unit is used for detecting whether the output data of the distributed arbitration module is abnormal;
the repair input detection unit is used for detecting whether the input data of the repair module is abnormal or not;
the operation control unit is used for controlling the operation of the repair result verification unit, the detection output detection unit, the arbitration input detection unit, the arbitration output detection unit and the repair input detection unit.
2. The distributed code backup and restoration system according to claim 1, wherein the operation control unit controls the operation state of the restoration result verification unit to be normally open, and controls the operation states of the detection output detection unit, the arbitration input detection unit, the arbitration output detection unit, and the restoration input detection unit to be normally closed;
when the repair result verification unit verifies that the repair module does not successfully store the target image file attachment in the alternative storage server, the operation control unit controls the repair input detection unit to be started;
and if the repair input detection unit detects that the input data of the repair module is not abnormal, the operation control unit judges that the repair module fails.
3. The distributed code backup and restore system according to claim 2, wherein when the repair input detection unit detects that the input data of the repair module is abnormal, the operation control unit controls the arbitration output detection unit to be turned on;
and if the arbitration output detection unit detects that the output data of the distributed arbitration module is not abnormal, the operation control unit judges that the data transmission between the distributed arbitration module and the repair module is faulty.
4. A distributed code backup restoration system according to claim 3, wherein when said arbitration output detection unit detects that the output data of said distributed arbitration module is abnormal, said operation control unit controls said arbitration input detection unit to be turned on;
and if the arbitration input detection unit detects that the input data of the distributed arbitration module is not abnormal, the operation control unit judges that the distributed arbitration module is faulty.
5. The distributed backup and restore system according to claim 4, wherein when the arbitration input detection unit detects that the input data of the distributed arbitration module is abnormal, the operation control unit controls the detection output detection unit to be turned on;
if the detection output detection unit detects that the output data of the failure detection module is not abnormal, the operation control unit judges that the data transmission between the failure detection module and the distributed arbitration module is faulty; and if the detection output detection unit detects that the output data of the failure detection module is abnormal, the operation control unit judges that the failure detection module is faulty.
6. The distributed backup and restore system according to claim 5, further comprising an anomaly feedback unit, wherein the anomaly feedback unit is connected to the operation control unit.
7. A distributed code backup recovery method, wherein a distributed code backup recovery system as claimed in any one of claims 1 to 6 is used for distributed code backup recovery.
8. A storage medium having a computer program stored thereon, which when executed performs a distributed coded backup restoration method as claimed in claim 7.
CN202310491812.XA 2023-04-28 2023-04-28 Distributed coding backup recovery system Pending CN116594809A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310491812.XA CN116594809A (en) 2023-04-28 2023-04-28 Distributed coding backup recovery system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310491812.XA CN116594809A (en) 2023-04-28 2023-04-28 Distributed coding backup recovery system

Publications (1)

Publication Number Publication Date
CN116594809A true CN116594809A (en) 2023-08-15

Family

ID=87600009

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310491812.XA Pending CN116594809A (en) 2023-04-28 2023-04-28 Distributed coding backup recovery system

Country Status (1)

Country Link
CN (1) CN116594809A (en)

Similar Documents

Publication Publication Date Title
US6678639B2 (en) Automated problem identification system
US9274902B1 (en) Distributed computing fault management
US6785838B2 (en) Method and apparatus for recovering from failure of a mirrored boot device
US8930750B2 (en) Systems and methods for preventing data loss
CN110865907B (en) Method and system for providing service redundancy between master server and slave server
CN1016827B (en) Fault-safe, high-efficiency multiprocessor central control unit operation method for switching systems
US10114356B2 (en) Method and apparatus for controlling a physical unit in an automation system
US20080082630A1 (en) System and method of fault tolerant reconciliation for control card redundancy
US7373542B2 (en) Automatic startup of a cluster system after occurrence of a recoverable error
EP0976041B1 (en) Detecting memory problems in computers
WO2015045122A1 (en) Storage device, storage system, and data management method
JP2007004793A (en) Method and device for measuring code coverage for embedded processor system
CN116594809A (en) Distributed coding backup recovery system
CN119127254A (en) Hard disk firmware upgrade method, device, electronic device and storage medium
US7533297B2 (en) Fault isolation in a microcontroller based computer
US8230261B2 (en) Field replaceable unit acquittal policy
CN113986308B (en) A method and system for online firmware upgrade
JP4592511B2 (en) IP network server backup system
Yang et al. A checkpoint scheme with task duplication considering transient and permanent faults
CN117873408B (en) Cloud printer data recovery method and related device
US20080133440A1 (en) System, method and program for determining which parts of a product to replace
CN113778753B (en) Method, device, equipment and medium for automatically correcting database after storage recovery
CN120508748A (en) Fault preprocessing method, device, equipment and storage medium
CN116610495A (en) Database exception recovery method, storage medium and device
CN117192957A (en) Control method and system for multi-level redundancy fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination