[go: up one dir, main page]

CN120186171A - An object storage backup system - Google Patents

An object storage backup system Download PDF

Info

Publication number
CN120186171A
CN120186171A CN202510261575.7A CN202510261575A CN120186171A CN 120186171 A CN120186171 A CN 120186171A CN 202510261575 A CN202510261575 A CN 202510261575A CN 120186171 A CN120186171 A CN 120186171A
Authority
CN
China
Prior art keywords
node
target
information
key value
object storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510261575.7A
Other languages
Chinese (zh)
Inventor
杨杰
陈勇铨
胡军擎
周华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Information2 Software Inc
Original Assignee
Shanghai Information2 Software Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Information2 Software Inc filed Critical Shanghai Information2 Software Inc
Priority to CN202510261575.7A priority Critical patent/CN120186171A/en
Publication of CN120186171A publication Critical patent/CN120186171A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses an object storage backup system. The method comprises the steps of storing load information representing the operation condition of service nodes into a distributed key value storage module, sending survival signals to the distributed key value storage module based on the survival signal sending module, sending node abnormal information to each health node based on all the received survival signals, responding to the received node abnormal information by the health node for each health node and the node identification of the health node, and continuously executing a first object storage backup task which is not completed by the abnormal node under the condition that the target identification of the target health node with the smallest load in all the health nodes is determined to be the node identification based on all the read load information. The technical scheme of the embodiment of the invention can improve the high availability of disaster backup service of object storage.

Description

Object storage backup system
Technical Field
The embodiment of the invention relates to the field of data storage, in particular to an object storage backup system.
Background
With the development of cloud computing, big data and the like, more data need to be stored, but when the traditional file system mode faces mass data, the data are difficult to bear, and the object storage can easily cope with the demands.
Therefore, the use of object storage is more and more common, and meanwhile, various private clouds, public clouds and mixed clouds are rising, so that the variety of object storage is various, and the use scene is wider. With this, disaster recovery business from object storage to object storage is increasing.
However, in the disaster recovery process, if the service node for performing the disaster recovery service is abnormal, the disaster recovery service is interrupted, and a series of serious consequences are caused. Therefore, how to guarantee high availability of disaster backup services for object storage is needed to be solved.
Disclosure of Invention
The embodiment of the invention provides an object storage backup system for improving the high availability of disaster backup service of object storage.
According to an aspect of the present invention, there is provided an object storage backup system, which may include:
the system comprises a distributed key value storage module, a plurality of service nodes and a survival signal sending module, wherein,
For each service node, the service node is used for storing load information representing the operation condition of the service node into the distributed key value storage module and sending survival signals to the distributed key value storage module based on the survival signal sending module;
The distributed key value storage module is used for respectively sending node abnormality information to each healthy node except for the abnormal node in the plurality of service nodes under the condition that the abnormal node exists in the plurality of service nodes based on all the received survival signals;
And aiming at each healthy node and the node identification of the healthy node, the healthy node is used for responding to the received node abnormality information, reading the load information of all the healthy nodes from the distributed key value storage module, and continuously executing the first object storage backup task which is not completed by the abnormal node under the condition that the target identification of the target healthy node with the smallest load in all the healthy nodes is determined to be the node identification based on the read all the load information.
The technical scheme of the embodiment of the invention comprises a distributed key value storage module, a plurality of service nodes and a survival signal sending module, wherein the service nodes are used for storing load information representing the operation condition of the service nodes into the distributed key value storage module according to each service node, sending survival signals to the distributed key value storage module based on the survival signal sending module so that the distributed key value storage module can timely know the fault condition of the service nodes and store the load information of the service nodes, the distributed key value storage module is used for respectively sending node abnormality information to each health node except for the abnormal node in the plurality of service nodes when all the received survival signals are determined to be abnormal, so that the abnormal node can be timely known to occur in the health nodes, and the health node is used for responding to the received node abnormality information, reading the load information of all the health nodes from the distributed key value storage module, and continuously executing a first task backup task with a first task being successfully completed by taking the target node with the lowest load identifier as a first task backup object under the condition of ensuring that the health node is not successfully executing the first task is successfully stored in the first task backup object. According to the technical scheme, the first object storage backup task which is not executed by the abnormal node is continuously executed through the healthy node with the minimum load, so that the service node for executing the backup task can be switched in time under the condition that the service node is abnormal, and the high availability of disaster backup service for object storage is improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention, nor is it intended to be used to limit the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of an object storage backup system provided in accordance with an embodiment of the present invention;
FIG. 2 is a block diagram of another object store backup system provided in accordance with an embodiment of the present invention;
FIG. 3 is a block diagram of yet another object store backup system provided in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of a connection of a specific example in yet another object storage backup system provided in accordance with an embodiment of the present invention;
FIG. 5 is a binding flow diagram of a specific example in yet another object store backup system provided in accordance with an embodiment of the present invention;
FIG. 6 is a flowchart of the operation of rules for a specific example in yet another object store backup system provided in accordance with an embodiment of the present invention;
Fig. 7 is an abnormal switching diagram of a specific example in yet another object storage backup system according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. The cases of "target", "original", etc. are similar and will not be described in detail herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a block diagram of an object storage backup system according to an embodiment of the present invention. The embodiment is applicable to the case of backing up object stores.
Referring to fig. 1, the object storage backup system of the embodiment of the present invention includes a distributed key value storage module 11, a plurality of service nodes 12, and a survival signaling module 13, wherein,
For each service node 12, the service node 12 is configured to store load information representing an operation condition of the service node 12 into the distributed key value storage module 11, and send a survival signal to the distributed key value storage module 11 based on the survival signal sending module 13;
A distributed key value storage module 11 for, in a case where it is determined that an abnormal node exists among the plurality of service nodes 12 based on the received all survival signals, respectively transmitting node abnormality information to each healthy node other than the abnormal node among the plurality of service nodes 12;
For each healthy node and the node identifier of the healthy node, the healthy node is configured to read load information of all healthy nodes from the distributed key value storage module 11 in response to the received node anomaly information, and if it is determined that the target identifier of the target healthy node with the smallest load in all healthy nodes is the node identifier based on the read all load information, continue to execute the first object storage backup task that is not completed by the anomaly node.
The distributed key value storage module 11 may be understood as a module for storing relevant information, such as load information, of the plurality of service nodes 12 and for receiving survival signals of the service nodes 12. Alternatively, the distributed key value storage module 11 may be implemented by the ETCD, and in order to enable the ETCD and the service node 12 to directly perform information transmission, the ETCD and the service node 12 may be bound. Specifically, the information of the ETCD may be issued to all service nodes 12, meanwhile, a gateway IP is specified, the service node 12 generates a regdto.conf file locally, records the control machine associated with the service node 12 and ETCD information (cc_id, cc_ip, cluster_ uuid, IP, PORT), optionally, for facilitating deployment management, all the service nodes 12 may include gateway functions at the same time, and after receiving the control machine registration information (dto.synchronization_reg), the service node 12 registers the current node in the ETCD according to the ETCD information, and updates to regdto.conf, so as to complete the binding between the service node 12 and the ETCD.
The service node 12 may be understood as a server that backs up an object store, and may read data of the object store and write it to another object store to achieve the backup of the object store. Alternatively, the service node 12 may be a data transfer Object (DATA TRANSFER Object, DTO) over which the Object store may be backed up.
The survival signal transmission module 13 may be understood as a module that may transmit a survival signal, and the service node 12 may periodically transmit the survival signal through the fire signal transmission module, and further may determine whether the service node 12 is abnormal through the survival signal.
An abnormal node may be understood as a service node 12 that has an abnormal condition in the service nodes 12, and cannot continue the backup task. When the service node 12 fails or goes offline due to the network, the service node 12 cannot send the survival signal to the distributed key value storage module 11 through the survival signal sending module 13, and the service node 12 may be determined as an abnormal node.
A healthy node may be understood as a service node 12 of all service nodes 12 where no anomaly has occurred. Alternatively, the service node 12 capable of sending the survival signal may be determined as a healthy node. The target healthy node may be understood as the least loaded healthy node of all healthy nodes.
The first object storage backup task may be understood as an object storage backup task that an abnormal node has not completed due to occurrence of an abnormality. Node anomaly information may be understood as information that characterizes the occurrence of an anomaly node in service node 12.
For each service node 12, the service node 12 stores load information representing the operation condition of the service node, such as information including rule number, memory occupancy rate, cpu occupancy rate, and the like, into the distributed key value storage module 11, and optionally, when the load information is stored, in order to ensure the real-time performance of the load information, the load information may be stored based on a preset time interval, or may be stored after the load information detected by the service node 12 changes. Meanwhile, in order to monitor the service node 12 in real time, the service node 12 may send a survival signal to the distributed key value storage module 11 periodically based on the survival signal sending module 13, and when the service node 12 does not send the survival signal, it indicates that the service node 12 fails or is offline, and the first object storage backup task cannot be performed again. The distributed key value storage module 11 receives all the survival signals, determines the service node 12 which does not receive the survival signals among the plurality of service nodes 12 as an abnormal node, and transmits node abnormality information to each healthy node except the abnormal node among the plurality of service nodes 12, respectively. For each healthy node, the load information of all the healthy nodes is read from the distributed key value storage module 11 according to the received node abnormality information, and when the target identifier of the target healthy node with the smallest load in all the healthy nodes is determined according to the total load information, if the target identifier is identical to the node identifier of the healthy node, for example, if the node name of the target healthy node is identical to the node name of the healthy node, the healthy node is indicated to be the target healthy node, and if the healthy node continues to execute the first object storage backup task which is not completed by the abnormal node.
The technical scheme of the embodiment of the invention comprises a distributed key value storage module, a plurality of service nodes and a survival signal sending module, wherein the service nodes are used for storing load information representing the operation condition of the service nodes into the distributed key value storage module according to each service node, sending survival signals to the distributed key value storage module based on the survival signal sending module so that the distributed key value storage module can timely know the fault condition of the service nodes and store the load information of the service nodes, the distributed key value storage module is used for respectively sending node abnormality information to each health node except for the abnormal node in the plurality of service nodes when all the received survival signals are determined to be abnormal, so that the abnormal node can be timely known to occur in the health nodes, and the health node is used for responding to the received node abnormality information, reading the load information of all the health nodes from the distributed key value storage module, and continuously executing a first task backup task with a first task being successfully completed by taking the target node with the lowest load identifier as a first task backup object under the condition of ensuring that the health node is not successfully executing the first task is successfully stored in the first task backup object. According to the technical scheme, the first object storage backup task which is not executed by the abnormal node is continuously executed through the healthy node with the minimum load, so that the service node for executing the backup task can be switched in time under the condition that the service node is abnormal, and the high availability of disaster backup service for object storage is improved.
The method comprises the steps of selecting a first object storage backup task, storing first task rule information of the first object storage backup task into a distributed key value storage module before executing the first object storage backup task, and if a target identifier is a node identifier, reading the first task rule information from the distributed key value storage module by a healthy node and continuously executing the first object storage backup task which is not completed by the abnormal node based on the first task rule information.
The first task rule information may be understood as rule information obtained by the abnormal node when the abnormal node starts to execute the first object storage backup task, for example, a task type, a backup storage location, backup content, and the like. Optionally, the first task rule information may be issued to the abnormal node by the control machine.
After determining the healthy node which continues to execute the backup task, the healthy node can read the first task rule information from the distributed key value storage module and continue to execute the first object storage backup task which is not completed by the abnormal node based on the first task rule information.
By the technical scheme, the healthy node can acquire the correct first task rule information, and smooth execution of the first object storage backup task is ensured.
On the basis, the system also comprises a shared database and an abnormal node, wherein the abnormal node is also used for sending flow information generated in the process of executing the first object storage backup task to the shared database for storage; and under the condition that the target identifier is the node identifier, the healthy node is specifically used for reading the first task rule information from the distributed key value storage module, reading the flow information from the shared database based on the first task rule information, and continuously executing the first object storage backup task which is not completed by the abnormal node based on the first task rule information and the flow information.
The abnormal node may send traffic information, such as a backup progress, an execution report, etc., generated during the process of executing the first object storage backup task to the shared database for storage, so that even if the first object storage backup task is interrupted due to an abnormal situation, the traffic information may be obtained from the shared database, and the backup progress of the abnormal node is known. Further, after the health node is determined, the health node may read the first task rule information from the distributed key value storage module, and based on the first task rule information, read the flow information from the shared database, and based on the first task rule information and the flow information, the health node may continue to execute the first object storage backup task that is not completed by the abnormal node from the interruption position of the first object storage backup task.
According to the technical scheme, the loss of the flow information can be avoided, meanwhile, when the first object storage backup task is continuously executed, the backup can be continuously carried out from the interrupt based on the flow information, and the backup efficiency is improved. And meanwhile, the flow cost is reduced.
Alternatively, if the target identifier is a node identifier, the health node is specifically configured to acquire the distributed lock from the distributed key value storage module, and if the distributed lock is acquired, read the first task rule information from the distributed key value storage module, and based on the first task rule information, continue to execute the first object storage backup task that is not completed by the abnormal node, and if the first object storage backup task is completed by the abnormal node, release the distributed lock.
The distributed lock can ensure that only one service node accesses the distributed key value storage module at the same time at a plurality of service nodes, thereby avoiding data inconsistency. Under the condition that the target mark is the node mark, the healthy node can firstly acquire the distributed lock from the distributed key value storage module, under the condition that the distributed lock is acquired, the first task rule information can be read from the distributed key value storage module, then the abnormal node is continuously executed to not execute the completed first object storage backup task based on the first task rule information, and under the condition that the healthy node executes the completed first object storage backup task, the distributed lock needs to be released, so that the follow-up service node can acquire the distributed lock conveniently.
According to the technical scheme, the smooth acquisition of the first task rule information can be ensured through the distributed lock.
The distributed key value storage module comprises Raft units, a leading unit and a plurality of following units, wherein the service node is specifically used for storing load information representing the operation condition of the service node to the leading unit and sending survival signals to the leading unit based on the survival signal sending module, raft units are used for copying all load information stored in the leading unit to each following unit respectively, raft units are also used for electing a target following unit from the following units and taking the target following unit as a new leading unit when the leading unit is detected to be faulty, and healthy nodes are specifically used for responding to the received node abnormality information, reading the load information of all healthy nodes from the new leading unit, and continuously executing the first object storage backup task which is not completed by the abnormal node when the target identifier of the target healthy node with the minimum load in all healthy nodes is determined to be the node identifier based on the read all load information.
In order to ensure high availability of the distributed key value storage module, the distributed key value storage module may include Raft units, a leader unit and a plurality of follower units, and the leader unit and the follower units are managed by the Raft units. When the leading unit operates normally, the service node stores load information representing the operation condition of the service node into the leading unit, and sends a survival signal to the leading unit based on the survival signal sending module, and the Raft unit copies all the load information stored in the leading unit into each following unit respectively, so that the leading unit and the following units store the same load information. Meanwhile, the Raft unit is further used for detecting the leader unit, optionally, the leader unit can periodically send a heartbeat signal to the Raft unit, and the Raft unit can elect a target following unit from a plurality of following units to take the target following unit as a new leader unit under the condition that the leader unit is detected to be faulty through the heartbeat signal. And finally, the healthy nodes can respond to the received node abnormality information, read the load information of all the healthy nodes from the new leading unit, and continuously execute the first object storage backup task which is not completed by the abnormal nodes under the condition that the target identifier of the target healthy node with the smallest load in all the healthy nodes is determined to be the node identifier based on the read all the load information.
According to the technical scheme, the distributed key value storage module is provided with the following units, so that high availability of the distributed key value storage module can be realized.
FIG. 2 is a block diagram of another object store backup system according to an embodiment of the present invention. The present embodiment is optimized based on the above technical solutions. In this embodiment, the system further comprises a control machine and a gateway module, wherein the control machine is used for issuing control instructions for a plurality of service nodes, the gateway module is used for acquiring all load information from the distributed key value storage module under the condition that the received control instructions are object storage backup instructions, determining a minimum load node with the minimum load from all service nodes based on all load information, taking the minimum load node as a first target node for backing up target object storage, and sending the backup instructions to the first target node, and the first target node is used for backing up the target object storage in response to the received backup instructions. Wherein, the explanation of the same or corresponding terms as the above embodiments is not repeated herein.
Specifically, referring to fig. 2, the object storage backup system of the present embodiment includes a distributed key value storage module 21, a plurality of service nodes 22, a survival signal transmission module 23, a controller 24, and a gateway module 25, wherein,
For each service node 22, the service node 22 is configured to store load information representing an operation condition of the service node 22 into the distributed key value storage module 21, and send a survival signal to the distributed key value storage module 21 based on the survival signal sending module 23;
a distributed key value storage module 21 for, in the case where it is determined that there is an abnormal node in the plurality of service nodes 22 based on the received all survival signals, respectively transmitting node abnormality information to each healthy node except the abnormal node in the plurality of service nodes 22;
For each healthy node and the node identifier of the healthy node, the healthy node is configured to read load information of all healthy nodes from the distributed key value storage module 21 in response to the received node anomaly information, and if it is determined that the target identifier of the target healthy node with the smallest load in all healthy nodes is the node identifier based on the read all load information, continue to execute the first object storage backup task that is not executed by the anomaly node;
a controller 24 for issuing control instructions for the plurality of service nodes 22;
The gateway module 25 is configured to obtain all load information from the distributed key value storage module 21 when the received control instruction is an object storage backup instruction, determine a minimum load node with a minimum load from all service nodes 22 based on all load information, take the minimum load node as a first target node for backing up a target object storage, and send the backup instruction to the first target node;
And the first target node is used for responding to the received backup instruction and backing up the target object storage.
The control machine 24 may be understood as a terminal that controls a backup task of the object storage, and may issue a control instruction to the service node 22 to control the service node 22 to backup the object storage.
The gateway module 25 may be understood as a module performing functions such as routing and data transmission.
The first target node may be understood as the least loaded service node 22 that gateway module 25 screens from all service nodes 22.
The control machine 24 issues a control instruction to the gateway module 25, the gateway module 25 determines that all load information is obtained from the distributed key value storage module 21 under the condition that the control instruction is an object storage backup instruction, screens the service nodes 22 through all load information, determines a first target node with the minimum load from all the service nodes 22, and sends the backup instruction to the first target node, and the first target node backs up the target object storage in response to the received backup instruction. Optionally, after receiving the backup instruction, the first target node may store the backup rule information in the backup instruction in json character strings in the distributed key value storage module 21, and simultaneously create a key in the distributed key value storage module 21 and store the load information of the first target node, so that when the plurality of service nodes 22 all perform the backup task, the service nodes 22 can be used in a balanced manner, and the utilization rate of the service nodes 22 is improved.
By the technical scheme, the resource utilization rate and the backup efficiency of the service node can be improved.
The method comprises the steps of selecting a target gateway unit with minimum load, obtaining all load information from a distributed key value storage module under the condition that the received control instruction is an object storage backup instruction, determining a minimum load node with minimum load from all service nodes based on all load information, taking the minimum load node as a first target node for backing up target object storage, and sending the backup instruction to the first target node.
The gateway module may include a plurality of gateway units, and optionally, the plurality of gateway units may share a virtual IP address, so that even if a faulty gateway unit exists, a healthy gateway unit may still be used for service, and the IP address of the gateway will not change.
After the control machine issues the control instruction, in order to balance the running loads of a plurality of gateway units in the gateway module, a target gateway unit with the smallest running load can be determined from the plurality of gateway units through an external load balancer, then the control instruction is received through the target gateway unit, the target gateway unit obtains all load information from the distributed key value storage module under the condition that the received control instruction is an object storage backup instruction, a smallest load node with the smallest load is determined from all service nodes based on all load information, the smallest load node is used as a first target node for backing up target object storage, and the backup instruction is sent to the first target node.
According to the technical scheme, the high availability of the gateway module can be realized by arranging the gateway units and the load equalizer.
FIG. 3 is a block diagram of another object storage backup system according to an embodiment of the present invention. The present embodiment is optimized based on the above technical solutions. In this embodiment, the system further comprises a controller and a gateway module, wherein the controller and the gateway module are used for aiming at each service node and node basic information of the service node, the service node is further used for registering the node basic information to the distributed key value storage module, the controller is used for issuing control instructions aiming at a plurality of service nodes, the gateway module is used for analyzing the node control instructions under the condition that the received control instructions are the node control instructions, obtaining node identifiers of second target nodes which the controller needs to control from the plurality of service nodes, inquiring the node basic information corresponding to the node identifiers from the distributed key value storage module, sending the node control instructions to the second target nodes based on the inquired node basic information, and the second target nodes are used for responding to the received node control instructions and executing corresponding operations. Wherein, the explanation of the same or corresponding terms as the above embodiments is not repeated herein.
Specifically, referring to fig. 3, the object storage backup system of the present embodiment includes a distributed key value storage module 31, a plurality of service nodes 32, a survival signal transmission module 33, a controller 34, and a gateway module 35, wherein,
For each service node 32, the service node 32 is configured to store load information representing an operation condition of the service node 32 into the distributed key value storage module 31, and send a survival signal to the distributed key value storage module 31 based on the survival signal sending module 33;
a distributed key value storage module 31 for, in a case where it is determined that an abnormal node exists among the plurality of service nodes 32 based on the received all survival signals, respectively transmitting node abnormality information to each healthy node other than the abnormal node among the plurality of service nodes 32;
for each healthy node and the node identification of the healthy node, the healthy node is configured to read load information of all healthy nodes from the distributed key value storage module 31 in response to the received node anomaly information, and continue to execute the first object storage backup task that is not completed by the anomaly node if it is determined that the target identification of the target healthy node with the smallest load in all healthy nodes is the node identification based on the read all load information;
For each service node 32 and node basic information of the service node 32, the service node 32 is further configured to register the node basic information to the distributed key value storage module 31;
a controller 34 for issuing control instructions for the plurality of service nodes 32;
The gateway module 35 is configured to parse the node control instruction if the received control instruction is a node control instruction, obtain, from the plurality of service nodes 32, a node identifier of a second target node that the controller 34 needs to control, query, from the distributed key value storage module 31, node basic information corresponding to the node identifier, and send the node control instruction to the second target node based on the queried node basic information;
And the second target node is used for responding to the received node control instruction and executing corresponding operation.
The node basic information may be understood as configuration information of the service node 32. In order for the service node 32 to store data such as load information in the distributed key value storage module 31, the service node 32 needs to register node basic information in the distributed key value storage module 31. Optionally, after receiving the registration information (e.g., dto. Synchronization_reg) issued by the controller 34, the service node 32 registers the node basic information into the distributed key value storage module 31 according to the information of the distributed key value storage module 31, updates the node basic information to regdto. Conf, and may subscribe to an event under a specified directory (e.g., cluster_uuid).
The second target node may be understood as a service node 32 of the plurality of service nodes 32 that the controller 34 is to control. The node control instruction issued by the controller 34 includes a node identifier of the second target node, and the gateway module 35 may determine the second target node through the node identifier and route the node control instruction to the second target node.
The controller 34 issues node control instructions for the plurality of service nodes 32, the gateway module 35 analyzes the received node control instructions, obtains node identifiers of the second target nodes from the plurality of service nodes 32, queries node basic information corresponding to the node identifiers from the distributed key value storage module 31, and sends the node control instructions to the second target nodes based on the queried node basic information, and the second target nodes execute corresponding operations in response to the received node control instructions.
According to the technical scheme, independent control on a certain service node can be realized through the control machine, and flexibility of object storage and backup is improved.
The node control instruction comprises a deleting instruction, a second target node and a second target node, wherein the second target node is further used for storing second task rule information of a second object storage backup task into the distributed key value storage module before executing the second object storage backup task, and the second target node is specifically used for stopping executing the second object storage backup task in response to the received deleting instruction and deleting node basic information and second task rule information of the second target node in the distributed key value storage module.
The second object storage backup task may be understood as an object storage backup task executed by the second target node. Alternatively, the second object store backup task may be the same as or different from the first object store backup task. The second task rule information may be understood as rule information for the second target node to perform the second object storage backup task.
And after the second target node receives the deleting instruction, stopping executing the second object storage backup task, and deleting node basic information and second task rule information of the second target node in the distributed key value storage module.
By the technical scheme, the deletion of the service node can be realized, and redundant or abnormal service nodes can be timely processed.
In another alternative solution, the node control instruction includes a pause instruction or a start instruction, and the second target node is specifically configured to pause execution of the second object storage backup task in response to the pause instruction, or start execution of the second object storage backup task in response to the start instruction.
The node control instruction may further include a pause instruction or a start instruction, where the second target node may pause execution of the second object storage backup task after receiving the pause instruction, and may start execution of the second object storage backup task after receiving the start instruction.
According to the technical scheme, the service node can be stopped or started in the backup process of the service node, and the backup flexibility is further improved.
In order to better understand the above technical solutions in general, the following is an exemplary description of a specific example, where a connection schematic diagram of the specific example is shown in fig. 4, and in the specific example, the distributed key value storage module is an ETCD cluster, the service node is a data transmission Object (DATA TRANSFER Object, DTO), and the Object is stored as a cloud Object store (Cloud Object Storage, COS), which specifically includes the following steps:
step one, the controller (i.e., CTRLCENTER) creates an ETCD cluster.
In order to facilitate maintenance management and monitoring and ensure high availability of the ETCD cluster itself, three or more ETCD nodes are typically added.
Step two, ETCD binds DTO
As shown in fig. 5, configuration information of the ETCD and configuration information of the shared database (TDSQL) are issued to all DTOs, and meanwhile, the DTOs locally generate regdto.conf files, record configuration information (for example, cc_id, cc_ip, cluster_ uuid, IP, PORT) of the controller and the ETCD. To facilitate deployment management, all DTOs contain gateway functions at the same time. Meanwhile, after receiving the configuration information of the ETCD, the DTO registers in the ETCD, updates to regdto.conf, starts a heartbeat keep-alive thread (namely sends a survival signal) through the survival signal sending module, initiates lease and regularly continues with the ETCD. In addition, the DTO may acquire resource usage information (i.e., load information) such as CPU, memory, and rule number of the DTO, and upload the resource usage information to the ETCD.
Step three, the control machine creates backup rules
And selecting the gateway IP as a target service pool, configuring a backup strategy (namely a control instruction) through the controller, then issuing the control instruction to the gateway, registering the gateway as a service node to the ETCD cluster, and subscribing the ETCD service. The ETCD stores the resource usage information uploaded by DTOs periodically, as shown in fig. 6, when the gateway determines that the backup policy is newly created (i.e., the object storage backup command), the gateway obtains the resource usage information of each DTO on the ETCD, and uses the DTO with the smallest load in each DTO as the first target node based on the resource usage information, routes the control command to the first target node, and the first target node executes the object storage backup task.
Fourth, rule operation
As shown in fig. 6, in the case that the gateway determines that the backup policy is a rule operation (i.e., a node control instruction), the gateway searches node basic information of the DTO corresponding to the backup policy, and forwards a request (i.e., the node control instruction) to the corresponding DTO based on the node basic information. The rule operations include a pause instruction, a start instruction, and a delete instruction.
Step five, DTO off-line (abnormal)
As shown in fig. 7, after a DTO network or service is abnormal, by a keep alive mechanism, the ETCD updates the DTO state, after the validity period expires, the DTO that is not leased is automatically deleted, the K-V information of the rule corresponding to the offline node (i.e., the abnormal node) is updated, and other healthy nodes are notified, after the other healthy nodes receive the node abnormality information, for each healthy node, the node ID of the node with the smallest current load (i.e., the DTO) is queried from the ETCD, and if so, the node ID is the ID of the healthy node, the distributed lock of the ETCD is started to be obtained, the traverse rule information is queried, the rule information corresponding to the offline node (i.e., the first task rule information) is queried, the rule information is obtained, and the lock (i.e., the distributed lock) is released, and the first object storage backup task that is not completed by the abnormal node is continuously executed.
The specific example improves the high availability of disaster backup services for object storage.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. The object storage backup system is characterized by comprising a distributed key value storage module, a plurality of service nodes and a survival signal sending module, wherein,
For each service node, the service node is configured to store load information representing an operation condition of the service node into the distributed key value storage module, and send a survival signal to the distributed key value storage module based on the survival signal sending module;
The distributed key value storage module is used for respectively sending node abnormality information to each healthy node except for the abnormal node in the plurality of service nodes under the condition that the abnormal node exists in the plurality of service nodes based on all the received survival signals;
And for each healthy node and node identification of the healthy node, the healthy node is used for responding to the received node abnormality information, reading load information of all the healthy nodes from the distributed key value storage module, and continuously executing a first object storage backup task which is not completed by the abnormal node under the condition that the target identification of the target healthy node with the smallest load in all the healthy nodes is determined to be the node identification based on the read load information.
2. The system of claim 1, wherein the system further comprises a controller configured to control the controller,
The abnormal node is configured to store first task rule information of the first object storage backup task into the distributed key value storage module before the first object storage backup task is executed;
And under the condition that the target identifier is the node identifier, the healthy node is specifically configured to read the first task rule information from the distributed key value storage module, and continuously execute the first object storage backup task which is not completed by the abnormal node based on the first task rule information.
3. The system of claim 2, further comprising a shared database,
The abnormal node is further configured to send flow information generated during the process of executing the first object storage backup task to the shared database for storage;
And under the condition that the target identifier is the node identifier, the healthy node is specifically configured to read the first task rule information from the distributed key value storage module, read the traffic information from the shared database based on the first task rule information, and continuously execute the first object storage backup task which is not completed by the abnormal node based on the first task rule information and the traffic information.
4. The system of claim 2, wherein the system further comprises a controller configured to control the controller,
The health node is specifically configured to acquire a distributed lock from the distributed key value storage module when the target identifier is the node identifier, and read the first task rule information from the distributed key value storage module when the distributed lock is acquired, and continuously execute a first object storage backup task that is not completed by the abnormal node based on the first task rule information;
The health node is further configured to release the distributed lock when the first object storage backup task is completed.
5. The system of claim 1, further comprising a control machine and a gateway module, wherein,
The control machine is used for issuing control instructions for a plurality of service nodes;
The gateway module is configured to obtain all the load information from the distributed key value storage module when the received control instruction is an object storage backup instruction, determine a minimum load node with a minimum load from all the service nodes based on all the load information, take the minimum load node as a first target node for backing up a target object storage, and send the backup instruction to the first target node;
And the first target node is used for responding to the received backup instruction and backing up the target object storage.
6. The system of claim 5, further comprising a load balancer, wherein the gateway module comprises a plurality of gateway units,
The control machine is used for issuing control instructions aiming at a plurality of service nodes to the load balancer;
The load balancer is used for distributing the control instruction to a target gateway unit with the minimum running load in the gateway units;
the target gateway unit is configured to obtain all the load information from the distributed key value storage module when the received control instruction is an object storage backup instruction, determine a minimum load node with a minimum load from all the service nodes based on all the load information, use the minimum load node as a first target node for backing up the target object storage, and send the backup instruction to the first target node.
7. The system of claim 1, further comprising a control machine and a gateway module, wherein,
For each service node and node basic information of the service node, the service node is further configured to register the node basic information to the distributed key value storage module;
The control machine is used for issuing control instructions for a plurality of service nodes;
The gateway module is used for analyzing the node control instruction under the condition that the received control instruction is the node control instruction, obtaining node identifiers of second target nodes which are required to be controlled by the controller from a plurality of service nodes, inquiring node basic information corresponding to the node identifiers from the distributed key value storage module, and sending the node control instruction to the second target nodes based on the inquired node basic information;
And the second target node is used for responding to the received node control instruction and executing corresponding operation.
8. The system of claim 7, wherein the node control instructions comprise delete instructions;
The second target node is further configured to store second task rule information of a second object storage backup task into the distributed key value storage module before executing the second object storage backup task;
and the second target node is specifically configured to stop executing the second object storage backup task in response to the received deletion instruction, and delete the node basic information and the second task rule information of the second target node in the distributed key value storage module.
9. The system according to claim 7, wherein the node control instructions comprise a suspend instruction or a start instruction, and the second target node is specifically configured to:
Suspending execution of the second object storage backup task in response to the suspend instruction, or
And responding to the starting instruction, and starting to execute the second object storage backup task.
10. The system of claim 1, wherein the distributed key value store module comprises Raft units, a lead unit, and a plurality of follower units, wherein,
The service node is specifically configured to store load information representing an operation condition of the service node to the leader unit, and send a survival signal to the leader unit based on the survival signal sending module;
The Raft unit is configured to copy all the load information stored in the leader unit to each of the follower units respectively;
The Raft unit is further configured to, when a failure of the leader unit is detected, elect a target follower unit from a plurality of follower units, and use the target follower unit as a new leader unit;
The healthy nodes are specifically configured to read load information of all the healthy nodes from a new leading unit in response to the received node anomaly information, and continue to execute a first object storage backup task that is not completed by the anomaly nodes when it is determined that a target identifier of a target healthy node with a minimum load in all the healthy nodes is the node identifier based on the read load information.
CN202510261575.7A 2025-03-06 2025-03-06 An object storage backup system Pending CN120186171A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510261575.7A CN120186171A (en) 2025-03-06 2025-03-06 An object storage backup system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510261575.7A CN120186171A (en) 2025-03-06 2025-03-06 An object storage backup system

Publications (1)

Publication Number Publication Date
CN120186171A true CN120186171A (en) 2025-06-20

Family

ID=96032877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510261575.7A Pending CN120186171A (en) 2025-03-06 2025-03-06 An object storage backup system

Country Status (1)

Country Link
CN (1) CN120186171A (en)

Similar Documents

Publication Publication Date Title
US7225356B2 (en) System for managing operational failure occurrences in processing devices
CN102640108B (en) The monitoring of replicated data
US20040225697A1 (en) Storage operation management program and method and a storage management computer
EP3817290A1 (en) Member change method for distributed system, and distributed system
CN109101196A (en) Host node switching method, device, electronic equipment and computer storage medium
CN110545197B (en) Node state monitoring method and device
CN116107828A (en) Main node selection method, distributed database and storage medium
CN110971872B (en) Video image information acquisition method based on distributed cluster
CN119248194B (en) A method, device, computer equipment and storage medium for synchronizing data of unattached mirror volume
US7093163B2 (en) Processing takeover method in multiple computer system
JPH08212095A (en) Client server control system
CN113254159B (en) Migration method and device of stateful service, computer equipment and storage medium
CN105740049A (en) Control method and apparatus
CN109189854B (en) Method and node equipment for providing continuous service
CN108509296B (en) Method and system for processing equipment fault
JP4202158B2 (en) Plant data collection device
CN120186171A (en) An object storage backup system
US20230118525A1 (en) Recovery of a software-defined data center
CN113347038B (en) Circulation mutual-backup high-availability system for bypass flow processing
CN116455920A (en) Data storage method, system, computer equipment and storage medium
CN116346582A (en) Method, device, equipment and storage medium for realizing redundancy of main network and standby network
JP4410082B2 (en) Communication relay device, communication relay method, and communication relay program
CN118550712B (en) Cloud platform disaster recovery method, device, equipment, medium and product
CN115242820B (en) A cluster node failure processing method, device, equipment and medium
US11354197B2 (en) Recovery of a software-defined data center

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination