[go: up one dir, main page]

CN111880969A - Storage node recovery method, device, equipment and storage medium - Google Patents

Storage node recovery method, device, equipment and storage medium Download PDF

Info

Publication number
CN111880969A
CN111880969A CN202010753074.8A CN202010753074A CN111880969A CN 111880969 A CN111880969 A CN 111880969A CN 202010753074 A CN202010753074 A CN 202010753074A CN 111880969 A CN111880969 A CN 111880969A
Authority
CN
China
Prior art keywords
storage node
log
playback
sequence number
playback log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010753074.8A
Other languages
Chinese (zh)
Other versions
CN111880969B (en
Inventor
王家贤
郭琰
韩朱忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dameng Database Co Ltd
Original Assignee
Shanghai Dameng Database Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dameng Database Co Ltd filed Critical Shanghai Dameng Database Co Ltd
Priority to CN202010753074.8A priority Critical patent/CN111880969B/en
Publication of CN111880969A publication Critical patent/CN111880969A/en
Application granted granted Critical
Publication of CN111880969B publication Critical patent/CN111880969B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a storage node recovery method, a device, equipment and a storage medium, wherein the method comprises the following steps: caching at least one redo log of the storage node as a playback log; when the storage node is determined to be in fault, determining at least one archiving task according to the playback log; and if the storage node is recovered in communication, sending each archiving task to the storage node so that the storage node executes the operation corresponding to the archiving task to realize data recovery. According to the embodiment of the invention, the filing task is determined through the cached replay log when the storage node fails, so that the redo log searching time in the data recovery process of the storage node is reduced, the recovery efficiency of the storage node is improved, and the robustness of the distributed database based on the storage node can be improved.

Description

Storage node recovery method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of database storage, in particular to a storage node recovery method, a storage node recovery device, storage node recovery equipment and a storage medium.
Background
In the distributed database system based on the redo log, each storage node needs to redo according to the redo log generated by the distributed database system, so as to achieve the purpose of distributed storage, but in the operation process of the distributed database system, after a certain storage node fails, in order to enable the failed storage node to recover quickly, data is migrated from a normal storage node to the failed storage node again, so that the storage node can provide service again, however, the method usually takes a long time. Another method can recover the storage node through the service log, but this method needs to analyze and identify the service log, find the missing data in the service log by the failed storage node, and repackage the service log into a replay log that can be identified by the storage node, which also takes a lot of time. There is a need in the art for a method for quickly recovering a failed storage node in a distributed database system.
Disclosure of Invention
The invention provides a storage node recovery method, a storage node recovery device, storage node recovery equipment and a storage medium, which are used for realizing the rapid recovery of a failed storage node and improving the robustness of a distributed database system.
In a first aspect, an embodiment of the present invention provides a storage node recovery method, where the method includes:
caching at least one redo log of the storage node as a playback log;
when the storage node is determined to be in fault, determining at least one archiving task according to the playback log;
and if the storage node is recovered in communication, sending each archiving task to the storage node so that the storage node executes the operation corresponding to the archiving task to realize data recovery.
In a second aspect, an embodiment of the present invention provides a storage node recovery apparatus, where the apparatus includes:
the log caching module is used for caching at least one redo log of the storage node as a playback log;
and the archiving log module is used for determining at least one archiving task according to the playback log when the storage node is determined to be in fault.
And the recovery execution module is used for sending each archiving task to the storage node if the communication of the storage node is recovered so as to enable the storage node to execute the operation corresponding to the archiving task to realize data recovery.
In a third aspect, an embodiment of the present invention provides an apparatus, where the apparatus includes:
one or more processors;
a memory for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the storage node recovery method according to any of the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a storage node recovery method according to any one of the embodiments of the present invention.
According to the embodiment of the invention, the redo log corresponding to the storage node is cached as the replay log, the archiving task is determined according to the replay log when the storage node fails, and after the communication recovery of the storage node is determined, the archiving task is sent to the storage node so that the storage node executes the operation corresponding to the archiving task to realize the data recovery, so that the failure recovery of the storage node is realized, the data search time during the recovery of the storage node can be reduced, and the robustness of the distributed database can be improved.
Drawings
Fig. 1 is a flowchart of a storage node recovery method according to an embodiment of the present invention;
fig. 2 is a flowchart of a storage node recovery method according to a second embodiment of the present invention;
fig. 3 is an exemplary diagram of a cached playback log according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of a storage node recovery apparatus according to a third embodiment of the present invention;
fig. 5 is a schematic structural diagram of an apparatus according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only a part of the structures related to the present invention, not all of the structures, are shown in the drawings, and furthermore, embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
Example one
Fig. 1 is a flowchart of a storage node recovery method according to an embodiment of the present invention, where this embodiment is applicable to a case of recovering data stored in a failed storage node in a distributed database, and the method may be executed by a storage node recovery apparatus, where the apparatus may be implemented in a hardware and/or software manner, and may be generally integrated in a log server, referring to fig. 1, where the method according to an embodiment of the present invention includes the following steps:
step 110, caching at least one redo log of the storage node as a playback log.
The storage nodes may be nodes for storing data in a distributed database, the distributed database includes at least one storage node, and the data stored in each storage node may be the same or different. Wherein the redo log may be comprised of two or more pre-allocated files within which all changes that occur to the distributed database may be stored. The playback log may be a cache file of the log server, the playback log may store one or more redo logs, the playback log may have a corresponding relationship with the storage node, and the redo log in the playback log may include a change operation corresponding to the storage node.
In the embodiment of the present invention, redo logs corresponding to each storage node may be received, and the obtained redo logs may be packaged as replay logs, it may be understood that a plurality of redo logs are received, and a redo log may be packaged as one or more replay logs according to the storage node corresponding to each redo log, where each replay log may correspond to a storage node. The generated playback log can be stored in a cache of the log server and used when the storage node is recovered from the fault.
And 120, when the storage node is determined to be in fault, determining at least one archiving task according to the playback log.
The fault may represent a state in which the storage node cannot normally operate, and may include disconnection of communication of the storage node, downtime of the storage node, and the like. The archive task may be an operation that the storage node needs to perform when restoring data, and may consist of a redo log.
Specifically, when a storage node fault is detected, a replay log corresponding to the storage node may be obtained in the cache, a redo log is extracted from the replay log, and the redo log may be stored as a filing task.
Step 130, if the storage node is restored by communication, sending each archiving task to the storage node so that the storage node executes the operation corresponding to the archiving task to realize data restoration.
In the embodiment of the invention, the storage node can be continuously monitored, when the communication with the storage node is determined to be reestablished, the archiving task corresponding to the storage node can be obtained and sent to the storage node, and the storage node can operate according to the redo log in the archiving task to realize data recovery.
According to the embodiment of the invention, the redo log of the storage node is cached as the replay log, the replay log is extracted to generate the filing task when the storage node fails, and the filing task is sent to the storage node when the communication of the storage node is recovered, so that the storage node executes the operation corresponding to the filing task to recover the data, the data recovery of the failed storage node is realized, the time for searching the recovery data by the storage node can be reduced, the recovery efficiency of the storage node is improved, and the robustness of the distributed database is improved.
Further, on the basis of the foregoing embodiment of the present invention, before caching at least one redo log of a storage node as a playback log, the method further includes:
initializing global control information of the storage nodes; the global control information includes at least one of: playback log buffer space, playback log control items, the number of effective control items, the starting position of the buffer space, the ending position of the buffer space, the size of used buffer space, the size of free buffer space and the sequence number of the processed log.
The global control information may be information for controlling the storage node to cache the replay log, and may be used to control the storage of the redo log in the replay log, and the global control information may include related attribute parameters of the cached replay log, such as the size of the cache space, the location of the cache space, the number of the stored replay logs, and the like.
In the embodiment of the present invention, when the log server is started, the global control information of each storage node may be configured in advance, so as to implement allocation of the memory space of each storage node, and when the memories corresponding to the storage nodes are different, the global control information corresponding to the storage nodes may also be different.
Specifically, the global control information may include at least one of the following: playback log buffer space, playback log control items, the number of effective control items, the starting position of the buffer space, the ending position of the buffer space, the size of used buffer space, the size of free buffer space and the sequence number of the processed log. The playback log cache space may represent an available memory space of the playback log, and a parameter of the playback log cache space may be set by a configuration file, which may be set by a user. The playback log cache space can be a space ring, the memory spaces corresponding to the playback log cache space can be logically connected end to end, and dead spaces in the playback log cache space are continuously covered in the playback log cache process, so that the memory can be reused. The playback log control item can be information for recording the maximum storage quantity of the redo log, and a numerical value corresponding to the playback log control item can be constant, can be set in the log server system, and can also be set through a configuration file. The number of active control entries may be the number of redo logs cached within the record playback log cache space. The valid control entry starting position may be a starting position of recording a redo log of a first cache within the playback log cache space. The starting position of the buffer space may represent the starting position of the playback log buffer space in the memory, and the starting offset of the memory may be recorded. The buffer space end position may represent an end position of the playback log buffer space in the memory, and an end offset of the memory may be recorded. The used cache space size may be the amount of memory space already occupied in the playback log cache space. The free buffer space size may be the size of the unoccupied memory space in the playback log buffer space. The processed log sequence number may represent a sequence number of a redo log executed by the storage node, and the storage node may feed back the sequence number of the redo log to the log server every time the storage node executes one redo log.
Example two
Fig. 2 is a flowchart of a storage node recovery method according to a second embodiment of the present invention, which is embodied on the basis of the foregoing embodiment of the present invention, and reduces space occupation by periodically cleaning a playback log, and referring to fig. 2, the storage node recovery method according to the second embodiment of the present invention includes the following steps:
step 200, initializing global control information of the storage node, wherein the global control information at least comprises one of the following: playback log buffer space, playback log control items, the number of effective control items, the starting position of the buffer space, the ending position of the buffer space, the size of the used buffer space, the size of the free buffer space and the sequence number of the processed log.
And step 210, packaging the obtained redo log into a playback log, and sending the playback log to a storage node.
The redo log is packaged according to a preset distribution rule, and the preset distribution rule may include packaging different replay logs according to storage nodes corresponding to the redo log, or packaging different replay logs according to log sequence numbers of the redo log.
Specifically, the log server may package the received redo log as a replay log, the log server may maintain a replay log processing thread, a sending queue is disposed in the replay log processing thread corresponding to each storage node, and the generated replay log may be sent to the corresponding sending queue and transmitted to the corresponding storage node through the sending queue.
Step 220, receiving the processed log sequence number fed back by the storage node, and caching the playback log according to the processed log sequence number of the storage node.
The processed log sequence number may be a log sequence number of a redo log executed last by the storage node, and the processed log sequence number may reflect a storage state of data in the storage node.
In the embodiment of the invention, the storage node can receive the redo log distributed by the log server and execute the operation in the redo log to realize the change of the data in the storage node, and after the storage node executes the redo log, the log sequence number of the redo log can be fed back to the log server as the processed log sequence number. After receiving the processed log sequence number, the log server may cache the playback log according to the processed log sequence number, for example, only cache the playback log with the log sequence number greater than the processed log sequence number. Furthermore, in order to reduce the cache occupation space of the playback log in the log server, the playback log in the cache space can be periodically cleared according to the processed log sequence number.
And step 230, setting the state of the storage node as a fault state when the storage node is determined to be in fault, and acquiring the global control information of the storage node.
Specifically, when the log server detects a storage node failure, a status Flag, for example, Flag 1, may be set for the storage node, the status Flag may be stored in the log server, and when the status Flag of the storage node is detected, the status of the storage node may be determined to be a failure status. The global control information associated with the storage node in the log server may also be obtained, for example, the global control information stored in association with the unique identification number may be searched for by the unique identification number of the storage node, and the global control information may include information related to the storage node caching playback log.
And 240, reading the playback log in the cache according to the initial position of the effective control item of the global control information, and storing the read playback log as an archiving task.
Wherein, the start position of the valid control item can mark the start position of the playback log stored in the cache.
Specifically, the start position of the effective control item in the global control information may be extracted, the stored playback logs may be read in the cache according to the start position of the effective control item, a corresponding archiving task may be generated for each playback log, and it may be understood that one archiving task may be generated from a plurality of acquired playback logs. For example, the numerical value of the start position of the valid control item may be assigned to the read pointer, after the playback log is read, the read pointer may be updated according to the length of the playback log to continue reading the playback log until all the playback logs corresponding to the storage node in the cache are read, all the read playback logs may be written into one file according to a preset format, and the file may be used as a filing task.
And step 250, setting the state of the storage node as a repair state when the communication of the storage node is determined to be recovered.
In the embodiment of the present invention, the log server monitors the storage node, determines that the storage node is in communication recovery when it is determined that communication connection is reestablished with the storage node, changes the state Flag stored in association with the storage node, and modifies the failure state to a repair state, for example, the state Flag corresponding to the storage node is Flag, when the Flag is equal to 1, the storage node is in the failure state, and when communication is recovered, the Flag may be set to 2, and the storage node is set to the recovery state.
Step 260, extracting the processed log sequence number from the global control information of the storage node.
In the embodiment of the present invention, the processed log sequence number may be a sequence number of a redo log that has been processed before the storage node fails, and may reflect a state of data in the storage node, and after the storage node recovers communication, the corresponding processed log sequence number may be extracted from the global control information corresponding to the storage node, and a replay log that needs to be executed by the storage node may be determined in a subsequent process through the processed log sequence number.
And step 270, sending the target playback log in the archiving task to the storage node according to the processed log sequence number.
The target playback log may be composed of redo logs that have not been executed by the storage node, and the storage node may restore data of the failure time by executing the redo logs in the target playback log.
Specifically, since the storage node generates the archive task in the fault state, the log sequence number of the replay log in the archive task may be compared with the processed log sequence number, and when the log sequence number of the replay log is greater than the processed log sequence number, the redo log corresponding to the replay log is not yet executed by the storage node, and the replay log may be used as the target replay log and sent to the storage node. For example, a log position larger than the processed log sequence number of the storage node is searched in the archiving task, and the playback log in the archiving task can be used as a target playback log from the log position, and the obtained target playback log is reissued to the storage node recovering communication.
And step 280, controlling the storage node to execute corresponding operation according to the target playback log to realize data recovery.
Specifically, the storage node may receive the target replay logs sent by the log server, sequentially execute operations in the target replay logs, and update data in the storage node, where the storage node determines to complete data recovery after all the target replay logs are executed. Further, after the data of the storage node is recovered, the state of the storage node can be set to be an effective state, and the storage node can normally provide storage service.
According to the embodiment of the invention, global control information of a storage node is initialized in advance, the obtained redo log is packaged into the replay log, the replay log is sent to the storage node, the replay log is cached according to a processed log serial number fed back by the storage node, when the storage node is determined to be in fault, the global control information of the storage node is obtained, the replay log in the cache is read according to the initial position information of an effective control item, a filing task corresponding to the replay log is generated, when the communication recovery of the storage node is determined, a target replay log in the filing task is sent to the storage node to be executed based on the processed log serial number of the storage node, the recovery of data in the storage node is realized, the searching time for determining the redo log by the storage node is shortened, the data recovery efficiency is improved, and the robustness of a distributed database can be enhanced.
Further, on the basis of the above embodiment of the present invention, the caching the playback log of the storage node according to the processed log sequence number includes:
clearing the playback log with the log sequence number smaller than or equal to the processed log sequence number in the cache until the number of effective control items in the global control information corresponding to the storage node is smaller than the playback log control items and the size of the free cache space is larger than or equal to the occupied space of the playback log; and caching the playback log according to the end position of the cache space in the global control information of the storage node.
The occupied space may be a data size of the playback log, and the occupied space may be a size of a space occupied by the playback log in the cache.
In the embodiment of the invention, the buffer space for storing the playback log in the log server has limitation, and in order to reasonably use the buffer space, when the buffer space is full and the playback log is stored, the playback log in the buffer can be cleaned. The log sequence number of the replay log may be compared with the processed log sequence number, if the log sequence number of the replay log is greater than the processed log sequence number, it may be determined that the replay log has not been executed by the storage node and cannot be cleared, and if the log sequence number of the replay log is less than or equal to the processed log sequence number, it may be determined that the replay log has been executed by the storage node and the replay log may be cleared from the cache. When the number of the effective control items in the global control information is smaller than that of the playback log, and the size of the free cache space is larger than or equal to the occupied space of the playback log, that is, the cleaned cache can store the playback log, the cleaning of the playback log in the cache can be stopped at this moment. The playback log may be stored in the cache address corresponding to the end position of the cache space, and the cache of the playback log is completed.
Further, on the basis of the above embodiment of the invention, the method further includes: and determining that the storage node is in a fault state, and directly generating an archiving task from the newly received playback log of the storage node.
Specifically, the log server may receive the playback log of the storage node in the failure state, and at this time, the received new playback log may be directly stored in the archiving task according to the preset format without caching the playback log.
In an exemplary implementation manner, the storage node recovery method provided by the embodiment of the present invention may be executed in a log service, and the method includes: step 1, when the log server is started, initializing global control information of each storage node, where the global control information may include the following parameters: the size of the playback log cache space can be configured by a user through a configuration file, parameters in the configuration file are read when a system is started to realize initialization, the playback log cache space can be used in a cyclic mode, the whole playback log cache space is regarded as a space ring, and the playback log cache space is repeatedly utilized by continuously overwriting a failed space. The playback log control item may be specifically an array, and each array may store therein attribute information of a playback log, for example, a data offset of the playback log. The array size of the playback log control items may be constant, and may represent the maximum number of playback logs stored in the playback log buffer space. The array size of the playback log control item may be set by the system of the server or may be set by a configuration file. The number of valid control items may be used to record the number of playback logs that have been cached in the playback log cache space. The starting position of the effective control item is used for marking the storage position of the first playback log in the playback log cache space, and the starting position needs to be marked because the array size of the playback log control item is fixed and the log cache space is circularly and repeatedly filled. And the starting position of the buffer space is used for recording the starting offset of the buffer space of the playback log. And the buffer space end position is used for recording the end offset of the playback log buffer space. And the used cache space size is used for recording the size of the used space in the playback log cache space. And the size of the free cache space is used for recording the size of the unused space in the playback log cache space. The log sequence number is processed, the log sent to the storage node by the log server has a log sequence number, and the storage node feeds back a processed log sequence number to the log server after processing the log, wherein the log sequence number can be the processed log sequence number.
And 2, the log server can analyze the redo log of the front-end database node, package the redo log into a replay log according to an allocation rule, and allocate the replay log to a sending queue of the corresponding storage node, wherein the sending queue can be processed by a replay log thread. Meanwhile, the log response thread can continuously receive the processed log sequence number fed back by the storage node, and the received processed log sequence number is set to the global control information of the storage node.
Step 3, the replay log processing thread in the log server may send the replay log of each storage node according to the log sequence number, and put the replay log into the cache after the sending is completed, where fig. 3 is an exemplary diagram of a cached replay log according to a second embodiment of the present invention, and referring to fig. 3, the putting of the replay log into the cache may specifically include the following steps:
and 3.1, trying to clean the refreshed playback log, acquiring a playback log cached in a playback log cache space, comparing whether the log serial number of the playback log is smaller than or equal to the processed log serial number of the storage node, if so, deleting the playback log, and adjusting the number of effective control items, the initial position of the cache space, the size of the used cache space and the size of the free cache space in the global control information. The 3.1 deleting the playback log cached in the log cache space may be performed repeatedly. Specifically, the manner of adjusting the global control information includes subtracting 1 from the number of the effective control items, adding 1 to the starting position of the effective control item, adjusting the length of the log in the playback log backward at the starting position of the cache space, subtracting the length of the log in the playback log from the size of the used cache space, and increasing the length of the log in the playback log by the size of the free cache space. Otherwise, step 3.2 is performed.
And 3.2, checking whether the buffer space of the playback log is enough to store a new playback log, firstly checking whether the number of the effective control items reaches the upper limit of the control items of the playback log, if not, continuously checking whether the size of the free buffer space is larger than or equal to the size of the new playback log, otherwise, returning to the step 3.1 to continuously try to clean the playback log which is flushed. If the size of the free buffer space is larger than or equal to the size of the new playback log, continuing to execute the step 3.3, otherwise returning to the step 3.1 of continuously trying to clean the playback log which is flushed.
And 3.3, caching the log, determining the storage position of the playback log according to the initial position of the effective control items and the number of the effective control items in the global control information, caching the playback log into a playback log caching space according to the storage position, and adjusting the initial position of the effective control items and the number of the effective control items according to the length of the playback log.
And 4, when the storage node fails, the storage node generates an archiving task according to the playback log. The state of the storage node can be set to be a fault state, a filing thread corresponding to the fault storage node is established, the replay logs in the replay log cache space are read according to the global control information of the fault storage node, each replay log can correspond to one filing thread, and the filing threads are used for storing the replay logs as filing tasks according to an agreed format. When receiving the playback log of the failure storage node, the log server does not store the playback log to the cache but directly generates an archiving task.
And 5, when detecting that the network communication between the log server and the fault storage node is recovered, considering that the fault storage node is recovered, recovering data in the storage node, setting the state of the fault storage node to be a recovery state, checking whether an archiving task of the storage node exists, if so, reading the archiving task according to an agreed format to acquire a replay log, determining an unexecuted target replay log in the replay log according to a processed log sequence number of the storage node, reissuing the target replay log to the fault node, completing fault recovery after all target replay logs in a replay log cache space are executed by the storage node, stopping the generation of the archiving task and setting the storage node to be an effective state.
EXAMPLE III
Fig. 4 is a schematic structural diagram of a storage node recovery apparatus according to a third embodiment of the present invention, where the apparatus shown in fig. 4 may execute a storage node recovery method according to any embodiment of the present invention, and has functional modules and/or beneficial effects corresponding to the execution method. The device can be implemented by software and/or hardware, and specifically comprises: a log caching module 301, an archive log module 302, and a recovery execution module 303.
The log caching module 301 is configured to cache at least one redo log of a storage node as a playback log.
And the archiving log module 302 is configured to determine at least one archiving task according to the playback log when determining that the storage node has a fault.
A recovery executing module 303, configured to send each of the archive tasks to the storage node if the storage node is restored in communication, so that the storage node executes an operation corresponding to the archive task to implement data recovery.
According to the implementation of the invention, the redo log corresponding to the storage node is cached as the replay log through the log caching module, the archiving log module determines the archiving task according to the replay log when the storage node fails, and after the recovery execution module determines the communication recovery of the storage node, the archiving task is sent to the storage node so that the storage node executes the operation corresponding to the archiving task to realize data recovery, thereby realizing the fault recovery of the storage node, reducing the data searching time when the storage node is recovered and improving the robustness of the distributed database.
Further, on the basis of the above embodiment of the invention, the method further includes:
the initialization module is used for initializing the global control information of the storage nodes; the global control information includes at least one of: playback log buffer space, playback log control items, the number of effective control items, the starting position of the buffer space, the ending position of the buffer space, the size of the used buffer space, the size of the free buffer space and the sequence number of the processed log.
Further, on the basis of the above embodiment of the present invention, the log caching module 301 includes:
the log sending unit is used for packaging the obtained redo log into a playback log and sending the playback log to the storage node;
and the cache execution unit is used for receiving the processed log sequence number fed back by the storage node and caching the playback log of the storage node according to the processed log sequence number.
Further, on the basis of the above embodiment of the present invention, the cache execution unit is specifically configured to:
clearing the playback log with the log sequence number smaller than or equal to the processed log sequence number in the cache until the number of effective control items in the global control information corresponding to the storage node is smaller than the playback log control items and the size of the free cache space is larger than or equal to the occupied space of the playback log; and caching the playback log according to the end position of the cache space in the global control information of the storage node.
Further, on the basis of the above embodiment of the present invention, the archive log module 302 includes:
the information extraction unit is used for setting the state of the storage node as a fault state when the storage node is determined to be in fault, and acquiring global control information of the storage node;
and the archiving processing unit is used for reading the playback log in the cache according to the initial position of the effective control item of the global control information and storing the read playback log as an archiving task.
Further, on the basis of the above embodiment of the present invention, the archive log module 302 further includes:
and the log processing unit is used for determining that the storage node is in a fault state and directly generating an archiving task from the newly received playback log of the storage node.
Further, on the basis of the above embodiment of the present invention, the recovery execution module 303 includes:
and the state adjusting unit is used for setting the state of the storage node to be a repair state when the communication of the storage node is recovered.
And the sequence number acquisition unit is used for extracting the processed log sequence number from the global control information of the storage node.
The target log unit is used for sending a target playback log in the archiving task to the storage node according to the processed log sequence number;
and the data recovery unit is used for controlling the storage node to execute corresponding operation according to the target playback log to realize data recovery.
Example four
Fig. 5 is a schematic structural diagram of an apparatus according to a fourth embodiment of the present invention, as shown in fig. 5, the apparatus includes a processor 40, a memory 41, an input device 42, and an output device 43; the number of processors 40 in the device may be one or more, and one processor 40 is taken as an example in fig. 5; the processor 40, the memory 41, the input device 42 and the output device 43 in the apparatus may be connected by a bus or other means, which is exemplified in fig. 5.
The memory 41, which is a computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as program modules corresponding to the storage node recovery method in the embodiment of the present invention (for example, the log caching module 301, the archival log module 302, and the recovery execution module 303 in the storage node recovery apparatus). The processor 70 executes various functional applications of the device and data processing by executing software programs, instructions and modules stored in the memory 71, that is, implements the storage node recovery method described above.
The memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 41 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 41 may further include memory located remotely from processor 40, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 42 is operable to receive input numeric or character information and to generate key signal inputs relating to user settings and function controls of the apparatus. The output device 43 may include a display device such as a display screen.
EXAMPLE five
An embodiment of the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are executed by a computer processor to perform a storage node recovery method, where the method includes:
caching at least one redo log of the storage node as a playback log;
when the storage node is determined to be in fault, determining at least one archiving task according to the playback log;
and if the storage node is recovered in communication, sending each archiving task to the storage node so that the storage node executes the operation corresponding to the archiving task to realize data recovery.
Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the operations of the method described above, and may also perform related operations in the storage node recovery method provided by any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the storage node recovery apparatus, each unit and each module included in the storage node recovery apparatus are only divided according to functional logic, but are not limited to the above division, as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A storage node recovery method, the method comprising:
caching at least one redo log of the storage node as a playback log;
when the storage node is determined to be in fault, determining at least one archiving task according to the playback log;
and if the storage node is recovered in communication, sending each archiving task to the storage node so that the storage node executes the operation corresponding to the archiving task to realize data recovery.
2. The method of claim 1, further comprising, prior to the caching at least one redo log of a storage node as a playback log:
initializing global control information of the storage nodes;
the global control information includes at least one of: playback log buffer space, playback log control items, the number of effective control items, the starting position of the buffer space, the ending position of the buffer space, the size of the used buffer space, the size of the free buffer space and the sequence number of the processed log.
3. The method of claim 2, wherein caching the at least one redo log of the storage node as a playback log comprises:
packaging the obtained redo log into a playback log, and sending the playback log to a storage node;
and receiving the processed log sequence number fed back by the storage node, and caching the playback log of the storage node according to the processed log sequence number.
4. The method of claim 3, wherein caching the playback log of the storage node according to the processed log sequence number comprises:
clearing the playback log with the log sequence number smaller than or equal to the processed log sequence number in the cache until the number of effective control items in the global control information corresponding to the storage node is smaller than the playback log control items and the size of the free cache space is larger than or equal to the occupied space of the playback log;
and caching the playback log according to the end position of the cache space in the global control information of the storage node.
5. The method of claim 3, wherein determining at least one archival task from the playback log upon determining that the storage node is malfunctioning comprises:
setting the state of the storage node as a fault state when the storage node is determined to be in fault, and acquiring global control information of the storage node;
and reading the playback log in the cache according to the initial position of the effective control item of the global control information, and storing the read playback log as an archiving task.
6. The method of claim 5, further comprising:
and determining that the storage node is in a fault state, and directly generating an archiving task from the newly received playback log of the storage node.
7. The method according to claim 5, wherein if the storage node is restored by communication, sending each of the archive tasks to the storage node so that the storage node performs the operation corresponding to the archive task to realize data restoration comprises:
when the communication of the storage node is determined to be recovered, setting the state of the storage node as a repair state;
extracting a processed log sequence number from the global control information of the storage node;
sending a target playback log in the archiving task to the storage node according to the processed log sequence number;
and controlling the storage node to execute corresponding operation according to the target playback log to realize data recovery.
8. An apparatus for storage node recovery, the apparatus comprising:
the log caching module is used for caching at least one redo log of the caching storage node as a playback log;
and the archiving log module is used for determining at least one archiving task according to the playback log when the storage node is determined to be in fault.
And the recovery execution module is used for sending each archiving task to the storage node if the communication of the storage node is recovered so as to enable the storage node to execute the operation corresponding to the archiving task to realize data recovery.
9. An apparatus, characterized in that the apparatus comprises:
one or more processors;
a memory for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the storage node recovery method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a storage node restoration method according to any one of claims 1 to 7.
CN202010753074.8A 2020-07-30 2020-07-30 Storage node recovery method, device, equipment and storage medium Active CN111880969B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010753074.8A CN111880969B (en) 2020-07-30 2020-07-30 Storage node recovery method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010753074.8A CN111880969B (en) 2020-07-30 2020-07-30 Storage node recovery method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111880969A true CN111880969A (en) 2020-11-03
CN111880969B CN111880969B (en) 2024-06-04

Family

ID=73204531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010753074.8A Active CN111880969B (en) 2020-07-30 2020-07-30 Storage node recovery method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111880969B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112346913A (en) * 2020-12-01 2021-02-09 上海达梦数据库有限公司 Data recovery method, device, equipment and storage medium
CN115202588A (en) * 2022-09-14 2022-10-18 云和恩墨(北京)信息技术有限公司 Data storage method and device and data recovery method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060117074A1 (en) * 2004-11-30 2006-06-01 Ezzat Ahmed K Method and apparatus for database cluster recovery
US20140279929A1 (en) * 2013-03-15 2014-09-18 Amazon Technologies, Inc. Database system with database engine and separate distributed storage service
CN104536971A (en) * 2014-12-02 2015-04-22 北京锐安科技有限公司 High-availability database
CN107357688A (en) * 2017-07-28 2017-11-17 广东神马搜索科技有限公司 Distributed system and its fault recovery method and device
CN110807064A (en) * 2019-10-28 2020-02-18 北京优炫软件股份有限公司 Data Recovery Device in RAC Distributed Database Cluster System

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060117074A1 (en) * 2004-11-30 2006-06-01 Ezzat Ahmed K Method and apparatus for database cluster recovery
US20140279929A1 (en) * 2013-03-15 2014-09-18 Amazon Technologies, Inc. Database system with database engine and separate distributed storage service
CN104536971A (en) * 2014-12-02 2015-04-22 北京锐安科技有限公司 High-availability database
CN107357688A (en) * 2017-07-28 2017-11-17 广东神马搜索科技有限公司 Distributed system and its fault recovery method and device
CN110807064A (en) * 2019-10-28 2020-02-18 北京优炫软件股份有限公司 Data Recovery Device in RAC Distributed Database Cluster System

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112346913A (en) * 2020-12-01 2021-02-09 上海达梦数据库有限公司 Data recovery method, device, equipment and storage medium
CN112346913B (en) * 2020-12-01 2024-03-15 上海达梦数据库有限公司 Data recovery method, device, equipment and storage medium
CN115202588A (en) * 2022-09-14 2022-10-18 云和恩墨(北京)信息技术有限公司 Data storage method and device and data recovery method and device
CN115202588B (en) * 2022-09-14 2022-12-27 本原数据(北京)信息技术有限公司 Data storage method and device and data recovery method and device

Also Published As

Publication number Publication date
CN111880969B (en) 2024-06-04

Similar Documents

Publication Publication Date Title
US11397648B2 (en) Virtual machine recovery method and virtual machine management device
CN110543386B (en) Data storage method, device, equipment and storage medium
CN109542682B (en) Data backup method, device, equipment and storage medium
CN107329704B (en) Cache mirroring method and controller
CN111506253B (en) Distributed storage system and storage method thereof
US10990312B2 (en) Method, apparatus, device and storage medium for processing data location of storage device
CN109683825B (en) Storage system online data compression method, device and equipment
CN109614276A (en) Fault handling method, device, distributed memory system and storage medium
CN108255576B (en) Virtual machine live migration exception handling method and device and storage medium
CN111046024A (en) Data processing method, device, equipment and medium for sharing storage database
US20130219224A1 (en) Job continuation management apparatus, job continuation management method and job continuation management program
US10970172B2 (en) Method to recover metadata in a content aware storage system
CN111880969B (en) Storage node recovery method, device, equipment and storage medium
WO2016206568A1 (en) Data update method, device, and related system
CN115756955A (en) Data backup and data recovery method and device and computer equipment
WO2018076633A1 (en) Remote data replication method, storage device and storage system
CN115357429A (en) Method and device for recovering data file and client
CN112711382A (en) Data storage method and device based on distributed system and storage node
CN112732479B (en) Data backup method and device for distributed system
CN111488124A (en) Data updating method and device, electronic equipment and storage medium
JP2008225599A (en) Trace information output device and trace information output method
US20210232466A1 (en) Storage system and restore control method
CN109032762B (en) Virtual machine backtracking method and related equipment
CN111694806A (en) Transaction log caching method, device, equipment and storage medium
CN109325005A (en) A kind of data processing method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant