KR20240010939A

KR20240010939A - Method and system for backup and recovery of distributed databases

Info

Publication number: KR20240010939A
Application number: KR1020220088330A
Authority: KR
Inventors: 김상철; 이한주; 서동주
Original assignee: 주식회사 카카오
Priority date: 2022-07-18
Filing date: 2022-07-18
Publication date: 2024-01-25

Abstract

A method performed by a backup manager server according to an embodiment may comprise the steps of: based on receiving backup input for one or more sharded clusters, obtaining information on a target sharded cluster from a backup waiting queue sorted according to backup priority; determining whether the target sharded cluster can be backed up based on whether one or more shards belonging to the target sharded cluster can be backed up; and based on the target sharded cluster being determined to be able to be backed up, giving a backup command to the one or more shards belonging to the target sharded cluster to perform an operation log backup prior to a snapshot backup.

Description

{METHOD AND SYSTEM FOR BACKUP AND RECOVERY OF DISTRIBUTED DATABASES}

이하, 분산 데이터베이스의 백업 및 복구에 관한 기술이 개시된다.Hereinafter, technologies related to backup and recovery of distributed databases are disclosed.

데이터베이스를 주 프로그램으로 사용하는 어플리케이션은 시스템의 장애가 발생했을 때, 데이터 처리가 멈추는 것을 허용하지 않을 수 있다. 하지만, 데이터베이스가 운영되는 동안 시스템 장애가 발생하지 않는 것은 불가능할 수 있으므로, 장애가 발생하더라도 데이터베이스가 지속적으로 운영되는 것이 중요할 수 있다. 데이터베이스의 정상적인 운영을 위한 방법 중 정기적인 데이터베이스 백업을 통해 의도하지 않은 데이터 손실 이벤트가 발생한 경우, 데이터 손실이 방지될 수 있다. 최신 데이터를 백업 및/또는 복제하고 프라이머리 시스템이 복구될 동안에도 서비스의 적어도 일부를 제공하기 위해서 사용될 수 있다. 즉, 원격 백업은 다른 형식의 폴트 톨러런스(fault tolerance)를 지원해 주는 방법이다.An application that uses a database as its main program may not allow data processing to stop when a system failure occurs. However, it may be impossible to avoid system failures while the database is operating, so it may be important for the database to continue operating even if a failure occurs. Among the methods for normal operation of the database, regular database backups can prevent data loss in the event of an unintended data loss event. It may be used to back up and/or replicate current data and provide at least some of the services while the primary system is being restored. In other words, remote backup is a method that supports a different form of fault tolerance.

원격 백업은 데이터를 보호하고 데이터의 유실 또는 사고가 발생할 경우 운영 연속성 등을 보장할 수 있기 때문에 데이터베이스 시스템을 운영하기 위해서 중요하다고 할 수 있다. 원격 백업은 프라이머리 시스템에서 연산이 불가능할 경우에도 도움을 주는 장점을 가질 수 있다. 예시적으로, 부하를 줄이기 위하여 로컬에서 백업하는 방법이 있을 수 있다. 원격 백업은 로컬에서 복사본 디스크(mirror disk)로 수행될 수 있다. 시스템 오류 또는 장애가 같은 물리적 노드(예: 같은 지역의 데이터 센터)에서 전파될 가능성이 있기 때문에, 로컬 복제는 적합하지 않을 수 있다. 이러한 단점을 극복하기 위해 원격 백업은 수행될 수 있고, 백업 데이터를 물리적으로 시스템과 분리시킴으로써 장애와 무관하게 백업 데이터를 유지할 수 있다. 원격 백업은 물리적인 장애 이외에도, 소프트웨어의 장애(예: 연산 오류나 버그 등)에서도 분리될 수 있는 장점을 가질 수 있다. Remote backup can be said to be important for operating a database system because it can protect data and ensure operational continuity in the event of data loss or accident. Remote backup can have the advantage of helping even when computation is not possible on the primary system. As an example, there may be a local backup method to reduce the load. Remote backup can be performed locally to a mirror disk. Local replication may not be appropriate because system errors or failures are likely to propagate from the same physical nodes (e.g., data centers in the same region). To overcome these shortcomings, remote backup can be performed and the backup data can be maintained regardless of failure by physically separating the backup data from the system. Remote backup can have the advantage of being able to isolate not only physical failures but also software failures (e.g., calculation errors, bugs, etc.).

일 실시예에 따른 백업 매니저 서버에 의하여 수행되는 방법은 하나 이상의 샤드 클러스터(sharded cluster)에 대한 백업 입력을 수신하는 것에 기초하여, 백업 우선 순위에 따라 정렬된 백업 대기 큐(backup waiting queue)로부터 대상 샤드 클러스터(target sharded cluster)의 정보를 획득하는 단계, 상기 대상 샤드 클러스터에 속하는 하나 이상의 샤드의 백업 가능 여부에 기초하여, 상기 대상 샤드 클러스터의 백업 가능 여부를 결정하는 단계. 및 상기 대상 샤드 클러스터가 백업 가능한 것으로 결정되는 것에 기초하여, 상기 대상 샤드 클러스터에 속하는 하나 이상의 샤드에게, 스냅샷 백업(snapshot backup)에 선행하여 연산 로그 백업(operation log backup)을 수행하도록 백업을 명령하는 단계를 포함할 수 있다.A method performed by a backup manager server according to an embodiment is based on receiving backup input for one or more sharded clusters, selecting a target from a backup waiting queue sorted according to backup priority. Obtaining information on a target sharded cluster, determining whether the target shard cluster can be backed up based on whether one or more shards belonging to the target shard cluster can be backed up. And based on the target shard cluster being determined to be capable of backup, a backup command is given to one or more shards belonging to the target shard cluster to perform an operation log backup prior to a snapshot backup. It may include steps.

상기 대상 샤드 클러스터의 정보를 획득하는 단계는, 샤드 클러스터에 속하는 샤드의 평균 백업 시간 길이 또는 샤드 클러스터에 속하는 샤드의 개수 중 하나 또는 둘 이상의 조합에 기초하여 결정된 백업 우선 순위에 따라 정렬된 상기 백업 대기 큐를 생성하는 단계를 포함할 수 있다.The step of acquiring information on the target shard cluster includes waiting for the backup sorted according to the backup priority determined based on one or a combination of two or more of the average backup time length of the shards belonging to the shard cluster or the number of shards belonging to the shard cluster. It may include the step of creating a queue.

상기 대상 샤드 클러스터의 백업 가능 여부를 결정하는 단계는, 샤드의 하나 이상의 세컨더리 중에서, 세컨더리의 물리적 위치에 기초하여 백업을 위해 이용될 세컨더리를 선택하는 단계를 포함할 수 있다.The step of determining whether backup of the target shard cluster is possible may include selecting a secondary to be used for backup based on the physical location of the secondary from among one or more secondary of the shard.

상기 대상 샤드 클러스터의 백업 가능 여부를 결정하는 단계는, 상기 대상 샤드 클러스터에 속하는 샤드의 세컨더리에 매핑된 데이터 센터(data center)의 가용 백업 서버 개수에 기초하여, 해당 샤드의 백업 가능 여부를 결정하는 단계를 포함할 수 있다.The step of determining whether backup of the target shard cluster is possible includes determining whether backup of the corresponding shard is possible based on the number of available backup servers in the data center mapped to the secondary of the shard belonging to the target shard cluster. May include steps.

상기 대상 샤드 클러스터의 백업 가능 여부를 결정하는 단계는, 상기 대상 샤드 클러스터에 속하는 하나 이상의 샤드가 모두 백업 가능한 것으로 결정되는 것에 기초하여, 상기 대상 샤드 클러스터가 백업 가능한 것으로 결정하는 단계 및 상기 대상 샤드 클러스터에 속하는 적어도 하나의 샤드가 백업 가능하지 않은 것으로 결정되는 것에 기초하여, 상기 대상 샤드 클러스터가 백업 가능하지 않은 것으로 결정하는 단계를 포함할 수 있다.The step of determining whether the target shard cluster is capable of being backed up includes determining that the target shard cluster is capable of being backed up based on determining that all of one or more shards belonging to the target shard cluster are capable of being backed up, and determining that the target shard cluster is capable of being backed up. It may include determining that the target shard cluster is not capable of being backed up, based on determining that at least one shard belonging to is not capable of being backed up.

상기 백업을 명령하는 단계는, 상기 하나 이상의 샤드들의 백업 시작 시각들이 미리 결정된 시간 범위 내에 포함되도록 백업을 명령하는 단계를 포함할 수 있다.The step of ordering a backup may include ordering a backup so that the backup start times of the one or more shards are within a predetermined time range.

상기 백업을 명령하는 단계는, 샤드의 프라이머리 및 세컨더리 간의 복제(data replication)를 위한 연산 로그의 백업을 명령하는 단계를 포함할 수 있다.The step of ordering a backup may include ordering a backup of the operation log for data replication between the primary and secondary of the shard.

상기 연산 로그의 백업을 명령하는 단계는, 상기 대상 샤드 클러스터의 논리적 클럭(logical clock)을 따르는 연산 시각을 가지는 연산 로그의 백업을 명령하는 단계를 포함할 수 있다.The step of ordering a backup of the computation log may include commanding a backup of the computation log whose computation time follows the logical clock of the target shard cluster.

상기 백업을 명령하는 단계는, 상기 연산 로그 백업의 시작 시각으로부터 미리 결정된 시간 길이만큼의 시간이 도과한 후에, 각 샤드의 세컨더리에 대한 스냅샷(snapshot)을 이용한 스냅샷 백업을 명령하는 단계를 포함할 수 있다.The step of commanding the backup includes commanding a snapshot backup using a snapshot for the secondary of each shard after a predetermined length of time has elapsed from the start time of the operation log backup. can do.

일 실시예에 따른 백업 매니저 서버에 의하여 수행되는 방법은 상기 대상 샤드 클러스터가 백업 가능하지 않은 것으로 결정되는 것에 기초하여, 상기 대상 샤드 클러스터의 백업 우선 순위를 변경하고 상기 변경된 백업 우선 순위에 따라 상기 백업 대기 큐를 업데이트하는 단계를 더 포함할 수 있다.A method performed by a backup manager server according to an embodiment changes the backup priority of the target shard cluster based on determining that the target shard cluster is not available for backup, and performs the backup according to the changed backup priority. The step of updating the waiting queue may be further included.

일 실시예에 따른 백업 매니저 서버에 의하여 수행되는 방법은 상기 대상 샤드 클러스터에 속하는 적어도 하나의 샤드의 백업이 종료된 것에 기초하여, 새로운 대상 샤드 클러스터의 백업 가능 여부를 결정하는 단계를 더 포함할 수 있다.The method performed by the backup manager server according to one embodiment may further include determining whether a new target shard cluster can be backed up based on the end of the backup of at least one shard belonging to the target shard cluster. there is.

일 실시예에 따른 백업 매니저 서버에 의하여 수행되는 방법은 상기 대상 샤드 클러스터가 백업 가능한 것으로 결정되는 것에 기초하여, 상기 백업 대기 큐로부터 획득된 정보가 지시하는 다른 샤드 클러스터가 상기 대상 샤드 클러스터와 병렬적으로 백업 가능한지 여부를 결정하는 단계를 더 포함할 수 있다.The method performed by the backup manager server according to one embodiment is based on determining that the target shard cluster is capable of backup, and another shard cluster indicated by information obtained from the backup standby queue is parallel to the target shard cluster. It may further include a step of determining whether backup is possible.

상기 백업을 명령하는 단계는, 다른 샤드 클러스터가 상기 대상 샤드 클러스터와 병렬적으로 백업 가능한 것으로 결정되는 것에 기초하여, 상기 대상 샤드 클러스터 및 상기 다른 샤드 클러스터를 병렬적으로 백업하도록 명령하는 단계를 포함하고, 상기 방법은, 다른 샤드 클러스터가 상기 대상 샤드 클러스터와 병렬적으로 백업 가능한 것으로 결정되는 것에 기초하여, 상기 대상 샤드 클러스터의 정보 및 상기 다른 샤드 클러스터의 정보를 상기 백업 대기 큐로부터 삭제하는 단계를 더 포함할 수 있다.The step of ordering the backup includes commanding the target shard cluster and the other shard cluster to be backed up in parallel, based on determining that the other shard cluster is capable of being backed up in parallel with the target shard cluster, , the method further includes the step of deleting the information of the target shard cluster and the information of the other shard cluster from the backup standby queue, based on determining that another shard cluster is capable of being backed up in parallel with the target shard cluster. It can be included.

일 실시예에 따른 백업 매니저 서버에 의하여 수행되는 방법은 상기 다른 샤드 클러스터가 상기 대상 샤드 클러스터와 병렬적으로 백업 가능하지 않은 것으로 결정되는 것에 기초하여, 상기 대상 샤드 클러스터의 정보를 상기 백업 대기 큐로부터 삭제하고 상기 다른 샤드 클러스터의 정보를 상기 백업 대기 큐에 유지하는 단계를 더 포함할 수 있다.The method performed by the backup manager server according to one embodiment is based on determining that the other shard cluster is not capable of being backed up in parallel with the target shard cluster, and extracts information of the target shard cluster from the backup standby queue. The step of deleting and maintaining information about the other shard cluster in the backup waiting queue may be further included.

일 실시예에 따른 백업 매니저 서버에 의하여 수행되는 방법은 상기 대상 샤드 클러스터에 대한 복구 입력을 수신하는 것에 기초하여, 상기 대상 샤드 클러스터의 스냅샷을 대상 샤드 클러스터의 복구 초기 상태로 설정하도록 명령하는 단계 및 연산 로그에 기초하여, 상기 설정된 대상 샤드 클러스터의 복구 초기 상태를 조정하도록 명령하는 단계를 더 포함할 수 있다.A method performed by a backup manager server according to an embodiment includes commanding to set a snapshot of the target shard cluster to the initial recovery state of the target shard cluster based on receiving a recovery input for the target shard cluster. And it may further include commanding to adjust the initial recovery state of the set target shard cluster based on the operation log.

상기 대상 샤드 클러스터의 의 복구 초기 상태를 조정하도록 명령하는 단계는, 상기 연산 로그를 이용하여 스냅샷에 기초하여 설정된 샤드의 복구 초기 상태를 조정함으로써 샤드의 데이터를 복구하도록 명령하는 단계를 포함할 수 있다.The step of commanding to adjust the initial recovery state of the target shard cluster may include commanding to restore the data of the shard by adjusting the initial recovery state of the shard set based on the snapshot using the operation log. there is.

일 실시예에 따른 분산 데이터베이스의 백업 및 복구를 위한 시스템에 의하여 수행되는 방법에 있어서, 백업 매니저 서버에 의하여, 하나 이상의 샤드 클러스터(sharded cluster)에 대한 백업 입력을 수신하는 것에 기초하여, 백업 우선 순위에 따라 정렬된 백업 대기 큐(backup waiting queue)로부터 대상 샤드 클러스터(target sharded cluster)의 정보를 획득하는 단계, 상기 백업 매니저 서버에 의하여, 상기 대상 샤드 클러스터에 속하는 하나 이상의 샤드의 백업 가능 여부에 기초하여, 상기 대상 샤드 클러스터의 백업 가능 여부를 결정하는 단계, 상기 백업 매니저 서버에 의하여, 상기 대상 샤드 클러스터가 백업 가능한 것으로 결정되는 것에 기초하여, 상기 대상 샤드 클러스터에 속하는 하나 이상의 샤드에게, 스냅샷 백업(snapshot backup)에 선행하여 연산 로그 백업(operation log backup)을 수행하도록 백업을 명령하는 단계, 복구 매니저 서버에 의하여, 상기 대상 샤드 클러스터에 대한 복구 입력을 수신하는 것에 기초하여, 상기 대상 샤드 클러스터의 스냅샷을 대상 샤드 클러스터의 복구 초기 상태로 설정하도록 명령하는 단계, 및 상기 복구 매니저 서버에 의하여, 연산 로그에 기초하여, 상기 설정된 대상 샤드 클러스터의 복구 초기 상태를 조정하도록 명령하는 단계를 포함할 수 있다.In a method performed by a system for backup and recovery of a distributed database according to an embodiment, the backup priority is determined based on receiving backup input for one or more sharded clusters by a backup manager server. Obtaining information on a target sharded cluster from a backup waiting queue sorted according to the backup waiting queue, based on whether one or more shards belonging to the target sharded cluster can be backed up by the backup manager server. Thus, determining whether the target shard cluster can be backed up, based on the backup manager server determining that the target shard cluster is capable of being backed up, performing a snapshot backup on one or more shards belonging to the target shard cluster. Commanding a backup to perform an operation log backup prior to a snapshot backup, based on receiving a recovery input for the target shard cluster by a recovery manager server, It may include commanding to set a snapshot to the initial recovery state of the target shard cluster, and commanding the recovery manager server to adjust the initial recovery state of the set target shard cluster based on the operation log. there is.

일 실시예에 따른 백업 매니저 서버는 컴퓨터로 실행 가능한 명령어들(computer-executable instructions)이 저장된 메모리 및 상기 메모리에 억세스(access)하여 상기 명령어들을 실행하는 프로세서를 포함하고, 상기 명령어들은, 하나 이상의 샤드 클러스터(sharded cluster)에 대한 백업 입력을 수신하는 것에 기초하여, 백업 우선 순위에 따라 정렬된 백업 대기 큐(backup waiting queue)로부터 대상 샤드 클러스터(target sharded cluster)의 정보를 획득하고, 상기 대상 샤드 클러스터에 속하는 하나 이상의 샤드들의 백업 가능 여부에 기초하여, 상기 대상 샤드 클러스터의 백업 가능 여부를 결정하며, 상기 대상 샤드 클러스터가 백업 가능한 것으로 결정되는 것에 기초하여, 상기 대상 샤드 클러스터에 속하는 하나 이상의 샤드들에게, 스냅샷 백업(snapshot backup)에 선행하여 연산 로그 백업(operation log backup)을 수행하도록 백업을 명령하도록 구성될 수 있다.The backup manager server according to one embodiment includes a memory storing computer-executable instructions and a processor that accesses the memory and executes the instructions, and the instructions are stored in one or more shards. Based on receiving the backup input for the sharded cluster, obtain information of the target sharded cluster from the backup waiting queue sorted according to backup priority, and Based on whether or not one or more shards belonging to the target shard cluster can be backed up, it is determined whether the target shard cluster can be backed up, and based on the target shard cluster being determined to be capable of being backed up, to one or more shards belonging to the target shard cluster. , It may be configured to command a backup to perform an operation log backup prior to a snapshot backup.

일 실시예에 따른 분산 데이터베이스의 백업 및 복구를 위한 시스템에 있어서, 컴퓨터로 실행 가능한 명령어들(computer-executable instructions)이 저장된 백업 매니저 서버의 메모리, 및 상기 백업 매니저 서버의 메모리에 억세스(access)하여 상기 명령어들을 실행하는 백업 매니저 서버의 프로세서를 포함하는 백업 매니저 서버 및 컴퓨터로 실행 가능한 명령어들이 저장된 복구 매니저 서버의 메모리, 및 상기 복구 매니저 서버의 메모리에 억세스하여 상기 명령어들을 실행하는 복구 매니저 서버의 프로세서를 포함하는 복구 매니저 서버를 포함하고, 상기 백업 매니저 서버의 메모리에 저장된 명령어들은, 하나 이상의 샤드 클러스터(sharded cluster)에 대한 백업 입력을 수신하는 것에 기초하여, 백업 우선 순위에 따라 정렬된 백업 대기 큐(backup waiting queue)로부터 대상 샤드 클러스터(target sharded cluster)의 정보를 획득하고, 상기 대상 샤드 클러스터에 속하는 하나 이상의 샤드들의 백업 가능 여부에 기초하여, 상기 대상 샤드 클러스터의 백업 가능 여부를 결정하며, 상기 대상 샤드 클러스터가 백업 가능한 것으로 결정되는 것에 기초하여, 상기 대상 샤드 클러스터에 속하는 하나 이상의 샤드들에게, 스냅샷 백업(snapshot backup)에 선행하여 연산 로그 백업(operation log backup)을 수행하도록 백업을 명령하도록 구성되고, 상기 복구 매니저 서버의 메모리에 저장된 명령어들은, 상기 대상 샤드 클러스터에 대한 복구 입력을 수신하는 것에 기초하여, 상기 대상 샤드 클러스터의 스냅샷을 대상 샤드 클러스터의 복구 초기 상태로 설정하도록 명령하고, 연산 로그에 기초하여, 상기 설정된 대상 샤드 클러스터의 복구 초기 상태를 조정하도록 명령하도록 구성될 수 있다.In a system for backing up and restoring a distributed database according to an embodiment, the memory of the backup manager server where computer-executable instructions are stored, and the memory of the backup manager server are accessed. A backup manager server including a processor of the backup manager server that executes the instructions, a memory of the recovery manager server storing computer-executable instructions, and a processor of the recovery manager server that accesses the memory of the recovery manager server and executes the instructions. a recovery manager server including a backup manager server, wherein the commands stored in a memory of the backup manager server include: a backup standby queue sorted according to backup priority, based on receiving backup input for one or more sharded clusters; Obtain information about the target sharded cluster from (backup waiting queue), and determine whether the target shard cluster can be backed up based on whether one or more shards belonging to the target shard cluster can be backed up, and Based on the target shard cluster being determined to be capable of backup, to command one or more shards belonging to the target shard cluster to perform an operation log backup prior to a snapshot backup. Commands configured and stored in the memory of the recovery manager server command, based on receiving a recovery input for the target shard cluster, to set a snapshot of the target shard cluster to the recovery initial state of the target shard cluster, Based on the operation log, it may be configured to command to adjust the initial recovery state of the set target shard cluster.

도 1은 일 실시예에 따른 백업 및 복구를 위한 시스템의 동작을 설명하기 위한 도면이다.
도 2는 일 실시예에 따른 분산 데이터베이스를 설명하기 위한 도면이다.
도 3은 일 실시예에 따른 샤드 클러스터에 대한 백업 명령 동작을 설명하기 위한 도면이다.
도 4는 일 실시예에 따른 대상 샤드 클러스터의 백업 가능 여부를 결정하는 동작을 설명하기 위한 도면이다.
도 5는 일 실시예에 따른 백업 매니저 서버의 백업 명령 동작을 설명하기 위한 도면이다.
도 6은 일 실시예에 따른 복수의 샤드 클러스터들을 병렬적으로 백업하는 동작을 설명하기 위한 도면이다.
도 7은 일 실시예에 따른 백업 및 복구 시스템의 복구 동작을 설명하기 위한 도면이다.1 is a diagram for explaining the operation of a system for backup and recovery according to an embodiment.
Figure 2 is a diagram for explaining a distributed database according to an embodiment.
Figure 3 is a diagram for explaining a backup command operation for a shard cluster according to an embodiment.
FIG. 4 is a diagram illustrating an operation of determining whether a target shard cluster can be backed up according to an embodiment.
Figure 5 is a diagram for explaining a backup command operation of a backup manager server according to an embodiment.
FIG. 6 is a diagram illustrating an operation of backing up a plurality of shard clusters in parallel according to an embodiment.
Figure 7 is a diagram for explaining a recovery operation of a backup and recovery system according to an embodiment.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 구현될 수 있다. 따라서, 실제 구현되는 형태는 개시된 특정 실시예로만 한정되는 것이 아니며, 본 명세서의 범위는 실시예들로 설명한 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be changed and implemented in various forms. Accordingly, the actual implementation form is not limited to the specific disclosed embodiments, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical idea described in the embodiments.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but these terms should be interpreted only for the purpose of distinguishing one component from another component. For example, a first component may be named a second component, and similarly, the second component may also be named a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected” to another component, it should be understood that it may be directly connected or connected to the other component, but that other components may exist in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as “comprise” or “have” are intended to designate the presence of the described features, numbers, steps, operations, components, parts, or combinations thereof, and are intended to indicate the presence of one or more other features or numbers, It should be understood that this does not exclude in advance the possibility of the presence or addition of steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the art. Terms as defined in commonly used dictionaries should be interpreted as having meanings consistent with the meanings they have in the context of the related technology, and unless clearly defined in this specification, should not be interpreted in an idealized or overly formal sense. No.

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the attached drawings. In the description with reference to the accompanying drawings, identical components will be assigned the same reference numerals regardless of the reference numerals, and overlapping descriptions thereof will be omitted.

도 1은 일 실시예에 따른 백업 및 복구를 위한 시스템의 동작을 설명하기 위한 도면이다.1 is a diagram for explaining the operation of a system for backup and recovery according to an embodiment.

입력 모듈(110)은 백업 및 복구를 위한 시스템(120)에게 입력을 전송할 수 있다. 입력은, 백업 및 복구를 위한 시스템(120)에 대한 특정 동작(예: 백업 명령 동작, 복구 명령 동작 등)을 수행할 것을 지시하는 입력으로서, 예를 들어, 백업 입력, 복구 입력 등을 포함할 수 있다. 일 실시예에 따른 입력 모듈(110)은 젠킨스(Jenkins)를 이용하여 구현될 수 있다.The input module 110 may transmit input to the system 120 for backup and recovery. The input is an input that instructs the system 120 for backup and recovery to perform a specific operation (e.g., backup command operation, recovery command operation, etc.), and may include, for example, backup input, recovery input, etc. You can. The input module 110 according to one embodiment may be implemented using Jenkins.

백업 및 복구를 위한 시스템(120)은, 분산 데이터베이스(130)의 백업과 복구를 수행하기 위한 시스템으로서, 백업 매니저 서버(backup manager server)(121), 복구 매니저 서버(recovery manager server)(122), 및 트래커 서버(tracker server)(123)를 포함할 수 있다.The backup and recovery system 120 is a system for performing backup and recovery of the distributed database 130, and includes a backup manager server 121 and a recovery manager server 122. , and may include a tracker server (tracker server) 123.

백업 매니저 서버(121)는 입력 모듈(110)로부터 백업 입력을 수신하는 것에 기초하여, 분산 데이터베이스(130)에게 백업을 명령할 수 있다. 일 실시예에 따른 백업 매니저 서버(121)는 컴퓨터로 실행 가능한 명령어들이 저장된 메모리, 및 상기 메모리에 억세스하여 명령어들을 실행하는 프로세서를 포함할 수 있다. 후술하겠으나, 백업을 명령받은 분산 데이터베이스(130)는 스토리지(140)에 백업 데이터를 전송함으로써 백업을 수행할 수 있다. 백업 매니저 서버(121)에 의한 백업 명령 동작은 도 3 내지 도 6에서 후술한다.The backup manager server 121 may command the distributed database 130 to perform a backup based on receiving a backup input from the input module 110. The backup manager server 121 according to one embodiment may include a memory in which computer-executable instructions are stored, and a processor that accesses the memory and executes the instructions. As will be described later, the distributed database 130, which has received a backup command, can perform the backup by transmitting backup data to the storage 140. The backup command operation by the backup manager server 121 will be described later with reference to FIGS. 3 to 6.

복구 매니저 서버(122)는 입력 모듈(110)로부터 복구 입력을 수신하는 것에 기초하여, 분산 데이터베이스(130)에게 복구를 명령할 수 있다. 일 실시예에 따른 복구 매니저 서버(122)는 컴퓨터로 실행 가능한 명령어들이 저장된 메모리, 및 상기 메모리에 억세스하여 명령어들을 실행하는 프로세서를 포함할 수 있다. 후술하겠으나, 복구를 명령받은 분산 데이터베이스(130)는 스토리지(140)로부터 백업 데이터를 수신하고 수신된 백업 데이터에 기초하여 분산 데이터베이스(130)를 복구할 수 있다. 복구 매니저 서버(122)에 의한 복구 명령 동작은 도 7에서 후술한다.The recovery manager server 122 may command the distributed database 130 to perform recovery based on receiving a recovery input from the input module 110. The recovery manager server 122 according to one embodiment may include a memory in which computer-executable instructions are stored, and a processor that accesses the memory and executes the instructions. As will be described later, the distributed database 130 that has received a recovery command may receive backup data from the storage 140 and restore the distributed database 130 based on the received backup data. The recovery command operation by the recovery manager server 122 will be described later with reference to FIG. 7.

트래커 서버(123)는 분산 데이터베이스(130)의 백업 동작 및/또는 복구 동작 동안 상태를 확인하고, 오류를 감지하는 경우 백업 매니저 서버(121)(또는 복구 매니저 서버(122))에게 감지된 오류에 관한 정보를 전달할 수 있다. 트래커 서버(123)는 미리 정해진 샘플링 주기에 따라 분산 데이터베이스(130)의 백업 동작 및/또는 복구 동작의 상태를 확인함으로써 오류 발생을 모니터링할 수 있다. 예를 들어, 트래커 서버(123)는, 분산 데이터베이스(130) 중 적어도 하나의 샤드 클러스터가 백업 및/또는 복구에 실패한 것을 감지한 경우, 백업 매니저 서버(121)(또는 복구 매니저 서버(122))에게 오류에 관한 정보를 전달할 수 있다.The tracker server 123 checks the status during the backup operation and/or recovery operation of the distributed database 130, and when an error is detected, reports the detected error to the backup manager server 121 (or recovery manager server 122). Information can be conveyed. The tracker server 123 may monitor the occurrence of errors by checking the status of the backup operation and/or recovery operation of the distributed database 130 according to a predetermined sampling period. For example, when the tracker server 123 detects that at least one shard cluster of the distributed database 130 has failed to backup and/or restore, the backup manager server 121 (or recovery manager server 122) Information about errors can be conveyed to

분산 데이터베이스(130)는, 하나의 논리적 데이터베이스가 네트워크 상에서 여러 컴퓨터들에 물리적으로 분산되어 있지만 클라이언트가 하나의 데이터베이스처럼 인식할 수 있도록 논리적으로 통합되어 공유되는 데이터베이스 구조를 가질 수 있다. 도 2에서 후술하겠으나, 분산 데이터베이스(130)는 하나 이상의 샤드 클러스터(sharded cluster)를 포함하고, 각 샤드 클러스터는 하나 이상의 샤드(shard)를 포함할 수 있다.The distributed database 130 may have a database structure in which one logical database is physically distributed across multiple computers on a network but is logically integrated and shared so that the client can recognize it as one database. As will be described later in FIG. 2, the distributed database 130 includes one or more sharded clusters, and each shard cluster may include one or more shards.

스토리지(storage)(140)는 백업 서버에 의하여 분산 데이터베이스(130)로부터 수신된 백업 데이터가 저장될 수 있다. 일 실시예에 따른 스토리지(140)는, 데이터 센터(data center)에 포함된 저장 매체를 통해 구현될 수 있다.The storage 140 may store backup data received from the distributed database 130 by the backup server. Storage 140 according to one embodiment may be implemented through a storage medium included in a data center.

도 2는 일 실시예에 따른 분산 데이터베이스를 설명하기 위한 도면이다.Figure 2 is a diagram for explaining a distributed database according to an embodiment.

일 실시예에 따른 분산 데이터베이스(200)(예: 도 1의 분산 데이터베이스(130))는, 샤드 클러스터(210) 및 단일 레플리카 세트(220) 중 하나 또는 둘 이상의 조합을 포함할 수 있다. The distributed database 200 (e.g., the distributed database 130 of FIG. 1) according to an embodiment may include one or a combination of two or more of a shard cluster 210 and a single replica set 220.

각 샤드 클러스터는 하나 이상의 샤드를 포함할 수 있다. 예시적으로, 도 2에서 나타난 바와 같이, 제1 샤드 클러스터(210)는 3개의 샤드들(211, 212, 213)을 포함할 수 있다.Each shard cluster can contain one or more shards. Exemplarily, as shown in FIG. 2, the first shard cluster 210 may include three shards 211, 212, and 213.

도 2에서는 명시적으로 도시되지는 않았으나, 각 샤드 클러스터는 하나 이상의 샤드와 함께 컨피그 서버(configuration server; config server) 및 라우터 서버(router server)를 더 포함할 수 있다. 컨피그 서버는, 샤드 클러스터에 저장된 데이터가 분산된 상태에 관한 메타 데이터를 저장할 수 있다. 라우터 서버는 사용자로부터의 명령을 각 샤드에게 전달하는 동작 및 명령에 따른 결과를 사용자에게 전송하는 동작을 수행할 수 있다. 예시적으로, 샤드 클러스터는 MongoDB를 통해 구현될 수 있다. Although not explicitly shown in FIG. 2, each shard cluster may further include a configuration server (config server) and a router server along with one or more shards. The Config server can store metadata about the distributed state of data stored in a shard cluster. The router server can perform the operation of transmitting a command from the user to each shard and transmitting the result of the command to the user. By way of example, a shard cluster may be implemented through MongoDB.

각 샤드는 하나의 프라이머리 및 하나 이상의 세컨더리를 포함할 수 있다. 하나의 샤드의 프라이머리와 세컨더리 각각은, 공통된 데이터의 제공을 위한 물리적으로 분리된 서버로 구현될 수 있다. 예시적으로, 도 2에서 나타난 바와 같이, 샤드(211)는 프라이머리(211a), 제1 세컨더리(211b), 및 제2 세컨더리(211c)를 포함할 수 있다.Each shard can contain one primary and one or more secondary. Each of the primary and secondary shards can be implemented as physically separate servers to provide common data. Exemplarily, as shown in FIG. 2, the shard 211 may include a primary 211a, a first secondary 211b, and a second secondary 211c.

샤드의 프라이머리 및 세컨더리 중에서, 프라이머리는 데이터에 대한 변경(예: 삽입(create), 업데이트(update), 삭제(delete)) 명령을 적용함으로써 데이터를 변경할 수 있다. 세컨더리는, 프라이머리의 데이터가 변경되는 경우 프라이머리와의 데이터 동기화를 동기적으로(synchronously) 진행하지 않을 수 있다. 대신에, 세컨더리의 데이터는, 프라이머리에 대한 연산을 기록하기 위한 연산 로그(operation log)에 기초하여 비동기적으로(asynchronously) 변경될 수 있다. 체크포인트는 메모리 상의 데이터를 디스크에 쓰는(write) 동작으로서, 한 노드에서 메모리의 데이터와 디스크의 데이터를 동기화하는 동작이 수행되는 시점을 의미할 수 있다. 예시적으로, 체크 포인트는 60초의 주기로 반복되는 것으로 설정될 수 있다. 다시 말해, 연산 로그는 하나의 샤드의 프라이머리 및 세컨더리 간의 데이터 일관성(data consistency)를 위한 로그로 체크 포인트마다 이용될 수 있다. 데이터 일관성은, 하나의 샤드에서, 프라이머리로부터 획득 가능한 데이터를 세컨더리의 데이터 및 연산 로그에 기초하여 획득 가능한 성질을 포함할 수 있다.Among the primary and secondary of a shard, the primary can change data by applying change commands to the data (e.g., create, update, delete). The secondary may not synchronize data with the primary synchronously when the data in the primary changes. Instead, the secondary's data can be changed asynchronously based on an operation log for recording operations on the primary. A checkpoint is an operation to write data in memory to a disk, and may refer to the point in time when an operation to synchronize data in memory and data on a disk is performed at one node. Illustratively, the checkpoint may be set to repeat at a period of 60 seconds. In other words, the operation log is a log for data consistency between the primary and secondary of one shard and can be used at each checkpoint. Data consistency may include the property that, in one shard, data obtainable from the primary can be obtained based on secondary data and operation logs.

연산 로그는 논리적 클럭(logical clock)을 따르는 연산 시각을 가질 수 있다. 논리적 클럭은, 샤드 클러스터에 복수의 샤드들이 속하는 경우, 샤드들 간의 동기화를 위한 클럭을 나타낼 수 있다. 논리적 클럭은, 샤드 클러스터에서 발생한 연산 간의 선후 관계를 표현할 수 있다. 예를 들어, 제1 연산 로그는 제1 연산의 논리적 클럭으로 표현된 제1 연산 시각을 포함할 수 있고, 제2 연산 로그는 제2 연산의 논리적 클럭으로 표현된 제2 연산 시각을 포함할 수 있다. 제1 연산 시각 및 제2 연산 시각의 대소 관계는 제1 연산 및 제2 연산의 선후 관계를 지시하지만, 제1 연산 시각 및 제2 연산 시각의 차이는 제1 연산의 발생 시각 및 제2 연산의 발생 시각 간의 차이와 독립적일 수 있다. 다시 말해, 제1 연산 시각이 제2 연산 시각보다 작은 경우, 제1 연산이 제2 연산보다 먼저 적용되는 것으로 결정할 수 있을 뿐이고, 제1 연산이 제2 연산보다 먼저 적용되는 시간 차이와는 제1 연산 시각과 제2 연산 시각은 물론 제1 연산 시각 및 제2 연산 시각 간의 시간 차이와도 무관할 수 있다.The operation log may have an operation time that follows a logical clock. The logical clock may represent a clock for synchronization between shards when a plurality of shards belong to a shard cluster. A logical clock can express the precedence relationship between operations that occur in a shard cluster. For example, the first operation log may include the first operation time expressed by the logical clock of the first operation, and the second operation log may include the second operation time expressed by the logical clock of the second operation. there is. The magnitude relationship between the first operation time and the second operation time indicates the precedence relationship between the first operation and the second operation, but the difference between the first operation time and the second operation time is the occurrence time of the first operation and the second operation time. It may be independent of the difference between occurrence times. In other words, if the first operation time is smaller than the second operation time, it can only be determined that the first operation is applied before the second operation, and the time difference between the first operation and the second operation is different from the first operation time. The calculation time and the second calculation time may be unrelated to the time difference between the first calculation time and the second calculation time.

샤드는, 연산 로그에 포함된 연산 시각을 샤드의 시각(예: 샤드의 물리적 클럭으로 표현된 시각)과 비교함으로써 샤드의 클럭을 조정할 수 있다. 예를 들어, 샤드는 연산 명령을 수신할 수 있다. 연산 명령은 컨피그 서버에 의하여 대응하는 샤드로 전달될 수 있다. 연산 명령은 논리적 클럭으로 표현된 연산 시각을 포함할 수 있다. 연산 명령을 수신한 샤드는, 샤드의 물리적 시각이 연산 명령의 연산 시각보다 작을 수 있다. 샤드는 샤드의 물리적 시각이 연산 명령의 연산 시각보다 작은 경우, 연산 명령을 프라이머리에서 처리한 후에, 해당 샤드의 물리적 시각을 연산 시각보다 큰 시각으로(예를 들어, 연산 명령의 연산 시각에 미리 결정된 값을 더한 값으로) 변경할 수 있다. 다시 말해, 샤드는 샤드의 시각 및 연산 명령의 연산 시각과의 선후 관계에 맞추어 샤드의 시각을 조정할 수 있다. 샤드의 시각을 조정함으로써, 샤드는 샤드 클러스터의 논리적 클럭을 통해 동기화될 수 있다. 샤드는 프라이머리에서 연산 명령을 적용한 뒤, 논리적 클럭을 따르는 연산 시각을 포함하는 연산 로그를 기록할 수 있다. 따라서, 샤드 클러스터의 샤드는 연산 명령을 수신할 때마다, 샤드의 시각이 샤드 클러스터의 논리적 클럭과 동기화될 수 있다. 결과적으로, 샤드 클러스터의 샤드는 적어도 하나의 연산 명령을 수신하였으면, 샤드 클러스터의 논리적 클럭으로 동기화된 상태를 가질 수 있다. A shard can adjust the shard's clock by comparing the computation time included in the computation log with the shard's time (e.g., the time expressed by the shard's physical clock). For example, a shard can receive operation instructions. Operation commands can be delivered to the corresponding shard by the config server. The operation instruction may include an operation time expressed as a logical clock. For the shard that received the operation command, the physical time of the shard may be smaller than the operation time of the operation command. If the shard's physical time is smaller than the operation time of the operation command, after processing the operation command in the primary, the shard's physical time is set to a time greater than the operation time (for example, in advance of the operation time of the operation command). can be changed to the value added to the determined value). In other words, the shard can adjust the shard's time according to the sequential relationship between the shard's time and the operation time of the operation command. By adjusting the shard's clock, the shard can be synchronized via the logical clock of the shard cluster. After applying an operation instruction in the primary, a shard can record an operation log that includes the operation time following the logical clock. Therefore, whenever a shard in a shard cluster receives an operation command, the shard's time can be synchronized with the logical clock of the shard cluster. As a result, if a shard in a shard cluster has received at least one operation command, it can be in a synchronized state with the logical clock of the shard cluster.

이하, 도 3 내지 도 7에서 하나 이상의 샤드 클러스터를 포함하는 분산 데이터베이스의 백업 및 복구를 위한 시스템의 동작 방법을 설명한다.Hereinafter, a method of operating a system for backing up and restoring a distributed database including one or more shard clusters will be described in FIGS. 3 to 7.

도 3은 일 실시예에 따른 샤드 클러스터에 대한 백업 명령 동작을 설명하기 위한 도면이다.Figure 3 is a diagram for explaining a backup command operation for a shard cluster according to an embodiment.

단계(310)에서, 백업 매니저 서버는 백업 우선 순위에 따라 정렬된 백업 대기 큐(backup waiting queue)를 생성할 수 있다. 백업 대기 큐는 백업 우선 순위에 따라 정렬된 분산 데이터베이스의 적어도 일부의 정보를 가질 수 있다. 예를 들어, 백업 대기 큐는 하나 이상의 샤드 클러스터의 정보를 가질 수 있다. 샤드 클러스터의 정보는, 각 샤드 클러스터의 ID(identification), 각 샤드 클러스터에 속하는 샤드의 ID, 또는 각 샤드의 세컨더리의 지리적 위치 중 하나 또는 둘 이상의 조합을 포함할 수 있다. 백업 매니저 서버는 샤드 클러스터의 백업 우선 순위를 결정할 수 있다. 백업 매니저 서버는 결정된 백업 우선 순위에 기초하여 백업 대기 큐를 생성할 수 있다. At step 310, the backup manager server may create a backup waiting queue ordered by backup priority. The backup standby queue may have information on at least part of the distributed database sorted according to backup priority. For example, a backup standby queue may have information from one or more shard clusters. Information on the shard cluster may include one or a combination of two or more of the identification of each shard cluster, the ID of the shard belonging to each shard cluster, or the geographical location of the secondary of each shard. The backup manager server can determine the backup priority of the shard cluster. The backup manager server may create a backup standby queue based on the determined backup priority.

일 실시예에 따르면, 백업 매니저 서버는 샤드 클러스터에 속하는 하나 이상의 샤드 간 평균 백업 시간 길이에 기초하여 백업 우선 순위를 결정할 수 있다. 백업 매니저 서버는 저장된 과거 백업 시간 길이(예: 직전에 수행된 백업 동작의 각 샤드의 백업 시간 길이)에 기초하여, 각 샤드 클러스터에 속하는 샤드의 평균 백업 시간 길이를 계산할 수 있다. 백업 매니저 서버는, 샤드의 평균 백업 시간 길이에 기초하여(예를 들어, 오름차순으로 또는 내림차순으로) 하나 이상의 샤드 클러스터를 정렬함으로써 백업 대기 큐를 생성할 수 있다.According to one embodiment, the backup manager server may determine backup priority based on the average backup time length between one or more shards belonging to a shard cluster. The backup manager server may calculate the average backup time length of the shards belonging to each shard cluster based on the stored past backup time length (e.g., the backup time length of each shard in the immediately preceding backup operation). The backup manager server may create a backup standby queue by sorting one or more shard clusters (e.g., in ascending or descending order) based on the average backup time length of the shards.

일 실시예에 따르면, 백업 매니저 서버는 평균 백업 시간 길이와 함께 샤드 클러스터에 속하는 샤드의 개수(또는 샤드 클러스터의 크기라고도 표현됨)에 더 기초하여 백업 우선 순위를 결정할 수 있다. 백업 매니저 서버는 평균 백업 시간 길이를 기준으로 초기 백업 대기 큐를 정렬하고, 샤드의 개수에 기초하여 초기 백업 대기 큐를 조정함으로써 백업 대기 큐를 생성할 수 있다. 예를 들어, 백업 매니저 서버는 평균 백업 시간 길이를 기준으로 샤드 클러스터를 내림차순으로 정렬함으로써 초기 백업 대기 큐를 생성할 수 있다. 다시 말해, 백업 매니저 서버는 샤드 클러스터의 평균 백업 시간 길이가 길수록 해당 샤드 클러스터의 초기 백업 우선 순위를 보다 더 높은 순위로 결정할 수 있다. 백업 매니저 서버는, 초기 백업 대기 큐에서 인접하게 정렬된 둘 이상의 샤드 클러스터들의 평균 백업 시간 길이의 차이가 임계 시간 길이 이하인 것에 기초하여, 상기 인접하게 정렬된 둘 이상의 샤드 클러스터들을 샤드 클러스터의 크기를 내림차순으로 정렬된 백업 대기 큐를 조정함으로써 백업 대기 큐를 생성할 수 있다. 예를 들어, 백업 매니저 서버는 평균 백업 시간 길이의 차이가 임계 시간 길이 이하인 제1 샤드 클러스터 및 제2 샤드 클러스터를 포함하는 초기 백업 대기 큐를 조정할 수 있다. 백업 매니저 서버는 제1 샤드 클러스터의 크기 및 제2 샤드 클러스터의 크기를 내림차순으로 정렬하여 백업 대기 큐를 조정할 수 있다. 다시 말해, 백업 매니저 서버는, 샤드 클러스터의 크기가 클수록 해당 샤드 클러스터의 백업 우선 순위를 보다 더 높은 순위로 조정할 수 있다.According to one embodiment, the backup manager server may determine the backup priority based further on the number of shards belonging to the shard cluster (also expressed as the size of the shard cluster) along with the average backup time length. The backup manager server can create a backup queue by sorting the initial backup queue based on the average backup time length and adjusting the initial backup queue based on the number of shards. For example, the backup manager server can create an initial backup queue by sorting shard clusters in descending order based on average backup time length. In other words, the backup manager server can determine the initial backup priority of the shard cluster to be higher as the average backup time length of the shard cluster is longer. The backup manager server sorts the two or more adjacently aligned shard clusters in descending order of shard cluster size, based on the difference between the average backup time lengths of the two or more adjacently aligned shard clusters in the initial backup standby queue being less than or equal to the threshold time length. You can create a backup queue by adjusting the sorted backup queue. For example, the backup manager server may adjust an initial backup standby queue that includes a first shard cluster and a second shard cluster where the difference in average backup time length is less than or equal to a threshold time length. The backup manager server can adjust the backup standby queue by sorting the size of the first shard cluster and the size of the second shard cluster in descending order. In other words, the backup manager server can adjust the backup priority of the shard cluster to a higher priority as the size of the shard cluster becomes larger.

일 실시예에 따르면, 백업 매니저 서버는 백업 대기 큐와 함께 백업 실행 큐, 백업 실패 큐, 또는 백업 완료 큐 중 하나 또는 둘 이상의 조합을 더 생성 및/또는 관리할 수 있다. 백업 대기 큐는, 전술한 바와 같이, 백업 시작되기 전의 샤드 클러스터(또는 샤드 클러스터의 정보)를 포함하고, 백업 우선 순위에 따라 정렬된 상태로 관리될 수 있다. According to one embodiment, the backup manager server may further create and/or manage one or a combination of two or more of a backup execution queue, a backup failure queue, or a backup completion queue along with the backup standby queue. As described above, the backup standby queue includes shard clusters (or shard cluster information) before the backup starts, and can be managed in a sorted state according to backup priority.

백업 실행 큐는, 백업 수행 중의 샤드 클러스터(또는 샤드 클러스터의 정보), 다시 말해, 백업이 시작되었으나 아직 종료되지 않은 샤드 클러스터(또는 샤드 클러스터의 정보)를 포함할 수 있다. 예시적으로, 백업 실행 큐는 샤드 클러스터의 백업 시작 시각에 따라 정렬된 상태로 관리될 수 있다. The backup execution queue may include a shard cluster (or information on a shard cluster) that is performing a backup, that is, a shard cluster (or information on a shard cluster) where a backup has started but has not yet ended. As an example, the backup execution queue can be managed in a sorted state according to the backup start time of the shard cluster.

백업 완료 큐는, 백업 입력에 따라 백업 완료된 샤드 클러스터(또는 샤드 클러스터의 정보), 다시 말해, 백업 입력에 따라 샤드 클러스터에 속하는 샤드 모두의 백업이 완료된 샤드 클러스터(또는 샤드 클러스터의 정보)를 포함할 수 있다. 예시적으로, 백업 완료 큐는 샤드 클러스터의 백업 완료 시각에 따라 정렬된 상태로 관리될 수 있다.The backup completion queue may include a shard cluster (or information on a shard cluster) for which backup has been completed according to the backup input, that is, a shard cluster (or information on the shard cluster) for which the backup of all shards belonging to the shard cluster has been completed according to the backup input. You can. As an example, the backup completion queue may be managed in a sorted state according to the backup completion time of the shard cluster.

백업 실패 큐는, 백업 매니저 서버에 의하여 백업을 시도하였으나 백업이 실패한 샤드 클러스터(또는 샤드 클러스터의 정보)를 포함할 수 있다. 다시 말해, 백업 매니저 서버는 대상 샤드 클러스터가 백업 가능한 것으로 결정하였음에도 불구하고, 백업 가능 시간대 조건 및 가용 백업 서버 개수 이외의 다른 오류에 인하여 백업에 실패한 경우, 해당 샤드 클러스터를 백업 실패 큐에 삽입할 수 있다. 참고로, 백업 매니저는, 가용 백업 서버 개수 또는 백업 가능 시간대 조건에 따라 샤드 클러스터가 백업 가능하지 않은 것으로 결정하는 경우에는, 해당 샤드 클러스터의 백업 우선 순위를 변경하여 다시 백업 대기 큐에 삽입하여 이후에 다시 백업 가능 여부를 판단하고, 해당 샤드 클러스터를 백업 실패 큐에 삽입하지 않을 수 있다.The backup failure queue may include shard clusters (or shard cluster information) for which backup was attempted by the backup manager server but the backup failed. In other words, even though the backup manager server determines that the target shard cluster is available for backup, if the backup fails due to errors other than the backup time zone conditions and the number of available backup servers, the shard cluster can be inserted into the backup failure queue. there is. For reference, if the backup manager determines that a shard cluster is not available for backup based on the number of available backup servers or backup time zone conditions, it changes the backup priority of the shard cluster and inserts it back into the backup waiting queue for later use. You can determine whether backup is possible again and not insert the corresponding shard cluster into the backup failure queue.

일 실시예에 따른 백업 매니저 서버의 백업 대기 큐, 백업 실행 큐, 백업 완료 큐, 및 백업 실패 큐 관리 동작은 관계된 단계에서 후술한다.Management operations of the backup standby queue, backup execution queue, backup completion queue, and backup failure queue of the backup manager server according to one embodiment will be described later in related steps.

단계(320)에서, 백업 매니저 서버는 백업 입력을 수신하는 것에 기초하여, 백업 대기 큐로부터 대상 샤드 클러스터(target sharded cluster)의 정보를 획득할 수 있다. 대상 샤드 클러스터는, 백업 입력이 지시하는 하나 이상의 샤드 클러스터 중에서, 백업 매니저 서버가 백업을 명령할 샤드 클러스터를 의미할 수 있다. In step 320, the backup manager server may obtain information about the target sharded cluster from the backup standby queue based on receiving the backup input. The target shard cluster may refer to a shard cluster for which the backup manager server will command backup, among one or more shard clusters indicated by the backup input.

백업 입력은, 분산 데이터베이스의 적어도 일부(예: 하나 이상의 샤드 클러스터)에 대한 백업 동작을 수행할 것을 지시하는 입력을 포함할 수 있다. 백업 입력은 입력 모듈(예: 도 1의 입력 모듈(110))로부터 백업 및 복구를 위한 시스템(예: 도 1의 백업 및 복구를 위한 시스템(120))으로 전송될 수 있다. 백업 및 복구를 위한 시스템은, 입력 모듈로부터 백업 입력을 수신하는 것에 기초하여, 백업 입력을 백업 매니저 서버(예: 도 1의 백업 매니저 서버(121))로 전달할 수 있다. The backup input may include an input instructing to perform a backup operation on at least a portion of the distributed database (eg, one or more shard clusters). The backup input may be transmitted from an input module (e.g., input module 110 in FIG. 1) to a system for backup and recovery (e.g., system for backup and recovery 120 in FIG. 1). The system for backup and recovery may transmit the backup input to the backup manager server (eg, the backup manager server 121 in FIG. 1) based on receiving the backup input from the input module.

일 실시예에 따르면, 백업 입력은 백업의 대상으로서 분산 데이터베이스의 적어도 일부를 지시하고, 백업 및 복구를 위한 시스템은 상기 백업 입력을 수신하는 것에 기초하여 백업 입력이 지시하는 분산 데이터베이스의 적어도 일부에게 백업을 명령할 수 있다. 일 실시예에 따르면, 백업 입력은 백업의 대상을 특정하지 않을 수 있고, 백업 및 복구를 위한 시스템은 백업의 대상을 특정하지 않는 백업 입력을 수신하는 것에 기초하여 분산 데이터베이스의 전부에 대한 백업을 명령할 수 있다. According to one embodiment, the backup input indicates at least a portion of the distributed database as the target of backup, and the system for backup and recovery performs backup to at least a portion of the distributed database indicated by the backup input based on receiving the backup input. can be commanded. According to one embodiment, the backup input may not specify the target of the backup, and the system for backup and recovery orders a backup of the entire distributed database based on receiving the backup input that does not specify the target of the backup. can do.

단계(330)에서, 백업 매니저 서버는 대상 샤드 클러스터에 속하는 하나 이상의 샤드의 백업 가능 여부에 기초하여, 대상 샤드 클러스터의 백업 가능 여부를 결정할 수 있다. 백업 매니저 서버는 대상 샤드 클러스터에 속하는 하나 이상의 샤드가 모두 백업 가능한 것에 기초하여 대상 샤드 클러스터가 백업 가능한 것으로 결정할 수 있다. 다시 말해, 대상 샤드 클러스터는 대상 샤드 클러스터에 속하는 샤드 모두가 백업 가능한 경우에 백업 가능한 것으로 결정될 수 있다. In step 330, the backup manager server may determine whether the target shard cluster can be backed up based on whether one or more shards belonging to the target shard cluster can be backed up. The backup manager server may determine that the target shard cluster is capable of being backed up based on all of one or more shards belonging to the target shard cluster being capable of being backed up. In other words, the target shard cluster may be determined to be capable of being backed up if all shards belonging to the target shard cluster are capable of being backed up.

일 실시예에 따르면, 백업 매니저 서버는 샤드의 백업 가능 여부를 샤드의 세컨더리 및 해당 세컨더리에 매핑된 가용 백업 서버에 기초하여 판단할 수 있다. 대상 샤드 클러스터의 백업 가능 여부 결정 동작은 도 4에서 후술한다.According to one embodiment, the backup manager server may determine whether a shard can be backed up based on the shard's secondary and the available backup servers mapped to the secondary. The operation of determining whether a target shard cluster can be backed up is described later in FIG. 4.

주로 백업 매니저 서버가 샤드의 백업 가능 여부에 기초하여 대상 샤드 클러스터의 백업 가능 여부를 결정하는 것에 대하여 설명하였으나, 이에 한정하는 것은 아니다. 예를 들어, 백업 매니저 서버는 샤드의 백업 가능 여부 및/또는 백업 가능 시간대에 기초하여 대상 샤드 클러스터의 백업 가능 여부를 결정할 수도 있다.Mainly, it has been explained that the backup manager server determines whether a target shard cluster can be backed up based on whether the shard can be backed up, but it is not limited to this. For example, the backup manager server may determine whether the target shard cluster can be backed up based on whether the shard can be backed up and/or the backup time zone.

일 실시예에 따르면, 백업 매니저 서버는 대상 샤드 클러스터에 대응하는 백업 가능 시간대에 기초하여 대상 샤드 클러스터의 백업 가능 여부를 결정할 수 있다. 예를 들어, 백업 매니저 서버는, 각 샤드 클러스터에 대응하는 백업 가능 시간대를 획득할 수 있다. 백업 가능 시간대는, 예시적으로, 각 샤드 클러스터에 대하여 미리 결정된 시간대로서, 해당 샤드 클러스터의 백업을 수행 가능한 시간대를 의미할 수 있다. 백업 매니저 서버는, 대상 샤드 클러스터의 백업 가능 시간대에 대상 샤드 클러스터의 백업 가능 여부를 결정할 때의 시각이 포함된 경우 대상 샤드 클러스터가 백업 가능한 것으로 결정할 수 있다. 백업 매니저 서버는, 대상 샤드 클러스터의 백업 가능 시간대에 대상 샤드 클러스터의 백업 가능 여부를 결정할 때의 시각이 포함되지 않은 경우 대상 샤드 클러스터가 백업 가능하지 않은 것으로 결정할 수 있다.According to one embodiment, the backup manager server may determine whether backup of the target shard cluster is possible based on the backup available time zone corresponding to the target shard cluster. For example, the backup manager server can obtain the backup available time zone corresponding to each shard cluster. The backup time zone is, by way of example, a predetermined time zone for each shard cluster and may mean a time zone during which backup of the corresponding shard cluster can be performed. The backup manager server may determine that the target shard cluster is available for backup if the backup time zone of the target shard cluster includes the time when determining whether or not the target shard cluster can be backed up. The backup manager server may determine that the target shard cluster is not available for backup if the time for determining whether or not the target shard cluster can be backed up is not included in the backup time zone of the target shard cluster.

일 실시예에 따르면, 백업 매니저 서버는 샤드의 백업 가능 여부 및 백업 가능 시간대에 기초하여 대상 샤드 클러스터의 백업 가능 여부를 결정할 수 있다. 예를 들어, 백업 매니저 서버는, 대상 샤드 클러스터에 속하는 샤드가 모두 백업 가능하고 대상 샤드 클러스터의 백업 가능 여부를 결정하는 시각이 대상 샤드 클러스터에 대응하는 백업 가능 시간대에 포함되는 것에 기초하여, 대상 샤드 클러스터가 백업 가능한 것으로 결정할 수 있다. 백업 매니저 서버는 대상 샤드 클러스터에 속하는 샤드 중 적어도 하나가 백업 가능하지 않은 것에 기초하여 대상 샤드 클러스터가 백업 가능하지 않은 것으로 결정할 수 있다. 백업 매니저 서버는, 대상 샤드 클러스터의 백업 가능 시간대에 대상 샤드 클러스터의 백업 가능 여부를 결정할 때의 시각이 포함되지 않은 경우 대상 샤드 클러스터가 백업 가능하지 않은 것으로 결정할 수 있다.According to one embodiment, the backup manager server may determine whether the target shard cluster can be backed up based on whether the shard can be backed up and the backup time zone. For example, the backup manager server configures the target shard cluster based on the fact that all shards belonging to the target shard cluster are available for backup and that the time for determining whether or not the target shard cluster can be backed up is included in the backup available time zone corresponding to the target shard cluster. It can be determined that the cluster is capable of being backed up. The backup manager server may determine that the target shard cluster is not capable of being backed up based on at least one of the shards belonging to the target shard cluster being not capable of being backed up. The backup manager server may determine that the target shard cluster is not available for backup if the time for determining whether or not the target shard cluster can be backed up is not included in the backup time zone of the target shard cluster.

단계(340)에서, 백업 매니저 서버는 대상 샤드 클러스터가 백업 가능한 것으로 결정되는 것에 기초하여, 대상 샤드 클러스터에 속하는 하나 이상의 샤드에게 백업을 명령할 수 있다. 백업 매니저 서버는, 스냅샷 백업(snapshot backup)에 선행하여 연산 로그 백업(operation log backup)을 수행함으로써 백업을 수행하도록 명령할 수 있다. In step 340, the backup manager server may order a backup of one or more shards belonging to the target shard cluster based on the target shard cluster being determined to be capable of backup. The backup manager server may command to perform a backup by performing an operation log backup prior to a snapshot backup.

연산 로그 백업은, 프라이머리에 대한 연산이 기록된 연산 로그를 저장하기 위한 백업을 의미할 수 있다. 도 2에서 전술한 바와 같이, 연산 로그는 샤드의 프라이머리 및 세컨더리간의 데이터 복제(data replication)을 위한 로그를 포함할 수 있고, 프라이머리의 데이터는 연산 로그가 세컨더리의 데이터에 적용됨으로써 획득될 수 있다. 또한, 샤드 클러스터의 샤드는 연산 로그의 연산 시각을 통해 샤드 클러스터의 논리적 클럭으로 동기화될 수 있다.Operation log backup may refer to a backup for storing an operation log in which operations on the primary are recorded. As described above in FIG. 2, the operation log may include a log for data replication between the primary and secondary of the shard, and the primary data can be obtained by applying the operation log to the secondary data. there is. Additionally, shards in a shard cluster can be synchronized to the logical clock of the shard cluster through the operation time of the computation log.

스냅샷 백업은, 분산 데이터베이스의 적어도 일부(예: 하나 이상의 샤드 클러스터)에 대한 전체적인 이미지(image)를 스냅샷으로 저장하기 위한 백업을 의미할 수 있다. 스냅샷은, 특정 시점에서의 데이터베이스(예: 샤드 클러스터의 적어도 일부)의 상태를 보존하고 이후에 스냅샷에 기초하여 데이터베이스를 복구하기 위하여 고안된 개념으로서, 안정성을 중요시하는 환경에서 사용될 수 있다. 예시적으로, 스냅샷 백업은 LVM(logical volume manager)과 같은 볼륨 관리자에서 자체적으로 제공되는 스냅샷을 통해 구현될 수 있다.Snapshot backup may refer to a backup for storing the entire image of at least part of a distributed database (e.g., one or more shard clusters) as a snapshot. A snapshot is a concept designed to preserve the state of a database (e.g., at least part of a shard cluster) at a specific point in time and later restore the database based on the snapshot, and can be used in environments where stability is important. As an example, snapshot backup can be implemented through snapshots provided by a volume manager such as a logical volume manager (LVM).

스냅샷 백업은 스냅샷 백업이 수행된 시점에서의 샤드, 특히, 샤드에서 세컨더리의 특정 시점에 대한 데이터를 저장하는 것이고, 샤드 클러스터 간의 논리적 클럭으로의 동기화 및/또는 프라이머리의 데이터 복구를 위하여 연산 로그 백업이 추가적으로 필요할 수 있다. 연산 로그 백업 및 스냅샷 백업을 포함하는 백업 명령 동작은 도 5에서 후술한다.Snapshot backup stores data at a specific point in time in the shard, especially the secondary in the shard, at the time the snapshot backup was performed, and performs calculations for synchronization with logical clocks between shard clusters and/or data recovery in the primary. Additional log backups may be required. Backup command operations including operation log backup and snapshot backup will be described later with reference to FIG. 5 .

도 5에서 후술하겠으나, 샤드는 백업 매니저 서버로부터 백업을 명령받는 것에 기초하여, 백업 데이터를 데이터 센터의 백업 서버에게 전송함으로써 백업을 수행할 수 있다. 백업 데이터는 연산 로그 또는 스냅샷 중 하나 또는 둘 이상의 조합을 포함할 수 있다.As will be described later in FIG. 5, the shard can perform backup by transmitting backup data to the backup server in the data center based on receiving a backup command from the backup manager server. Backup data may include one or a combination of two or more of operation logs or snapshots.

도 2에서 전술한 바와 같이, 샤드는 라우터로부터 연산 명령을 수신하면서 논리적 클럭으로 동기화될 수 있다. 백업 매니저 서버는 연산 로그 백업을 먼저 수행하고 그 이후에 스냅샷 백업을 수행함으로써, 샤드 클러스터의 샤드들은 연산 명령을 수신함에 따라 논리적 클럭이 동기화된 상태에서 스냅샷 백업을 수행하도록 명령할 수 있다.As described above in FIG. 2, shards can be synchronized to a logical clock while receiving operation instructions from the router. By performing operation log backup first and then snapshot backup, the backup manager server can command shards in a shard cluster to perform snapshot backup with their logical clocks synchronized as they receive operation commands.

일 실시예에 따른 백업 매니저 서버는, 대상 샤드 클러스터에게 백업을 명령하는 것에 기초하여, 백업 대기 큐 또는 백업 실행 큐 중 하나 또는 둘 이상의 조합을 업데이트할 수 있다. 예를 들어, 백업 매니저 서버는, 대상 샤드 클러스터에게 백업을 명령하는 경우, 백업 대기 큐로부터 대상 샤드 클러스터(또는 대상 샤드 클러스터의 정보)를 삭제할 수 있다. 또한, 백업 매니저 서버는 대상 샤드 클러스터에게 백업을 명령하는 경우, 백업 실행 큐에 대상 샤드 클러스터(또는 대상 샤드 클러스터의 정보)를 삽입함으로써 백업을 수행 중인 샤드 클러스터에 대한 큐를 백업 실행 큐를 통해 관리할 수 있다.The backup manager server according to one embodiment may update one or a combination of two or more of the backup standby queue or backup execution queue based on a backup command to the target shard cluster. For example, when commanding a backup to a target shard cluster, the backup manager server may delete the target shard cluster (or information on the target shard cluster) from the backup waiting queue. Additionally, when commanding a backup to a target shard cluster, the backup manager server manages the queue for the shard cluster performing backup through the backup execution queue by inserting the target shard cluster (or information on the target shard cluster) into the backup execution queue. can do.

단계(350)에서, 백업 매니저 서버는 대상 샤드 클러스터가 백업 가능하지 않은 것으로 결정되는 것에 기초하여, 백업 대기 큐를 업데이트할 수 있다. 백업 매니저 서버는 백업 가능하지 않은 대상 샤드 클러스터의 백업 우선 순위를 변경하고 변경된 백업 우선 순위에 기초하여 백업 대기 큐를 정렬할 수 있다. 예를 들어, 백업 매니저 서버는 대상 샤드 클러스터가 백업 가능하지 않은 것으로 결정되는 것에 기초하여, 대상 샤드 클러스터의 백업 우선 순위를 백업 대기 큐의 샤드 클러스터 중 가장 낮은 순위로 변경하여 대상 샤드 클러스터를 백업 대기 큐의 마지막 위치로 삽입할 수 있다.At step 350, the backup manager server may update the backup standby queue based on the target shard cluster being determined to be unbackupable. The backup manager server can change the backup priority of the target shard cluster that is not available for backup and sort the backup waiting queue based on the changed backup priority. For example, based on determining that the target shard cluster is not available for backup, the backup manager server changes the backup priority of the target shard cluster to the lowest priority among the shard clusters in the backup standby queue, making the target shard cluster standby for backup. It can be inserted at the last position of the queue.

단계(360)에서, 백업 매니저 서버는 대상 샤드 클러스터에 속하는 적어도 하나의 샤드의 백업이 종료된 것에 기초하여, 새로운 대상 샤드 클러스터의 백업 가능 여부를 결정할 수 있다. 도 4에서 후술하겠으나, 각 백업 서버는 샤드의 백업을 수행하므로, 하나의 샤드의 백업이 종료된 경우 종료된 샤드의 백업을 수행했던 백업 서버는 가용 백업 서버로 취급될 수 있다. 다시 말해, 샤드 클러스터에 속하는 샤드 전부의 백업이 종료되기 전이라도, 하나의 샤드의 백업이 종료된 경우에는 가용 백업 서버의 개수가 증가할 수 있다. 따라서, 백업 매니저 서버는 새로운 대상 샤드 클러스터의 백업 가능 여부는 샤드 클러스터의 백업이 종료되지 않더라도, 하나의 샤드의 백업이 종료되는 것에 기초하여 백업 대기 큐로부터 새로운 대상 샤드 클러스터의 정보를 획득할 수 있다. 백업 매니저 서버는 새로운 대상 샤드 클러스터의 백업 가능 여부를 판단할 수 있다. In step 360, the backup manager server may determine whether a new target shard cluster can be backed up based on the completion of the backup of at least one shard belonging to the target shard cluster. As will be described later in FIG. 4, each backup server performs backup of a shard, so when the backup of one shard is terminated, the backup server that performed the backup of the terminated shard can be treated as an available backup server. In other words, even before the backup of all shards belonging to a shard cluster is terminated, if the backup of one shard is terminated, the number of available backup servers may increase. Therefore, the backup manager server can obtain information about the new target shard cluster from the backup standby queue based on the end of the backup of one shard, even if the backup of the shard cluster is not terminated. . The backup manager server can determine whether a new target shard cluster can be backed up.

일 실시예에 따르면, 도 3에서 명시적으로 도시되지 않았으나, 백업 매니저 서버는 대상 샤드 클러스터에 속하는 샤드 모두의 백업이 종료된 것에 기초하여, 백업 실행 큐 또는 백업 완료 큐 중 하나 또는 둘 이상의 조합을 업데이트할 수 있다. 예를 들어, 백업 매니저 서버는 백업 실행 큐로부터 대상 샤드 클러스터의 정보를 삭제할 수 있다. 백업 매니저 서버는 대상 샤드 클러스터(또는 대상 샤드 클러스터의 정보)를 백업 완료 큐에 삽입할 수 있다.According to one embodiment, although not explicitly shown in FIG. 3, the backup manager server creates one or a combination of two or more of the backup execution queue or the backup completion queue based on the completion of the backup of all shards belonging to the target shard cluster. It can be updated. For example, the backup manager server can delete information about the target shard cluster from the backup execution queue. The backup manager server can insert the target shard cluster (or information on the target shard cluster) into the backup completion queue.

도 4는 일 실시예에 따른 대상 샤드 클러스터의 백업 가능 여부를 결정하는 동작을 설명하기 위한 도면이다.FIG. 4 is a diagram illustrating an operation of determining whether a target shard cluster can be backed up according to an embodiment.

단계(410)에서, 백업 매니저 서버는 샤드의 하나 이상의 세컨더리 중에서, 백업을 위해 이용될 세컨더리를 선택할 수 있다. 예를 들어, 백업 매니저 서버는 분산 데이터베이스에 백업 동작이 미치는 영향을 고려하여 백업을 위해 이용될 세컨더리를 선택할 수 있다. 백업 매니저 서버는 세컨더리의 물리적 위치에 기초하여 백업을 위해 이용될 세컨더리를 선택할 수도 있다. 세컨더리 및 스토리지 간의 거리가 짧을수록 데이터 센터 간의 트래픽이 적게 발생할 수 있다. 예시적으로, 데이터 센터의 백업 서버 및/또는 스토리지(예: 도 1의 스토리지(140))가 A 지역에 존재하고, 제1 세컨더리는 A 지역에 존재하고 제2 세컨더리는 A 지역과 다른 B 지역에 존재하는 경우, 백업 매니저 서버는 백업 서버 및/또는 스토리지와 같은 지역(예: A 지역)에 존재하는 제1 세컨더리를 백업을 위해 이용될 세컨더리로 선택할 수 있다.At step 410, the backup manager server may select a secondary to be used for backup, from among one or more secondary of the shard. For example, the backup manager server can select the secondary to be used for backup by considering the impact of the backup operation on the distributed database. The backup manager server may select the secondary to be used for backup based on the physical location of the secondary. The shorter the distance between secondary and storage, the less traffic can occur between data centers. Illustratively, the data center's backup servers and/or storage (e.g., storage 140 in FIG. 1) exist in region A, the first secondary resides in region A, and the second secondary resides in region B, which is different from region A. If present, the backup manager server may select the first secondary that exists in the same area as the backup server and/or storage (e.g., area A) as the secondary to be used for backup.

예를 들어, 대상 샤드 클러스터가 3개의 샤드들(예: 제1 샤드, 제2 샤드, 및 제3 샤드)을 가질 수 있다. 각 샤드는 1개의 프라이머리 및 2개의 세컨더리들을 포함할 수 있다. 백업 매니저 서버는 A 지역에 적어도 하나의 세컨더리가 위치하는 경우 A 지역의 세컨더리를 선택하고, A 지역에 세컨더리가 위치하지 않는 경우 A 지역으로부터 가장 가까운 세컨더리를 선택할 수 있다. 예시적으로, 백업 서비스 매니저는 제1 샤드에 관하여 A 지역의 세컨더리, 제2 샤드에 관하여 B 지역의 세컨더리, 및 제3 샤드에 관하여 A 지역의 세컨더리를 백업에 이용될 세컨더리로 선택할 수 있다.For example, a target shard cluster may have three shards (eg, a first shard, a second shard, and a third shard). Each shard can include one primary and two secondary. If at least one secondary is located in area A, the backup manager server may select the secondary in area A, and if no secondary is located in area A, the backup manager server may select the secondary closest to area A. Illustratively, the backup service manager may select the secondary in region A for the first shard, the secondary in region B for the second shard, and the secondary in region A for the third shard as the secondary to be used for backup.

단계(420)에서, 백업 매니저 서버는 가용 백업 서버 개수에 기초하여, 샤드의 백업 가능 여부를 결정할 수 있다. 백업 매니저 서버는, 세컨더리에 매핑된 데이터 센터의 가용 백업 서버 개수에 기초하여, 해당 샤드의 백업 가능 여부를 결정할 수 있다. 일 실시예에 따르면, 각 세컨더리는 지리적 위치에 기초하여 데이터 센터와 매핑될 수 있다. 예시적으로, 세컨더리는 가장 인접한 위치에 존재하는 데이터 센터에 매핑될 수 있다.In step 420, the backup manager server may determine whether a shard can be backed up based on the number of available backup servers. The backup manager server can determine whether backup of the corresponding shard is possible based on the number of available backup servers in the data center mapped to the secondary. According to one embodiment, each secondary may be mapped to a data center based on geographic location. Illustratively, the secondary may be mapped to the data center that exists in the closest location.

예를 들어, 백업 서비스 매니저는 제1 샤드에 관하여 A 지역의 세컨더리, 제2 샤드에 관하여 B 지역의 세컨더리, 및 제3 샤드에 관하여 A 지역의 세컨더리를 백업에 이용될 세컨더리로 선택할 수 있다. A 지역의 데이터 센터의 가용 백업 서버는 1개이고, B 지역의 데이터 센터의 가용 백업 서버는 2개일 수 있다. 백업 서비스 매니저는, A 지역의 데이터 센터의 가용 백업 서버의 개수가 1개이고 A 지역의 백업에 이용될 세컨더리의 개수가 2개인 것에 기초하여, 제1 샤드 및 제3 샤드는 백업 가능하지 않은 것으로 결정할 수 있다. 백업 서비스 매니저는, B 지역의 데이터 센터의 가용 백업 서버의 개수가 2개이고 B 지역의 백업에 이용될 세컨더리의 개수가 1개인 것에 기초하여, 제2 샤드는 백업 가능한 것으로 결정할 수 있다.For example, the backup service manager may select the secondary in region A for the first shard, the secondary in region B for the second shard, and the secondary in region A for the third shard as the secondary to be used for backup. The data center in region A may have one available backup server, and the data center in region B may have two available backup servers. The backup service manager determines that the first and third shards are not available for backup, based on the fact that the number of available backup servers in the data center in region A is 1 and the number of secondary servers to be used for backup in region A is 2. You can. The backup service manager may determine that the second shard is available for backup, based on the fact that the number of available backup servers in the data center in region B is 2 and the number of secondary to be used for backup in region B is 1.

단계(430)에서, 백업 매니저 서버는 대상 샤드 클러스터에 속하는 하나 이상의 샤드가 모두 백업 가능한 것으로 결정되는 것에 기초하여, 샤드 클러스터가 백업 가능한 것으로 결정할 수 있다.In step 430, the backup manager server may determine that the shard cluster is capable of being backed up, based on determining that one or more shards belonging to the target shard cluster are all capable of being backed up.

단계(440)에서, 백업 매니저 서버는 대상 샤드 클러스터에 속하는 적어도 하나의 샤드가 백업 가능하지 않은 것으로 결정되는 것에 기초하여, 상기 대상 샤드 클러스터가 백업 가능하지 않은 것으로 결정할 수 있다.In step 440, the backup manager server may determine that the target shard cluster is not capable of backup based on at least one shard belonging to the target shard cluster being determined to be not capable of backup.

전술한 바와 같이, 샤드 클러스터에 속하는 샤드가 모두 백업 가능한 경우에, 해당 샤드 클러스터는 백업 가능할 수 있다. 다시 말해, 샤드 클러스터에 속하는 샤드 중 적어도 하나라도 백업 가능하지 않은 경우, 해당 샤드 클러스터는 백업 가능하지 않을 수 있다. 백업 매니저 서버는 백업 가능하지 않은 것으로 결정된 샤드 클러스터를 백업 대기 큐로 다시 삽입하여 이후에 백업 가능 여부를 다시 결정할 수 있다.As described above, if all shards belonging to a shard cluster are capable of being backed up, the corresponding shard cluster may be capable of being backed up. In other words, if at least one of the shards belonging to a shard cluster is not capable of being backed up, the corresponding shard cluster may not be capable of being backed up. The backup manager server can reinsert a shard cluster that has been determined as not available for backup into the backup waiting queue and re-determine whether or not it is possible to back up later.

도 5는 일 실시예에 따른 백업 매니저 서버의 백업 명령 동작을 설명하기 위한 도면이다.Figure 5 is a diagram for explaining a backup command operation of a backup manager server according to an embodiment.

단계(510)에서, 백업 매니저 서버는 대상 샤드 클러스터에 속하는 샤드에게 비슷한 시각에 백업을 시작하도록 백업을 명령할 수 있다. 백업 매니저 서버는, 하나 이상의 샤드들의 백업 시작 시각들이 미리 결정된 시간 범위 내에 포함되도록, 백업을 명령할 수 있다. In step 510, the backup manager server may command the shards belonging to the target shard cluster to start backup at a similar time. The backup manager server may command a backup so that the backup start times of one or more shards are within a predetermined time range.

하나 이상의 샤드들의 백업 시작 시각들이 미리 결정된 시간 범위 내에 포함되는 경우, 그렇지 않은 경우보다 더 적은 양의 연산 로그를 샤드 클러스터의 복구를 위하여 필요로 할 수 있다. 다시 말해, 샤드들의 백업 시작 시각들이 비슷한 경우, 샤드들 간의 백업 시작 시각 및 복구 시각 간의 차이 중 최대 차이가 상대적으로 더 작을 수 있고, 짧은 시간 길이 동안 저장된 연산 로그를 통해서도 스냅샷으로부터 복구 시각의 데이터를 복구하는 것이 가능할 수 있다. 복구 시각은 복구 입력에 의하여 지시될 수 있고, 또는 백업 데이터에 기초하여 복구 가능한 가장 마지막 시각일 수도 있다. 반면, 샤드들의 백업 시작 시각들이 미리 결정된 시간 범위 내에 포함되지 않는 경우(또는 샤드들의 백업 시작 시각들이 비교적 큰 차이를 가지는 경우), 스냅샷으로부터 복구 시각의 데이터까지의 복구를 위하여 보다 더 많은 양의 연산 로그가 요구될 수 있다. 일 실시예에 따른 백업 매니저 서버는 샤드들의 백업 시작 시각들이 미리 결정된 시간 범위 내에 포함되도록 백업을 명령함으로써, 백업 시작 시간들이 미리 결정된 시간 범위 내에 포함되지 않도록 백업을 명령하는 경우보다, 복구 시각으로의 샤드 클러스터 복구를 위하여 더 적은 양의 연산 로그를 백업할 수 있다.If the backup start times of one or more shards fall within a predetermined time range, a smaller amount of operation logs may be required for recovery of the shard cluster than would otherwise be the case. In other words, if the backup start times of the shards are similar, the maximum difference between the backup start time and recovery time between the shards may be relatively smaller, and the data at the recovery time from the snapshot can be obtained even through the operation log stored for a short period of time. It may be possible to recover. The recovery time may be indicated by a recovery input, or may be the latest time at which recovery is possible based on backup data. On the other hand, if the backup start times of the shards do not fall within the predetermined time range (or if the backup start times of the shards have a relatively large difference), a larger amount is needed to restore data from the snapshot to the recovery time. Computational logs may be required. The backup manager server according to one embodiment orders the backup so that the backup start times of the shards are within a predetermined time range, so that the backup start time of the shards is not within the predetermined time range, so that the backup manager server orders the backup so that the backup start times of the shards are not within the predetermined time range. For shard cluster recovery, a smaller amount of operation logs can be backed up.

단계(520)에서, 백업 매니저 서버는 대상 샤드 클러스터에 속하는 하나 이상의 샤드에게 연산 로그의 백업을 명령할 수 있다. 도 3에서 전술한 바와 같이, 백업 매니저 서버는 스냅샷 백업에 선행하여 연산 로그 백업을 수행하도록 명령할 수 있다. 도 2에서 전술한 바와 같이, 연산 로그는 샤드의 프라이머리 및 세컨더리 간의 데이터 복제(data replication)을 위한 로그로서, 논리적 클럭을 따르는 연산 시각을 가질 수 있다. In step 520, the backup manager server may command a backup of the operation log to one or more shards belonging to the target shard cluster. As described above in FIG. 3, the backup manager server may command to perform operation log backup prior to snapshot backup. As described above in FIG. 2, the operation log is a log for data replication between the primary and secondary of the shard, and may have an operation time that follows the logical clock.

일 실시예에 따르면, 백업 매니저 서버는, 대상 샤드 클러스터에 속하는 하나 이상의 샤드에게 연산 로그의 백업을 명령할 수 있다. 샤드는 백업 매니저 서버로부터 연산 로그의 백업을 명령받은 것에 기초하여, 데이터 센터의 백업 서버(예: 백업에 이용되는 것으로 선택된 세컨더리에 매핑된 데이터 센터의 백업 서버)에게 연산 로그를 전송함으로써 연산 로그를 백업할 수 있다. 백업 서버는 샤드로부터 수신된 연산 로그를 스토리지에 저장할 수 있다. 참고로, 샤드는 직접 데이터 센터의 백업 서버에게 연산 로그를 전송하고, 연산 로그 백업 동작은 샤드와 데이터 센터의 백업 서버 사이의 에이전트(agent)를 거치지 않을 수 있다.According to one embodiment, the backup manager server may command a backup of the operation log to one or more shards belonging to the target shard cluster. The shard stores the operation logs by sending the operation logs to the backup server in the data center (e.g., the backup server in the data center mapped to the secondary selected to be used for backup) based on the command to back up the operation logs from the backup manager server. You can back up. The backup server can store the operation log received from the shard in storage. For reference, the shard directly transmits the computation log to the data center's backup server, and the computation log backup operation may not go through an agent between the shard and the data center's backup server.

단계(530)에서, 백업 매니저 서버는 대상 샤드 클러스터에 속하는 하나 이상의 샤드에게 연산 로그 백업의 시작 시각으로부터 미리 결정된 시간 길이만큼의 시간이 도과한 후에, 스냅샷 백업을 명령할 수 있다. 백업 매니저 서버는 각 샤드의 세컨더리에 대한 스냅샷(snapshot)을 이용한 스냅샷 백업을 명령할 수 있다. 샤드는 데이터 센터의 백업 서버(예: 백업에 이용되는 것으로 선택된 세컨더리에 매핑된 데이터 센터의 백업 서버)에게 백업에 이용되도록 선택된 세컨더리에 대한 스냅샷을 전송함으로써 스냅샷 백업을 수행할 수 있다. 백업 서버는 샤드로부터 수신된 스냅샷을 스토리지에 저장할 수 있다. 참고로, 연산 로그 백업 동작과 마찬가지로, 샤드는 직접 데이터 센터의 백업 서버에게 세컨더리에 대한 스냅샷을 전송하고, 스냅샷 백업 동작은 샤드와 데이터 센터의 백업 서버 사이의 에이전트를 거치지 않을 수 있다.In step 530, the backup manager server may command a snapshot backup for one or more shards belonging to the target shard cluster after a predetermined length of time has elapsed from the start time of the operation log backup. The backup manager server can command snapshot backup using snapshots for the secondary of each shard. A shard can perform a snapshot backup by sending a snapshot for the secondary selected to be used for backup to the data center's backup server (e.g., a backup server in the data center mapped to the secondary selected to be used for backup). The backup server can store snapshots received from shards in storage. For reference, like the operation log backup operation, the shard directly transmits the snapshot for the secondary to the data center's backup server, and the snapshot backup operation may not go through an agent between the shard and the data center's backup server.

일 실시예에 따르면, 연산 로그 백업 시작 시각으로부터 스냅샷 백업 시작 시각까지의 시간 길이는 체크 포인트의 주기와 같거나 더 긴 시간 길이로 결정될 수 있다. 체크 포인트에서의 프라이머리의 데이터, 다시 말해, 해당 체크 포인트 이후에 획득된 세컨더리에 대한 스냅샷으로부터, 연산 로그를 이용하여 복구 시각에서의 프라이머리의 데이터가 샤드 클러스터의 동일한 시각까지 복구 가능할 수 있다.According to one embodiment, the length of time from the operation log backup start time to the snapshot backup start time may be determined to be equal to or longer than the checkpoint period. From the primary data at the checkpoint, that is, from the snapshot of the secondary acquired after the checkpoint, the primary data at the recovery time can be recovered up to the same time of the shard cluster using the operation log. .

일 실시예에 따른 백업 매니저 서버는 샤드와 데이터 센터의 백업 서버 사이의 에이전트를 거치지 않고, 샤드로부터 데이터 센터의 백업 서버에게 백업 데이터를 전송함으로써 백업을 수행하므로, 동시에 많은 개수의 백업 서버를 이용하여 백업하더라도 에이전트의 처리량 부족으로 인한 병목 현상이 방지될 수 있다.The backup manager server according to one embodiment performs backup by transmitting backup data from the shard to the backup server in the data center without going through an agent between the shard and the backup server in the data center, so it can use a large number of backup servers at the same time. Even with backup, bottlenecks due to insufficient agent throughput can be prevented.

도 6은 일 실시예에 따른 복수의 샤드 클러스터들을 병렬적으로 백업하는 동작을 설명하기 위한 도면이다.FIG. 6 is a diagram illustrating an operation of backing up a plurality of shard clusters in parallel according to an embodiment.

단계(610)에서, 백업 매니저 서버는 대상 샤드 클러스터가 백업 가능한 것으로 결정되는 것에 기초하여, 다른 샤드 클러스터가 상기 대상 샤드 클러스터와 병렬적으로 백업 가능한지 여부를 결정할 수 있다. 대상 샤드 클러스터와 다른 샤드 클러스터가 병렬적으로 백업 가능한 경우, 대상 샤드 클러스터와 다른 샤드 클러스터를 함께 백업하는 것이 가용 백업 서버의 활용을 최대화하고 백업 시간 길이를 단축할 수 있다.In step 610, the backup manager server may determine whether another shard cluster is capable of being backed up in parallel with the target shard cluster, based on the target shard cluster being determined to be capable of being backed up. If the target shard cluster and other shard clusters can be backed up in parallel, backing up the target shard cluster and other shard clusters together can maximize the utilization of available backup servers and shorten the length of backup time.

일 실시예에 따르면, 백업 매니저 서버는 대상 샤드 클러스터가 백업 가능한 것으로 결정되는 것에 기초하여, 대상 샤드 클러스터의 백업에 이용될 백업 서버에 기초하여 가용 백업 서버 개수를 변경할 수 있다. 예를 들어, 백업 매니저 서버는 대상 샤드 클러스터의 백업에 A 지역의 데이터 센터의 백업 서버 2개가 이용되는 경우, A 지역의 데이터 센터의 가용 백업 서버 개수를 2만큼 감소시킬 수 있다. 그 이후에, 백업 매니저 서버는, 백업 대기 큐로부터 대상 샤드 클러스터와 다른 샤드 클러스터의 정보를 획득할 수 있다. 다시 말해, 백업 매니저 서버는 백업 대기 큐로부터 다른 샤드 클러스터를 지시하는 정보를 획득할 수 있다. 백업 매니저 서버는 변경된 가용 백업 서버 개수에 기초하여, 다른 샤드 클러스터의 백업 가능 여부를 결정할 수 있다. According to one embodiment, the backup manager server may change the number of available backup servers based on the backup server to be used for backup of the target shard cluster, based on the target shard cluster being determined to be capable of backup. For example, if two backup servers from the data center in region A are used to back up the target shard cluster, the backup manager server can reduce the number of available backup servers in the data center in region A by 2. After that, the backup manager server can obtain information about the target shard cluster and other shard clusters from the backup standby queue. In other words, the backup manager server can obtain information indicating another shard cluster from the backup standby queue. The backup manager server can determine whether backup of another shard cluster is possible based on the changed number of available backup servers.

백업 매니저 서버는 도 4에서 전술한 대상 샤드 클러스터의 백업 가능 여부를 결정하는 동작과 유사하게 다른 샤드 클러스터의 백업 가능 여부를 결정할 수 있다. The backup manager server can determine whether another shard cluster can be backed up, similar to the operation of determining whether the target shard cluster can be backed up, as described above in FIG. 4.

단계(620)에서, 백업 매니저 서버는 다른 샤드 클러스터가 대상 샤드 클러스터와 병렬적으로 백업 가능한 것으로 결정되는 것에 기초하여, 대상 샤드 클러스터 및 다른 샤드 클러스터를 병렬적으로 백업하도록 명령할 수 있다. 대상 샤드 클러스터 및 다른 샤드 클러스터는 백업 매니저 서버로부터 병렬적으로 백업하도록 명령받는 것에 기초하여, 대상 샤드 클러스터에 속한 샤드와 다른 샤드 클러스터에 속한 샤드 각각에 할당된 백업 서버에게 백업 데이터를 전송함으로써 백업을 수행할 수 있다. In step 620, the backup manager server may command to back up the target shard cluster and the other shard cluster in parallel, based on determining that the other shard cluster is capable of being backed up in parallel with the target shard cluster. The target shard cluster and other shard clusters perform backups by transmitting backup data to the backup servers assigned to each shard belonging to the target shard cluster and the shards belonging to other shard clusters, based on commands to back up in parallel from the backup manager server. It can be done.

단계(630)에서, 백업 매니저 서버는 다른 샤드 클러스터가 대상 샤드 클러스터와 병렬적으로 백업 가능한 것으로 결정되는 것에 기초하여, 대상 샤드 클러스터의 정보 및 다른 샤드 클러스터의 정보를 백업 대기 큐로부터 삭제할 수 있다.In step 630, the backup manager server may delete the information of the target shard cluster and the information of other shard clusters from the backup standby queue, based on determining that the other shard cluster is capable of being backed up in parallel with the target shard cluster.

일 실시예에 따르면, 백업 매니저 서버가 백업 실행 큐를 관리하는 경우, 대상 샤드 클러스터의 정보 및 다른 샤드 클러스터의 정보를 백업 실행 큐에 삽입할 수 있다.According to one embodiment, when the backup manager server manages the backup execution queue, information on the target shard cluster and information on other shard clusters can be inserted into the backup execution queue.

단계(640)에서, 백업 매니저 서버는 다른 샤드 클러스터가 대상 샤드 클러스터와 병렬적으로 백업 가능하지 않은 것으로 결정되는 것에 기초하여, 대상 샤드 클러스터의 정보를 상기 백업 대기 큐로부터 삭제할 수 있다. 백업 매니저 서버는, 다른 샤드 클러스터의 정보를 백업 대기 큐에 유지할 수 있다. 백업 매니저 서버는 대상 샤드 클러스터에 대한 백업을 명령하고, 이후에 가용 백업 서버의 개수가 증가하여 다른 샤드 클러스터가 백업 가능할 때 다른 샤드 클러스터에 대한 백업을 명령할 수 있다.In step 640, the backup manager server may delete information of the target shard cluster from the backup standby queue based on determining that another shard cluster is not capable of being backed up in parallel with the target shard cluster. The backup manager server can maintain information of other shard clusters in the backup queue. The backup manager server orders backup for the target shard cluster, and later, when the number of available backup servers increases and other shard clusters become available for backup, it can order backup for other shard clusters.

도 7은 일 실시예에 따른 백업 및 복구 시스템의 복구 동작을 설명하기 위한 도면이다.Figure 7 is a diagram for explaining a recovery operation of a backup and recovery system according to an embodiment.

이하, 복구 매니저 서버(예: 도 1의 복구 매니저 서버(122))에 의하여 수행되는 분산 데이터베이스(예: 도 1의 분산 데이터 베이스(130))의 적어도 일부에 대한 복구 명령 동작을 설명한다. 다만, 주로 복구 명령 동작이 백업 명령 동작을 수행하는 백업 매니저 서버와 독립적인 복구 매니저 서버에 의하여 수행되는 것으로 설명되었으나, 이에 한정하는 것은 아니고, 백업 명령 동작(예: 도 3 내지 도 6에서 설명된 백업 명령 동작)을 수행하는 백업 매니저 서버에 의하여 수행될 수 있다. 다시 말해, 복구 매니저 서버는 백업 매니저 서버와 동일한 서버일 수 잇다. 또는, 백업 매니저 서버 및 복구 매니저 서버는 하나의 서버로서 구현되어, 백업 명령 동작 및 복구 명령 동작을 모두 수행할 수도 있다.Hereinafter, a recovery command operation for at least a portion of a distributed database (e.g., distributed database 130 of FIG. 1) performed by a recovery manager server (e.g., recovery manager server 122 of FIG. 1) will be described. However, it has been mainly explained that the recovery command operation is performed by a recovery manager server that is independent of the backup manager server that performs the backup command operation, but it is not limited to this, and the backup command operation (e.g., the backup command operation described in FIGS. 3 to 6 It can be performed by a backup manager server that performs backup command operations). In other words, the recovery manager server can be the same server as the backup manager server. Alternatively, the backup manager server and the recovery manager server may be implemented as one server and perform both backup command operations and recovery command operations.

단계(710)에서, 복구 매니저 서버는 입력 모듈(예: 도 1의 입력 모듈(110))로부터 대상 샤드 클러스터에 대한 복구 입력을 수신하는 것에 기초하여, 대상 샤드 클러스터를 복구하기 위하여 대상 샤드 클러스터의 스냅샷을 대상 샤드 클러스터의 복구 초기 상태로 설정하도록 명령할 수 있다.In step 710, the recovery manager server configures the target shard cluster to recover the target shard cluster based on receiving a recovery input for the target shard cluster from an input module (e.g., input module 110 in FIG. 1). You can command a snapshot to be set to the initial recovery state of the target shard cluster.

복구 입력은, 분산 데이터베이스의 적어도 일부(예: 하나 이상의 샤드 클러스터)에 대한 복구 명령 동작을 수행할 것을 지시하는 명령을 포함할 수 있다. 백업 입력은 입력 모듈(예: 도 1의 입력 모듈(110))로부터 백업 및 복구를 위한 시스템(예: 도 1의 백업 및 복구를 위한 시스템(120))으로 입력될 수 있다. 백업 및 복구를 위한 시스템은, 입력 모듈로부터 복구 명령을 수신하는 것에 기초하여, 복구 입력을 복구 매니저 서버(예: 도 1의 복구 매니저 서버(122))로 전달할 수 있다. The recovery input may include a command instructing to perform a recovery command operation on at least a portion of the distributed database (e.g., one or more shard clusters). The backup input may be input from an input module (e.g., input module 110 of FIG. 1) to a system for backup and recovery (e.g., system for backup and recovery 120 of FIG. 1). The system for backup and recovery may forward the recovery input to a recovery manager server (eg, recovery manager server 122 in FIG. 1) based on receiving a recovery command from the input module.

일 실시예에 따르면, 복구 입력은 복구의 대상으로서 분산 데이터베이스의 적어도 일부를 지시하고, 백업 및 복구를 위한 시스템은 상기 복구 입력을 수신하는 것에 기초하여 복구 입력이 지시하는 분산 데이터베이스의 적어도 일부에 대한 복구를 명령할 수 있다. 일 실시예에 따르면, 복구 입력은 백업의 대상을 특정하지 않을 수 있고, 백업 및 복구를 위한 시스템은 복구의 대상을 특정하지 않는 복구 입력을 수신하는 것에 기초하여 분산 데이터베이스의 전부에 대한 복구를 명령할 수 있다. According to one embodiment, the recovery input indicates at least a portion of the distributed database as a target for recovery, and the system for backup and recovery is configured to restore the at least a portion of the distributed database indicated by the recovery input based on receiving the recovery input. Recovery can be ordered. According to one embodiment, the recovery input may not specify the target of the backup, and the system for backup and recovery orders recovery of the entire distributed database based on receiving the recovery input that does not specify the target of the recovery. can do.

복구 매니저 서버는, 입력 모듈로부터 복구 입력을 수신하는 것에 기초하여, 복구 입력이 지시하는 하나 이상의 샤드 클러스터에게 복구를 명령할 수 있다. 샤드 클러스터는 복구 매니저 서버로부터 복구를 명령받는 것에 기초하여, 스토리지에 저장된 백업 데이터를 백업 서버로부터 수신할 수 있다. 복구 매니저 서버는 샤드 클러스터에 속하는 하나 이상의 샤드에게 백업 데이터의 스냅샷에 기초하여 샤드의 복구 초기 상태를 설정하도록 명령할 수 있다. 복구 매니저 서버는 샤드의 세컨더리에 대한 스냅샷을 통해, 해당 샤드의 전체적인 이미지를 복구할 수 있다. 다만, 대상 샤드 클러스터의 복구 초기 상태는, 연산 로그에 기초하여 조정되기 전에는, 프라이머리의 데이터가 아닌 세컨더리의 데이터를 가질 수 있다. 후술하겠으나, 연산 로그에 기초하여 대상 샤드 클러스터의 복구 초기 상태가 조정됨으로써 프라이머리의 데이터와 같은 데이터를 가지는 대상 샤드 클러스터로 복구될 수 있다.Based on receiving a recovery input from the input module, the recovery manager server may command recovery to one or more shard clusters indicated by the recovery input. The shard cluster can receive backup data stored in storage from the backup server based on a recovery command received from the recovery manager server. The recovery manager server may command one or more shards belonging to the shard cluster to set the initial recovery state of the shard based on a snapshot of the backup data. The recovery manager server can restore the entire image of the shard through a snapshot of the shard's secondary. However, the initial recovery state of the target shard cluster may have secondary data rather than primary data before being adjusted based on the operation log. As will be described later, the initial recovery state of the target shard cluster is adjusted based on the operation log, so that the target shard cluster can be restored with the same data as the primary data.

단계(720)에서, 복구 매니저 서버는 연산 로그에 기초하여, 설정된 대상 샤드 클러스터의 복구 초기 상태를 조정하도록 명령할 수 있다. 복구 매니저는 샤드에게 연산 로그에 기초하여 샤드의 복구 초기 상태를 조정함으로써 샤드의 데이터를 복구하도록 명령할 수 있다. 복구 매니저 서버는 스냅샷에 기초하여 설정된 샤드의 복구 초기 상태를 연산 로그를 이용하여 조정함으로써 스냅샷의 세컨더리의 데이터로부터 프라이머리의 데이터로 대상 샤드 클러스터의 데이터를 복구할 수 있다. 예를 들어, 복구 매니저 서버는 세컨더리에 대한 스냅샷에 적용할 연산에 관한 연산 로그를 선택할 수 있다. 복구 매니저 서버는 선택된 연산 로그에 대응하는 연산을 스냅샷에 적용함으로써, 샤드 클러스터를 동시점으로 복구할 수 있다. 복구 매니저 서버는 연산 로그의 연산 시각 및 스냅샷의 기준 시각을 비교함으로써 스냅샷에 적용할 연산에 관한 연산로그를 선택할 수 있다. 복구 매니저 서버는 샤드의 스냅샷에 선택된 연산 로그를 적용함으로써 프라이머리의 데이터를 가지는 샤드를 복구할 수 있다.In step 720, the recovery manager server may command to adjust the initial recovery state of the set target shard cluster based on the operation log. The recovery manager can command the shard to recover the data of the shard by adjusting the initial recovery state of the shard based on the operation log. The recovery manager server can restore the data of the target shard cluster from the secondary data of the snapshot to the primary data by adjusting the initial recovery state of the shard set based on the snapshot using the operation log. For example, the recovery manager server can select the operation log for the operation to apply to the snapshot for the secondary. The recovery manager server can restore the shard cluster to the same point by applying the operation corresponding to the selected operation log to the snapshot. The recovery manager server can select the operation log for the operation to be applied to the snapshot by comparing the operation time of the operation log and the reference time of the snapshot. The recovery manager server can recover shards containing primary data by applying the selected operation log to the shard's snapshot.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods, and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, and a field programmable gate (FPGA). It may be implemented using a general-purpose computer or a special-purpose computer, such as an array, programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and software applications running on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include multiple processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on a computer-readable recording medium.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있으며 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. A computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination, and the program instructions recorded on the medium may be specially designed and constructed for the embodiment or may be known and available to those skilled in the art of computer software. It may be possible. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

위에서 설명한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 또는 복수의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or multiple software modules to perform the operations of the embodiments, and vice versa.

본 문서에서, "A 또는 B", "A 및 B 중 적어도 하나", "A 또는 B 중 적어도 하나", "A, B 또는 C", "A, B 및 C 중 적어도 하나", 및 "A, B, 또는 C 중 적어도 하나"와 같은 문구들 각각은 그 문구들 중 해당하는 문구에 함께 나열된 항목들 중 어느 하나, 또는 그들의 모든 가능한 조합을 포함할 수 있다.As used herein, “A or B”, “at least one of A and B”, “at least one of A or B”, “A, B or C”, “at least one of A, B and C”, and “A Each of phrases such as “at least one of , B, or C” may include any one of the items listed together in the corresponding phrase, or any possible combination thereof.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 이를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited drawings, those skilled in the art can apply various technical modifications and variations based on this. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims also fall within the scope of the claims described below.

Claims

In the method performed by the backup manager server,
Based on receiving a backup input for one or more sharded clusters, obtaining information of a target sharded cluster from a backup waiting queue sorted according to backup priority;
Determining whether the target shard cluster can be backed up based on whether one or more shards belonging to the target shard cluster can be backed up; and
Based on the target shard cluster being determined to be capable of backup, commanding a backup to one or more shards belonging to the target shard cluster to perform an operation log backup prior to a snapshot backup. step
How to include .

According to paragraph 1,
The step of obtaining information on the target shard cluster is,
Comprising the step of creating the backup standby queue sorted according to a backup priority determined based on one or a combination of two or more of the average backup time length of the shards belonging to the shard cluster or the number of shards belonging to the shard cluster,
method.

According to paragraph 1,
The step of determining whether the target shard cluster can be backed up is:
Selecting, from among the one or more secondary in the shard, a secondary to be used for backup based on the physical location of the secondary,
method.

According to paragraph 1,
The step of determining whether the target shard cluster can be backed up is:
Comprising the step of determining whether backup of the corresponding shard is possible based on the number of available backup servers in the data center mapped to the secondary of the shard belonging to the target shard cluster,
method.

According to paragraph 1,
The step of determining whether the target shard cluster can be backed up is:
determining that the target shard cluster is capable of being backed up, based on determining that one or more shards belonging to the target shard cluster are all capable of being backed up; and
Based on determining that at least one shard belonging to the target shard cluster is not capable of backup, comprising at least one of the steps of determining that the target shard cluster is not capable of backup,
method.

According to paragraph 1,
The step of commanding the backup is,
Including commanding a backup so that the backup start times of the one or more shards are within a predetermined time range,
method.

According to paragraph 1,
The step of commanding the backup is,
Including the step of ordering a backup of the operation log for data replication between the primary and secondary of the shard,
method.

In clause 7,
The step of commanding a backup of the operation log is,
Including commanding a backup of an operation log having an operation time that follows the logical clock of the target shard cluster,
method.

According to paragraph 1,
The step of commanding the backup is,
After a predetermined length of time has elapsed from the start time of the operation log backup, commanding a snapshot backup using a snapshot for the secondary of each shard,
method.

According to paragraph 1,
Based on the target shard cluster being determined to be unavailable for backup, changing the backup priority of the target shard cluster and updating the backup standby queue according to the changed backup priority.
How to further include .

According to paragraph 1,
Determining whether a new target shard cluster can be backed up based on the end of the backup of at least one shard belonging to the target shard cluster.
How to further include .

According to paragraph 1,
Based on the target shard cluster being determined to be capable of backup, determining whether another shard cluster indicated by information obtained from the backup standby queue is capable of being backed up in parallel with the target shard cluster.
How to further include .

According to paragraph 1,
The step of commanding the backup is,
Based on determining that another shard cluster is capable of being backed up in parallel with the target shard cluster, commanding to back up the target shard cluster and the other shard cluster in parallel,
The method is:
Based on determining that another shard cluster is capable of being backed up in parallel with the target shard cluster, deleting information of the target shard cluster and information of the other shard cluster from the backup standby queue.
How to include more.

According to paragraph 1,
Based on it being determined that the other shard cluster is not capable of being backed up in parallel with the target shard cluster, the information of the target shard cluster is deleted from the backup waiting queue and the information of the other shard cluster is placed in the backup waiting queue. steps to maintain
How to further include .

According to paragraph 1,
Based on receiving a recovery input for the target shard cluster, commanding to set a snapshot of the target shard cluster to the recovery initial state of the target shard cluster; and
Commanding to adjust the initial recovery state of the set target shard cluster based on the operation log
How to further include .

According to clause 15,
The step of commanding to adjust the initial recovery state of the target shard cluster is:
Including commanding to restore the data of the shard by adjusting the initial recovery state of the shard set based on the snapshot using the operation log,
method.

In a method performed by a system for backup and recovery of a distributed database,
By the backup manager server, based on receiving backup input for one or more sharded clusters, the target sharded cluster is retrieved from the backup waiting queue, sorted by backup priority. Obtaining information;
determining, by the backup manager server, whether the target shard cluster can be backed up based on whether one or more shards belonging to the target shard cluster can be backed up;
Based on the backup manager server determining that the target shard cluster is capable of backup, operation log backup is performed on one or more shards belonging to the target shard cluster prior to snapshot backup. Commanding the backup to perform;
Commanding, by a recovery manager server, to set a snapshot of the target shard cluster to the recovery initial state of the target shard cluster, based on receiving a recovery input for the target shard cluster; and
Commanding the recovery manager server to adjust the initial recovery state of the set target shard cluster based on the operation log.
How to include .

A computer program combined with hardware and stored in a computer-readable recording medium to execute the method of any one of claims 1 to 17.

On the backup manager server,
Memory storing computer-executable instructions; and
a processor that accesses the memory and executes the instructions;
Including,
The above commands are:
Based on receiving backup input for one or more sharded clusters, obtain information of the target sharded cluster from a backup waiting queue sorted according to backup priority,
Based on whether one or more shards belonging to the target shard cluster can be backed up, determine whether the target shard cluster can be backed up,
Based on the target shard cluster being determined to be capable of backup, a backup command is given to one or more shards belonging to the target shard cluster to perform an operation log backup prior to a snapshot backup.
configured to,
Backup manager server.

In a system for backup and recovery of a distributed database,
A backup manager server including a memory of the backup manager server storing computer-executable instructions, and a processor of the backup manager server that accesses the memory of the backup manager server and executes the instructions; and
A recovery manager server including a memory of the recovery manager server storing computer-executable instructions, and a processor of the recovery manager server that accesses the memory of the recovery manager server and executes the instructions,
The commands stored in the memory of the backup manager server are:
Based on receiving backup input for one or more sharded clusters, obtain information of the target sharded cluster from a backup waiting queue sorted according to backup priority,
Based on whether one or more shards belonging to the target shard cluster can be backed up, determine whether the target shard cluster can be backed up,
Based on the target shard cluster being determined to be capable of backup, a backup command is given to one or more shards belonging to the target shard cluster to perform an operation log backup prior to a snapshot backup.
It is configured to
The commands stored in the memory of the recovery manager server are:
Based on receiving a recovery input for the target shard cluster, command to set a snapshot of the target shard cluster to the recovery initial state of the target shard cluster,
Based on the operation log, command to adjust the recovery initial state of the target shard cluster set above
configured to,
A system for backup and recovery of distributed databases.