CN109189608B

CN109189608B - A method for ensuring the consistency of replicated transactions and a corresponding replication device

Info

Publication number: CN109189608B
Application number: CN201810917819.2A
Authority: CN
Inventors: 付铨; 孙峰; 赵家威; 杨春
Original assignee: Wuhan Dameng Database Co Ltd
Current assignee: Wuhan Dream Database Co ltd
Priority date: 2018-08-13
Filing date: 2018-08-13
Publication date: 2019-07-26
Anticipated expiration: 2038-08-13
Also published as: CN110262929B; CN109189608A; CN110262929A

Abstract

The invention provides a method for ensuring consistency of replication affairs and a corresponding replication device, comprising the following steps: acquiring an initial log serial number of an active transaction which occurs earliest in a source database, and marking the initial log serial number of the active transaction which occurs earliest as a recovery initial point; acquiring initial log serial numbers of two adjacent active transactions of a source database, and marking the initial log serial numbers of the two adjacent active transactions as range check points when the difference value of the initial log serial numbers of the two adjacent active transactions is greater than a preset threshold value; acquiring a source database transaction log, and deleting the transaction log with the sequence number of the commit log not larger than the recovery starting point and the transaction log with the sequence number of the commit log falling into a range check point; data recovery is performed according to a recovery start point, a range checkpoint, and committed transaction records. The invention adopts a range check point mechanism to compress the submitted transaction table, and can effectively reduce the scale of the submitted transaction table.

Description

A method for ensuring the consistency of replicated transactions and a corresponding replication device

【技术领域】【Technical field】

本发明涉及数据库同步技术领域，特别是涉及一种保证复制事务一致性的方法以及相应的复制装置。The invention relates to the technical field of database synchronization, in particular to a method for ensuring the consistency of replicated transactions and a corresponding replication device.

【背景技术】【Background technique】

数据库数据实时同步是提高信息系统可用性，保证业务连续性的一种技术方案。通过数据实时同步，目标数据库和源数据库的业务数据保持实时一致，当源数据库出现故障中断服务后，应用系统可快速切换至目标数据库，保证业务连续性的要求。Real-time synchronization of database data is a technical solution to improve the availability of information systems and ensure business continuity. Through real-time data synchronization, the business data of the target database and the source database are kept consistent in real time. When the source database fails and interrupts the service, the application system can quickly switch to the target database to ensure business continuity.

基于日志分析的数据库数据实时复制技术，具有对源数据库的性能和数据模式影响小、支持异构操作系统和数据库平台、数据复制性能高等特点，在应急灾备、多业务中心、异构资源整合、数据迁移等领域得到广泛应用。这种技术通过源端的日志捕获进程捕获源数据库的在线日志或归档日志，然后分析出数据库的INSERT(插入)、UPDATE(更新)以及DELETE(删除)操作转换为内部特定格式的消息包，再将消息包通过TCP/IP(TransmissionControl Protocol/Internet Protocol，简写TCP/IP)网络发送到复制系统的目标端，目标端接收消息包后，进行拆包处理，将源端的事务信息恢复成相应的SQL(Structured QueryLanguage，简写SQL)语句，通过本地数据库接口在目标数据库执行实时复制，以实现数据库数据同步。The real-time database data replication technology based on log analysis has the characteristics of small impact on the performance and data mode of the source database, support for heterogeneous operating systems and database platforms, and high data replication performance. , data migration and other fields are widely used. This technology captures the online log or archived log of the source database through the log capture process of the source end, and then analyzes the INSERT (insert), UPDATE (update) and DELETE (deletion) operations of the database into message packets in an internal specific format, and then The message packet is sent to the target end of the replication system through the TCP/IP (Transmission Control Protocol/Internet Protocol, abbreviated TCP/IP) network. After the target end receives the message packet, it unpacks and restores the transaction information of the source end to the corresponding SQL ( Structured QueryLanguage, abbreviated SQL) statement, performs real-time replication in the target database through the local database interface to achieve database data synchronization.

为保证源数据库和目标数据库的数据一致性，基于日志分析的数据库数据复制技术通常以源数据库的事务为最小复制单位，严格按照源数据库事务顺序进行实时数据复制，保障目标数据库与源数据库的事务的完整性和一致性，确保目标数据库符合源数据库的事务逻辑。因此，在基于日志分析的数据库数据复制技术中，其技术关键在于如何保障源数据库和目标数据库的事务一致性，特别是在复制系统出现故障后，目标数据库能够按照事务完整性及一致性的要求进行正确的恢复。In order to ensure the data consistency between the source database and the target database, the database data replication technology based on log analysis usually takes the transaction of the source database as the smallest replication unit, and performs real-time data replication in strict accordance with the transaction sequence of the source database to ensure the transaction between the target database and the source database. The integrity and consistency of the target database ensure that the target database conforms to the transaction logic of the source database. Therefore, in the database data replication technology based on log analysis, the technical key lies in how to ensure the transaction consistency of the source database and the target database, especially after the replication system fails, the target database can comply with the requirements of transaction integrity and consistency Do a proper recovery.

目前，为了保证目标数据库与源数据库的复制事务的完整性及一致性，一般采用的方法如下：在目标数据库中创建一个提交事务表记录已完成的提交事务信息，并采用检查点机制对提交事务表进行维护，记录恢复起始点信息。该检查点机制的主要技术特点为，首先取出源数据库中活动事务的最小日志序列号；然后在目标数据库的提交事务表中删除小于等于该最小日志序列号的所有提交事务信息；最后将该最小日志序列号插入到提交事务表中作为一条检查点记录。复制系统出现故障进行恢复时，从检查点记录的最小日志序列号开始恢复，以此保障故障恢复后的事务一致性。At present, in order to ensure the integrity and consistency of the replication transaction between the target database and the source database, the general method is as follows: create a commit transaction table in the target database to record the completed commit transaction information, and use the checkpoint mechanism to check the commit transaction. The table is maintained to record the recovery starting point information. The main technical features of the checkpoint mechanism are: first, take out the minimum log sequence number of active transactions in the source database; then delete all committed transaction information less than or equal to the minimum log sequence number in the commit transaction table of the target database; The log sequence number is inserted into the commit transaction table as a checkpoint record. When the replication system fails to recover, it starts from the minimum log sequence number recorded in the checkpoint to ensure transaction consistency after failure recovery.

但是，上述基于检查点机制保障复制系统故障恢复后的事务一致性的方法存在一定不足：当源数据库中存在长时间未提交的长事务时，在将该长事务的起始日志序列号写入提交事务表后，复制系统在故障发生前可能已经执行了大量的事务操作。因为长事务一直未提交，导致检查点对应的日志序列号不能推进，提交事务表中过期事务信息不能及时删除，造成提交事务表规模过大。在故障恢复时，目标端复制进程需要加载提交事务表信息，用于判断恢复起始点后的事务是否需要重做。提交事务表过大会造成提交事务信息加载耗时过长，增加复制系统恢复时间，同时也增加复制系统内存资源消耗。However, the above method based on the checkpoint mechanism to ensure transaction consistency after failure recovery of the replication system has certain shortcomings: when there is a long transaction that has not been committed for a long time in the source database, the starting log sequence number of the long transaction is written into the source database. After the transaction table is committed, the replication system may have performed a large number of transaction operations before the failure occurred. Because the long transaction has not been submitted, the log sequence number corresponding to the checkpoint cannot be advanced, and the expired transaction information in the submitted transaction table cannot be deleted in time, resulting in an excessively large scale of the submitted transaction table. During failure recovery, the target-side replication process needs to load and submit transaction table information to determine whether the transaction after the recovery starting point needs to be redone. Excessive submission of the transaction table will result in a long time to load the submitted transaction information, increase the recovery time of the replication system, and also increase the memory resource consumption of the replication system.

鉴于此，克服该现有技术所存在的缺陷是本技术领域亟待解决的问题。In view of this, overcoming the defects of the prior art is an urgent problem to be solved in the technical field.

【发明内容】[Content of the invention]

本发明要解决的技术问题是由于长事务的存在导致提交事务表内存储有大量过期事务信息，造成提交事务表内存过大，占用资源。同时，复制系统出现故障进行恢复时，提交事务表过大会造成提交事务信息加载耗时过长，增加复制系统恢复时间，同时也增加复制系统内存资源消耗。The technical problem to be solved by the present invention is that due to the existence of long transactions, a large amount of expired transaction information is stored in the submission transaction table, which causes the memory of the submission transaction table to be too large and occupies resources. At the same time, when the replication system fails to recover, the submission of the transaction table will take too long to load the submitted transaction information, which increases the recovery time of the replication system and also increases the memory resource consumption of the replication system.

本发明实施例采用如下技术方案：The embodiment of the present invention adopts the following technical solutions:

第一方面，本发明提供一种保证复制事务一致性的方法，所述保证复制事务一致性的方法包括：In a first aspect, the present invention provides a method for ensuring the consistency of replicated transactions, and the method for ensuring the consistency of replicated transactions includes:

获取源数据库最早发生的活动事务的起始日志序列号，将所述最早发生的活动事务的起始日志序列号标记为恢复起始点；Obtain the initial log sequence number of the earliest active transaction in the source database, and mark the initial log sequence number of the earliest active transaction as the recovery start point;

获取源数据库相邻两个活动事务的起始日志序列号，并在所述相邻两个活动事务的起始日志序列号的差值大于预设阈值时，将所述相邻两个活动事务的起始日志序列号标记为范围检查点；Obtain the starting log sequence numbers of two adjacent active transactions in the source database, and when the difference between the starting log sequence numbers of the two adjacent active transactions is greater than a preset threshold, assign the two adjacent active transactions The starting log sequence number of marked as a range checkpoint;

获取源数据库事务日志，删除提交日志序列号不大于所述恢复起始点的事务日志以及提交日志序列号落入范围检查点的事务日志以得到已提交事务记录；Obtain the source database transaction log, delete the transaction log whose commit log sequence number is not greater than the recovery starting point and the transaction log whose commit log sequence number falls within the range checkpoint to obtain the committed transaction record;

在故障恢复时，依据所述恢复起始点、所述范围检查点以及所述已提交事务记录进行数据恢复，以保证源数据库和目标数据库的事务一致性。During failure recovery, data recovery is performed according to the recovery starting point, the range checkpoint and the committed transaction record, so as to ensure transaction consistency between the source database and the target database.

优选的，所述获取源数据库最早发生的活动事务的起始日志序列号，将所述最早发生的活动事务的起始日志序列号标记为恢复起始点包括：Preferably, the obtaining the initial log sequence number of the earliest active transaction in the source database, and marking the initial log sequence number of the earliest active transaction as the recovery starting point includes:

采用定时器定时获取源数据库最早发生的活动事务的起始日志序列号；Use a timer to regularly obtain the starting log sequence number of the earliest active transaction in the source database;

判断确定当前时刻源数据库最早发生的活动事务的起始日志序列号相对于上一时刻源数据库最早发生的活动事务的起始日志序列号是否发生变更；Determine whether the start log sequence number of the earliest active transaction in the source database at the current moment has changed relative to the start log sequence number of the earliest active transaction in the source database at the previous moment;

若发生变更，则依据当前时刻源数据库最早发生的活动事务的起始日志序列号重新设置恢复起始点。If a change occurs, the recovery starting point is reset according to the starting log sequence number of the earliest active transaction in the source database at the current moment.

优选的，所述获取源数据库相邻两个活动事务的起始日志序列号，并在所述相邻两个活动事务的起始日志序列号的差值大于预设阈值时，将所述相邻两个活动事务的起始日志序列号标记为范围检查点包括：Preferably, the acquisition of the starting log sequence numbers of two adjacent active transactions in the source database, and when the difference between the starting log sequence numbers of the two adjacent active transactions is greater than a preset threshold, The starting log sequence numbers of adjacent two active transactions marked as range checkpoints include:

依次获取源数据库相邻两个活动事务的起始日志序列号；Obtain the starting log sequence numbers of two adjacent active transactions in the source database in turn;

判断确定所述相邻两个活动事务的起始日志序列号的差值是否大于预设阈值；Determine whether the difference between the initial log sequence numbers of the two adjacent active transactions is greater than a preset threshold;

若所述相邻两个活动事务的起始日志序列号的差值大于预设阈值，则将所述相邻两个活动事务的起始日志序列号标记为一个范围检查点；If the difference between the initial log sequence numbers of the two adjacent active transactions is greater than a preset threshold, marking the initial log sequence numbers of the two adjacent active transactions as a range checkpoint;

将范围检查点以二维数组形式进行存储，其中，二维数组的一行对应其中一个范围检查点。Store range checkpoints as a two-dimensional array, where one row of the two-dimensional array corresponds to one of the range checkpoints.

优选的，在故障恢复时，依据所述恢复起始点、所述范围检查点以及所述已提交事务记录进行数据恢复包括：Preferably, during failure recovery, performing data recovery according to the recovery starting point, the range checkpoint and the committed transaction record includes:

将所述恢复起始点发送至所述源数据库并接收所述源数据库所发送的操作日志，其中，所述操作日志的日志序列号大于所述恢复起始点对应的日志序列号；sending the recovery starting point to the source database and receiving the operation log sent by the source database, wherein the log sequence number of the operation log is greater than the log sequence number corresponding to the recovery starting point;

对所述操作日志进行解析获取所述操作日志对应的事务的提交日志序列号，并判断确定所述提交日志序列号是否落入所述范围检查点内；Analyze the operation log to obtain the commit log sequence number of the transaction corresponding to the operation log, and determine whether the commit log sequence number falls within the range checkpoint;

若所述提交日志序列号落入所述范围检查点内，则所述操作日志对应的事务在故障发生前已经加载至目标数据库，忽略所述操作日志。If the serial number of the commit log falls within the range checkpoint, the transaction corresponding to the operation log has been loaded into the target database before the failure occurs, and the operation log is ignored.

优选的，所述保证复制事务一致性的方法还包括：Preferably, the method for ensuring the consistency of replicated transactions further includes:

若所述提交日志序列号未落入所述范围检查点内，则所述操作日志对应的事务在故障发生前未加载至目标数据库，所述操作日志需要在目标数据库同步。If the serial number of the commit log does not fall within the range checkpoint, the transaction corresponding to the operation log is not loaded into the target database before the failure occurs, and the operation log needs to be synchronized in the target database.

优选的，所述若所述提交日志序列号未落入所述范围检查点内，则所述操作日志对应的事务在故障发生前未加载至目标数据库，所述操作日志需要在目标数据库同步包括：Preferably, if the serial number of the commit log does not fall within the range checkpoint, the transaction corresponding to the operation log is not loaded into the target database before the failure occurs, and the operation log needs to be synchronized in the target database to include :

在接收到所述操作日志对应的事务的提交日志时，将所述操作日志对应的事务日志添加到所述已提交事务记录；When receiving the commit log of the transaction corresponding to the operation log, adding the transaction log corresponding to the operation log to the committed transaction record;

依据所述操作日志对应的事务日志对目标数据库进行数据同步。Data synchronization is performed on the target database according to the transaction log corresponding to the operation log.

对所述操作日志进行解析获取所述操作日志对应的事务标识码；Parsing the operation log to obtain a transaction identification code corresponding to the operation log;

判断确定所述事务标识码是否存在于已提交事务记录内；Judging to determine whether the transaction identification code exists in the submitted transaction record;

若存在，则所述操作日志对应的事务在故障发生前已经加载至目标数据库，忽略所述操作日志；If it exists, the transaction corresponding to the operation log has been loaded into the target database before the failure occurs, and the operation log is ignored;

若不存在，则执行对所述操作日志进行解析获取所述操作日志对应的事务的提交日志序列号，并判断确定所述提交日志序列号是否落入所述范围检查点内。If it does not exist, the operation log is parsed to obtain the commit log sequence number of the transaction corresponding to the operation log, and it is determined whether the commit log sequence number falls within the range checkpoint.

优选的，所述对所述操作日志进行解析获取所述操作日志对应的事务的提交日志序列号，并判断确定所述提交日志序列号是否落入所述范围检查点内包括：Preferably, the analyzing the operation log to obtain the commit log sequence number of the transaction corresponding to the operation log, and judging and determining whether the commit log sequence number falls within the range checkpoint includes:

对所述操作日志进行解析获取所述操作日志对应的事务的提交日志序列号；Analyzing the operation log to obtain the commit log sequence number of the transaction corresponding to the operation log;

采用二分查找算法在包含多个范围检查点的二维数组中进行定位，判断确定所述提交日志序列号是否落入范围检查点内。A binary search algorithm is used to locate in a two-dimensional array including multiple range checkpoints, and it is determined whether the serial number of the commit log falls within the range checkpoint.

优选的，所述获取源数据库最早发生的活动事务的起始日志序列号，将所述最早发生的活动事务的起始日志序列号标记为恢复起始点之前还包括：Preferably, before the acquiring the initial log sequence number of the earliest active transaction in the source database and marking the initial log sequence number of the earliest active transaction as the recovery start point, the method further includes:

创建提交事务表以及未提交事务表，所述提交事务表用于存储恢复起始点、范围检查点以及已提交事务记录，所述未提交事务表用于存储未提交事务的事务日志。A committed transaction table and an uncommitted transaction table are created, where the committed transaction table is used to store recovery starting points, range checkpoints, and committed transaction records, and the uncommitted transaction table is used to store transaction logs of uncommitted transactions.

第二方面，本发明提供一种复制装置，包括至少一个处理器；以及，与所述至少一个处理器通信连接的存储器；其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被程序设置为执行第一方面所述的保证复制事务一致性的方法。In a second aspect, the present invention provides a copying apparatus, comprising at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor , the instructions are set by the program to execute the method for ensuring the consistency of replicated transactions described in the first aspect.

第三方面，本发明还提供了一种非易失性计算机存储介质，所述计算机存储介质存储有计算机可执行指令，该计算机可执行指令被一个或多个处理器执行，用于完成第一方面所述的保证复制事务一致性的方法。In a third aspect, the present invention also provides a non-volatile computer storage medium storing computer-executable instructions, the computer-executable instructions being executed by one or more processors for completing the first The method described in the aspect guarantees the consistency of replicated transactions.

与现有技术相比，本发明实施例的有益效果在于：本发明依据预设的阈值设置范围检查点，并将提交日志序列号落入范围检查点的事务日志删除，采用范围检查点机制标记已提交的事务，即，使用日志序列号范围来表示批量已提交的事务，即使数据库中存在长时间未提交的长事务，也能有效及时的删除已过期的提交事务信息，对提交事务表进行了压缩，可以有效减小提交事务表的规模。同时，在复制系统发生故障还原时，可以基于范围检查点确定相应事务是否已经同步过，能够有效缩短复制系统恢复时间以及内存资源消耗。Compared with the prior art, the beneficial effects of the embodiments of the present invention are: the present invention sets a range checkpoint according to a preset threshold, deletes the transaction log whose commit log sequence number falls within the range checkpoint, and uses the range checkpoint mechanism to mark Submitted transactions, that is, using the log sequence number range to represent batches of submitted transactions, even if there are long uncommitted transactions in the database, the expired submitted transaction information can be deleted effectively and timely, and the submitted transaction table can be processed. For compression, the size of the committed transaction table can be effectively reduced. At the same time, when the replication system fails to restore, it can be determined whether the corresponding transaction has been synchronized based on the range checkpoint, which can effectively shorten the recovery time of the replication system and the consumption of memory resources.

【附图说明】【Description of drawings】

为了更清楚地说明本发明实施例的技术方案，下面将对本发明实施例中所需要使用的附图作简单地介绍。显而易见地，下面所描述的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to describe the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings that need to be used in the embodiments of the present invention. Obviously, the drawings described below are only some embodiments of the present invention, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.

图1是本发明实施例提供的一种复制系统的结构示意图；1 is a schematic structural diagram of a replication system provided by an embodiment of the present invention;

图2是本发明实施例提供的一种保证复制事务一致性的方法的流程示意图；2 is a schematic flowchart of a method for ensuring the consistency of replicated transactions provided by an embodiment of the present invention;

图3a是本发明实施例提供一种提交事务表的数据结构示意图；3a is a schematic diagram of a data structure of a submission transaction table provided by an embodiment of the present invention;

图3b是本发明实施例包含多个范围检查点的二维数组数据结构示意图；3b is a schematic diagram of a two-dimensional array data structure including multiple range checkpoints according to an embodiment of the present invention;

图3c是本发明实施例提供的一种未提交事务表的数据结构示意图；3c is a schematic diagram of a data structure of an uncommitted transaction table provided by an embodiment of the present invention;

图4是图2中步骤204的具体流程示意图；Fig. 4 is the specific flow chart of step 204 in Fig. 2;

图5是本发明实施例提供的一种复制装置的结构示意图。FIG. 5 is a schematic structural diagram of a copying apparatus according to an embodiment of the present invention.

【具体实施方式】【Detailed ways】

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

实施例1: Embodiment 1 :

参阅图1，在本实施例中，复制系统包括源数据库1、复制装置以及目标数据库2，复制装置分别与源数据库1和目标数据库2连接，以通过基于日志分析的数据复制技术保证源数据库1和目标数据库2的事务保持一致，进而实现数据同步。Referring to FIG. 1, in this embodiment, the replication system includes a source database 1, a replication device and a target database 2, and the replication device is respectively connected with the source database 1 and the target database 2, so as to ensure the source database 1 through the data replication technology based on log analysis It is consistent with the transaction of the target database 2, thereby realizing data synchronization.

具体而言，复制装置包括源端复制工具31以及目标端复制工具32。其中，源端复制工具31与源数据库1连接，源端复制工具31通过源复制进程访问源数据库1；目标端复制工具32与目标数据库2连接，目标端复制工具32通过目标复制进程访问目标数据库2；源复制进程和目标复制进程交互，以将源数据库1的数据复制到目标数据库2，从而实现源数据库1和目标数据库2数据同步的功能。Specifically, the replication device includes a source-side replication tool 31 and a target-side replication tool 32 . The source-side replication tool 31 is connected to the source database 1, and the source-side replication tool 31 accesses the source database 1 through the source replication process; the target-side replication tool 32 is connected to the target database 2, and the target-side replication tool 32 accesses the target database through the target replication process 2. The source replication process and the target replication process interact to replicate the data of the source database 1 to the target database 2, thereby realizing the function of data synchronization between the source database 1 and the target database 2.

其中，源数据库1和目标数据库2可以为同构数据库，也可以为异构数据库，可以依据具体情况设计，在此，不做具体限定。其中，源数据库1和目标数据库2可以分别对应为SQLSERVER数据库、MYSQL数据库、DB2数据库、DM6数据库、DM7数据库、POSTGRESQL数据库或ORACLE数据库中的任一种。The source database 1 and the target database 2 may be homogeneous databases or heterogeneous databases, which may be designed according to specific conditions, which are not specifically limited here. Wherein, the source database 1 and the target database 2 may respectively correspond to any one of the SQLSERVER database, the MYSQL database, the DB2 database, the DM6 database, the DM7 database, the POSTGRESQL database or the ORACLE database.

在实际应用场景中，目标端复制工具32的启动时，首先通过目标复制进程在目标数据库2中查找是否存在提交事务表，如果不存在，则触发目标数据库2创建提交事务表。其中，目标数据库2依据提交事务表内的事务日志对该事务对应的对象进行相应的操作，以实现数据同步。其中，提交事务表用于存储已提交事务记录、恢复起始点以及检查点范围。提交事务表可以理解为数据登记表，用于记录已经同步的事务操作。In an actual application scenario, when the target-side replication tool 32 is started, it first searches the target database 2 for whether there is a commit transaction table through the target replication process, and if not, triggers the target database 2 to create a commit transaction table. The target database 2 performs corresponding operations on the object corresponding to the transaction according to the transaction log in the submitted transaction table, so as to realize data synchronization. Among them, the committed transaction table is used to store committed transaction records, recovery starting points, and checkpoint ranges. The commit transaction table can be understood as a data registration table, which is used to record the transaction operations that have been synchronized.

关于提交事务表的数据内存结构如图3a所示，提交事务表的数据内存结构中包括三个字段名TID、SEQID以及END_SCN。其中，TID为事务的标识码，是事务的唯一标识，SEQID为事务开始操作，END_SCN为事务提交操作。本实施例的提交事务表中有三种不同的记录信息，基于不同的记录信息数据内存结构有所差异，具体如下：The data memory structure of the commit transaction table is shown in Figure 3a. The data memory structure of the commit transaction table includes three field names TID, SEQID and END_SCN. Among them, TID is the identification code of the transaction, which is the unique identifier of the transaction, SEQID is the transaction start operation, and END_SCN is the transaction commit operation. There are three kinds of different record information in the commit transaction table of the present embodiment, and the data memory structure is different based on the different record information, and the details are as follows:

第一种记录信息：第一种记录信息用于记录恢复起始点对应的日志序列号，其TID字段内容为0，SEQID字段为恢复起始点对应的日志序列号，END_SCN值为0。在其他实施例中，TID也可以为其他特殊字符，能够依据该特殊字符区分记录信息为恢复起始点即可。同样，END_SCN也可以为其他数值。The first type of record information: The first type of record information is used to record the log sequence number corresponding to the recovery start point, the content of the TID field is 0, the SEQID field is the log sequence number corresponding to the recovery start point, and the END_SCN value is 0. In other embodiments, the TID may also be other special characters, and the record information can be distinguished as the recovery starting point according to the special characters. Similarly, END_SCN can also be other values.

第二种记录信息：第二种记录信息用于记录已提交的事务，其TID字段内容为已提交的事务的标识码，标识码是事务的唯一标识，从源数据库1的日志记录中提取，是一个正整数；SEQID为已提交的事务的起始日志序列号，从源数据库1的日志记录中提取，该日志序列号随着数据库操作不断递增；END_SCN为已提交事务的提交日志序列号，从源数据库1的日志记录中提取，该日志序列号随着数据库操作不断递增。The second type of record information: the second type of record information is used to record the submitted transaction, and the content of the TID field is the identification code of the submitted transaction. The identification code is the unique identification of the transaction, which is extracted from the log records of the source database 1. is a positive integer; SEQID is the starting log sequence number of the committed transaction, extracted from the log record of source database 1, and the log sequence number is continuously increasing with the database operation; END_SCN is the commit log sequence number of the committed transaction, Extracted from the log records of source database 1, the log sequence number is continuously incremented with database operations.

第三种记录信息：第三种记录信息用于记录范围检查点对应的日志序列号。其TID字段内容为-1，SEQID为相邻两个活动事务中较早发生的活动事务的起始序列号LSN1，END_SCN为相邻两个活动事务中较晚发生的活动事务的起始序列号LSN2，其中，LSN1和LSN2满足如下条件：LSN2-LSN1>RANGE，其中RANGE为预设阈值。其中，预设阈值依据实际情况而定，可以为400、500、1000或者其他数值。在其他实施例中，TID也可以为其他特殊字符，能够依据该特殊字符区分记录信息为范围检查点即可。The third type of record information: The third type of record information is used to record the log sequence number corresponding to the range checkpoint. The content of the TID field is -1, SEQID is the starting sequence number LSN1 of the active transaction that occurs earlier in the two adjacent active transactions, and END_SCN is the starting sequence number of the active transaction that occurs later in the two adjacent active transactions. LSN2, where LSN1 and LSN2 satisfy the following condition: LSN2-LSN1>RANGE, where RANGE is a preset threshold. The preset threshold is determined according to the actual situation, and may be 400, 500, 1000 or other values. In other embodiments, the TID may also be other special characters, and the record information can be distinguished as a range checkpoint according to the special characters.

目标复制进程在启动时需要将提交事务表中的上述三种记录信息加载至目标数据库2的内存数据结构中。目标端复制工具32使用如下查询语句首先将提交事务表信息从目标数据库2中查询出来：SELECT SEQID，TID，END_SCN FROM"提交事务表"ORDER BYSEQID。然后，将第一种记录、第二种记录以TID字段为搜索关键字加载至内存哈希结构中，将第三种记录以二维数组形式进行存储。二维数组的数据内存结构具体如图3b，该二维数组的包括两列，行数依据范围检查点的数目而定，二维数组的每一行对应一个范围检查点。其中，LSN_A和LSN_B组成一个范围检查点，LSN_C和LSN_D组成一个范围检查点，LSN_E和LSN_F组成一个范围检查点，LSN_X和LSN_Y组成一个范围检查点，以此类推。其中，左边一列的日志序列号的值对应为数组下标为[0]的元素的值，右边一列的日志序列号的值对应为数组下标为[1]的元素的值。通过该方式，将满足条件的两个日志序列号对应设置为范围检查点的最大值和最小值，以形成范围检查点区间。The target replication process needs to load the above three kinds of record information in the commit transaction table into the memory data structure of the target database 2 when starting. The target-side replication tool 32 uses the following query statement to first query the commit transaction table information from the target database 2: SELECT SEQID, TID, END_SCN FROM "commit transaction table" ORDER BYSEQID. Then, the first type of record and the second type of record are loaded into the memory hash structure with the TID field as the search key, and the third type of record is stored in the form of a two-dimensional array. The data memory structure of the two-dimensional array is specifically shown in Figure 3b. The two-dimensional array includes two columns, the number of rows depends on the number of range checkpoints, and each row of the two-dimensional array corresponds to a range checkpoint. Among them, LSN _A and LSN _B form a range checkpoint, LSN _C and LSN _D form a range checkpoint, LSN _E and LSN _F form a range checkpoint, LSN _X and LSN _Y form a range checkpoint, and so on. The value of the log sequence number in the left column corresponds to the value of the element whose index is [0] in the array, and the value of the log sequence number in the right column corresponds to the value of the element whose index is [1] in the array. In this way, the two log sequence numbers that satisfy the conditions are correspondingly set as the maximum value and the minimum value of the range checkpoint, so as to form the range checkpoint interval.

同时，目标端复制工具32在其内存中创建未提交事务表。其中，未提交事务表用于存储活动事务(未提交的事务)的事务日志。其中，事务日志包括该事务的起始操作日志(第一个操作日志)、至少一个中间操作日志以及提交操作日志(最后一个操作日志)。当接收到提交操作日志时，将该事务对应的事务日志存储至提交事务表内。关于未提交事务表的数据内存结构如图3c，未提交事务表的数据内存结构中包括两个字段名TID以及SEQID。其中，TID为事务的标识码，SEQID为未提交的事务的起始日志序列号，从源数据库1的日志记录中提取，该日志序列号随着数据库操作不断递增。At the same time, the target-side replication tool 32 creates an uncommitted transaction table in its memory. Among them, the uncommitted transaction table is used to store the transaction log of active transactions (uncommitted transactions). The transaction log includes an initial operation log (the first operation log) of the transaction, at least one intermediate operation log and a commit operation log (the last operation log). When the commit operation log is received, the transaction log corresponding to the transaction is stored in the commit transaction table. The data memory structure of the uncommitted transaction table is shown in Figure 3c. The data memory structure of the uncommitted transaction table includes two field names, TID and SEQID. Wherein, TID is the identification code of the transaction, and SEQID is the initial log sequence number of the uncommitted transaction, which is extracted from the log record of the source database 1, and the log sequence number is continuously increased with the operation of the database.

在创建好提交事务表和未提交事务表之后，复制装置依据实际情况实时获取恢复起始点和范围检查点以更新提交事务表和未提交事务表内存储的数据，以便于复制系统故障时，依据提交事务表内的恢复起始点、范围检查点以及已提交事务记录进行数据恢复。After the committed transaction table and the uncommitted transaction table are created, the replication device obtains the recovery starting point and range checkpoint in real time according to the actual situation to update the data stored in the committed transaction table and the uncommitted transaction table, so that in the event of a failure of the replication system, the Commit the recovery start point, range checkpoint, and committed transaction records within the transaction table for data recovery.

在实际应用场景中，复制系统可能会发生故障，为了保证故障还原时，源数据库1和目标数据库2事务的一致性，同时为了保证复制系统能够准确、高效的进行恢复，本实施例提供了一种保证复制事务一致性的方法，具体请参阅图2。In practical application scenarios, the replication system may fail. In order to ensure the transaction consistency between the source database 1 and the target database 2 when the failure is restored, and to ensure that the replication system can recover accurately and efficiently, this embodiment provides a A method to ensure the consistency of replicated transactions, see Figure 2 for details.

步骤201：获取源数据库最早发生的活动事务的起始日志序列号，将所述最早发生的活动事务的起始日志序列号标记为恢复起始点。Step 201: Obtain the initial log sequence number of the earliest active transaction in the source database, and mark the initial log sequence number of the earliest active transaction as a recovery start point.

其中，日志序列号用于代表数据库系统内部产生的一个物理事务，日志序列号在数据库系统的日志文件中具有全局唯一性，且日志序列号随着数据库的操作逐步递增，以区分不同的物理事务。目前，大多数数据库管理系统使用LSN(Log sequence number，简称LSN)代表数据库系统内部产生的一个物理事务，例如，SQLSERVER、MYSQL、DB2、DM6、DM7等数据库管理系统。Among them, the log sequence number is used to represent a physical transaction generated inside the database system. The log sequence number is globally unique in the log file of the database system, and the log sequence number gradually increases with the operation of the database to distinguish different physical transactions. . At present, most database management systems use LSN (Log sequence number, LSN for short) to represent a physical transaction generated inside the database system, for example, database management systems such as SQLSERVER, MYSQL, DB2, DM6, DM7, etc.

其中，活动事务是指未提交的事务。当活动事务被提交后就变更为已提交事务，该事务对应的操作会更新到数据库中。Among them, the active transaction refers to the uncommitted transaction. When an active transaction is committed, it is changed to a committed transaction, and the operations corresponding to the transaction are updated to the database.

在本实施例中，在源数据库和目标数据库数据同步的过程中，源数据库可能一直有事务操作在执行，例如，产生一个新事务或者某一个活动事务被提交等等。也就是说，随着时间的推移最早发生的活动事务所指代的对象会发生改变，那么相应的，最早发生的活动事务对应的起始日志序列号也会发生改变。In this embodiment, in the process of data synchronization between the source database and the target database, the source database may always have transaction operations being performed, for example, a new transaction is generated or a certain active transaction is committed and so on. That is to say, as time goes by, the object referred to by the earliest active transaction will change, and accordingly, the starting log sequence number corresponding to the earliest active transaction will also change.

因此，为了保证获取到最准确的恢复起始点(故障发生之前且最接近故障发生时刻点对应的恢复起始点为最准确的恢复起始点)，在优选的实施例中，采用定时器定时获取源数据库最早发生的活动事务的起始日志序列号，并判断确定当前时刻源数据库最早发生的活动事务的起始日志序列号相对于上一时刻源数据库最早发生的活动事务的起始日志序列号是否发生变更；若发生变更，则依据当前时刻源数据库最早发生的活动事务的起始日志序列号重新设置恢复起始点。并将重新设置的恢复起始点更新到提交事务表中。Therefore, in order to ensure that the most accurate recovery starting point is obtained (the recovery starting point corresponding to the time before the fault and the closest to the fault occurrence time point is the most accurate recovery starting point), in a preferred embodiment, a timer is used to obtain the source The starting log sequence number of the earliest active transaction in the database, and it is determined whether the starting log sequence number of the earliest active transaction in the source database at the current moment is relative to the starting log sequence number of the earliest active transaction in the source database at the previous moment. If a change occurs, the recovery starting point is reset according to the starting log sequence number of the earliest active transaction in the source database at the current moment. And update the reset recovery starting point to the commit transaction table.

具体而言，判断确定当前时刻源数据库最早发生的活动事务的起始日志序列号是否大于上一时刻源数据库最早发生的活动事务的起始日志序列号；若大于，则依据当前时刻源数据库最早发生的活动事务的起始日志序列号重新设置恢复起始点，并将重新设置的恢复起始点更新到提交事务表中。Specifically, it is determined whether the initial log sequence number of the earliest active transaction in the source database at the current moment is greater than the initial log sequence number of the earliest active transaction in the source database at the previous moment; The starting log sequence number of the active transaction that occurred resets the recovery starting point, and updates the reset recovery starting point to the commit transaction table.

在实际应用场景中，假设当前时刻源数据库最早发生的活动事务的起始日志序列号为LSN0，上一时刻源数据库最早发生的活动事务的起始日志序列号为LSN，当LSN0大于LSN时，在提交事务表中删除END_SCN小于等于该LSN0的所有记录以及TID等于0的记录，具体可执行如下语句：In the actual application scenario, it is assumed that the starting log sequence number of the earliest active transaction in the source database at the current moment is LSN0, and the starting log sequence number of the earliest active transaction in the source database at the previous moment is LSN. When LSN0 is greater than LSN, Delete all records whose END_SCN is less than or equal to the LSN0 and records whose TID is equal to 0 in the commit transaction table. Specifically, the following statement can be executed:

BEGINBEGIN

DELET FROM“提交事务表”WHERE END_SCN<＝LSN0OR TID＝0；DELET FROM "commit transaction table" WHERE END_SCN <= LSN0OR TID=0;

INSERT INTO“提交事务表”VALUES(0，LSN0-1，0)；INSERT INTO "commit transaction table" VALUES(0, LSN0-1, 0);

ENDEND

可以理解的是，在重新设置起始恢复点后，会将提交日志序列号不大于该重新设置的起始恢复点的事务日志从提交事务表中删除。It can be understood that, after the initial recovery point is reset, the transaction logs whose commit log sequence number is not greater than the reset initial recovery point will be deleted from the commit transaction table.

步骤202：获取源数据库相邻两个活动事务的起始日志序列号，并在所述相邻两个活动事务的起始日志序列号的差值大于预设阈值时，将所述相邻两个活动事务的起始日志序列号标记为范围检查点。Step 202: Obtain the starting log sequence numbers of two adjacent active transactions in the source database, and when the difference between the starting log sequence numbers of the two adjacent active transactions is greater than a preset threshold The starting log sequence number of an active transaction is marked as a range checkpoint.

在本实施例中，依次获取源数据库相邻两个活动事务的起始日志序列号，判断确定所述相邻两个活动事务的起始日志序列号的差值是否大于预设阈值，若所述相邻两个活动事务的起始日志序列号的差值大于预设阈值，则将所述相邻两个活动事务的起始日志序列号标记为一个范围检查点，将多个范围检查点以二维数组形式进行存储，其中，二维数组的一行对应其中一个范围检查点。In this embodiment, the starting log sequence numbers of two adjacent active transactions in the source database are sequentially obtained, and it is determined whether the difference between the starting log sequence numbers of the two adjacent active transactions is greater than a preset threshold. If the difference between the start log sequence numbers of the two adjacent active transactions is greater than the preset threshold, the start log sequence numbers of the two adjacent active transactions are marked as a range checkpoint, and multiple range checkpoints are marked. Stored as a 2D array, where one row of the 2D array corresponds to one of the range checkpoints.

例如，未提交事务表中有n个活动事务，第一个活动事务的起始日志序列号为LSN1、第二个活动事务的起始日志序列号为LSN2、第三个活动事务的起始日志序列号为LSN3……第n个活动事务的起始日志序列号为LSNn。首先，判断确定LSN2-LSN1的差值是否大于预设的阈值，如果大于预设的阈值，则将第一个活动事务的起始日志序列号为LSN1和第二个活动事务的起始日志序列号为LSN2标记为第一个范围检查点，具体可执行如下SQL语句：For example, there are n active transactions in the uncommitted transaction table, the starting log sequence number of the first active transaction is LSN1, the starting log sequence number of the second active transaction is LSN2, and the starting log sequence number of the third active transaction is LSN2. The sequence number is LSN3...the starting log sequence number of the nth active transaction is LSNn. First, it is determined whether the difference between LSN2 and LSN1 is greater than the preset threshold. If it is greater than the preset threshold, the starting log sequence number of the first active transaction is LSN1 and the starting log sequence of the second active transaction. The number LSN2 is marked as the first range checkpoint, and the following SQL statement can be executed:

BEGINBEGIN

DELETE FROM“提交事务表”WHERE SEQID>LSN1AND SEQID<LSN2AND END_SCN<LSN2AND TID>0；DELETE FROM "Commit Transaction Table" WHERE SEQID>LSN1AND SEQID<LSN2AND END_SCN<LSN2AND TID>0;

INSERT INTO“提交事务表”VALUES(-1，LSN1，LSN2)；INSERT INTO "commit transaction table" VALUES(-1, LSN1, LSN2);

ENDEND

然后，判断确定LSN3-LSN2的差值是否大于预设的阈值，如果大于预设的阈值，则将第二个活动事务的起始日志序列号为LSN2和第三个活动事务的起始日志序列号为LSN3标记为第二个范围检查点，以此类推，获取所有的范围检查点。Then, it is determined whether the difference between LSN3-LSN2 is greater than the preset threshold, and if it is greater than the preset threshold, set the starting log sequence number of the second active transaction as the starting log sequence of LSN2 and the third active transaction No. LSN3 marked as the second range checkpoint, and so on, to get all range checkpoints.

在本实施例中，将前述所有的范围检查点以二维数组的形式进行存储，其中，二维数组的一行对应其中一个范围检查点。数组下标为[0]的元素记录较小的起始日志序列号，数组下标为[1]的元素记录较大的起始日志序列号。通过前述的方式设置范围检查点，再依据范围检查点标记事务是否已经被提交，可以有效减小提交事务表所占用的内存。In this embodiment, all the aforementioned range checkpoints are stored in the form of a two-dimensional array, wherein a row of the two-dimensional array corresponds to one of the range checkpoints. The element with the index [0] of the array records the smaller starting log sequence number, and the element with the index [1] of the array records the larger starting log sequence number. By setting the range checkpoint in the foregoing manner, and then marking whether the transaction has been committed according to the range checkpoint, the memory occupied by the commit transaction table can be effectively reduced.

步骤203：获取源数据库事务日志，删除提交日志序列号小于所述恢复起始点的事务日志以及提交日志序列号落入范围检查点的事务日志以得到已提交事务记录。Step 203: Obtain the source database transaction log, delete the transaction log whose commit log sequence number is smaller than the recovery starting point and the transaction log whose commit log sequence number falls within the range checkpoint to obtain the committed transaction record.

在本实施例中，源端复制工具实施访问源数据库以获取源数据库事务日志，并将获取到的事务日志发送给目标端复制工具，目标端复制工具依据事务提交日志的日志序列号的大小执行相应的操作。In this embodiment, the source-side replication tool implements access to the source database to obtain the source database transaction log, and sends the obtained transaction log to the target-side replication tool, and the target-side replication tool executes the execution according to the size of the log sequence number of the transaction commit log corresponding action.

具体而言，目标端复制工具首先删除提交日志序列号小于恢复起始点的事务日志，然后，再删除提交日志序列号落入范围检查点的事务日志，同时将相应范围点对应的两个起始日志序列号插入到提交事务表中。如此，使用一行范围检查点记录信息，表示批量的提交事务信息，对提交事务表进行了压缩，避免长事务导致提交事务表规模过大的情况的发生。Specifically, the target-side replication tool first deletes the transaction log whose commit log sequence number is smaller than the recovery starting point, and then deletes the transaction log whose commit log sequence number falls within the range checkpoint, and simultaneously deletes the two starting points corresponding to the corresponding range point. The log sequence number is inserted into the commit transaction table. In this way, a row of range checkpoint record information is used to represent batch submitted transaction information, and the submitted transaction table is compressed, so as to avoid the occurrence of an excessively large scale of the submitted transaction table caused by long transactions.

在此需要说明的是，当有至少一个事务日志的日志序列号落入某个范围检查点内时，将该范围检查点插入到提交事务表中，表示批量的提交事务信息。当没有任何事务日志的日志序列号落入某个范围检查点内时，则该范围检查点无需记录在提交事务表内。It should be noted here that when the log sequence number of at least one transaction log falls within a range checkpoint, the range checkpoint is inserted into the commit transaction table, indicating batch commit transaction information. When the log sequence numbers of no transaction logs fall within a range checkpoint, the range checkpoint need not be recorded in the commit transaction table.

步骤204：在故障恢复时，依据所述恢复起始点、所述范围检查点以及所述已提交事务记录进行数据恢复，以保证源数据库和目标数据库的事务一致性。Step 204: During fault recovery, perform data recovery according to the recovery starting point, the range checkpoint, and the committed transaction record, so as to ensure transaction consistency between the source database and the target database.

在本实施例中，在复制系统故障恢复时，目标端复制工具首先加载提交事务表信息至内存结构中，对于范围检查点以二维数组形式进行存储，复制系统通过从提交事务表中获取的恢复起始点进行恢复。In this embodiment, when the replication system fails to recover, the target-side replication tool first loads the submitted transaction table information into the memory structure, and stores the range checkpoint in the form of a two-dimensional array. Restore the starting point to restore.

具体请参阅图3，步骤204：在故障恢复时，依据所述恢复起始点、所述范围检查点以及所述已提交事务记录进行数据恢复，以保证源数据库和目标数据库的事务一致性包括如下步骤：Please refer to FIG. 3 for details, step 204: during fault recovery, perform data recovery according to the recovery starting point, the range checkpoint and the committed transaction record, so as to ensure the transaction consistency of the source database and the target database, including the following: step:

步骤2041：将所述恢复起始点发送至所述源数据库并接收所述源数据库所发送的操作日志，其中，所述操作日志的日志序列号大于所述恢复起始点对应的日志序列号。Step 2041: Send the recovery starting point to the source database and receive the operation log sent by the source database, wherein the log sequence number of the operation log is greater than the log sequence number corresponding to the recovery starting point.

在本实施例中，复制系统在进行故障恢复时，源数据库必须获取恢复起始点，依据恢复起始点发送事务日志到目标数据库。其中，恢复起始点为故障发生时源数据库最早发生的活动事务的起始日志序列号。In this embodiment, when the replication system performs fault recovery, the source database must obtain the recovery starting point, and send the transaction log to the target database according to the recovery starting point. The recovery starting point is the starting log sequence number of the earliest active transaction in the source database when the failure occurs.

具体而言，源端复制工具启动源复制进程，并向目标端复制工具发送请求获取恢复起始点的命令消息，以获取日志分析的起始日志序列号。目标端复制工具接收到源端复制工具的请求指令后，执行如下查询：SELECT SEQID FROM“提交事务表”WHERE TID＝0，进而从提交事务表中获取恢复起始点对应的日志序列号。目标端复制工具将获取到的恢复起始点对应的日志序列号发送给源端复制工具，源端复制工具接收到恢复起始点对应的日志序列号后，在源数据库中依据该恢复起始点定位日志分析点。即，源端复制工具从源数据库获取日志序列号大于所述恢复起始点对应的日志序列号的操作日志，并将操作日志发送给目标端复制工具。Specifically, the source-side replication tool starts the source-side replication process, and sends a command message requesting to obtain the recovery starting point to the target-side replication tool, so as to obtain the starting log sequence number for log analysis. After receiving the request command from the source replication tool, the target replication tool executes the following query: SELECT SEQID FROM "commit transaction table" WHERE TID=0, and then obtains the log sequence number corresponding to the recovery starting point from the submission transaction table. The target-side replication tool sends the obtained log sequence number corresponding to the recovery starting point to the source-side replication tool. After receiving the log sequence number corresponding to the recovery starting point, the source-side replication tool locates the log in the source database according to the recovery starting point. Analysis point. That is, the source-side replication tool obtains from the source database an operation log whose log sequence number is greater than the log sequence number corresponding to the recovery starting point, and sends the operation log to the target-side replication tool.

步骤2042：对所述操作日志进行解析获取所述操作日志对应的事务标识码。Step 2042: Parse the operation log to obtain a transaction identification code corresponding to the operation log.

在本实施例中，目标端复制工具对操作日志进行解析获取操作日志对应的事务标识码。In this embodiment, the target-side replication tool parses the operation log to obtain the transaction identification code corresponding to the operation log.

步骤2043：判断确定所述事务标识码是否存在于已提交事务记录内，若存在，则执行步骤2044；若不存在，则执行步骤2045。Step 2043: Determine whether the transaction identification code exists in the submitted transaction record, if so, execute step 2044; if not, execute step 2045.

在本实施例中，接收到事务的提交日志后，首先以事务的标识码为搜索关键字在提交事务表内容进行搜索，以判断确定事务标识码是否存在于已提交事务记录内。若存在，则执行步骤2044；若不存在，则执行步骤2045。In this embodiment, after receiving the commit log of the transaction, firstly use the transaction identification code as the search key to search the content of the submitted transaction table to determine whether the transaction identification code exists in the submitted transaction record. If it exists, go to step 2044; if not, go to step 2045.

步骤2044：则所述操作日志对应的事务在故障发生前已经加载至目标数据库，忽略所述操作日志。Step 2044: The transaction corresponding to the operation log has been loaded into the target database before the failure occurs, and the operation log is ignored.

若该事务标识码存在于已提交事务记录内，则说明该操作日志对应的事务在故障发生前已经加载至目标数据库，忽略该操作日志。If the transaction identification code exists in the committed transaction record, it means that the transaction corresponding to the operation log has been loaded into the target database before the failure occurs, and the operation log is ignored.

步骤2045：对所述操作日志进行解析获取所述操作日志对应的事务的提交日志序列号，并判断确定所述提交日志序列号是否落入所述范围检查点内。Step 2045: Analyze the operation log to obtain the commit log sequence number of the transaction corresponding to the operation log, and determine whether the commit log sequence number falls within the range checkpoint.

若该事务标识码不存在于已提交事务记录内，则对操作日志进行解析获取操作日志对应的事务的提交日志序列号，并判断确定提交日志序列号是否落入范围检查点内。为了提高判断的准确性，在优选的实施例中，若该事务标识码不存在于已提交事务记录内，则对操作日志进行解析获取操作日志对应的事务的提交日志序列号以及起始日志序列号，并判断确定提交日志序列号以及起始日志序列号是否落入范围检查点内。若提交日志序列号以及起始日志序列号均落入范围检查点内，则操作日志对应的事务在故障发生前已经加载至目标数据库，忽略操作日志。If the transaction identifier does not exist in the committed transaction record, the operation log is parsed to obtain the commit log sequence number of the transaction corresponding to the operation log, and it is determined whether the commit log sequence number falls within the range checkpoint. In order to improve the accuracy of the judgment, in a preferred embodiment, if the transaction identification code does not exist in the committed transaction record, the operation log is parsed to obtain the commit log sequence number and the start log sequence of the transaction corresponding to the operation log. number, and determine whether the commit log sequence number and the start log sequence number fall within the range checkpoint. If both the commit log sequence number and the start log sequence number fall within the range checkpoint, the transaction corresponding to the operation log has been loaded into the target database before the failure occurs, and the operation log is ignored.

在优选的实施例中，由于提交事务表中可能存在多个范围检查点，为了提高查找的速度，可以采用二分查找算法在包含多个范围检查点的二维数组中进行定位，判断确定所述提交日志序列号是否落入范围检查点内。In a preferred embodiment, since there may be multiple range checkpoints in the commit transaction table, in order to improve the search speed, a binary search algorithm can be used to locate in a two-dimensional array containing multiple range checkpoints, and it is determined that the Whether the commit log sequence number falls within the range checkpoint.

步骤2046：若所述提交日志序列号落入所述范围检查点内，则所述操作日志对应的事务在故障发生前已经加载至目标数据库，忽略所述操作日志。Step 2046: If the serial number of the commit log falls within the range checkpoint, the transaction corresponding to the operation log has been loaded into the target database before the failure occurs, and the operation log is ignored.

在本实施中，若提交日志序列号落入范围检查点内，则操作日志对应的事务在故障发生前已经加载至目标数据库，忽略操作日志。In this implementation, if the serial number of the commit log falls within the range checkpoint, the transaction corresponding to the operation log has been loaded into the target database before the failure occurs, and the operation log is ignored.

例如，假设该提交事务的提交日志序列号为LSN_COMMIT。该某个范围检查点中较小的日志序列号为LSN_C，较大的日志序列号为LSN_D。For example, suppose the commit log sequence number for this committing transaction is LSN_COMMIT. The smaller log sequence number in the range checkpoint is LSN_C, and the larger log sequence number is LSN_D.

如果LSN_C<LSN_COMMIT<LSN_D，则提交日志序列号LSN_COMMIT落入范围检查点内，表明该提交事务在故障发生前已经加载至目标数据库，忽略该操作日志。If LSN_C<LSN_COMMIT<LSN_D, the commit log sequence number LSN_COMMIT falls within the range checkpoint, indicating that the commit transaction has been loaded into the target database before the failure occurs, and the operation log is ignored.

为提高判断的准确性，在另一个实施例中，若提交日志序列号以及起始日志序列号均落入范围检查点内，则操作日志对应的事务在故障发生前已经加载至目标数据库，忽略操作日志。In order to improve the accuracy of the judgment, in another embodiment, if both the serial number of the commit log and the serial number of the starting log fall within the range checkpoint, the transaction corresponding to the operation log has been loaded into the target database before the failure occurs, and it is ignored. Operation log.

例如，假设该提交事务的起始日志序列号为LSN_START，该提交事务的提交日志序列号为LSN_COMMIT。该某个范围检查点中较小的日志序列号为LSN_C，较大的日志序列号为LSN_D。For example, suppose the start log sequence number of the commit transaction is LSN_START, and the commit log sequence number of the commit transaction is LSN_COMMIT. The smaller log sequence number in the range checkpoint is LSN_C, and the larger log sequence number is LSN_D.

如果LSN_C<LSN_START<LSN_D，LSN_C<LSN_COMMIT<LSN_D，则起始日志序列号为LSN_START和提交日志序列号LSN_COMMIT均落入范围检查点内则表明该提交事务在故障发生前已经加载至目标数据库，忽略该操作日志。If LSN_C<LSN_START<LSN_D, LSN_C<LSN_COMMIT<LSN_D, then the start log sequence number is LSN_START and the commit log sequence number LSN_COMMIT both fall within the range checkpoint, indicating that the commit transaction has been loaded into the target database before the failure occurs, ignoring The operation log.

步骤2047：若所述提交日志序列号未落入所述范围检查点内，则所述操作日志对应的事务在故障发生前未加载至目标数据库，所述操作日志需要在目标数据库进行同步。Step 2047: If the serial number of the commit log does not fall within the range checkpoint, the transaction corresponding to the operation log is not loaded into the target database before the failure occurs, and the operation log needs to be synchronized in the target database.

若提交日志序列号未落入范围检查点内，则操作日志对应的事务在故障发生前未加载至目标数据库，将操作日志在目标数据库进行同步。具体地，在接收到操作日志对应的事务的提交日志时，首先判断确定提交日志序列号是否落入范围检查点内，若未落入范围检查点内，则将操作日志对应的事务日志添加到已提交事务记录，然后，依据操作日志对应的事务日志对目标数据库进行数据同步。If the serial number of the commit log does not fall within the range checkpoint, the transaction corresponding to the operation log is not loaded into the target database before the failure occurs, and the operation log is synchronized in the target database. Specifically, when receiving the commit log of the transaction corresponding to the operation log, first determine whether the serial number of the commit log falls within the range checkpoint, and if it does not fall within the range checkpoint, add the transaction log corresponding to the operation log to the The transaction record has been submitted, and then data synchronization is performed on the target database according to the transaction log corresponding to the operation log.

区别于现有技术，本发明依据预设的阈值设置范围检查点，并将提交日志序列号落入范围检查点的事务日志删除，采用范围检查点机制标记已提交的事务，即，使用日志序列号范围来表示批量已提交的事务，即使数据库中存在长时间未提交的长事务，也能有效及时的删除已过期的提交事务信息，对提交事务表进行了压缩，可以有效减小提交事务表的规模。同时，在复制系统发生故障还原时，可以基于范围检查点确定相应事务是否已经同步过，能够有效缩短复制系统恢复时间以及内存资源消耗。Different from the prior art, the present invention sets a range checkpoint according to a preset threshold, deletes the transaction log whose commit log sequence number falls within the range checkpoint, and adopts the range checkpoint mechanism to mark the committed transactions, that is, uses the log sequence The number range is used to represent batches of committed transactions. Even if there are long uncommitted long-term transactions in the database, the expired committed transaction information can be deleted in a timely manner, and the committed transaction table is compressed, which can effectively reduce the number of committed transaction tables. size. At the same time, when the replication system fails to restore, it can be determined whether the corresponding transaction has been synchronized based on the range checkpoint, which can effectively shorten the recovery time of the replication system and the consumption of memory resources.

实施例2： Example 2 :

请参阅图5，图5复制装置是本发明实施例提供的一种复制装置的结构示意图。本实施例的复制装置括一个或多个处理器51以及存储器52。其中，图5中以一个处理器51为例。Please refer to FIG. 5. FIG. 5 is a schematic structural diagram of a copying apparatus according to an embodiment of the present invention. The replication apparatus of this embodiment includes one or more processors 51 and a memory 52 . Among them, one processor 51 is taken as an example in FIG. 5 .

处理器51和存储器52可以通过总线或者其他方式连接，图5中以通过总线连接为例。The processor 51 and the memory 52 may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 5 .

存储器52作为一种基于数据同步的非易失性计算机可读存储介质，可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块，如实施例1中的保证复制事务一致性的方法以及对应的程序指令。处理器51通过运行存储在存储器52中的非易失性软件程序、指令以及模块，从而执行保证复制事务一致性的方法的各种功能应用以及数据处理，实现实施例1的保证复制事务一致性的方法的功能。The memory 52, as a non-volatile computer-readable storage medium based on data synchronization, can be used to store non-volatile software programs, non-volatile computer-executable programs and modules, such as ensuring the consistency of replication transactions in Embodiment 1. method and corresponding program instructions. The processor 51 executes various functional applications and data processing of the method for ensuring the consistency of replicated transactions by running the non-volatile software programs, instructions and modules stored in the memory 52, thereby realizing the guaranteed replication transaction consistency of Embodiment 1 function of the method.

其中，存储器52可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中，存储器52可选包括相对于处理器51远程设置的存储器，这些远程存储器可以通过网络连接至处理器51。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 52 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 52 may optionally include memory located remotely from processor 51 , which may be connected to processor 51 via a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

关于保证复制事务一致性的方法请参照图1～图4及相关的文字描述在此，不再赘述。For the method of ensuring the consistency of the replicated transaction, please refer to FIG. 1 to FIG. 4 and the related text descriptions, which will not be repeated here.

值得说明的是，上述装置和系统内的模块、单元之间的信息交互、执行过程等内容，由于与本发明的处理方法实施例基于同一构思，具体内容可参见本发明方法实施例中的叙述，此处不再赘述。It is worth noting that the information exchange, execution process and other contents between the modules and units in the above-mentioned device and the system are based on the same concept as the processing method embodiments of the present invention. For details, please refer to the descriptions in the method embodiments of the present invention. , and will not be repeated here.

本领域普通技术人员可以理解实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，该程序可以存储于一计算机可读存储介质中，存储介质可以包括：只读存储器(Read Only Memory，简写为ROM)、随机存取存储器(Random AccessMemory，简写为RAM)、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the embodiments can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the storage medium can include: Read Only Memory (abbreviated as ROM), random access memory (Random Access Memory, abbreviated as RAM), magnetic disk or optical disk, etc.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included in the protection of the present invention. within the range.

Claims

1. a kind of method for guaranteeing duplication transaction consistency, which is characterized in that the method packet for guaranteeing duplication transaction consistency It includes:

The origin log sequence number for obtaining the active transaction that source database occurs earliest, by the active transaction occurred earliest Origin log sequence number is labeled as recovery starting point；

Obtain the origin log sequence number of the two neighboring active transaction of source database, and rising in the two neighboring active transaction When the difference of beginning log serial number is greater than preset threshold, the origin log sequence number of the two neighboring active transaction is labeled as Range check point；

Obtain source database transaction journal, delete submit log serial number no more than it is described restore starting point transaction journal and Log serial number is submitted to fall into the transaction journal of range check point to obtain having been filed on transaction journal；

In fault recovery, according to the recovery starting point, range check point and it is described have been filed on transaction journal carry out Data are restored, to guarantee the transaction consistency of source database and target database；

Wherein, in fault recovery, remember according to the recovery starting point, range check point and the affairs that have been filed on Record carries out data recovery

The recovery starting point is sent to the source database and receives operation log transmitted by the source database, In, the log serial number of the operation log is greater than the corresponding log serial number of the recovery starting point；

The submission log serial number that parsing obtains the corresponding affairs of the operation log is carried out to the operation log, and is judged true Whether the fixed submission log serial number falls into the range check point；

If the submission log serial number is fallen into the range check point, the corresponding affairs of the operation log are sent out in failure It has been loaded into target database before death, has ignored the operation log.

2. the method according to claim 1 for guaranteeing duplication transaction consistency, which is characterized in that the acquisition source database The origin log sequence number of the active transaction occurred earliest, by the origin log sequence number mark of the active transaction occurred earliest Being denoted as recovery starting point includes:

Using the origin log sequence number for the active transaction that timer timing acquisition source database occurs earliest；

Judge the origin log sequence number for the active transaction for determining that current time source database occurs earliest relative to last moment Whether the origin log sequence number for the active transaction that source database occurs earliest changes；

If changing, the origin log sequence number for the active transaction that foundation current time source database occurs earliest is set again Set recovery starting point.

3. the method according to claim 1 for guaranteeing duplication transaction consistency, which is characterized in that the acquisition source database The origin log sequence number of two neighboring active transaction, and the difference of the origin log sequence number in the two neighboring active transaction When value is greater than preset threshold, include: labeled as range check point by the origin log sequence number of the two neighboring active transaction

Successively obtain the origin log sequence number of the two neighboring active transaction of source database；

Judge to determine whether the difference of the origin log sequence number of the two neighboring active transaction is greater than preset threshold；

If the difference of the origin log sequence number of the two neighboring active transaction is greater than preset threshold, will be described two neighboring The origin log sequence number of active transaction is labeled as a range check point；

Range check point is stored in the form of two-dimensional array, wherein a line of two-dimensional array corresponds to one of range inspection It makes an inventory of.

4. the method according to claim 1 for guaranteeing duplication transaction consistency, which is characterized in that the guarantee replicates affairs The method of consistency further include:

If the submission log serial number is not fallen in the range check point, the corresponding affairs of the operation log are in failure It is not loaded onto target database before occurring, the operation log is added to and does not submit transaction table.

5. the method according to claim 4 for guaranteeing duplication transaction consistency, which is characterized in that the guarantee replicates affairs The method of consistency further include:

When receiving the submission log of the corresponding affairs of the operation log, the corresponding transaction journal of the operation log is added Transaction journal is had been filed on described in being added to；

It is synchronous that data are carried out to target database according to the operation log corresponding transaction journal.

6. the according to claim 1, method of guarantee duplication transaction consistency described in 4 or 5, which is characterized in that described to guarantee again The method of transaction consistency processed further include:

Parsing is carried out to the operation log and obtains the corresponding Transaction Identifier code of the operation log；

Judgement determines that the Transaction Identifier code whether there is in having been filed in transaction journal；

If it exists, then the corresponding affairs of the operation log have been loaded into target database before failure occurs, and ignore described Operation log；

If it does not exist, then it executes and the submission log that parsing obtains the corresponding affairs of the operation log is carried out to the operation log Sequence number, and judge to determine whether the submission log serial number falls into the range check point.

7. the according to claim 1, method of guarantee duplication transaction consistency described in 4 or 5, which is characterized in that described to described Operation log carries out parsing and obtains the submission log serial number of the corresponding affairs of the operation log, and judges to determine the submission Whether log serial number falls into the range check point

The submission log serial number that parsing obtains the corresponding affairs of the operation log is carried out to the operation log；

It is positioned in the two-dimensional array comprising multiple range check points using binary chop algorithm, judges to determine the submission Whether log serial number falls into range check point.

8. the method according to claim 1 for guaranteeing duplication transaction consistency, which is characterized in that the acquisition source database The origin log sequence number of the active transaction occurred earliest, by the origin log sequence number mark of the active transaction occurred earliest It is denoted as before restoring starting point further include:

Creation submits transaction table and does not submit transaction table, and the submission transaction table is used for recovery of stomge starting point, range check Transaction journal is put and has been filed on, it is described not submit transaction table for storing the transaction journal for not submitting affairs.

9. a kind of reproducing unit, which is characterized in that including at least one processor；And it is logical at least one described processor Believe the memory of connection；Wherein, the memory is stored with the instruction that can be executed by least one described processor, described instruction It is arranged to carry out by program and guarantees the method for replicating transaction consistency as claim 1 ~ 8 is described in any item.