CN117171266B

CN117171266B - Data synchronization method, device, equipment and storage medium

Info

Publication number: CN117171266B
Application number: CN202311091225.8A
Authority: CN
Inventors: 陈肃; 徐志超; 刘瀚林; 陈节勋; 陈诚; 陈雷
Original assignee: Beijing Zhufeng Technology Co ltd
Current assignee: Beijing Zhufeng Technology Co ltd
Priority date: 2023-08-28
Filing date: 2023-08-28
Publication date: 2024-05-14
Anticipated expiration: 2043-08-28
Also published as: CN117171266A

Abstract

The application discloses a data synchronization method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring the current transaction state of a data synchronization task; determining a target site from the first site and the second site according to the current transaction state; and executing data synchronization operation by taking the target site as a reading starting point of the data synchronization task.

Description

Data synchronization method, device, equipment and storage medium

Technical Field

The present application relates to, but not limited to, the field of database technologies, and in particular, to a data synchronization method, apparatus, device, and storage medium.

Background

The data synchronization system generally adopts an asynchronous replication mode to perform data synchronization, but once disaster backup switching occurs in the system in the asynchronous replication mode, the risk of data loss is very easy to occur, and continuous operation of the service system cannot be ensured.

Currently, for submitted data, the data synchronization system may perform data recovery through a unique location in the log information. However, for the uncommitted data, the existing data synchronization system cannot determine a safe recovery site to recover the data, so that a large amount of data is lost, even repeated reading of the data occurs, and the continuous operation of the service system is affected.

Disclosure of Invention

The application provides a data synchronization method, a device, equipment and a storage medium, which solve the problems that a data synchronization system in the related art cannot determine a safe recovery site to recover data, so that a large amount of data is lost, even repeated reading of the data occurs, and further the continuous operation of a service system is influenced.

The technical scheme of the application is realized as follows:

A method of data synchronization, the method comprising:

Acquiring the current transaction state of a data synchronization task;

Determining a target site from the first site and the second site according to the current transaction state;

and executing data synchronization operation by taking the target site as a reading starting point of the data synchronization task.

A data synchronization apparatus, the apparatus comprising:

the acquisition module is used for acquiring the current transaction state of the data synchronization task;

The processing module is used for determining a target site from the first site and the second site according to the current transaction state;

and the control module is used for executing data synchronization operation by taking the target site as a reading starting point of the data synchronization task.

A data synchronization device, the device comprising:

one or more processors;

a memory for storing one or more programs;

when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the steps of the data synchronization method described above.

A computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of the data synchronization method described above.

The embodiment of the application provides a data synchronization method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring the current transaction state of a data synchronization task; determining a target site from the first site and the second site according to the current transaction state; taking the target site as a reading starting point of the data synchronization task, and executing data synchronization operation; the method solves the problems that a data synchronization system in the related art cannot determine a safe recovery site to recover data, so that a large amount of data is lost, even repeated reading of the data occurs, and the continuous operation of a service system is affected.

Drawings

Fig. 1 is a flow chart of a data synchronization method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a data synchronization device according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a data synchronization device according to an embodiment of the present application;

Fig. 4 is a flow chart of a data synchronization method in disaster recovery scenarios according to an embodiment of the present application;

FIG. 5 is a schematic flow chart of determining a security recovery site according to an embodiment of the present application;

FIG. 6 is a flowchart illustrating a method for log resolution-based data synchronization according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a persistent storage synchronization method in a disaster recovery scenario provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of another data synchronization apparatus according to an embodiment of the present application;

FIG. 9 is a diagram of a deployment architecture for implementing data synchronization based on a remote manner according to an embodiment of the present application;

Fig. 10 is a layout diagram for implementing data synchronization based on a local mode according to an embodiment of the present application.

Detailed Description

The present application will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

In the following description, the terms "first", "second", "third" and the like are merely used to distinguish similar objects and do not represent a specific ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a specific order or sequence, as permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

An embodiment of the present application provides a data synchronization method, referring to fig. 1, including the steps of:

s101, acquiring the current transaction state of the data synchronization task.

The current transaction state is used for representing the processing degree of the current transaction, and comprises a state that the transaction is submitted and a state that the transaction is not submitted.

The data synchronization task refers to submitting the data in the source database to the data cache, and then sending the data submitted to the cache to the target database to complete data backup.

In the embodiment of the application, when the full or incremental data of the source database is acquired by adopting a log analysis mode, two situations may exist: under the condition that only submitted transactions are contained in the log, when data synchronization operation is performed at this time, the current transaction state of the acquired data synchronization task is the submitted state; in another case, the log includes uncommitted transactions, and when the data synchronization operation is performed, the current transaction state of the obtained data synchronization task is the uncommitted state.

Illustratively, the full or incremental data of the source database may also be obtained by way of a query.

S102, determining a target site from the first site and the second site according to the current transaction state.

The first position is used for representing the cache data offset of the data, and the second position is used for representing the transaction submitting progress of the data. It can be understood that the first location is a location corresponding to the data offset in the log, that is, the cached data offset location, which may be a location where the maximum offset of the current data is located; the second location, which is the location in the log corresponding to the unique identifier (Identity Document, ID) of the transaction, i.e., the transaction commit location, may be the location at which the transaction begins. Typically, the cache data offset location is located further back than the transaction commit location.

In the embodiment of the application, in order to ensure the data integrity, a more reliable site is selected from the first site and the second site to be used as a safe recovery site of the data synchronization task in combination with the current transaction state.

Illustratively, the target site is determined from the cached data offset site and the transaction commit site based on the current transaction state, based on the current maximum transaction commit site and the maximum data offset of the target database. It can be appreciated that, in the data synchronization task, if the current transaction state is the commit state, the processing progress of the current transaction can be determined by the unique location in the log at this time, and the cache data offset location can be preferentially determined as the target location, so as to reduce repeated reading of the data; if the current transaction state is the uncommitted state, the processing progress of the current transaction needs to consider the transaction commit site and the cache data offset site at the same time, and a more reliable site is selected from the transaction commit site and the cache data offset site to serve as a safe recovery site of the data synchronization task.

Illustratively, in a disaster recovery scenario, a more reliable site is selected from a transaction commit site and a cache data migration site as a safe recovery site for a data synchronization task.

S103, executing data synchronization operation by taking the target site as a reading starting point of the data synchronization task.

In the embodiment of the application, when the target site is determined to be the first site, the first site is used as the reading starting point of the data synchronization task, the data synchronization operation is executed, and the data in the cache is sent to the target database.

In the embodiment of the application, when the target site is determined to be the second site, the second site is used as a reading starting point of the data synchronization task, the data synchronization operation is executed, and the data in the source database is sent to the target database through the cache.

The embodiment of the application provides a data synchronization method, which comprises the following steps: acquiring the current transaction state of a data synchronization task; determining a target site from the first site and the second site according to the current transaction state; taking the target site as a reading starting point of the data synchronization task, and executing data synchronization operation; the method solves the problems that a data synchronization system in the related art cannot determine a safe recovery site to recover data, so that a large amount of data is lost, even repeated reading of the data occurs, and the continuous operation of a service system is affected. In addition, the method provided by the application can carry out persistent storage on the position corresponding to the uncommitted transaction in the local transaction cache, so that even under the condition of data loss in the local transaction cache, the data can be recovered from a safe recovery position, and the integrity of the data is further ensured.

In some embodiments of the present application, determining the target site from the first site and the second site in S102 according to the current transaction state may be implemented by:

And if the current transaction state is the commit state, determining the first site as the target site.

In the embodiment of the application, if the current transaction state is the commit state, the fact that the log of the source database does not contain uncommitted transactions is indicated, at this time, all data in the cache are committed transactions, at this time, the processing progress of the current transaction is determined by a unique position in the log, and in order to reduce repeated reading of the data, a first position used for representing the offset of the cache data is determined as a target position.

In some embodiments of the present application, determining the target site from the first site and the second site in S102 according to the current transaction state may further be implemented by:

If the current transaction state is an uncommitted state and the data corresponding to the first site meets the data reading condition, determining the first site as a target site.

The meeting of the data reading condition means that the data corresponding to the first site exists and the data can be read.

In the embodiment of the application, if the current transaction state is the uncommitted state, it indicates that the log of the source database contains uncommitted transactions, at this time, the uncommitted transactions exist in the data in the cache, and it needs to further determine whether the data corresponding to the uncommitted transactions, which is used for characterizing the first location of the cache data offset, exists, and when the data exists, it indicates that the first location has data that can be read, and it is determined that the first location is the target location.

If the current transaction state is an uncommitted state and the data corresponding to the first site does not meet the data reading condition, determining the second site as the target site.

Wherein, the failure to satisfy the data reading condition means that the data corresponding to the first location is not present and no data can be read.

In the embodiment of the application, if the current transaction state is an uncommitted state, it indicates that an uncommitted transaction is included in a log of a source database, at this time, the uncommitted transaction exists in data in a cache, whether the data of a first location corresponding to the uncommitted transaction and used for representing a cache data offset exists needs to be judged, when the data does not exist, it indicates that the first location has no data and can be read, at this time, further reading needs to be continued by a second location corresponding to the uncommitted transaction and used for representing a transaction commit progress of the data, so as to continue reading the data from the source database, and determining that the second location is a target location.

In some embodiments of the present application, the first site includes a first sub-site, and if the current transaction state is a commit state, determining the first site as a target site may be implemented by:

If the current transaction state is the commit state and the data corresponding to the first sub-site exists, determining the first sub-site as the target site.

The first site comprises a first sub-site, and the first sub-site is a site corresponding to the data offset in the first cache.

Wherein the first cache is a data cache. The data cache is middleware between the source database and the target database and is used for decoupling reading and writing. When data is read from the source database, the read data is submitted to the data cache, and then the data in the data cache is written into the target database based on the data cache. In the data full reading stage, the data cache is used as a middleware, so that the data reading of the source database can be completed in a shorter time, and the memory and performance cost of the source database are reduced; in the scene of data one-to-many distribution, the data cache is used as a middleware, so that a plurality of writing processes are not mutually influenced, and the influence of low writing performance of individual target databases on the whole synchronous progress is avoided.

In the embodiment of the application, if the current transaction state is the commit state, the fact that the log of the source database does not contain uncommitted transactions is indicated, at this time, all data in the cache are committed transactions, whether the data of the first sub-site corresponding to the committed transactions and used for representing the cache data offset exists needs to be judged, and when the data exists, the fact that the data can be read is indicated, and the first sub-site is determined to be the target site.

In some embodiments of the present application, S103 uses a target site as a reading start point of a data synchronization task, and performs a data synchronization operation, which may be implemented by the following steps:

and reading data from the first cache by using the first sub-site as a reading starting point of the data synchronization task, and sending the read data to the target database.

In the embodiment of the application, after the target site is determined to be the first subsite, the existence of the data corresponding to the first subsite is known, the first subsite is used as a reading starting point of the data synchronization task, the data is read from the data cache where the first subsite is located, the read data is sent to the target database, and the data synchronization operation is executed.

In an exemplary embodiment, after determining a location corresponding to a data offset in the data cache at a target location, it is known that there is data available for reading in the data cache, and at this time, the data is read from the data cache with the location corresponding to the data offset in the data cache as a reading start point, and the read data is sent to the target database, so as to perform a data synchronization operation.

In some embodiments of the application, the first site comprises a first subsite and a second subsite; if the current transaction state is an uncommitted state and the data corresponding to the first site meets the data reading condition, determining the first site as the target site can be realized by the following steps:

If the current transaction state is an uncommitted state, the data corresponding to the first sub-site does not exist, and the data corresponding to the second sub-site exists, and the second sub-site is determined to be the target site.

The first site comprises a first subsite, and the first subsite is a site corresponding to the data offset in the first cache; the first site comprises a second sub-site, and the second sub-site is a site corresponding to the data offset in the second cache.

Wherein the second cache is a transaction cache. The transaction cache is a middleware between the source database and the data cache and is used for caching the uncommitted transaction data first until the transaction is submitted, and then sending the submitted data to the data cache so as to ensure that the data written into the data cache are all submitted data, and ensure the integrity of the data.

In the embodiment of the application, if the current transaction state is an uncommitted state, it is indicated that the log of the source database contains uncommitted transactions, at this time, the uncommitted transactions exist in the transaction cache, at this time, it is required to determine whether data of a first sub-site corresponding to a data offset in the data cache corresponding to the uncommitted transactions exists, when the data does not exist, it is indicated that the first sub-site has no data to be readable, at this time, it is required to further determine whether data of a second sub-site corresponding to a data offset in the transaction cache corresponding to the uncommitted transactions exists, when the data exists, it is indicated that the second sub-site has data to be readable, and therefore it is determined that the second sub-site is a target site.

reading data from the source database by using the second sub-site as a reading starting point of the data synchronization task, and sending the read data to a second cache; the second cache is different from the first cache.

In the embodiment of the application, after the target site is determined to be the second sub-site, it is known that the data corresponding to the second sub-site exists, that is, the site corresponding to the data offset corresponding to the uncommitted transaction existing in the transaction cache is the second sub-site, and then the transaction commit operation needs to be continuously executed until the transaction is committed to completion, so that the data is continuously read from the source database by using the second sub-site as a reading starting point of the data synchronization task, and the read data is sent to the transaction cache to advance the transaction commit progress, so that the complete data can be obtained through the transaction cache when the data synchronization operation is executed subsequently.

In some embodiments of the present application, the second sub-site is used as a reading start point of the data synchronization task, the data is read from the source database, and the read data is sent to the second cache, which can be implemented by the following steps:

receiving a first transaction request, and reading transaction data from a source database by using a second sub-site as a reading starting point of a data synchronization task;

Adding a fourth subsite and a fifth subsite to the read transaction data, and obtaining the transaction data added with the fourth subsite and the fifth subsite;

and acquiring data to be transmitted according to the transaction data added with the fourth sub-site and the fifth sub-site, and transmitting the data to be transmitted to the second cache.

Wherein the first transaction request includes a transaction start flag characterizing that the transaction begins to commit.

The fourth sub-site comprises a site corresponding to the data offset in the log, and the fifth sub-site comprises a site corresponding to the transaction ID in the log.

In the embodiment of the application, when a request for starting a transaction is received, the data is required to be read from a source database, the second sub-site is used as a reading starting point of a data synchronization task, the transaction data is continuously read from the source database, a site corresponding to a data offset and a site corresponding to a transaction ID in a log are added to each piece of read transaction data, the data added with the fourth sub-site and the fifth sub-site is sent to a transaction cache, and the current transaction state is not submitted before the data is sent.

In the embodiment of the application, after the data to be sent is obtained according to the transaction data added with the fourth sub-site and the fifth sub-site and the data to be sent is sent to the second buffer, the method can be realized by the following steps:

and receiving a second transaction request, reading data to be sent from the second cache, and sending the read data to be sent to the first cache.

Wherein the second transaction request includes a marker of transaction end for characterizing transaction commit completion.

The second cache is a transaction cache, and the first cache is a data cache.

In the embodiment of the application, when the request of ending the transaction is received, the data read from the source database is indicated to be submitted to the transaction cache, and then the submitted data is sent to the data cache, so that the submitted data stored in the data cache is ensured, and the integrity of the data is ensured.

In some embodiments of the application, the first site comprises a first subsite and a second subsite, the second site comprising a third subsite; if the current transaction state is an uncommitted state and the data corresponding to the first site does not meet the data reading condition, determining the second site as the target site can be realized by the following steps:

if the current transaction state is an uncommitted state, the data corresponding to the first sub-site does not exist, the data corresponding to the second sub-site does not exist, and the third sub-site is determined to be the target site.

The first sub-site is a site corresponding to the data offset in the data cache, the second sub-site is a site corresponding to the data offset in the transaction cache, and the third sub-site is a site corresponding to the transaction ID in the transaction cache.

In the embodiment of the application, if the current transaction state is an uncommitted state, it indicates that an uncommitted transaction is included in a log of a source database, at this time, there is an uncommitted transaction in data in a cache, it needs to determine whether there is data of a first sub-site corresponding to a data offset in a data cache corresponding to the uncommitted transaction, when the data does not exist, it indicates that the first sub-site has no data to be readable, at this time, it needs to further determine whether there is data of a second sub-site corresponding to a data offset in a transaction cache corresponding to the uncommitted transaction, when the data also does not exist, it indicates that the second sub-site has no data to be readable, and then it needs to use a third sub-site corresponding to a transaction ID in a transaction cache corresponding to the uncommitted transaction as a reading start point, and re-analyze from the source database, and then determine that the third sub-site corresponding to a transaction ID in the transaction cache as a target site.

In an embodiment of the present application, the third child site includes a minimum uncommitted site in the transaction cache.

It can be appreciated that if the current transaction state is an uncommitted state, the data of the first sub-site corresponding to the data offset in the data cache corresponding to the uncommitted transaction does not exist, and the data of the second sub-site corresponding to the data offset in the transaction cache corresponding to the uncommitted transaction does not exist, the minimum uncommitted site corresponding to the transaction ID in the transaction cache corresponding to the uncommitted transaction needs to be taken as a target site as a reading start point, so as to be resolvable from the source database, thereby ensuring the integrity of the data.

And reading data from the source database by using the third sub-site as a reading starting point of the data synchronization task, and sending the read data to the second cache.

In the embodiment of the application, after the target site is determined to be the third sub-site, it is known that no data which can be used for reading is available in the data cache and the transaction cache, and at the moment, the data needs to be re-analyzed from the source database, the third sub-site is taken as a reading starting point, the data is read from the position corresponding to the third sub-site in the source database, and the read data is sent to the transaction cache to execute the subsequent data synchronization task operation.

In some embodiments of the present application, the first location and the second location are obtained from the third cache; the third cache is used to persist the first location and the second location.

The third cache is a site persistence cache and is used for carrying out persistence storage on the first site and the second site.

The embodiment of the application provides a data synchronization device 200, referring to fig. 2, the data synchronization device 200 includes an acquisition module 201, a processing module 202, and a control module 203; wherein,

An obtaining module 201, configured to obtain a current transaction state of a data synchronization task;

A processing module 202, configured to determine a target site from the first site and the second site according to the current transaction state;

The control module 203 is configured to perform a data synchronization operation with the target site as a read start point of the data synchronization task.

In some embodiments of the present application, the processing module 202 is configured to determine the first location as the target location if the current transaction state is a commit state.

In some embodiments of the present application, the processing module 202 is configured to determine the first location as the target location if the current transaction state is an uncommitted state and the data corresponding to the first location satisfies the data reading condition.

In some embodiments of the present application, the processing module 202 is configured to determine that the second location is the target location if the current transaction state is an uncommitted state and the data corresponding to the first location does not satisfy the data reading condition.

In some embodiments of the present application, the first site includes a first sub-site, and the processing module 202 is further configured to determine that the first sub-site is a target site if the current transaction state is a commit state and data corresponding to the first sub-site exists.

In some embodiments of the present application, the control module 203 is configured to read data from the first cache with the first sub-site as a read start point of the data synchronization task, and send the read data to the target database.

In some embodiments of the present application, the first site includes a first sub-site and a second sub-site, and the processing module 202 is further configured to determine that the second sub-site is the target site if the current transaction state is an uncommitted state, data corresponding to the first sub-site does not exist, and data corresponding to the second sub-site exists.

In some embodiments of the present application, the control module 203 is configured to read data from the source database with the second sub-site as a read start point of the data synchronization task, and send the read data to the second cache; the second cache is different from the first cache.

In some embodiments of the present application, the processing module 202 is configured to receive the first transaction request and read the transaction data from the source database with the second sub-site as a read start point for the data synchronization task.

In some embodiments of the present application, the processing module 202 is configured to add transaction data of the fourth sub-site and the fifth sub-site to the read transaction data. And acquiring data to be transmitted according to the transaction data added with the fourth sub-site and the fifth sub-site, and transmitting the data to be transmitted to the second cache.

In some embodiments of the present application, the processing module 202 is configured to receive the second transaction request, read data to be sent from the second cache, and send the read data to be sent to the first cache.

In some embodiments of the present application, the first site includes a first sub-site and a second sub-site, the second site includes a third sub-site, and the processing module 202 is further configured to determine that the third sub-site is the target site if the current transaction state is an uncommitted state, data corresponding to the first sub-site does not exist, and data corresponding to the second sub-site does not exist.

In some embodiments of the present application, the control module 203 is configured to read data from the source database with the third sub-site as a read start point of the data synchronization task, and send the read data to the second cache.

In some embodiments of the present application, the obtaining module 201 is configured to obtain the first location and the second location from the third cache; the third cache is used to persist the first location and the second location.

An embodiment of the present application provides a data synchronization apparatus 300, referring to fig. 3, the data synchronization apparatus 300 includes: including a bus 301, a processor 302, a memory 303, and a communication interface 304. The processor 302, the communication interface 304 and the memory 303 perform communication with each other through the bus 301.

One or more processors 302 for executing one or more computer programs stored on memory 303 perform the steps of the data synchronization method of embodiments of the present invention.

A memory 303 for storing one or more computer programs.

In the following, taking a disaster recovery scenario as an example, an embodiment of a data synchronization method supporting disaster recovery is provided, and the data synchronization method provided by the present application is described with reference to fig. 4:

The method comprises the following steps:

S401: initializing site information and starting a data synchronization task.

S402: and determining a safe recovery site from the site information, and setting the safe recovery site as a reading starting point.

Here, S402 determines a security recovery site from the site information, and setting the security recovery site as a reading start point can be achieved by S4021 to S4026 shown in fig. 5:

taking the example of determining the security recovery site from the minimum uncommitted site (min-uncommitted-offset) and the site (data-offset) corresponding to the data offset, the process advances to S4021.

S4021: acquiring a writing progress from the initialization site information;

S4022: according to the writing progress, firstly consuming data from the cache;

s4023: if the min-uncommitted-offset in the reading site exists, the step of entering S4024 is performed, otherwise, the step of entering S4026 is performed;

S4024: if the cache data corresponding to the min-uncommitted-offset exists in the local transaction cache, entering S4026 if the cache data exists, otherwise entering S4025;

S4025: determining min-uncommitted-offset as a safe recovery site;

s4026: the data-offset is determined to be the security recovery site.

S403: when the security recovery site is determined to be min-uncommitted-offset, the data is read from the source database with the min-uncommitted-offset as the read start point, and site information is added to the data, and the process proceeds to S404.

Here, S403 reads data from the source database and adds site information to the data, which can be achieved by S4031 to S4036 shown in fig. 6:

s4031: sequentially reading the next log block of the source database by the data synchronization task;

S4032: if the log block type is the transaction start flag, the step S4034 is entered, otherwise the step S4033 is entered;

s4033: if the log block type is the transaction end mark, the step S4036 is entered, otherwise the step S4038 is entered;

S4034: the data synchronization task adds the site of the transaction start to the set of uncommitted sites (uncommitted-offset);

S4035: if the position of the data block is smaller than the recorded min-uncommitted-offset, the data synchronization task reassigns the initial value of the min-uncommitted-offset, and the S4031 is returned;

S4036: the data synchronization task sends the data corresponding to the transaction in the transaction cache to the data cache;

s4037: the data synchronization task deletes the start site corresponding to the transaction from the uncommitted-offset set and updates min-uncommitted-offset to the minimum value of the remaining elements in the set. Returning to S4031;

S4038: the data synchronization task writes the data block into the transaction cache, returning to S4031.

S404: and writing the data read from the source database into a cache, and recording the maximum 'position' of one batch of data as 'reading progress' after each batch of data is written.

S405: and reading data from the cache, writing the read data into a target database, and recording the maximum 'position' and the maximum 'cache offset' of one batch of data after each batch of data is written, wherein the maximum 'position' and the maximum 'cache offset' are used as 'writing progress'.

S406: and saving the reading progress and the writing progress to the persistent storage.

S407: and synchronizing the persistent storage data of the main running environment into the persistent storage of the disaster recovery environment.

S408: when the security recovery site is determined to be data-offset, the data is read from the cache using the data-offset as a read start point, and the site information of the data is recorded, and the process goes to S405.

As can be appreciated, S401: initializing site information and starting a data synchronization task.

And initializing task configuration information and site information from the persistent storage, and starting a data synchronization task.

The task configuration information at least comprises: the method comprises the steps of a task reading source database, a written target database, a mapping relation of tables and fields, processing parallelism, speed limit, write batch size, error data processing strategy and log recording strategy.

After the data synchronization task is started, the data of the source database is read according to the initialization site information. Under the acquisition mode based on inquiry, the site information is embodied as a reading value of an incremental expression, for example, the maximum value of the update time of the last batch of data, and then inquiry sentences are constructed to acquire the incremental data from the last reading to the current reading; in a log-based parsing collection, the site information is embodied as a value representing the progress of the log, such as a global transaction identifier (Global Transaction Identifier, GTID) of MySQL, a system change Number (SYSTEM CHANGE Number, SCN) of Oracle, etc. Meanwhile, if there is an uncommitted transaction in the log, the location information includes a "minimum uncommitted location" (min-uncommitted-offset) and a "location corresponding to data offset" (data-offset). When the method is used in disaster recovery scenes, a safe recovery site can be determined from the disaster recovery sites when disaster recovery switching occurs, so that safe recovery of data is ensured, and data loss is avoided. If there are no uncommitted transactions in the log, min-uncommitted-offset may be empty; the latter is the actual data offset for the data in the log. When the data synchronization task is judged and the reading progress is not required to be recovered from the min-uncommitted-offset, the data is directly recovered from the data-offset recorded in the log as a reading starting point, and repeated reading of the data is reduced.

Where location information is typically added as a fixed field to each piece of data and written to persistent storage during subsequent processing. If the control message mode is adopted, the control message containing the site information is independently sent according to a certain time/number interval, so that the data volume of the write-in persistent storage is reduced.

After the synchronous task is started, a safe recovery site is determined from the obtained site information and is used as a reading starting point.

Further, there is provided a method capable of automatically deciding whether to restore the read site data from min-uncommitted-offset or from data-offset, at which time S402 determines a safe restoration site from the site information and sets it as a read start point, and referring to fig. 5, this can be achieved by the steps of:

S4021: and acquiring the writing progress from the initialization site information.

The data synchronization task reads configuration and progress data from the persistent storage and starts the data synchronization task.

Wherein the progress data includes a "reading progress" and a "writing progress". The former determines where the data synchronization task normally starts to continue collecting the data of the source database; the latter determines where the data sync task starts consuming from the cache and writing to the target database, and where to safely recover the data in the event of a restart/disaster recovery switch.

S4022: and according to the writing progress, firstly consuming data from the cache.

And the data synchronization task consumes data from the cache according to the writing progress.

The data synchronization task first tries to consume data from the maximum buffer offset in the writing progress, and if the corresponding data cannot be found, takes the maximum 'position' in the writing progress as the position for starting reading. If the cache is non-persistent, it is not certain to find the data corresponding to the location from the cache; if the cache is persistent, the data synchronization task should be able to find the data corresponding to the maximum "cache offset" in the data cache before the cache meets the flush condition (e.g., is reclaimed 24 hours after the data is set for storage) and the location of the last piece of data in the cache coincides with the location recorded in the write schedule. If not found, this means two possibilities: 1) Disaster recovery switching occurs; 2) The local write process is not performed correctly for some reason, resulting in the cached data being reclaimed before being written to the destination. In either case, to ensure that data is not lost, the maximum "location" in the write progress is now preferred as the starting point for reading. If the data synchronization task can find the data corresponding to the maximum "location" in the write location in the cache, the data continues to be consumed from the maximum "location" and written into the target database. Otherwise, setting the maximum buffer offset of the writing site as the maximum offset of the current buffer, and waiting for the reading process to continue consuming after writing the data into the buffer.

S4023: the presence of min-uncommitted-offset in the read site goes to S4024, otherwise it goes to S4026.

The data synchronization task determines whether the location data read from the persistent storage contains min-uncommitted-offset. If yes, S4024 is entered, and if not, S4026 is entered. If the log of the database does not contain uncommitted transactions, then no min-uncommitted-offset will be generated, and in order to reduce repeated reading of data, the process goes directly to S4026 to begin reading data. Otherwise, it is necessary to go to S4024 to determine whether or not there is a cache uncommitted transaction locally.

S4024: if the cache data corresponding to the min-uncommitted-offset exists in the local transaction cache, the process goes to S4026, otherwise, the process goes to S4025.

The data synchronization task retrieves the corresponding cache data in the local transaction cache that contains min-uncommitted-offset. If yes, the method proceeds to S4026, if not, the method proceeds to S4025, because when disaster recovery switching occurs, corresponding cache data of min-uncommitted-offset is lost, and at this time, reading progress reset is needed based on S4025.

S4025: and determining min-uncommitted-offset as a safe recovery site.

The data synchronization task sets min-uncommitted-offset as the read start. min-uncommitted-offset represents the smallest site in the parsed log where the transaction was not committed, from which to start re-parse to ensure that data can be re-parsed from a safe recovery site when a disaster recovery switch occurs.

S4026: the data-offset is determined to be the security recovery site.

The data synchronization task sets the data-offset as a reading starting point, so that after disaster recovery switching occurs, the data synchronization task starts to recover data from a position corresponding to the maximum buffer offset in the buffer according to the maximum buffer offset in the writing progress.

The data synchronization task takes the determined safe recovery site as a reading starting point. Further, a data synchronization method based on log analysis is provided, in which, taking a security recovery site min-uncommitted-offset as an example of a reading start point, in S403, data is read from a source database, and site information is added to the data, and referring to fig. 6, the method may be implemented by the following steps:

s4031: the data synchronization task sequentially reads the next log block of the source database.

S4032: if the log block type is the transaction start flag, the process proceeds to S4034, otherwise, the process proceeds to S4033.

S4033: if the log block type is the transaction end flag, the process proceeds to S4036, otherwise, the process proceeds to S4038.

The start of the transaction comprises a transaction ID and corresponding position information, and based on the transaction ID and the position information, the data synchronization task can firstly store all data under the transaction ID into a local transaction cache, and submit the data into the data cache after analyzing to a transaction end mark.

It will be appreciated that when a request for a transaction to begin is received, it is indicated that data needs to be read from the source database, and each piece of transaction data read is added with a transaction ID, corresponding location information, and the data to which the location information is added is sent to the transaction cache. Each piece of data sent to the transaction cache contains position information corresponding to the data, wherein the position information comprises data-offset and min-uncommitted-offset at the moment, and the current transaction state is not submitted before the data sending is finished.

S4034: the data synchronization task adds uncommitted-offset set to the site where the transaction begins.

The data synchronization task adds the position corresponding to the transaction ID parsed from the source database into the uncommitted-offset set, and persists the set into the local transaction cache, so that when disaster backup switching occurs, the data synchronization task can directly parse from the data-offset preferentially, and the phenomenon that more repeated data is generated due to the fact that the data synchronization task parses from the min-uncommitted-offset is avoided.

For example, local transaction caching may implement data caching based on a local disk.

S4035: if the position of the data block is smaller than the recorded min-uncommitted-offset, the data synchronization task reassigns the initial value of the min-uncommitted-offset, and returns to S4031.

On the basis of log sequence record, the starting position of the later resolved transaction is larger than min-uncommitted-offset, and when the starting position of the resolved transaction is smaller than recorded min-uncommitted-offset, the initial value of the min-uncommitted-offset needs to be assigned again by the data synchronization task.

S4036: and the data synchronization task sends the data corresponding to the transaction in the transaction cache to the data cache.

When the transaction ending mark is analyzed, the fact that the data read from the source database is submitted to the transaction cache is indicated, the data corresponding to the transaction is sent to the cache from the local transaction cache by the data synchronization task, so that submitted data stored in the data cache is guaranteed, and the integrity of the data is guaranteed.

S4037: the data synchronization task deletes the start site corresponding to the transaction from uncommitted-offset set, and updates min-uncommitted-offset to the minimum value of the remaining elements in the set, returning to S4031. In this way, the transaction information that has been committed is removed and the update to uncommitted-offset set is completed.

In order to achieve reading-writing decoupling and flexible data distribution, the data synchronization task firstly writes the data added with the position information into a cache. The cache may or may not be persistent. Illustratively, the persisted cached representation includes Kafka, pular, rockeMQ or other message queues, and the non-persisted representation includes memory-based queues. In the persistent cache, the capacity of the cache can be considered to be approximately infinite, and a data synchronization task can start to read the next batch after continuously writing one batch of data into the cache; in the non-persistent cache, the capacity of the cache is relatively limited, and after a read thread of a data synchronization task writes the data into the queue, the write thread can only wait for the write thread to complete writing of the target database, and then the data in the queue can be cleared to continue writing of the next batch of data.

When the data cache and the transaction cache are both non-persistent caches, the data in the transaction cache is lost after disaster recovery switching occurs, and a safe recovery site is determined as a reading starting point according to the site information by taking the writing progress as a reference at the moment, and the data is re-read and recovered from the source database. When the data cache and the transaction cache are persistent caches, after disaster recovery switching occurs, the data synchronization task can still access the data which is already written in the transaction cache and the data cache, at the moment, the data is read and recovered from the data cache according to the position information by taking the reading progress as a reference, if the data in the data cache does not exist, the data is read and recovered from the transaction cache, and if the data in the transaction cache does not exist, the minimum non-submitted position is taken as a reading starting point, and the data is re-read and recovered from the source database.

For example, the persistent storage of data may be performed in a synchronous or asynchronous manner. It can be understood that, in the synchronous mode, the read thread waits for the data synchronization task to finish the persistence of the read progress of one batch of data, and then starts the reading of the next batch of data; in an asynchronous mode, the reading thread starts to read the next batch of data after writing the reading progress into a memory queue, and the progress data is independently persisted by an asynchronous thread. In an actual scene, data persistence can be performed in an asynchronous mode or a synchronous mode according to actual requirements, and the application is not limited by comparison.

For example, the data synchronization task may read data from the cache in batches, and send the data read in batches to the target database, thereby improving the writing efficiency.

The manner in which the "write progress" is persisted may be, for example, in a synchronous or asynchronous manner, as the application is not limited in this regard.

The progress persistence may be a system-level process/thread or may be subordinate to each data synchronization task, as the application is not limited in this regard.

And synchronizing the data to the persistent storage of the disaster recovery environment by the persistent storage of the original main running environment, and synchronizing the data under the disaster recovery environment by the configuration data and the progress data.

By synchronizing the configuration data and the progress data in the disaster recovery environment, when the running environment of the system fails, the progress can be stored and migrated to the disaster recovery environment for continuous running, and the integrity of the data is ensured.

Illustratively, the databases include, but are not limited to, relational databases, distributed databases, open source distributed relational databases, and the like. The data acquisition modes comprise a log-based analysis mode and a query-based acquisition mode. Under the acquisition method based on log analysis, the site information in the log is required to be kept continuous when disaster recovery switching occurs.

The disaster recovery switching can be performed manually by way of example, and it is understood that a system administrator sends a switching instruction through a disaster recovery control system to realize the disaster recovery switching in a manual mode; the disaster recovery switching can also be based on detection activity, and understandably, the disaster recovery system detects that the heartbeat detection program of the original environment does not send data for more than a certain time, and automatically starts the data synchronization task of the disaster recovery environment; the disaster recovery switching can be manually confirmed, and as can be appreciated, a system administrator manually confirms that the original environment system is not working, and then directly starts the data synchronization task of the disaster recovery environment. The mode of disaster recovery switching is not particularly limited in the present application.

In the following, taking a disaster recovery scenario as an example, an embodiment of a data synchronization method based on persistent storage is provided, and referring to fig. 7, a method for providing data synchronization in an embodiment of the present application is described as follows:

S501: a data write request.

The site persistence module issues a data write request to a persistent storage Master (Master).

S502: and (5) requesting data synchronization.

The persistent storage Master issues a data synchronization request to a persistent storage device (Standby).

S503: the response is received synchronously.

The persistent storage Standby sends out a synchronous receiving response to the persistent storage Master. The actual data persistence is then completed in the manner of S505.

S504: the writing was successful.

The persistent storage Master returns a response to the site persistence module that the write was successful.

S505: data persistence.

Persistent storage Standby completes data persistence in a synchronous/semi-synchronous/asynchronous manner.

In the embodiment of the application, the data synchronization relationship is established between the persistent storage Master of the original environment and the persistent storage Standby of the disaster recovery environment, so that the persistent site information can be synchronized into the disaster recovery environment, the data synchronization task can be executed in the disaster recovery environment immediately, and the data consistency between the disaster recovery environment and the original environment is maintained.

Next, taking a disaster recovery scenario as an example, another embodiment of a data synchronization device is provided, and referring to fig. 8, 9, and 10, a description will be given of the data synchronization device provided by the present application:

Another data synchronization apparatus 800 (the data synchronization apparatus 800 in fig. 8 corresponds to the data synchronization apparatus 200 in fig. 2) according to some embodiments of the present application includes: an acquisition unit 801, a parsing unit 802, a conversion unit 803, a loading unit 804, and a progress persistence unit 805; wherein,

An acquisition unit 801 for acquiring a log from a source database.

The log may be obtained by the obtaining unit remotely or locally.

Referring to fig. 9, a deployment architecture for implementing data synchronization based on a remote manner includes: the source end database host 901, the database process and the target end database host 902 are operated on the source end database host 901, the database process and the data synchronization device host 903 are operated on the target end database host 902, and the data synchronization system process is operated on the data synchronization device host 903.

Referring to fig. 10, a deployment architecture for implementing data synchronization based on a local manner includes: the source database host 901, the target database host 902, the data synchronization device host 903, the acquisition agent process 904 running on the source database host 901, and the loading agent process 905 running on the target database host 902.

When implemented in a remote manner, the acquisition unit runs on a remote server, as shown with reference to fig. 9, the data synchronization device host 903 starts a data synchronization system process, and the database process is executed by the remote server to acquire the data of the database of the source database host 901. When implemented in a local manner, the acquisition unit operates on the source database host 901 separately in the form of an acquisition agent, and as shown with reference to fig. 10, the data synchronization system process of the data synchronization device host 903 is started, and the data of the database of the source database host 901 is acquired through the local acquisition agent process 904.

A parsing unit 802 for obtaining a data definition language (Data Definition Language, DDL) and a data manipulation language (Data Manipulation Language, DML) of a plaintext from a source database log. Illustratively, the DDL includes operations of table creation, renaming, column addition, column modification, column deletion, etc., and the DML includes data addition, deletion, and modification. Under the acquisition mode based on the query, the analysis unit is used for directly transmitting the DDL and the DML obtained by the query to the conversion unit and adding site information to the data.

Illustratively, the parsing unit may generate separate control messages for the persistence site. It will be appreciated that the control message may be to send the location data to the data cache at set time or number intervals.

A conversion unit 803 for converting the DDL and the DML into a unified data encoding format.

And the loading unit 804 is configured to write data into the target database, thereby implementing data synchronization.

Wherein writing to the target database may be accomplished remotely or locally. When implemented remotely, the loading unit runs on a remote server, as shown with reference to fig. 9, the data synchronization device host 903 starts a data synchronization system process, and the database process is executed by the remote server to load the acquired data to the target database host 902. When implemented in a local manner, the loading unit runs on the target database host 902 solely in the form of a load agent, and as shown with reference to FIG. 10, the data synchronization system process of the data synchronization device host 903 is started and data is loaded to the target database host 902 via a local load agent process 905.

A progress persistence unit 805 configured to persistence-store the progress data.

By way of example, progress may be written to persistent storage in a synchronous/asynchronous manner, as the application is not limited in this regard.

Illustratively, the acquisition agent and the loading agent may be implemented in any desired combination, such as, but not limited to, an acquisition agent alone, or a loading agent alone.

Embodiments of the present application provide a computer readable storage medium storing one or more programs executable by one or more processors to implement the implementation procedure in the data synchronization control method provided in the corresponding embodiments of fig. 1,4 to 7, and will not be described herein.

The buses mentioned above may be peripheral component interconnect (PERIPHERAL COMPONENT INTERCONNECT, PCI) buses or peripheral component interconnect (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, PCIE) buses, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The communication interface is used for communication between the server and other devices. The computer storage medium/Memory may be a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable programmable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable programmable Read Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), a magnetic random access Memory (Ferromagnetic Random Access Memory, FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Read Only optical disk (Compact Disc Read-Only Memory, CD-ROM); but may also be various terminals such as mobile phones, computers, tablet devices, personal digital assistants, etc., that include one or any combination of the above-mentioned memories. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a digital signal processor (DIGITAL SIGNAL Processing, DSP), application Specific Integrated Circuit (ASIC), field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment of the present application" or "the foregoing embodiments" or "some implementations" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" or "an embodiment of the application" or "the foregoing embodiment" or "some embodiments" or "some implementations" in various places throughout this specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of units is only one logical function division, and there may be other divisions in actual implementation, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.

The methods disclosed in the method embodiments provided by the application can be arbitrarily combined under the condition of no conflict to obtain a new method embodiment.

The features disclosed in the several product embodiments provided by the application can be combined arbitrarily under the condition of no conflict to obtain new product embodiments.

The features disclosed in the embodiments of the method or the apparatus provided by the application can be arbitrarily combined without conflict to obtain new embodiments of the method or the apparatus.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read Only Memory (ROM), a magnetic disk or an optical disk, or the like, which can store program codes.

Or the above-described integrated units of the application may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the embodiments of the present application may be embodied essentially or in a part contributing to the related art in the form of a software product stored in a storage medium, including several instructions for causing a data synchronization device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.

It should be noted that the drawings in the embodiments of the present application are only for illustrating schematic positions of respective devices on the terminal device, and do not represent actual positions in the terminal device, the actual positions of respective devices or respective areas may be changed or shifted according to actual situations (for example, structures of the terminal device), and proportions of different parts in the terminal device in the drawings do not represent actual proportions.

The foregoing is merely an embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of data synchronization, comprising:

Acquiring the current transaction state of a data synchronization task;

Determining a target site from a first site and a second site according to the current transaction state, wherein the first site is used for representing the cache data offset of the data, and the second site is used for representing the transaction commit progress of the data;

Taking the target site as a reading starting point of the data synchronization task, and executing data synchronization operation;

the first site includes a first sub-site and a second sub-site, and the executing the data synchronization operation with the target site as a reading start point of the data synchronization task includes:

If the current transaction state is an uncommitted state, the data corresponding to the first sub-site does not exist, and the data corresponding to the second sub-site exists, receiving a first transaction request, and taking the second sub-site as a reading starting point of the data synchronization task, reading transaction data from a source database, wherein the second sub-site is a site corresponding to a data offset in a second cache, the second cache is used for caching uncommitted transaction data, the first sub-site is a site corresponding to a data offset in a first cache, and the first cache is used for caching committed transaction data;

Adding a fourth sub-site and a fifth sub-site to the read transaction data, and obtaining the transaction data added with the fourth sub-site and the fifth sub-site, wherein the fourth sub-site comprises a site corresponding to a data offset in a log, and the fifth sub-site comprises a site corresponding to a transaction unique identifier in the log;

2. The method of claim 1, wherein determining a target site from the first site and the second site based on the current transaction state comprises:

And if the current transaction state is a commit state, determining the first site as the target site.

3. The method of claim 1, wherein determining a target site from the first site and the second site based on the current transaction state comprises:

And if the current transaction state is an uncommitted state and the data corresponding to the first site meets the data reading condition, determining the first site as the target site.

4. The method of claim 1, wherein determining a target site from the first site and the second site based on the current transaction state comprises:

And if the current transaction state is an uncommitted state and the data corresponding to the first site does not meet the data reading condition, determining the second site as the target site.

5. The method of claim 2, wherein determining the first location as the target location if the current transaction state is commit state comprises:

and if the current transaction state is a commit state and the data corresponding to the first sub-site exists, determining the first sub-site as the target site.

6. The method of claim 5, wherein the performing a data synchronization operation with the target site as a read start point of the data synchronization task comprises:

And taking the first sub-site as a reading starting point of the data synchronization task, reading data from the first cache, and sending the read data to a target database.

7. The method of claim 1, wherein the performing a data synchronization operation with the target site as a read start point of the data synchronization task comprises:

and taking the second sub-site as a reading starting point of the data synchronization task, reading data from a source database, and sending the read data to the second cache.

8. The method according to claim 1, wherein after obtaining data to be sent according to the transaction data added with the fourth sub-site and the fifth sub-site and sending the data to be sent to the second cache, the method comprises:

And receiving a second transaction request, reading the data to be sent from the second cache, and sending the read data to be sent to the first cache.

9. The method of claim 4, wherein the first site comprises the first sub-site and the second sub-site, the second site comprises a third sub-site, the third sub-site being a site corresponding to a transaction unique identifier in the second cache; if the current transaction state is an uncommitted state and the data corresponding to the first location does not satisfy the data reading condition, determining that the second location is the target location includes:

And if the current transaction state is an uncommitted state, determining that the data corresponding to the first sub-site does not exist and the data corresponding to the second sub-site does not exist, and determining that the third sub-site is the target site.

10. The method of claim 9, wherein the performing a data synchronization operation with the target site as a read start point of the data synchronization task comprises:

And taking the third sub-site as a reading starting point of the data synchronization task, reading data from a source database, and sending the read data to the second cache.

11. The method according to claim 1, characterized in that it comprises: acquiring the first site and the second site from a third cache; the third cache is to persist the first location and the second location.

12. A data synchronization device, comprising:

the processing module is used for determining a target site from a first site and a second site according to the current transaction state, wherein the first site is used for representing the cache data offset of the data, and the second site is used for representing the transaction submitting progress of the data;

The control module is used for executing data synchronization operation by taking the target site as a reading starting point of the data synchronization task; the first site includes a first sub-site and a second sub-site, and the executing the data synchronization operation with the target site as a reading start point of the data synchronization task includes: if the current transaction state is an uncommitted state, the data corresponding to the first sub-site does not exist, and the data corresponding to the second sub-site exists, receiving a first transaction request, and taking the second sub-site as a reading starting point of the data synchronization task, reading transaction data from a source database, wherein the second sub-site is a site corresponding to a data offset in a second cache, the second cache is used for caching uncommitted transaction data, the first sub-site is a site corresponding to a data offset in a first cache, and the first cache is used for caching committed transaction data; adding a fourth sub-site and a fifth sub-site to the read transaction data, and obtaining the transaction data added with the fourth sub-site and the fifth sub-site, wherein the fourth sub-site comprises a site corresponding to a data offset in a log, and the fifth sub-site comprises a site corresponding to a transaction unique identifier in the log; and acquiring data to be transmitted according to the transaction data added with the fourth sub-site and the fifth sub-site, and transmitting the data to be transmitted to the second cache.

13. A data synchronization device, the device comprising:

one or more processors;

a memory for storing one or more programs;

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the data synchronization method of any one of claims 1 to 11.

14. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a data synchronization method according to any one of claims 1 to 11.