Disclosure of Invention
In view of the above problems with the transaction processing of the distributed database system in the prior art, the present invention is proposed to provide a transaction processing method and apparatus of the distributed database system, which overcome the above problems or at least partially solve the above problems.
The invention provides a transaction processing method of a distributed database system, which comprises the following steps:
the global transaction processing center server receives a transaction Structured Query Language (SQL) statement submitted by a user and generates an SQL syntax tree according to the SQL statement;
the global transaction processing center server traverses the SQL syntax tree and generates an SQL execution tree which can be executed at each database node of the distributed database system according to the distribution condition of the distributed database system;
the global transaction processing center server applies for a global transaction identifier GTID corresponding to the SQL execution tree, traverses the SQL execution tree, and issues SQL statements on execution nodes of the SQL execution tree to corresponding database nodes for execution, wherein the GTID is carried in the SQL statements;
and the global transaction processing central server receives the execution result, traverses the SQL execution tree according to the execution result, returns the execution result to the user after the SQL statements on the SQL execution tree are successfully and completely executed, and releases the GTID.
Preferably, when each database node executes an SQL statement, corresponding GTIDs are stored in the implicit columns of the operation data of the SQL statement stored in the database node, and the previous and subsequent values of the operation data of the SQL statement are written into the database logs of each database node;
the method further comprises a distributed rollback operation: the global transaction processing center server acquires a database log of a database node successfully executing the SQL statement when an execution result submitted by a part of database nodes is that the SQL statement is successfully executed, acquires the successfully executed SQL statement and front and back values of operation data from the database log through GTID, constructs a reverse SQL statement according to the successfully executed SQL statement and the front and back values of the operation data, performs a rollback operation on the successfully executed SQL statement, and rolls back the successfully executed SQL statement to a state consistent with the unsuccessfully executed SQL statement.
Preferably, the receiving, by the global transaction processing center server, the structured query language SQL statement submitted by the user, and generating the SQL syntax tree according to the SQL statement specifically includes:
the global transaction processing center server receives a transaction SQL statement submitted by a user;
the global transaction processing center server formulates an SQL (structured query language) syntax analyzer according to the SQL syntax rules and the distributed application rules, analyzes the transaction SQL statement through the SQL syntax analyzer to generate an SQL syntax tree, and reconstructs the required SQL statement according to the SQL syntax tree, wherein the SQL syntax tree comprises an SQL statement node and an SQL statement operation node, and if the two SQL statement nodes are in a master-slave relationship, the two SQL statement nodes are connected through the SQL statement operation node.
Preferably, the traversing, by the global transaction center server, the SQL syntax tree, and generating, according to the distribution of the distributed database system, an SQL execution tree that can be executed at each database node of the distributed database system specifically includes:
the global transaction processing center server deeply traverses the SQL syntax tree from a root node, starts merging operation after traversing to leaf nodes, merges nodes of a father-child relationship and a brother relationship which can be issued to database nodes in an SQL statement into an execution node, distributes and issues the current nodes to the execution node of the SQL execution tree when the current nodes cannot be merged, and finally performs the merging operation to generate the SQL execution tree which can be executed at each database node of the distributed database system.
Preferably, when generating the SQL execution tree that can be executed at each database node of the distributed database system, the method further includes:
when the data related to the SQL sentences on the SQL execution tree are distributed in the same database node, the SQL sentences are marked to be directly issued to the corresponding database node for execution; when data related to SQL sentences on the SQL execution tree are distributed on different database nodes, the SQL sentences are marked as needing to be summarized, and sub SQL sentence marks decomposed from the SQL sentences are directly issued to corresponding database nodes for execution;
the issuing of the SQL statements on the SQL execution tree to the corresponding database nodes for execution specifically includes:
when the SQL statement mark is executed by the corresponding database node which is directly issued, the global transaction center server directly issues the SQL statement to the corresponding database node for execution;
when the SQL statement mark is needed to be summarized, the global transaction processing center server issues the SQL statement mark of the SQL statement to each database node for execution, receives the execution result fed back by each database node and performs data summarization operation on the execution result.
Preferably, the applying, by the global transaction processing center server, the global transaction identifier GTID corresponding to the SQL execution tree specifically includes:
step 1, a global transaction processing center server inquires GTID in operation data of SQL statements on each execution node of an SQL execution tree;
step 2, the global transaction center server reads the inquired GTID, compares the GTID with a GTID list of the current active system stored in the global transaction center server, and judges whether the operation data corresponding to the read GTID is active;
step 3, under the condition that the operation data is determined to be active, the global transaction processing center server judges whether the transaction operation is overtime or not, or whether the retry number of the transaction operation exceeds a preset retry number threshold or not, if the transaction operation is overtime or exceeds the retry number threshold, the distributed rollback operation is executed, and if the transaction operation is not overtime or does not exceed the retry number threshold, the step 1 is executed;
step 4, the global transaction processing center server applies for a new GTID under the condition of judging that the operation data is not active;
step 5, the global transaction processing center server stores the GTID corresponding to the current active operation data into the current system active GTID list;
and 6, when other transactions apply for carrying out related data operation on the GTID in the active GTID list of the current system, prohibiting the other transactions from carrying out operation.
The invention also provides a transaction processing device of the distributed database system, which comprises:
a client module for receiving the transaction structured query language SQL statement submitted by the user
The syntax parsing module is used for generating an SQL syntax tree according to the SQL statement;
the SQL processing module is used for traversing the SQL syntax tree and generating an SQL execution tree which can be executed at each database node of the distributed database system according to the distribution condition of the distributed database system;
the global transaction identification module is used for applying a global transaction identification GTID corresponding to the SQL execution tree, wherein the GTID is carried in an SQL statement on an execution node of the SQL execution tree;
the execution module is used for traversing the SQL execution tree and sending the SQL sentences on the execution nodes of the SQL execution tree to the routing module;
the routing module is used for issuing the SQL sentences issued by the execution module to the corresponding database nodes for execution and feeding back the execution results to the execution module;
the execution module is further to: and receiving an execution result, traversing the SQL execution tree according to the execution result, returning the execution result to the user after the SQL statements on the SQL execution tree are successfully and completely executed, and controlling the global transaction identification module to release the GTID.
Preferably, the execution module is further configured to: and under the condition that the execution result submitted by the partial database node is that the SQL sentence is successfully executed, and the execution result submitted by the partial database node is that the SQL sentence is unsuccessfully executed, acquiring a database log of the database node successfully executed by the SQL sentence, acquiring the successfully executed SQL sentence and front and back values of the operation data from the database log through the GTID, constructing a reverse SQL sentence according to the successfully executed SQL sentence and the front and back values of the operation data, performing a rollback operation on the successfully executed SQL sentence, and rolling back the successfully executed SQL sentence to a state consistent with the SQL sentence which is unsuccessfully executed.
Preferably, the syntax parsing module is specifically configured to:
and formulating an SQL syntax analyzer according to the SQL syntax rules and the distributed application rules, analyzing the transaction SQL statement by the SQL syntax analyzer to generate an SQL syntax tree, and reconstructing the required SQL statement according to the SQL syntax tree, wherein the SQL syntax tree comprises SQL statement nodes and SQL statement operation nodes, and if the two SQL statement nodes are in a master-slave relationship, the two SQL statement nodes are connected by the SQL statement operation nodes.
Preferably, the SQL processing module is specifically configured to:
the SQL syntax tree is deeply traversed from a root node, merging operation is started after the SQL syntax tree is traversed to a leaf node, nodes of a father-child relationship and nodes of a brother relationship which can be issued to database nodes in an SQL statement are merged into an execution node, when the current nodes cannot be merged, the current nodes are distributed and issued to the execution nodes of the SQL execution tree, and finally, the merging operation is carried out, so that the SQL execution tree which can be executed at each database node of the distributed database system is generated.
Preferably, the SQL processing module is further configured to:
when an SQL execution tree which can be executed at each database node of the distributed database system is generated, when data related to SQL statements on the SQL execution tree are distributed at the same database node, the SQL statements are marked to be directly issued to the corresponding database nodes for execution; when data related to SQL sentences on the SQL execution tree are distributed on different database nodes, the SQL sentences are marked as needing to be summarized, and sub SQL sentence marks decomposed from the SQL sentences are directly issued to corresponding database nodes for execution;
the routing module is specifically configured to: when the SQL statement mark is executed by the corresponding database node which is directly issued, the SQL statement is directly issued to the corresponding database node for execution; when the SQL sentence mark is needed to be summarized, the sub SQL sentence mark of the SQL sentence is issued to each database node for execution, and then the received execution result fed back by each database node is sent to the execution module;
the execution module is further to: and carrying out data summarization operation on the execution result.
Preferably, the execution module specifically includes:
the locking sub-module is used for inquiring GTID in the operation data of the SQL statement on each execution node of the SQL execution tree; when other transactions apply for carrying out related data operation on the GTID in the active GTID list of the current system, other transactions are prohibited from operating;
the judging submodule is used for reading the inquired GTID, comparing the GTID with a GTID list which is stored in the global transaction processing central server and is currently active by the system, and judging whether the operation data corresponding to the read GTID is active or not;
the processing submodule is used for judging whether the transaction operation is overtime or not or whether the retry number of the transaction operation exceeds a preset retry number threshold value or not under the condition that the operation data is determined to be active, executing the distributed rollback operation if the transaction operation is overtime or exceeds the retry number threshold value, and calling the locking submodule if the transaction operation is not overtime or does not exceed the retry number threshold value; and under the condition that the operation data is judged to be inactive, controlling the whole office transaction identification module to apply for a new GTID, and storing the GTID corresponding to the current active operation data into the current system active GTID list.
The invention has the following beneficial effects:
by applying for a GTID when transaction operation of a distributed multi-node database is involved, even if the transaction is distributed on different nodes, the data implemented by the transaction can be ensured to be either completely successful or completely failed through the GTID.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In order to solve the above problems in the prior art, the present invention provides a method and an apparatus for performing Transaction processing in a distributed database system based on Global Transaction Identifier (GTID), which all need to apply for a GTID, and even if a Transaction is distributed on different nodes, the Transaction implemented data is either completely successful or completely failed through the Transaction Identifier, and on the premise of satisfying ACID (which refers to the abbreviations of four basic elements that indicate correct execution of database transactions, including Atomicity (Atomicity), Consistency (Consistency), Isolation (Isolation) and persistence), system concurrency is provided as much as possible, thereby improving the overall processing efficiency of the system. In addition, when a distributed transaction Structured Query Language (SQL) statement is executed, the technical solution of the embodiment of the present invention may write the SQL statement into a database log. When partial submission is successful and partial submission is failed, the database log can ensure that the successfully submitted data can be correctly rolled back, and the strong consistency of the data is ensured. The embodiment of the invention is particularly suitable for OLTP systems with high requirements on performance and data consistency, such as the fields of finance, securities and electronic commerce.
The present invention will be described in further detail below with reference to the drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Method embodiment
According to an embodiment of the present invention, a transaction processing method of a distributed database system is provided, fig. 1 is a flowchart of the transaction processing method of the distributed database system according to the embodiment of the present invention, and as shown in fig. 1, the transaction processing method of the distributed database system according to the embodiment of the present invention includes the following processes:
step 101, a global transaction processing center server receives a transaction Structured Query Language (SQL) statement submitted by a user and generates an SQL syntax tree according to the SQL statement; the method specifically comprises the following steps:
step 1, a global transaction processing center server receives a transaction SQL statement submitted by a user;
and 2, the global transaction processing center server formulates an SQL (structured query language) syntax analyzer according to the SQL syntax rules and the distributed application rules, analyzes the transaction SQL statement through the SQL syntax analyzer to generate an SQL syntax tree, and reconstructs the required SQL statement according to the SQL syntax tree, wherein the SQL syntax tree comprises SQL statement nodes and SQL statement operation nodes, and if the two SQL statement nodes are in a master-slave relationship, the two SQL statement nodes are connected through the SQL statement operation nodes.
102, traversing the SQL syntax tree by the global transaction center server, and generating an SQL execution tree which can be executed at each database node of the distributed database system according to the distribution condition of the distributed database system; the method specifically comprises the following steps:
the global transaction processing center server deeply traverses the SQL syntax tree from a root node, starts merging operation after traversing to leaf nodes, merges nodes of a father-child relationship and a brother relationship which can be issued to database nodes in an SQL statement into an execution node, distributes and issues the current nodes to the execution node of the SQL execution tree when the current nodes cannot be merged, and finally performs the merging operation to generate the SQL execution tree which can be executed at each database node of the distributed database system.
In the embodiment of the invention, when the SQL execution tree capable of being executed at each database node of the distributed database system is generated, and when the data related to the SQL statement on the SQL execution tree are all distributed at the same database node, the SQL statement is marked as being directly issued to the corresponding database node for execution; when data related to SQL sentences on the SQL execution tree are distributed on different database nodes, the SQL sentences are marked as needing to be summarized, and sub SQL sentence marks decomposed from the SQL sentences are directly issued to corresponding database nodes for execution;
103, the global transaction processing center server applies for a global transaction identifier GTID corresponding to the SQL execution tree, traverses the SQL execution tree, and issues SQL statements on execution nodes of the SQL execution tree to corresponding database nodes for execution, wherein the GTID is carried in the SQL statements;
in step 103, issuing the SQL statements on the SQL execution tree to the corresponding database nodes for execution specifically includes:
step 1, when the SQL statement mark is executed by a corresponding database node which is directly issued, the global transaction center server directly issues the SQL statement to the corresponding database node for execution;
and 2, when the sub SQL statement mark decomposed by the SQL statement is directly issued to the corresponding database node for execution, and the SQL statement mark needs to be summarized, the global transaction processing center server issues the sub SQL statement of the SQL statement to each database node for execution, receives the execution result fed back by each database node and performs data summarization operation on the execution result.
In step 103, the application of the global transaction identifier GTID corresponding to the SQL execution tree by the global transaction processing center server specifically includes:
step 1, a global transaction processing center server inquires GTID in operation data of SQL statements on each execution node of an SQL execution tree;
step 2, the global transaction center server reads the inquired GTID, compares the GTID with a GTID list of the current active system stored in the global transaction center server, and judges whether the operation data corresponding to the read GTID is active;
step 3, under the condition that the operation data is determined to be active, the global transaction processing center server judges whether the transaction operation is overtime or not, or whether the retry number of the transaction operation exceeds a preset retry number threshold or not, if the transaction operation is overtime or exceeds the retry number threshold, the distributed rollback operation is executed, and if the transaction operation is not overtime or does not exceed the retry number threshold, the step 1 is executed;
step 4, the global transaction processing center server applies for a new GTID under the condition of judging that the operation data is not active;
and 5, the global transaction processing center server stores the GTID corresponding to the current active operation data into the current system active GTID list.
And 6, when other transactions apply for carrying out related data operation on the GTID in the active GTID list of the current system, prohibiting the other transactions from carrying out operation.
And 104, the global transaction processing center server receives the execution result, traverses the SQL execution tree according to the execution result, returns the execution result to the user after the SQL statements on the SQL execution tree are successfully executed and are finished, and releases the GTID.
In the embodiment of the invention, when each database node executes the SQL statement, corresponding GTID needs to be stored in the hidden column of the operation data of the SQL statement stored in the database node, and the front and back values of the operation data of the SQL statement are written into the database log of each database node;
according to the technical scheme of the embodiment of the invention, when the SQL sentences on the SQL execution tree are determined not to be completely executed successfully, distributed rollback operation needs to be carried out, and the distributed rollback operation specifically comprises the following processing: the global transaction processing center server acquires a database log of a database node successfully executing the SQL statement when an execution result submitted by a part of database nodes is that the SQL statement is successfully executed, acquires the successfully executed SQL statement and front and back values of operation data from the database log through GTID, constructs a reverse SQL statement according to the successfully executed SQL statement and the front and back values of the operation data, performs a rollback operation on the successfully executed SQL statement, and rolls back the successfully executed SQL statement to a state consistent with the unsuccessfully executed SQL statement.
The above technical solutions of the embodiments of the present invention are described in detail below with reference to the accompanying drawings.
Fig. 2 is a schematic system structure diagram of a transaction processing method of a distributed database system according to an embodiment of the present invention, and as shown in fig. 2, the method mainly includes: a client module 201, a metadata cache module 202, a syntax parsing module 203, an SQL processing module 204, an execution module 205, a routing module 206, a global transaction identification GTID module 207, a database monitoring module 208, and a distributed database cluster 209.
Wherein, the user submits the transaction SQL statement and the like through the client module 201 (including a computer, a mobile phone, a browser and the like).
And the metadata caching module 202 is used for caching metadata, wherein the metadata comprises table definition information, distribution information of each node of the distributed database, and internal information of the global transaction processing center server. The metadata caching module 202 may perform metadata update and query operations. When the system metadata information is changed, the information stored by the metadata cache module is synchronously updated. Meanwhile, the metadata cache module receives the query operation of other modules and returns metadata information.
And the syntax parsing module 203 is used for constructing the distributed transaction SQL statement into an SQL syntax tree suitable for single-machine execution. The syntax tree parsing module 203 parses the transaction SQL statement operation into a parsed SQL syntax tree. There are two types of nodes in the SQL syntax tree: LEX node (SQL statement node), UNIT node (SQL statement operation node). The LEX nodes comprise tables, SQL statements, join operations, where operations, having operations, group operations, order operations and other information related to the SQL statements. The UNIT node is used for performing subquery, Union operation, Join operation and the like.
If there is a master-slave relationship between two LEX, the master-slave LEX needs to be connected to the slave LEX through a UNIT node. Such as subqueries, Union operations, Join operations, etc. And the nodes forming the master-slave relationship deeply and recursively access the child nodes from the master node. The nodes in the same layer are in a sibling relationship and can be accessed and executed in parallel. Such as two nodes of a Union operation. The root node of the SQL syntax tree is the UNIT node. The execution module 205 recursively accesses child nodes from the root node until all nodes have been executed.
The SQL processing module 204 is configured to convert the SQL syntax tree parsed by the syntax parsing module into an SQL execution tree suitable for distributed database transaction operations. The conversion process from the SQL syntax tree to the SQL execution tree is shown in fig. 3, and there are these types of nodes on the distributed transaction execution plan tree:
the SQLNode does not determine whether the SQL statements can be issued directly or merged with the SQL statements of other nodes and then issued, for example, select query, insert, delete, update and their combination statements.
The UnionNode is a node which needs to do a union operation, and the UnionNode needs to return data to a parent node of the UnionNode after the union operation is performed on the data of two or more child nodes.
The JoinNode is a node which needs to perform operations of left connection, right connection and external connection, and the JoinNode needs to perform join operation on data of two or more child nodes and then returns the data to a parent node of the child nodes.
The SQL syntax tree is converted into the SQL execution tree from the SQL syntax tree, the SQL syntax tree needs to be deeply traversed from a root node, merging operation is started after the SQL syntax tree is traversed to a leaf node, the merging operation is carried out according to the principle that the nodes of the SQL syntax tree in the parent-child relationship can be placed in an SQL and issued to a database to be executed and then merged into an execution node, and the nodes of the SQL syntax tree in the brother relationship can be placed in an SQL and issued to the database to be executed and then merged into an execution node. If the current nodes can not be merged, the current nodes must be distributed and issued to SQL for execution, and then summary operation is performed.
For example, there is insert into tb1(id, name) select id, name from tb 2. After parsing the SQL syntax tree, select should take out data before insert operation. After analysis, tb1 and tb2 are distributed in the same database node and can be directly issued for execution, so that the two parent-child relationship nodes can be merged. As shown by the dashed box in fig. 3 where the parent-child relationships are merged.
For select from tb3tb4from tb3, tb 4. After the SQL syntax tree is analyzed, the data of tb3 and tb4 should be taken out, and then join operation should be performed. After analysis, tb3 and tb4 are distributed in the same database node and can be directly issued for execution, so that the two sibling relationship nodes can be merged. The dotted pair where sibling relationships merge as marked by the dotted line in fig. 4.
The process of generating the SQL execution tree is described below in conjunction with FIG. 3.
In step 1, the execution module 205 reads the SQL syntax tree parsed by the syntax parsing module 203.
In step 2, the execution module 205 initializes the temporary stack and the SQL execution tree stack.
In step 3, the execution module 205 reads the root element of the SQL syntax tree and merges it into the temporary stack.
And 4, repeating the following operations until the temporary stack is empty.
And step 41, reading the temporary stack top element, and if the stack top element is a non-leaf node element, taking out the temporary stack top element and converting the temporary stack top element into a Master-slave structure of the SQL statement. The number of slave devices may be multiple, and the multiple slave devices have sibling relationships. And sequentially putting Master and Slaver elements into the temporary stack.
Step 42, if the temporary stack top element is an SQL statement that can be executed directly to the database, the operation of 42 is repeatedly executed:
step 421, reading the top element of the SQL execution tree. And if the stack top element is empty, the temporary stack top element is added into the SQL execution tree stack. The step 42 operation is exited.
At step 422, if the top element of the SQL execution tree is 1 element. Judging whether the temporary stack top element and the SQL execution tree stack top element can be combined or not, if so, taking out the SQL execution tree stack top element and the temporary stack top element, combining and then putting into the SQL execution tree stack.
Step 423 if the execution stack top contains 2 or more than 2 elements. The top element of the SQL execution tree is read and executed to judge whether the combination can be carried out. If no, the temporary top-of-stack element is directly entered into the SQL execution tree stack, exiting step 42. If the combination is possible, the top element of the SQL execution tree is taken out, the temporary top element and the top element of the SQL execution tree are combined into one element, the element is taken as the temporary top element, and the step 42 is repeated.
The execution module 205 is configured to recursively execute the SQL statement from the root node of the SQL execution tree (the recursively executed SQL statement described herein is not actually executed on the SQL statement, and mainly obtains the SQL statement, and the executed SQL statement is performed by the database node), and issue the SQL statement to the routing module 206. The execution module 205 applies for and releases the GTID as needed, and meanwhile, when operating the execution node on the SQL execution tree, the execution module 205 performs operations such as sorting, grouping, join, and the like as needed.
The routing module 206 is configured to select a suitable database to connect the SQL statements issued by the execution module 205 and issue the SQL statements to the database node for execution, and return an operation result to the execution module 205.
And a GTID module 207 for applying, releasing, querying, etc. GTID. The GTID module 207 may maintain a list of GTIDs for the transaction operation data that is currently active. And adding an implicit column to operation data stored in the distributed database system, storing GTID, and synchronously updating the GTID of the implicit column of the data when the data is updated during distributed transaction operation. Each transaction operation applies for a GTID, and during the transaction operation, the applied GTID is updated or inserted into an implicit column of data, and the GTID of the distributed transaction operation in the current system is updated in real time. When the distributed transaction operation is involved, whether the operation data is active or not is firstly checked, namely whether the GTID of the operation data is in other transactions or not is checked, and when the operation data is in other transaction operations, the transaction operation cannot be carried out. Applying for GTID when starting a distributed transaction operation across database nodes; when the transaction operation ends, the GTID is released.
And the database monitoring module 208 is configured to monitor the status of each database of the distributed database cluster, and perform a roll-back operation on the submitted transaction.
The distributed database cluster module 209 is a distributed database cluster formed by a plurality of nodes. Each DBgroup has a database host and a database standby, database nodes which are physically distributed on different places can be arranged among the DBgroups, and a plurality of DBgroups form a distributed database cluster. When the transaction operation involves different database nodes, distributed transaction operation control is required, so that when a plurality of nodes execute the transaction SQL statement, all the transaction SQL statement is executed successfully or fails, and the data consistency is ensured. When partial execution succeeds and partial execution fails, it is necessary to ensure that data is rolled back to a coherency state by a distributed transaction roll-back operation.
The single-machine transaction rollback is ensured by the database, and the distributed database transaction rollback solves the problem that transaction operation across database nodes causes successful submission of part of nodes, and the condition that the submission of part of nodes fails needs to perform rollback operation on the successfully submitted nodes, so that the data consistency is ensured. The distributed database transaction operation records database logs by taking a transaction as a unit, and comprises query, addition, update and deletion operations. The committed distributed transaction rollback operation flow is described in conjunction with fig. 4.
In step 401, the execution module 205 initiates a rollback operation, traverses nodes of the executing SOL execution tree, and issues a rollback command to the node operation that has been issued and executed.
In step 402, the execution module 205 determines whether the transaction has committed the database and executed successfully. If the uncommitted transaction is successful, step 403 is entered, otherwise step 404 is entered, and the committed transaction rollback operation flow is entered.
Step 403, directly initiating a rollback operation. In the step, for the operation that the transaction is not successfully submitted, the rollback of the uncommitted transaction is ensured by the database, and the data consistency is ensured.
Step 404, the database log of the database node is checked. When each node of the database performs transaction operation, a database log is recorded, and the front and back values of the modified operation data are recorded for the purposes of primary and standby copying of the database nodes and the like. The database monitoring module 208 finds the transaction block in the log by using the database log through the GTID and the time point, and further finds out the SQL statement corresponding to the transaction operation and the previous and subsequent values of the operation data.
In step 405, after the database monitoring module 208 finds out the SQL statement and the previous and subsequent values of the operation data according to the GTID and the time point, it needs to further construct a reverse SQL statement. After the reverse SQL statement is constructed, the database monitoring module 208 directly issues the reverse SQL statement for execution.
For example, for insert statements, GTIDs need to be assigned to the implicit columns of the list at insert time, from which delete reverse SQL statements are constructed directly.
For the update statement, re-assignment of the GTID needs to be added in the update statement, and the update reverse SQL statement is constructed according to the GTID and the front and back values of the update operation data.
For the delete statement, because the delete statement cannot carry GTID information, a transaction cannot be rolled back directly if it contains only the delete statement. Aiming at the situation, the update operation of auxiliary SQL is added, the SQL statement block is found through the GTID of the auxiliary update statement, and the insert reverse SQL statement is constructed. For the transaction of the delete statement containing other operations, the statement block can be found through the GTID, and the insert reverse SQL statement is directly constructed.
For select statements, no data manipulation is involved, and no reverse SQL statements need to be made.
In step 406, the execution module 205 determines whether the rollback operation of each database node is completed. If not, the step 401 is entered, and the rollback operation is continued recursively. If the execution is finished, the operation is ended, and the execution module 205 returns the operation result to the client module 201.
As can be seen from the above processing, in the embodiment of the present invention, the client module 201 of the global transaction processing center server receives the transaction SQL statement submitted by the user; a syntax parsing module 203 of the global transaction processing center server generates an SQL syntax tree according to SQL statements submitted by a user; the SQL processing module 204 of the global transaction center server traverses the SQL syntax tree and generates an SQL execution tree which can be executed at each distributed database node by combining the distributed data distribution condition; the execution module 205 of the global transaction center server traverses the SQL execution tree, applies for GTID from the GTID module of the global transaction center server as needed, and issues the SQL statement on the SQL execution tree to the routing module 206; the routing module 206 issues the SQL statements to the database for execution, and returns the execution results to the execution module 205 of the global transaction center server; the execution module 205 traverses the execution tree according to the returned result, and after the traversal is completed, returns the result to the client module 201 of the global transaction center server, and notifies the GTID module to release the GTID; the client module 201 returns the operation result to the user, so that the distributed transaction operation is completed.
Further, the specific steps of the global transaction processing center server receiving the transaction SQL statement submitted by the client module 201 are as follows:
the global transaction center server receives the SQL statements submitted by the client modules 201, and the syntax parsing module 203 of the global transaction center server parses the SQL statements submitted by the client modules 201 into a syntax tree. The objective task of syntax parsing is to formulate an SQL syntax analyzer according to SQL syntax rules and distribution-oriented application rules, which can analyze the input SQL statements to generate a syntax semantic tree and reconstruct the required SQL statements from the analysis tree.
The syntax tree generated by the syntax parsing module 203 can only be executed on a single-node database and cannot be executed on a distributed database. The SQL processing module 204 of the global transaction center server traverses the SQL syntax tree and generates an SQL execution tree suitable for execution on the distributed database. When traversing the SQL syntax tree to generate the SQL execution tree, the SQL processing module 204 distributes the operation data related to the SQL statement to the same database node, and may directly issue the operation data to the database node for execution, and then marks the SQL statement as a corresponding database for execution that may directly issue; when the data related to the SQL statement is distributed in different database nodes, the SQL statement is decomposed into sub SQL statements and then is issued to each node database for execution, and then the operation result data is transmitted to the execution module 205, the execution module 205 summarizes the data, the SQL statement is identified as requiring summary operation, and the sub SQL statement is identified as being capable of being directly issued to the database for direct execution, for example, Union operation, Join operation, and the like. Each execution node on the SQL execution tree generated according to the SQL syntax tree is either an SQL statement that can be directly issued to the database for execution, or an execution node that needs to perform a summary operation on the node.
After the SQL processing module 204 generates the SQL execution tree suitable for the distributed database, the execution module 205 applies the GTID of the transaction SQL to the GTID module 207 according to the needs of the transaction SQL. The GTID is used for SQL statements involved in transactions that need to be executed at different database nodes to ensure that either all operations succeed or all operations fail. The execution module 205 traverses the execution plan tree from the root node in a deep recursive manner, and issues the SQL statement of the execution node to the routing module 206.
When receiving the SQL statement submitted by the execution module 205, the routing module 206 selects an appropriate database to connect and issue the SQL statement according to the data distribution situation related to the SQL statement, and waits for the database to return an execution result.
After receiving the operation result or data returned by the routing module 206, the execution module 205 performs further operations, such as sorting, merging, and deduplication. If the SQL execution tree is not executed completely, the execution nodes on the SQL execution tree are executed continuously in a recursion mode. After the execution module 205 completes the recursive execution of the SQL execution tree, it sends a GTID release request to the GTID module 207, and releases the GTID. The execution module 205 returns the transaction operation results to the client.
Fig. 5 is a flowchart of detailed processing of a transaction processing method of a distributed database system according to an embodiment of the present invention, and as shown in fig. 5, the detailed processing specifically includes the following processing:
in step 501, the client module 201 issues a transaction SQL statement.
In step 502, the syntax parsing module 203 parses the SQL statement issued by the client module 201 into an SQL syntax tree. The syntax parsing module 203 can analyze the input SQL statements, generate syntax semantic trees, and reconstruct the required SQL statements from the syntax semantic trees. The generation process of the SQL syntax tree is described in fig. 3.
In step 503, the SQL syntax tree generated by the syntax parsing module 203 can only be executed on a single-node database, and cannot be executed on a distributed database. The SQL processing module 204 of the global transaction center server traverses the SQL syntax tree and generates an SQL execution tree suitable for execution on the distributed database. When the SQL processing module 204 traverses the SQL syntax tree to generate the SQL execution tree, the data related to the SQL statement are all distributed in the same database node and can be directly issued to the database node for execution, and the SQL statement is identified as the corresponding database node that can be directly issued for execution; when the data related to the related SQL statement is distributed in different database nodes, the SQL statement is decomposed into sub-SQL statements and then is issued to each database node for execution, and then the data is transmitted to the execution module 205, and the execution module 205 summarizes the data, where the SQL statement is identified as needing to be summarized, and the sub-SQL statement is identified as being capable of being directly issued to the database nodes for direct execution, such as Union operation, Join operation, and the like. Each execution node on the SQL execution tree generated according to the SQL syntax tree is either an SQL statement that can be directly issued to the database for execution, or an execution node that needs to perform a summary operation on the node. The process of generating the SQL execution tree and performing traversal execution is shown in fig. 3.
Step 504, the execution module 205 queries the GTID of the operation data of each node;
in step 505, the read operation data may be in other distributed transaction operations, and the execution module 205 needs to compare the read GTID with the GTID list of the current system activity stored in the GTID module 207, and determine whether the GTID of the operation data of the transaction is active. Step 506 is entered if the operational data is active, otherwise step 507 is entered.
In step 506, the execution module 205 determines whether the transaction operation is overtime or whether the retry number exceeds the retry number of execution, if the transaction operation is overtime or exceeds the retry number, the execution module 205 proceeds to step 511 to execute the distributed transaction rollback operation, otherwise, the execution module reenters 504 to query the data and lock the data.
In step 507, the execution module 205 now determines that the data of the operation is in an inactive state, and may perform a transaction operation, and at this time, applies for a new GTID from the GTID module 207.
In step 508, the execution module 205 may perform the operations of the nodes of the plan tree by performing recursive operations starting from the root node. Each execution node on the SQL execution tree may be an SQL statement, a Union operation, a merge operation, and a Join operation that are directly issued, or may be an operation node that needs to be further decomposed into a parent-child relationship, and the execution of the parent node depends on the execution of the child node.
In step 509, the execution module 205 issues the SQL statement of the execution node on the SQL execution tree to the routing module 206, and the routing module 206 selects a suitable database to execute the SQL statement. Each SQL statement to be issued indicates a database node to be issued, and the routing module 206 issues the SQL statement according to the selected rule.
Step 510, the routing module 206 determines whether the SQL statements sent to the nodes are executed successfully, and if the SQL statements are executed unsuccessfully, the step 511 is executed to execute the distributed rollback operation; otherwise, go to step 512.
In step 511, the execution module 205 performs a distributed rollback operation, wherein the detailed processing of the distributed rollback operation is shown in fig. 5.
In step 512, the routing module 206 returns the result to the corresponding operation node on the execution plan tree for further operation by the execution module 205.
In step 513, the execution module 205 performs recursive execution to determine whether the execution of the operation node on the SQL execution tree is completed. If the execution is completed, step 514 is entered, otherwise, the operation nodes of the plan tree are recursively executed 508.
In step 514, the execution module 205 sends a release GTID request to the GTID module 207.
In step 515, the execution module 205 returns the operation result to the client module 201.
In the above processing, when other transactions are operating, it is necessary to determine whether the GTID is active, and if the GTID is active, it indicates that other transactions are operating, other transactions may be automatically prohibited from operating.
In summary, the technical solution of the embodiment of the present invention is different from a distributed transaction method such as a two-phase commit transaction, and when a transaction operation of a distributed multi-node database is involved, a GTID is applied, and even if the transaction is distributed on different nodes, the GTID can ensure that the transaction operation is either completely successful or completely failed. On one hand, when the distributed transaction is involved, all SQL statement operations write the database log, so that the rollback operation can be performed through the database log under the condition that the software and hardware have faults, and the strong consistency of data is ensured. On the other hand, because the distributed transaction method based on GTID only adds the write lock to the relevant data during the data writing operation, the granularity of the lock is smaller, and the concurrency of the system is ensured. The technical scheme of the embodiment of the invention is particularly suitable for occasions requiring high concurrency and needing to ensure strong data consistency, such as finance, electronic commerce, securities and the like.
System embodiment
According to an embodiment of the present invention, there is provided a transaction processing apparatus of a distributed database system, fig. 6 is a schematic structural diagram of the transaction processing apparatus of the distributed database system according to the embodiment of the present invention, and as shown in fig. 6, the transaction processing apparatus of the distributed database system according to the embodiment of the present invention includes: a client module 601 (corresponding to the client module 201 shown in fig. 2), a syntax parsing module 602 (corresponding to the syntax parsing module 203 shown in fig. 2), an SQL processing module 603 (corresponding to the SQL processing module 204 shown in fig. 2), a global transaction identification module 604 (corresponding to the GTID module 207 shown in fig. 2), an execution module 605 (corresponding to the execution module 205 shown in fig. 2), and a routing module 606 (corresponding to the routing module 206 shown in fig. 2), which describe each module of the embodiment of the present invention in detail below.
A client module 601 for receiving a transaction Structured Query Language (SQL) statement submitted by a user
A syntax parsing module 602, configured to generate an SQL syntax tree according to the SQL statement; the syntax parsing module 602 is specifically configured to:
and formulating an SQL syntax analyzer according to the SQL syntax rules and the distributed application rules, analyzing the transaction SQL statement by the SQL syntax analyzer to generate an SQL syntax tree, and reconstructing the required SQL statement according to the SQL syntax tree, wherein the SQL syntax tree comprises SQL statement nodes and SQL statement operation nodes, and if the two SQL statement nodes are in a master-slave relationship, the two SQL statement nodes are connected by the SQL statement operation nodes.
The SQL processing module 603 is configured to traverse the SQL syntax tree and generate an SQL execution tree that can be executed at each database node of the distributed database system according to the distribution of the distributed database system; the SQL processing module 603 is specifically configured to:
the SQL syntax tree is traversed from the root node deeply, merging operation is started after the SQL syntax tree is traversed to the leaf nodes, nodes of father-child relationship and brother relationship which can be issued to database nodes in an SQL statement are merged into an execution node, when the current nodes cannot be merged, the current nodes are distributed and issued to the execution nodes of the SQL execution tree, and finally, the merging operation is carried out, so that the SQL execution tree which can be executed at each database node of the distributed database system is generated.
Preferably, the SQL processing module 603 is further configured to: when an SQL execution tree which can be executed at each database node of the distributed database system is generated, when data related to SQL statements on the SQL execution tree are distributed at the same database node, the SQL statements are marked to be directly issued to the corresponding database nodes for execution; when data related to SQL sentences on the SQL execution tree are distributed on different database nodes, the SQL sentences are marked as needing to be summarized, and sub SQL sentence marks decomposed from the SQL sentences are directly issued to corresponding database nodes for execution;
the global transaction identification module 604 is configured to apply for a global transaction identification GTID corresponding to the SQL execution tree, where the GTID is carried in an SQL statement on an execution node of the SQL execution tree;
the execution module 605 is configured to traverse the SQL execution tree and issue the SQL statements on the execution nodes of the SQL execution tree to the routing module 606;
the execution module 605 is further configured to: and receiving an execution result, traversing the SQL execution tree according to the execution result, returning the execution result to the user after the SQL statements on the SQL execution tree are successfully and completely executed, and controlling the global transaction identification module 604 to release the GTID.
The execution module 605 is further configured to: the method is used for acquiring a database log of a database node successfully executing the SQL statement under the condition that an execution result submitted by a part of database nodes is successful in executing the SQL statement and an execution result submitted by a part of database nodes is failed in executing the SQL statement, acquiring the successfully executed SQL statement and front and back values of operation data from the database log through GTID, constructing a reverse SQL statement according to the successfully executed SQL statement and the front and back values of the operation data, performing rollback operation on the successfully executed SQL statement, and rolling back the successfully executed SQL statement to a state consistent with the unsuccessfully executed SQL statement.
The routing module 606 is configured to issue the SQL statements issued by the execution module 605 to corresponding database nodes for execution, and feed back execution results to the execution module 605;
the routing module 606 is specifically configured to: when the SQL statement mark is executed by the corresponding database node which is directly issued, the SQL statement is directly issued to the corresponding database node for execution; when the sub SQL statement identifier decomposed by the SQL statement is directly issued to the corresponding database node for execution, and the SQL statement identifier needs to be summarized, the sub SQL statement of the SQL statement is issued to each database node for execution, and then the received execution result fed back by each database node is sent to the execution module 605;
the execution module 605 is further configured to: and carrying out data summarization operation on the execution result.
The execution module 605 specifically includes:
the locking sub-module is used for inquiring GTID in the operation data of the SQL statement on each execution node of the SQL execution tree; when other transactions apply for carrying out related data operation on the GTID in the active GTID list of the current system, other transactions are prohibited from operating;
the judging submodule is used for reading the inquired GTID, comparing the GTID with a GTID list which is stored in the global transaction processing central server and is currently active by the system, and judging whether the operation data corresponding to the read GTID is active or not;
the processing submodule is used for judging whether the transaction operation is overtime or not or whether the retry number of the transaction operation exceeds a preset retry number threshold value or not under the condition that the operation data is determined to be active, executing the distributed rollback operation if the transaction operation is overtime or exceeds the retry number threshold value, and calling the locking submodule if the transaction operation is not overtime or does not exceed the retry number threshold value; when the operation data is determined to be inactive, the control global transaction identification module 604 applies for a new GTID, and stores the GTID corresponding to the currently active operation data in the currently active GTID list.
The processing of each module in the embodiment of the present invention may be understood by referring to the corresponding description in the method embodiment, and is not described herein again.
In summary, the technical solution of the embodiment of the present invention is different from a distributed transaction method such as a two-phase commit transaction, and when a transaction operation of a distributed multi-node database is involved, a GTID is applied, and even if the transaction is distributed on different nodes, the GTID can ensure that the transaction operation is either completely successful or completely failed. On one hand, when the distributed transaction is involved, all SQL statement operations write the database log, so that the rollback operation can be performed through the database log under the condition that the software and hardware have faults, and the strong consistency of data is ensured. On the other hand, because the distributed transaction method based on GTID only adds the write lock to the relevant data during the data writing operation, the granularity of the lock is smaller, and the concurrency of the system is ensured. The technical scheme of the embodiment of the invention is particularly suitable for occasions requiring high concurrency and needing to ensure strong data consistency, such as finance, electronic commerce, securities and the like.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the client in an embodiment may be adaptively changed and provided in one or more clients different from the embodiment. The modules of the embodiments may be combined into one module and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or client so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in a client loaded with a ranking website according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.