[go: up one dir, main page]

CN113836238B - Batch processing method and device for data commands - Google Patents

Batch processing method and device for data commands Download PDF

Info

Publication number
CN113836238B
CN113836238B CN202111164186.0A CN202111164186A CN113836238B CN 113836238 B CN113836238 B CN 113836238B CN 202111164186 A CN202111164186 A CN 202111164186A CN 113836238 B CN113836238 B CN 113836238B
Authority
CN
China
Prior art keywords
data
command
commands
cluster
batch processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111164186.0A
Other languages
Chinese (zh)
Other versions
CN113836238A (en
Inventor
何华峰
刘宇霆
张鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN202111164186.0A priority Critical patent/CN113836238B/en
Publication of CN113836238A publication Critical patent/CN113836238A/en
Application granted granted Critical
Publication of CN113836238B publication Critical patent/CN113836238B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a batch processing method and device of data commands, electronic equipment and a storage medium, wherein the method can comprise the steps of reading a data command set aiming at a target database cluster from a command queue; the method comprises the steps of determining data command groups in a data command set, wherein the data command groups correspond to all cluster nodes in a database cluster respectively, merging all data commands in any data command group into corresponding batch processing requests based on pipeline technology when any data command group contains a plurality of data commands, and submitting the batch processing requests to target cluster nodes corresponding to any data command group so that the target cluster nodes can process the data commands contained in the batch processing requests in batches.

Description

Batch processing method and device for data commands
Technical Field
The application relates to the field of databases, in particular to a batch processing method and device for data commands.
Background
A database is an integral part of the core in computer software, and a database cluster is an organized form of database that contains a plurality of cluster nodes. In the related art, when a related management tool submits data commands to be processed to a database cluster, the database cluster often only supports processing the data commands in a manner of submitting one data command at a time, and it is difficult to meet the requirement of high efficiency.
Disclosure of Invention
In view of the above, the present application provides a method and apparatus for batch processing of data commands.
Specifically, the application is realized by the following technical scheme:
according to a first aspect of the present application, there is provided a batch processing method for data commands, the method comprising:
reading a data command set for a target database cluster from a command queue;
Determining data command groups in the data command set, wherein the data command groups respectively correspond to all cluster nodes in the database cluster;
When any data command group contains a plurality of data commands, merging all the data commands in the any data command group into corresponding batch processing requests based on pipeline technology;
And submitting the batch processing request to a target cluster node corresponding to any data command group, so that the target cluster node performs batch processing on the data commands contained in the batch processing request.
According to a second aspect of the present application, there is provided a batch processing apparatus for data commands, the apparatus comprising:
A reading unit for reading a data command set for a target database cluster from a command queue;
a determining unit, configured to determine data command groups in the data command set respectively corresponding to each cluster node in the database cluster;
A merging unit, when any data command group contains a plurality of data commands, merging all the data commands in the any data command group into corresponding batch processing requests based on pipeline technology;
and the submitting unit is used for submitting the batch processing request to a target cluster node corresponding to any data command group so that the target cluster node performs batch processing on the data commands contained in the batch processing request.
According to a third aspect of the present application, there is provided an electronic device comprising:
A processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to implement the method as described in the embodiments of the first aspect described above by executing the executable instructions.
According to a fourth aspect of embodiments of the present application there is provided a computer readable storage medium having stored thereon computer instructions which when executed by a processor perform the steps of the method as described in the embodiments of the first aspect above.
The technical scheme provided by the application can be seen that the data commands are stored in the command queue, the data command group corresponding to a certain node in the database cluster is obtained from the command queue, and then the data commands in the same data command group can be combined into batch processing requests by utilizing a pipeline technology and are submitted to the corresponding cluster nodes in batches. The technical scheme of the application can distribute the data commands, further combine the distributed data commands into batch processing requests, and submit the batch processing requests to corresponding cluster nodes at one time. Compared with the mode that one data command is submitted once every time, the method reduces the number of data interaction, reduces the consumption of transmission resources and improves the efficiency of processing the data commands.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a flow chart illustrating a method of batch processing of data commands according to an exemplary embodiment of the application;
FIG. 2 is a schematic diagram of a network architecture to which a data command batch processing method according to an embodiment of the present application is applied;
FIG. 3 is a specific flow diagram illustrating a method of data command batch processing according to an exemplary embodiment of the present application;
FIG. 4 is a schematic diagram of an electronic device, according to an exemplary embodiment of the application;
Fig. 5 is a block diagram illustrating a data command batch processing apparatus according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.
The embodiments of the present application will be described in detail.
FIG. 1 is a flow chart illustrating a method of batch processing of data commands according to an exemplary embodiment of the application. As shown in fig. 1, the above method may include the steps of:
Step 102, reading a data command set for a target database cluster from a command queue.
In one embodiment, the method is applied to a tool that can operate on a database, such as an ETL (Extract-Transform-Load) tool or other tools that can operate on a database, which is not limited in this regard.
In an embodiment, the target database cluster includes a plurality of database nodes, and each database node is matched with each other to realize the basic functions of the database cluster. In external expression forms, each database node can be expressed as different entity servers or different logic units divided in the same server, and the expression forms of the database clusters are not limited by the application. The data commands may be stored in a command queue, where the data commands are used to instruct specific operations to be performed on the target database cluster, for example, the data commands may include a data write command for writing data from the target database cluster, where the data to be written may be included in the data write command, or the data commands may include a read command for reading specific data from the target database cluster, where the application does not limit the types of data commands, and the data command set may include only one type of data command, or may include multiple types of data commands.
In an embodiment, the data commands in the command queue are arranged sequentially in a time order, and the relative order between the data commands contained in the batch processing request is the same as it is in the command queue. The command queue maintains a first-in first-out data structure, and when the data command is read from the command queue before the data command stored in the command queue, the data command stored in the command queue is first read, and then other data commands arranged in the command queue are sequentially read. The data commands stored in the command queue may be sequentially arranged according to a time sequence, for example, the data commands may be sequentially arranged in the command queue according to a time when the data commands are received by the queue, or may be sequentially arranged according to a generation time or a transmission time of the data commands, in this way, the generation time or the transmission time of the data commands may be recorded in the data commands, and when the ETL tool stores the data commands in the command queue, the ETL tool obtains the generation time or the transmission time of the data commands by analyzing the data commands, and further sequentially stores the data commands in the command queue according to a sequence of the generation time or the transmission time.
In an embodiment, when the number of data commands in the command queue reaches a preset threshold, a set of data commands for the target database cluster is read from the command queue. In order to prevent backlog of data due to an excessive number of data commands in the command queue, a preset threshold may be set according to the actual processing capacity.
Step 104, determining data command groups in the data command set, wherein the data command groups respectively correspond to all cluster nodes in the database cluster.
In one embodiment, since the data command is processed by the database cluster, in order for each node in the database cluster to share the data processing pressure, the pressure of processing the data command may be distributed to each node in the database cluster, so that the data command is divided into a plurality of data command groups, and each data command group has a cluster node corresponding to the data command group for processing the data command in the data command group.
Specifically, the data command may be allocated by matching hash slots, each cluster node included in the database cluster corresponds to a hash slot range managed by each cluster node, and the ETL tool may calculate a hash slot corresponding to any data command according to a key value of the data command in the data command set, and allocate the data command to a corresponding cluster node according to the hash slot of the node. For example, assuming that the entire database cluster defines 30000 hash slots in total, including three cluster nodes, node a manages hash slots of 0-10000, node B manages hash slots of 10001-20000, node C manages hash slots of 20001-30000, the ETL tool may calculate key values included in any one data command by using the CRC16 algorithm, and if the result of the calculation is 3000 and falls within the hash slot managed by node a, the data command should be allocated to the data command group corresponding to node a for processing by node a.
In one case, in the command queue, the data commands with the front time are arranged before the data commands with the rear time, and the data commands are arranged according to the time sequence, so that the original sequence of the data commands is not disturbed when the data commands are distributed to each data command group in order to avoid the situation that the database cluster cannot process due to the fact that the sequence of the data commands is disturbed when the dependency relationship exists between the data commands. For example, assuming that the data command generated first is a data write command and the data command generated later is a data read command, before the data write command generated first is arranged in the command queue and before the data read command generated later, if the data commands are all allocated to the same data command group, the data write command is still arranged before the data read command in the data command group, by the above manner, the database cluster will process the data write command first and then process the data read command when processing the data command, and no situation that the data cannot be read because the data has not been written yet will occur.
Step 106, when any data command group contains a plurality of data commands, merging all the data commands in the data command group into corresponding batch processing requests based on pipeline technology.
And step 108, submitting the batch processing request to a target cluster node corresponding to any data command group, so that the target cluster node performs batch processing on the data commands contained in the batch processing request.
In one embodiment, when any one data command group contains a plurality of data commands, if each data command is respectively submitted to a corresponding cluster node, the interaction times are increased, so that all data commands in any one data command group can be combined into a batch processing request based on pipeline technology, and all data commands contained in the data command group are contained in the batch processing request. After the processing of the pipeline technology, all data commands in the data command group can be acquired by the corresponding cluster nodes only by submitting the batch processing request once, and after the cluster nodes process all the data commands in the batch processing request, the processing results can be combined into a return result without repeated return.
In an embodiment, when the batch processing request is submitted to the target cluster node corresponding to the any data command group, the target connection corresponding to the target cluster node may be obtained from the connection pool of the database cluster, and the batch processing request is submitted to the target cluster node by using the target connection. The connections in the connection pool are established when the database cluster is initialized, and the cluster nodes all have the corresponding connections. In order to improve the submitting efficiency, when submitting the batch processing request, the initialized connection in the connection pool can be directly borrowed, and the corresponding batch processing request is submitted by utilizing the connection.
In one embodiment, when any one of the data commands fails to execute, stopping executing the data command which is not executed, redefining cluster nodes contained in the database cluster and hash slot ranges managed by each cluster node, forming new data command groups corresponding to each newly determined cluster node according to the newly determined hash slot ranges, and merging all the data commands in the new data command groups into corresponding batch processing requests based on pipeline technology. In this embodiment, if the storage space of a certain cluster node cannot accommodate the data to be written or if the certain cluster node is disconnected due to network reasons, then when execution of a certain data command may fail, in this case, in order to keep the normal running of the database cluster, the available cluster nodes in the database cluster may be reduced or increased, and in the case that the total number of hash slots is fixed, the distribution of the hash slots will also change due to the change of the number of nodes. In order to adapt to the change of the cluster nodes and ensure that the data processing flow is not interrupted, when any data command fails to execute, the cluster nodes can be stopped from continuously executing other data commands, and after the database cluster redistributes the hash slots corresponding to all the nodes, the cluster nodes contained in the database cluster and the hash slot range managed by all the nodes are determined. And according to the reassignment result, the command which is not executed is re-divided into the data command group corresponding to the existing node, and the newly generated batch processing request is submitted by utilizing the pipeline technology again.
In an embodiment, the above method may be applied as well, if it is desired to achieve data bulk synchronization between two databases. For example, when the data command is a data writing command, the ETL tool may extract data from the source database in batches, and include the data to be written into the data writing command, that is, each data writing command includes a piece of complete data to be written, so that the cluster node corresponding to the data writing command may be calculated according to the key value of the data to be written. The database cluster is the target database, and by using the method, the data in the source database can be written into the target database in batches, so that the efficiency of data synchronization is improved.
The technical scheme provided by the application can be seen that the data commands are stored in the command queue, the data command group corresponding to a certain node in the database cluster is obtained from the command queue, and then the data commands in the same data command group can be combined into batch processing requests by utilizing a pipeline technology and are submitted to the corresponding cluster nodes in batches. The technical scheme of the application can distribute the data commands, further combine the distributed data commands into batch processing requests, and submit the batch processing requests to corresponding cluster nodes at one time. Compared with the mode that one data command is submitted once every time, the method reduces the number of data interaction, reduces the consumption of transmission resources and improves the efficiency of processing the data commands. In addition, the application sets a fault processing mechanism according to the fault condition in the data command processing process, and redistributes unprocessed data commands according to the update condition of nodes in the database cluster, thereby improving the fluency of command processing.
Fig. 2 is a schematic diagram of a network architecture to which the data command batch processing method according to the embodiment of the present application is applied. As shown in fig. 2, ETL tool 22 may batch extract data from source database 21 and write to database cluster 23, where database cluster 23 contains a plurality of cluster nodes 1, 2.
FIG. 3 is a flowchart of a data command batch processing method according to an embodiment of the present application, and the steps of the method of FIG. 3 are described in detail below with reference to FIG. 2:
step 302, extracting data to be written from a source database.
The ETL tool 22 may extract data to be written from the source database in batches and write the data to the database cluster 23, and the application is not limited to the type of the source database, for example, the source database may be a distributed database such as a database cluster, or may be a relational or non-relational database. It should be noted that, in this embodiment, a database cluster is taken as an example of a redis cluster.
Step 304, a data write command is generated and stored in a command queue.
The ETL tool 22 may generate data write commands according to the data to be written, where each data write command includes original data to be written that needs to be written, and the data to be written usually exists in a Key-Value pair (Key-Value) form. The generated data writing commands can be stored in a command queue, the data writing commands in the command queue are sequentially arranged according to time sequence, the command queue maintains a first-in first-out data structure, the data writing commands stored in the command queue are arranged before the data writing commands stored in the command queue later, and when the data writing commands are read from the command queue later, the data writing commands stored in the command queue are firstly read, and then other data writing commands arranged later are sequentially read. The data writing commands stored in the command queue are sequentially arranged according to time sequence, for example, the data writing commands can be sequentially arranged in the command queue according to the time when the data writing commands are received by the queue, or the data writing commands can be sequentially arranged according to the generation time or the sending time of the data writing commands, in this way, the generation time or the sending time of the data writing commands can be recorded in the data writing commands, and when the data writing commands are stored in the command queue, the ETL tool obtains the generation time or the sending time of the data writing commands by analyzing the data writing commands, and then sequentially stores the data writing commands in the command queue according to the sequence of the generation time or the sending time.
Step 306, a set of data write commands is read.
Step 308, determining a data write command group corresponding to each cluster node.
In the above two steps, the ETL tool 22 may read the set of data write commands from the command queue, and allocate the data write commands according to the hash slot range managed by each redis node in the redis cluster 23, where each redis node corresponds to a set of data write commands that the redis node needs to process. Specifically, the redis cluster 23 includes a plurality of redis nodes, assuming that the entire redis cluster 23 defines 16384 hash slots (slots), each redis node manages a part of the hash slots, assuming that the redis cluster 23 includes 3 redis nodes, the node a manages slots of 0-5500, the node B manages slots of 5501-11000, the node C manages slots of 11001-16383, the ETL tool may calculate, by using the CRC16 algorithm, a Key value (Key) including data to be written in any one data write command, and if the result of the calculation is 3000, and falls within the range of slots managed by the node a, the data write command should be allocated to the data write command group corresponding to the node a for processing by the node a.
In step 310, a batch processing request is generated.
In this step, using node A as an example, ETL tool 22 may combine the data write commands assigned to this node for processing to generate a batch processing request corresponding to node A. Specifically, the ETL tool 22 merges the data write commands, and the message of the batch processing request generated after merging includes the data write command that needs to be processed by the node.
At step 312, a connection is obtained from the connection pool.
In the redis cluster 23, if interaction with the redis node is required, data interaction with the redis node is required by establishing a connection between the jedis object and the redis node and by connecting the jedis object and the redis node. Typically, each redis node corresponds to one jedis object, and in order to avoid repeatedly establishing a connection during the interaction, the redis cluster 23 may provide a connection pool, where a connection between the jedis object and the redis node is stored, and the connection stored in the connection pool is initialized when the redis cluster 23 is started. In this step, the ETL tool 22 needs to submit the batch processing request to the corresponding redis node, and may determine, from the connection pool, the connection to be borrowed according to the node corresponding to the batch processing request, and further obtain the determined connection from the connection pool, for use in the submitting process in step 314. It should be noted that the borrowed connection may be released and restored to the connection pool after the commit is completed.
At step 314, the batch processing request is submitted to the database cluster.
In this step, the ETL tool 22 may submit the batch processing request to the corresponding redis node in the redis cluster 23 by using the borrowed connection, where the redis node responds to the data writing command included in the batch processing request, writes the data to be written included in the batch processing request, and completes the batch writing process of the data to be written.
Corresponding to the above method embodiments, the present specification also provides an embodiment of an apparatus.
Fig. 4 is a schematic diagram of an electronic device of a data command batch processing apparatus according to an exemplary embodiment of the present application. Referring to fig. 4, at the hardware level, the electronic device includes a processor 402, an internal bus 404, a network interface 406, a memory 408, and a non-volatile storage 410, although other hardware required for other services is possible. The processor 402 reads the corresponding computer program from the non-volatile memory 410 into the memory 408 and then runs, forming a means of batch processing data commands at the logic level. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present application, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
Fig. 5 is a block diagram illustrating a data command batch processing apparatus according to an exemplary embodiment of the present application. Referring to fig. 5, the apparatus includes a reading unit 502, a determining unit 504, a merging unit 506, and a submitting unit 508, wherein:
a reading unit 502, configured to read a data command set for a target database cluster from a command queue;
a determining unit 504, configured to determine data command groups in the data command set respectively corresponding to each cluster node in the database cluster;
A merging unit 506, configured to merge all data commands in any data command group into corresponding batch processing requests based on pipeline technology when the data command group includes a plurality of data commands;
and the submitting unit 508 is configured to submit the batch processing request to a target cluster node corresponding to the any one data command group, so that the target cluster node performs batch processing on the data command included in the batch processing request.
Optionally, the reading the data command set for the target database cluster from the command queue includes:
And when the number of the data commands in the command queue reaches a preset threshold, reading the data command set aiming at the target database cluster from the command queue.
Optionally, the data commands in the command queue are sequentially arranged according to a time sequence, and the relative sequence of the data commands contained in the batch processing request is the same as that of the data commands in the command queue.
Optionally, the determining the data command group in the data command set corresponding to each cluster node in the database cluster includes:
calculating a hash slot corresponding to the data command according to the key value of the data command in the data command set;
respectively determining hash slot ranges managed by all cluster nodes contained in the database cluster;
determining a data command group corresponding to each cluster node contained in the database cluster from the data command set, wherein a hash slot corresponding to a data command contained in the data command group distributed to any cluster node falls into a hash slot range managed by any cluster node.
Optionally, the apparatus further includes a fault handling unit 510:
Stopping continuing to execute the data command which is not executed when any data command fails to execute;
determining cluster nodes contained in the database cluster again, and hash slot ranges managed by all cluster nodes;
And forming the data commands which are not executed into new data command groups corresponding to each newly determined cluster node according to the newly determined hash slot range, and combining all the data commands in the new data command groups into corresponding batch processing requests based on pipeline technology.
Optionally, the apparatus further includes a connection obtaining unit 512, configured to submit the batch processing request to a target cluster node corresponding to the any one data command group, where the connection obtaining unit includes:
Obtaining target connection corresponding to the target cluster node from a connection pool of the database cluster;
and submitting the batch processing request to the target cluster node by utilizing the target connection.
Optionally, the data command is a data writing command, where the data writing command includes data to be written extracted from a source database;
The database cluster is a target database, and the data writing command is used for writing the contained data to be written into the target database.
The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present application without undue burden.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, e.g., a memory, comprising instructions executable by a processor of a data command batch processing device to implement a method as described in any of the above embodiments, e.g., the method may include:
The method comprises the steps of receiving a data command set aiming at a target database cluster from a command queue, determining data command groups respectively corresponding to all cluster nodes in the database cluster in the data command set, merging all data commands in any data command group into corresponding batch processing requests based on a pipeline technology when any data command group contains a plurality of data commands, and submitting the batch processing requests to the target cluster nodes corresponding to any data command group so that the target cluster nodes can process the data commands contained in the batch processing requests in batches.
Wherein the non-transitory computer readable storage medium may be a ROM, random-access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc., and the application is not limited thereto.
The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the application.

Claims (7)

1.一种数据命令的批量处理方法,其特征在于,所述方法包括:1. A method for batch processing of data commands, characterized in that the method comprises: 从命令队列中读取针对目标数据库集群的数据命令集合;Read a data command set for the target database cluster from the command queue; 根据所述数据命令集合中的数据命令的键值计算所述数据命令对应的哈希槽;Calculate the hash slot corresponding to the data command according to the key value of the data command in the data command set; 分别确定所述数据库集群包含的各个集群节点管理的哈希槽范围;Determine the hash slot range managed by each cluster node included in the database cluster respectively; 从所述数据命令集合中确定所述数据库集群包含的各个集群节点对应的数据命令组,分配至任一集群节点的数据命令组中包含的数据命令所对应的哈希槽落入所述任一集群节点管理的哈希槽范围内;Determine, from the data command set, a data command group corresponding to each cluster node included in the database cluster, and a hash slot corresponding to a data command included in the data command group allocated to any cluster node falls within a range of hash slots managed by any cluster node; 当任一数据命令组包含多条数据命令时,基于管道技术将所述任一数据命令组中的所有数据命令合并为相应的批量处理请求;其中,所述命令队列中的数据命令按照时间顺序依次排列,所述批量处理请求中包含的数据命令之间的相对顺序与其在所述命令队列中相同;When any data command group contains multiple data commands, all data commands in the data command group are merged into a corresponding batch processing request based on pipeline technology; wherein the data commands in the command queue are arranged in chronological order, and the relative order between the data commands contained in the batch processing request is the same as that in the command queue; 将所述批量处理请求提交至所述任一数据命令组对应的目标集群节点,以使所述目标集群节点对所述批量处理请求中包含的数据命令进行批量处理;Submitting the batch processing request to a target cluster node corresponding to any one of the data command groups, so that the target cluster node performs batch processing on the data commands included in the batch processing request; 当任一数据命令执行失败时,停止继续执行尚未执行的数据命令;重新确定所述数据库集群中包含的集群节点,以及各个集群节点管理的哈希槽范围;根据新确定的哈希槽范围,将所述尚未执行的数据命令形成分别对应于各个新确定的集群节点的新的数据命令组,进而基于管道技术将所述新的数据命令组中的所有数据命令合并为相应的批量处理请求。When any data command fails to execute, stop executing the data commands that have not yet been executed; redefine the cluster nodes included in the database cluster and the hash slot range managed by each cluster node; based on the newly determined hash slot range, form the data commands that have not yet been executed into new data command groups corresponding to each newly determined cluster node, and then merge all data commands in the new data command group into corresponding batch processing requests based on pipeline technology. 2.根据权利要求1所述的方法,其特征在于,所述从命令队列中读取针对目标数据库集群的数据命令集合,包括:2. The method according to claim 1, wherein reading a data command set for a target database cluster from a command queue comprises: 当所述命令队列中的数据命令数量达到预设阈值时,从所述命令队列中读取针对所述目标数据库集群的数据命令集合。When the number of data commands in the command queue reaches a preset threshold, a data command set for the target database cluster is read from the command queue. 3.根据权利要求1所述的方法,其特征在于,所述将所述批量处理请求提交至所述任一数据命令组对应的目标集群节点,包括:3. The method according to claim 1, wherein submitting the batch processing request to the target cluster node corresponding to any one of the data command groups comprises: 从所述数据库集群的连接池中获取所述目标集群节点对应的目标连接;Obtaining a target connection corresponding to the target cluster node from the connection pool of the database cluster; 利用所述目标连接将所述批量处理请求提交至所述目标集群节点。The batch processing request is submitted to the target cluster node using the target connection. 4.根据权利要求1所述的方法,其特征在于,4. The method according to claim 1, characterized in that: 所述数据命令为数据写入命令,所述数据写入命令中包含从源数据库抽取的待写入数据;The data command is a data write command, and the data write command includes the data to be written extracted from the source database; 所述数据库集群为目的数据库,所述数据写入命令用于将所含的待写入数据写入所述目的数据库。The database cluster is a destination database, and the data write command is used to write the contained data to be written into the destination database. 5.一种数据命令的批量处理装置,其特征在于,所述装置包括:5. A batch processing device for data commands, characterized in that the device comprises: 读取单元,用于从命令队列中读取针对目标数据库集群的数据命令集合;A reading unit, used for reading a data command set for a target database cluster from a command queue; 确定单元,用于根据所述数据命令集合中的数据命令的键值计算所述数据命令对应的哈希槽;分别确定所述数据库集群包含的各个集群节点管理的哈希槽范围;从所述数据命令集合中确定所述数据库集群包含的各个集群节点对应的数据命令组,分配至任一集群节点的数据命令组中包含的数据命令所对应的哈希槽落入所述任一集群节点管理的哈希槽范围内;A determination unit is used to calculate the hash slot corresponding to the data command according to the key value of the data command in the data command set; respectively determine the hash slot range managed by each cluster node included in the database cluster; determine the data command group corresponding to each cluster node included in the database cluster from the data command set, and the hash slot corresponding to the data command included in the data command group allocated to any cluster node falls within the hash slot range managed by any cluster node; 合并单元,当任一数据命令组包含多条数据命令时,基于管道技术将所述任一数据命令组中的所有数据命令合并为相应的批量处理请求;其中,所述命令队列中的数据命令按照时间顺序依次排列,所述批量处理请求中包含的数据命令之间的相对顺序与其在所述命令队列中相同;a merging unit, when any data command group contains multiple data commands, merging all the data commands in the data command group into a corresponding batch processing request based on pipeline technology; wherein the data commands in the command queue are arranged in chronological order, and the relative order between the data commands contained in the batch processing request is the same as that in the command queue; 提交单元,用于将所述批量处理请求提交至所述任一数据命令组对应的目标集群节点,以使所述目标集群节点对所述批量处理请求中包含的数据命令进行批量处理;a submitting unit, configured to submit the batch processing request to a target cluster node corresponding to any one of the data command groups, so that the target cluster node performs batch processing on the data commands included in the batch processing request; 其中,当任一数据命令执行失败时,停止继续执行尚未执行的数据命令;重新确定所述数据库集群中包含的集群节点,以及各个集群节点管理的哈希槽范围;根据新确定的哈希槽范围,将所述尚未执行的数据命令形成分别对应于各个新确定的集群节点的新的数据命令组,进而基于管道技术将所述新的数据命令组中的所有数据命令合并为相应的批量处理请求。Among them, when any data command fails to execute, stop executing the data commands that have not yet been executed; redefine the cluster nodes included in the database cluster and the hash slot range managed by each cluster node; according to the newly determined hash slot range, form the data commands that have not yet been executed into new data command groups corresponding to each newly determined cluster node, and then merge all data commands in the new data command group into corresponding batch processing requests based on pipeline technology. 6.一种电子设备,其特征在于,包括:6. An electronic device, comprising: 处理器;processor; 用于存储处理器可执行指令的存储器;a memory for storing processor-executable instructions; 其中,所述处理器通过运行所述可执行指令以实现如权利要求1-4中任一项所述的方法。The processor implements the method according to any one of claims 1 to 4 by running the executable instructions. 7.一种计算机可读存储介质,其上存储有计算机指令,其特征在于,该指令被处理器执行时实现如权利要求1-4中任一项所述方法的步骤。7. A computer-readable storage medium having computer instructions stored thereon, wherein when the instructions are executed by a processor, the steps of the method as claimed in any one of claims 1 to 4 are implemented.
CN202111164186.0A 2021-09-30 2021-09-30 Batch processing method and device for data commands Active CN113836238B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111164186.0A CN113836238B (en) 2021-09-30 2021-09-30 Batch processing method and device for data commands

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111164186.0A CN113836238B (en) 2021-09-30 2021-09-30 Batch processing method and device for data commands

Publications (2)

Publication Number Publication Date
CN113836238A CN113836238A (en) 2021-12-24
CN113836238B true CN113836238B (en) 2024-12-17

Family

ID=78967951

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111164186.0A Active CN113836238B (en) 2021-09-30 2021-09-30 Batch processing method and device for data commands

Country Status (1)

Country Link
CN (1) CN113836238B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116068936A (en) * 2022-12-22 2023-05-05 长园深瑞继保自动化有限公司 Substation auxiliary equipment control system, method and electronic equipment
CN117609178A (en) * 2023-10-08 2024-02-27 中信数字创新(上海)科技有限公司 An application-oriented heterogeneous database compatible implementation system
CN120296084A (en) * 2024-01-09 2025-07-11 杭州阿里云飞天信息技术有限公司 Data synchronization method, system, electronic device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426451A (en) * 2015-11-11 2016-03-23 深圳市华讯方舟科技有限公司 Key value pair-based data processing method and system
WO2018161881A1 (en) * 2017-03-09 2018-09-13 腾讯科技(深圳)有限公司 Structuralized data processing method, data storage medium, and computer apparatus

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI447646B (en) * 2011-11-18 2014-08-01 Asmedia Technology Inc Data transmission device and method for merging multiple instruction
CN104424105B (en) * 2013-08-26 2017-08-25 华为技术有限公司 The read-write processing method and device of a kind of internal storage data
CN106202459A (en) * 2016-07-14 2016-12-07 华南师范大学 Relevant database storage performance optimization method under virtualized environment and system
US11443026B2 (en) * 2016-10-20 2022-09-13 International Business Machines Corporation Synchronizing data across multiple instances of an application in a cloud
KR102707637B1 (en) * 2019-04-29 2024-09-13 에스케이하이닉스 주식회사 Semiconductor memory device performing command merging and operating method thereof
CN112182445A (en) * 2019-07-02 2021-01-05 北京京东尚科信息技术有限公司 Method and device for optimizing client page performance
CN112398669B (en) * 2019-08-15 2023-09-26 北京京东尚科信息技术有限公司 Hadoop deployment method and device
CN110910921A (en) * 2019-11-29 2020-03-24 深圳市国微电子有限公司 A kind of command reading and writing method, device and computer storage medium
CN113220608B (en) * 2021-06-09 2022-06-28 湖南国科微电子股份有限公司 NVMe command processor and processing method thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426451A (en) * 2015-11-11 2016-03-23 深圳市华讯方舟科技有限公司 Key value pair-based data processing method and system
WO2018161881A1 (en) * 2017-03-09 2018-09-13 腾讯科技(深圳)有限公司 Structuralized data processing method, data storage medium, and computer apparatus

Also Published As

Publication number Publication date
CN113836238A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
CN113836238B (en) Batch processing method and device for data commands
US10019294B2 (en) Method of achieving intra-machine workload balance for distributed graph-processing systems
US11886284B2 (en) System and method for data redistribution in a database
CN108509462B (en) Method and device for synchronizing activity transaction table
CN106649828B (en) A data query method and system
CN108133059B (en) Efficient pushdown of joins in heterogeneous database systems containing large-scale low-power clusters
US10152500B2 (en) Read mostly instances
CN105517644B (en) A data partition method and device
JP2020531949A (en) Lazy update of database hash code in blockchain
CN107113341A (en) The system of the high-throughput processing of affairs in the Distributed Relation Database Management System divided for data
CN111930716A (en) Database capacity expansion method, device and system
US20140279960A1 (en) Row Level Locking For Columnar Data
CN113364877B (en) Data processing method, device, electronic equipment and medium
CN109597903A (en) Image file processing apparatus and method, document storage system and storage medium
CN118860751A (en) Data backup and recovery method and device based on anomaly detection
CN106973091B (en) Distributed memory data redistribution method and system, and master control server
CN103870571B (en) Cube reconstructing method and device in Multi-dimension on-line analytical process system
CN109739684A (en) Method and device for replica repairing of distributed key-value database based on vector clock
US10127270B1 (en) Transaction processing using a key-value store
CN108897822A (en) A kind of data-updating method, device, equipment and readable storage medium storing program for executing
US10185735B2 (en) Distributed database system and a non-transitory computer readable medium
CN105022833A (en) Data processing method, nodes and monitoring system
JP5646511B2 (en) Method, system and computer readable recording medium for providing a distributed programming environment using a distributed space
CN115580583A (en) Service distribution method and device under microservice architecture
US20170308444A1 (en) Method, Apparatus, and Computer Program Stored in Computer Readable Medium for Recovering Block in Database System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant