Disclosure of Invention
The embodiment of the invention provides a relational database data ordering method and device, which are used for optimizing the efficiency of external ordering by combining the performance of the existing hardware.
The embodiment of the invention provides a relational database data sorting method which is applied to a processing environment with a plurality of sorting processes and comprises the following steps:
Allocating the use memory of each ordering process participating in ordering for the ordering process based on the available memory size;
dividing the data file to be ordered into a plurality of data blocks based on the use memory of the ordering process, and
Dividing a data file to be ordered into at least two parts of data, wherein each part of data comprises at least two data blocks;
Reading a data block from the partial data to order by using two ordering processes, wherein the directions of reading the data blocks by the two ordering processes are opposite;
Merging the ordered files output by each ordering process to obtain the ordered files.
In some embodiments, reading a block of data from the portion of data to sort includes:
configuring a shared memory;
Recording the ordering duration of the data blocks corresponding to the execution of the two ordering processes through the shared memory;
Judging the size relation between the difference of the sequencing time lengths of the two sequencing processes and the preset time length;
And adjusting the reading sequence of the data blocks of the two sorting processes based on the size relation.
In some embodiments, adjusting the order of reading the data blocks of the two ordering processes based on the size relationship comprises:
If the difference between the ordering time length of the first ordering process and the ordering time length of the second ordering process of the two ordering processes is greater than the preset time length, the reading sequence of the second ordering process on the next data block is changed.
In some embodiments, the shared memory records file numbers of the data blocks processed by each sorting process.
In some embodiments, merging ordered files output by each ordering process includes:
calling merging processes with the same quantity as the sorting processes participating in sorting, merging ordered files output by the sorting processes through the merging processes to obtain sorting subfiles;
Merging the sorting subfiles to obtain the sorting file.
In some embodiments, the merge processes and the sort processes are run in parallel.
In some embodiments, dividing the data file to be sorted into at least two portions of data comprises:
And dividing the data file to be sequenced into a half of the number of the partial data.
The embodiment of the invention also provides a relational database data sorting device, which comprises a processor, wherein the processor is provided with a plurality of sorting processes, and is configured to:
Allocating the use memory of each ordering process participating in ordering for the ordering process based on the available memory size;
dividing the data file to be ordered into a plurality of data blocks based on the use memory of the ordering process, and
Dividing a data file to be ordered into at least two parts of data, wherein each part of data comprises at least two data blocks;
Reading a data block from the partial data to order by using two ordering processes, wherein the directions of reading the data blocks by the two ordering processes are opposite;
Merging the ordered files output by each ordering process to obtain the ordered files.
The embodiment of the invention also provides a computer readable storage medium, and a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the relational database data ordering method according to the embodiments of the disclosure are realized.
According to the embodiment of the invention, the external ordering is changed into parallel execution, the external ordering is divided into a plurality of ordering processes, and the steps of reading, ordering and writing are executed in parallel, so that the ordering performance can be greatly improved under new hardware, and a special data reading mode is designed, so that the problem of data disorder possibly caused by concurrent reading can be effectively avoided.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment of the invention provides a relational database data sorting method which is applied to a processing environment with a plurality of sorting processes, as shown in fig. 1, and comprises the following steps:
In step S101, based on the available memory size, the used memory of each sorting process participating in the sorting is allocated to the sorting process. For example, the current available memory size is 400M, and there are 4 sorting processes participating in the sorting, so each sorting process can be allocated to obtain 100M of used memory.
In step S102, the data file to be sorted is divided into a plurality of data blocks based on the usage memory of the sorting process. Specifically, based on the foregoing example, the size of the plurality of data blocks that divide the data file to be sorted cannot exceed 100M, and the plurality of data blocks divided in this example is 100M.
In step S103, the data file to be sorted is divided into at least two parts of data, each part of data comprising at least two data blocks. In some embodiments, dividing the data file to be sorted into at least two portions of data includes dividing the data file to be sorted by half the number of sorting processes. For example, in the case of including 4 sorting processes, the data file to be sorted may be divided into two parts of data, specifically, may be divided from the middle of the data file to be sorted. For example, in the case of including 6 sorting processes, the data file to be sorted may be divided into three parts of data, and the specific dividing manner is not limited herein.
In step S104, two sorting processes are used to read one data block from the partial data for sorting, and the directions of reading the data blocks by the two sorting processes are opposite. As shown in fig. 2, the present example illustrates a procedure that includes 4 sorting procedures. According to the foregoing embodiment, the data file to be sorted is divided into a plurality of data blocks chunk, and the size of each chunk is the calculated available memory size for each sorting process. (e.g., each ordering process may use 100MB of memory, then each chunk may be 100MB in size). Each sorting process then reads the chunk in the order shown in fig. 2, specifically, in fig. 2, two parts of data, the first part of data is read and sorted by process 1 and process 2, and the second part of data is read and sorted by process 3 and process 4, wherein the directions of data reading by process 1 and process 2, process 3 and process 4 are opposite. The opposite direction of data reading refers to process 1 sequential reading, process 2 reverse reading, e.g., total chunk number is 2j, process 1 sequential reading chunk1, and process 2 reverse reading chunkj, process 3 sequential reading from chunkj +1, and process 4 reverse reading from chunk2 j.
In this example, process 1 reads sequentially from chunk1 means that process 1 reads from chunk1 until the last byte of chunk1, and then process 1 continues to read chunk2 accordingly. Process 2 reads in reverse from chunkj means that process 2 reads from the last byte of chunkj all the way to the first byte of chunkj, and then process 2 continues to read chunkj-1 accordingly. Through the design, the problem of data disorder possibly caused by concurrent reading can be effectively avoided.
In step S105, the ordered files output by each sorting process are merged to obtain a sorted file.
According to the embodiment of the invention, the external ordering is changed into parallel execution, and the external ordering is divided into a plurality of ordering processes, and the steps of reading- > ordering- > writing out are executed in parallel, so that the ordering performance can be greatly improved under new hardware.
In some embodiments, reading a block of data from the portion of data to sort includes:
a shared memory is configured.
Recording the ordering duration of the data blocks corresponding to the execution of the two ordering processes through the shared memory;
Judging the size relation between the difference of the sequencing time lengths of the two sequencing processes and the preset time length;
And adjusting the reading sequence of the data blocks of the two sorting processes based on the size relation.
In some embodiments, the shared memory records file numbers of the data blocks processed by each sorting process. Also illustrated with an ordering process of 4, process 1 and process 2 are not fixed processing j/2 chunks. The 4 sorting processes are configured with a shared memory area, and the current processing chunk number of each process is synchronized in the shared memory. And recording the ordering time length of the data blocks corresponding to the execution of the two ordering processes through the shared memory, and judging the size relation between the difference of the ordering time lengths of the two ordering processes and the preset time length. Therefore, the processing efficiency is optimized according to the speed of processing data by the sequencing process, and more chunk can be processed by the process with high reading and sequencing speed.
In some embodiments, adjusting the order of reading the data blocks of the two sorting processes based on the size relationship includes changing the order of reading the next data block of the second sorting process if the difference between the sorting duration of the first sorting process and the sorting duration of the second sorting process is greater than a preset duration.
In particular, the arrangement of data in a database is often substantially ordered, but in an indefinite order. Such as ordering sequence 10,11,12,13,15,17,20, it is clear that positive order processing would be more efficient, and reverse order processing would be more efficient, such as ordering sequence 90,80,70,50,30,20,10. The sorting referred to in the present invention is generally from small to large, and may be in other order.
In this example, process 1 reads chunk1 and ranks, and Process 2 reads chunkj and ranks. The time period t1 consumed is recorded into the shared memory after the process 1 finishes sequencing the trunk 1, and the time period t2 consumed is recorded into the shared memory after the process 2 finishes sequencing the chunkj.
At this time, by comparing t1 and t2, if t1-t2> s (s is a configurable preset time period), it is indicated that the performance of reading and ordering the chunk in reverse order is better, and if t2-t1> s, it is indicated that the performance of reading and ordering the chunk in forward order is better.
As shown in FIG. 3, in the case of t1-t2> s, then process 1 starts with the chunkj-1 block and instead reads in reverse order.
In the case of t2-t1> s, then process 2 starts with the chunk2 block and reads in positive order.
And so on, as shown in fig. 4, the sorting process reads the chunk to the memory, and outputs the physical file after sorting until all chunk are sorted, and the process ends.
In some embodiments, merging ordered files output by each ordering process includes:
And calling merging processes with the same number as the sorting processes participating in sorting, merging ordered files output by the sorting processes through the merging processes to obtain sorting subfiles, wherein in some embodiments, all the merging processes and all the sorting processes run in parallel. Merging the sorting subfiles to obtain the sorting file.
Specifically, as shown in fig. 5, after the sorting by the sorting process, a plurality of sorted chunk files are obtained. For example, sorting 10G of data would result in a total of 100 sorted files if each chunk were 100 MB. To maximize parallelism, the merge process starts the merge process simultaneously during the 100 file outputs. Each merging process merges the ordered files output by the corresponding sorting process until all ordered files are merged to generate a final ordered file. And finally, 4 ordered files are generated by the four processes, and finally, the 4 files are required to be subjected to merging and sorting to obtain a final sorting file.
In summary, the external ordering algorithm in the database of the embodiment enables the external ordering to be performed in parallel, so that the capability of the current hardware can be fully exerted, and the ordering efficiency of the database is improved.
The embodiment of the invention also provides a relational database data sorting device, which comprises a processor, wherein the processor is provided with a plurality of sorting processes, and is configured to:
Allocating the use memory of each ordering process participating in ordering for the ordering process based on the available memory size;
dividing the data file to be ordered into a plurality of data blocks based on the use memory of the ordering process, and
Dividing a data file to be ordered into at least two parts of data, wherein each part of data comprises at least two data blocks;
Reading a data block from the partial data to order by using two ordering processes, wherein the directions of reading the data blocks by the two ordering processes are opposite;
Merging the ordered files output by each ordering process to obtain the ordered files.
The embodiment of the invention also provides a computer readable storage medium, and a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the relational database data ordering method according to the embodiments of the disclosure are realized.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.