[go: up one dir, main page]

CN114461173B - A method and device for sorting data in a relational database - Google Patents

A method and device for sorting data in a relational database Download PDF

Info

Publication number
CN114461173B
CN114461173B CN202210129279.8A CN202210129279A CN114461173B CN 114461173 B CN114461173 B CN 114461173B CN 202210129279 A CN202210129279 A CN 202210129279A CN 114461173 B CN114461173 B CN 114461173B
Authority
CN
China
Prior art keywords
sorting
data
processes
sorted
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210129279.8A
Other languages
Chinese (zh)
Other versions
CN114461173A (en
Inventor
李鹏
周红润
郑晓军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Highgo Base Software Co ltd
Original Assignee
Highgo Base Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Highgo Base Software Co ltd filed Critical Highgo Base Software Co ltd
Priority to CN202210129279.8A priority Critical patent/CN114461173B/en
Publication of CN114461173A publication Critical patent/CN114461173A/en
Application granted granted Critical
Publication of CN114461173B publication Critical patent/CN114461173B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/22Arrangements for sorting or merging computer data on continuous record carriers, e.g. tape, drum, disc
    • G06F7/24Sorting, i.e. extracting data from one or more carriers, rearranging the data in numerical or other ordered sequence, and rerecording the sorted data on the original carrier or on a different carrier or set of carriers sorting methods in general
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种关系型数据库数据排序方法及装置,应用于具有多个排序进程的处理环境,包括:基于可用的内存大小,为参与排序的各排序进程分配该排序进程的使用内存;将待排序的数据文件,基于该排序进程的使用内存,划分出多个数据块;以及将待排序的数据文件,划分为至少两部分数据;利用两个排序进程,从该部分数据读取一个数据块进行排序;将各排序进程输出的有序文件进行归并,以获得排序文件。本发明实施例通过将外部排序改为并行执行,分为若干个排序进程,并行执行读取‑排序‑写出的步骤,在新硬件下能大幅提升排序性能,并且设计了特殊的数据读取方式,能够有效避免并发读取可能带来的数据乱序问题。

The present invention discloses a relational database data sorting method and device, which is applied to a processing environment with multiple sorting processes, including: allocating the memory used by each sorting process to each sorting process based on the available memory size; dividing the data file to be sorted into multiple data blocks based on the memory used by the sorting process; and dividing the data file to be sorted into at least two parts of data; using two sorting processes, reading a data block from the part of data for sorting; merging the ordered files output by each sorting process to obtain a sorted file. The embodiment of the present invention can greatly improve the sorting performance under new hardware by changing the external sorting to parallel execution, dividing it into several sorting processes, and executing the read-sort-write steps in parallel, and designs a special data reading method, which can effectively avoid the data disorder problem that may be caused by concurrent reading.

Description

Relational database data ordering method and device
Technical Field
The present invention relates to the field of database technologies, and in particular, to a method and an apparatus for ordering relational database data.
Background
The database is data management and requires ordering of the data. In the case of large amounts of data, the database may sort the data using an external sort algorithm.
The ordering of the database is divided into two types, internal ordering and external ordering. And the internal sequencing is to read all data into the memory for sequencing when the data volume is smaller, so that the internal sequencing speed is high. When the data volume is large, the data to be sequenced is written to the disk by the disk, so that the external sequencing is performed, and the external sequencing speed is very low.
In order to increase the speed of external sorting, the method adopted in the prior art is to read a small amount of data into a memory at a time, then perform internal sorting (sorting in the memory mentioned above) on the data, then write the data with the sorted part into a disk file, so as to cycle, then obtain a plurality of sorted data files, and finally perform merging sorting on the data files on the disk (i.e. read the first number from the files each time, select the smallest writing final result file, and then read the next number until the end).
Disclosure of Invention
The embodiment of the invention provides a relational database data ordering method and device, which are used for optimizing the efficiency of external ordering by combining the performance of the existing hardware.
The embodiment of the invention provides a relational database data sorting method which is applied to a processing environment with a plurality of sorting processes and comprises the following steps:
Allocating the use memory of each ordering process participating in ordering for the ordering process based on the available memory size;
dividing the data file to be ordered into a plurality of data blocks based on the use memory of the ordering process, and
Dividing a data file to be ordered into at least two parts of data, wherein each part of data comprises at least two data blocks;
Reading a data block from the partial data to order by using two ordering processes, wherein the directions of reading the data blocks by the two ordering processes are opposite;
Merging the ordered files output by each ordering process to obtain the ordered files.
In some embodiments, reading a block of data from the portion of data to sort includes:
configuring a shared memory;
Recording the ordering duration of the data blocks corresponding to the execution of the two ordering processes through the shared memory;
Judging the size relation between the difference of the sequencing time lengths of the two sequencing processes and the preset time length;
And adjusting the reading sequence of the data blocks of the two sorting processes based on the size relation.
In some embodiments, adjusting the order of reading the data blocks of the two ordering processes based on the size relationship comprises:
If the difference between the ordering time length of the first ordering process and the ordering time length of the second ordering process of the two ordering processes is greater than the preset time length, the reading sequence of the second ordering process on the next data block is changed.
In some embodiments, the shared memory records file numbers of the data blocks processed by each sorting process.
In some embodiments, merging ordered files output by each ordering process includes:
calling merging processes with the same quantity as the sorting processes participating in sorting, merging ordered files output by the sorting processes through the merging processes to obtain sorting subfiles;
Merging the sorting subfiles to obtain the sorting file.
In some embodiments, the merge processes and the sort processes are run in parallel.
In some embodiments, dividing the data file to be sorted into at least two portions of data comprises:
And dividing the data file to be sequenced into a half of the number of the partial data.
The embodiment of the invention also provides a relational database data sorting device, which comprises a processor, wherein the processor is provided with a plurality of sorting processes, and is configured to:
Allocating the use memory of each ordering process participating in ordering for the ordering process based on the available memory size;
dividing the data file to be ordered into a plurality of data blocks based on the use memory of the ordering process, and
Dividing a data file to be ordered into at least two parts of data, wherein each part of data comprises at least two data blocks;
Reading a data block from the partial data to order by using two ordering processes, wherein the directions of reading the data blocks by the two ordering processes are opposite;
Merging the ordered files output by each ordering process to obtain the ordered files.
The embodiment of the invention also provides a computer readable storage medium, and a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the relational database data ordering method according to the embodiments of the disclosure are realized.
According to the embodiment of the invention, the external ordering is changed into parallel execution, the external ordering is divided into a plurality of ordering processes, and the steps of reading, ordering and writing are executed in parallel, so that the ordering performance can be greatly improved under new hardware, and a special data reading mode is designed, so that the problem of data disorder possibly caused by concurrent reading can be effectively avoided.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a basic flow diagram of a relational database data sorting method according to an embodiment of the present application;
FIG. 2 is a process reading example of a relational database data ordering method according to an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating a read order adjustment procedure of a sorting process of a relational database data sorting method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a sorting flow of a relational database data sorting method according to an embodiment of the present application;
Fig. 5 is a merging example of sorting files of a relational database data sorting method according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment of the invention provides a relational database data sorting method which is applied to a processing environment with a plurality of sorting processes, as shown in fig. 1, and comprises the following steps:
In step S101, based on the available memory size, the used memory of each sorting process participating in the sorting is allocated to the sorting process. For example, the current available memory size is 400M, and there are 4 sorting processes participating in the sorting, so each sorting process can be allocated to obtain 100M of used memory.
In step S102, the data file to be sorted is divided into a plurality of data blocks based on the usage memory of the sorting process. Specifically, based on the foregoing example, the size of the plurality of data blocks that divide the data file to be sorted cannot exceed 100M, and the plurality of data blocks divided in this example is 100M.
In step S103, the data file to be sorted is divided into at least two parts of data, each part of data comprising at least two data blocks. In some embodiments, dividing the data file to be sorted into at least two portions of data includes dividing the data file to be sorted by half the number of sorting processes. For example, in the case of including 4 sorting processes, the data file to be sorted may be divided into two parts of data, specifically, may be divided from the middle of the data file to be sorted. For example, in the case of including 6 sorting processes, the data file to be sorted may be divided into three parts of data, and the specific dividing manner is not limited herein.
In step S104, two sorting processes are used to read one data block from the partial data for sorting, and the directions of reading the data blocks by the two sorting processes are opposite. As shown in fig. 2, the present example illustrates a procedure that includes 4 sorting procedures. According to the foregoing embodiment, the data file to be sorted is divided into a plurality of data blocks chunk, and the size of each chunk is the calculated available memory size for each sorting process. (e.g., each ordering process may use 100MB of memory, then each chunk may be 100MB in size). Each sorting process then reads the chunk in the order shown in fig. 2, specifically, in fig. 2, two parts of data, the first part of data is read and sorted by process 1 and process 2, and the second part of data is read and sorted by process 3 and process 4, wherein the directions of data reading by process 1 and process 2, process 3 and process 4 are opposite. The opposite direction of data reading refers to process 1 sequential reading, process 2 reverse reading, e.g., total chunk number is 2j, process 1 sequential reading chunk1, and process 2 reverse reading chunkj, process 3 sequential reading from chunkj +1, and process 4 reverse reading from chunk2 j.
In this example, process 1 reads sequentially from chunk1 means that process 1 reads from chunk1 until the last byte of chunk1, and then process 1 continues to read chunk2 accordingly. Process 2 reads in reverse from chunkj means that process 2 reads from the last byte of chunkj all the way to the first byte of chunkj, and then process 2 continues to read chunkj-1 accordingly. Through the design, the problem of data disorder possibly caused by concurrent reading can be effectively avoided.
In step S105, the ordered files output by each sorting process are merged to obtain a sorted file.
According to the embodiment of the invention, the external ordering is changed into parallel execution, and the external ordering is divided into a plurality of ordering processes, and the steps of reading- > ordering- > writing out are executed in parallel, so that the ordering performance can be greatly improved under new hardware.
In some embodiments, reading a block of data from the portion of data to sort includes:
a shared memory is configured.
Recording the ordering duration of the data blocks corresponding to the execution of the two ordering processes through the shared memory;
Judging the size relation between the difference of the sequencing time lengths of the two sequencing processes and the preset time length;
And adjusting the reading sequence of the data blocks of the two sorting processes based on the size relation.
In some embodiments, the shared memory records file numbers of the data blocks processed by each sorting process. Also illustrated with an ordering process of 4, process 1 and process 2 are not fixed processing j/2 chunks. The 4 sorting processes are configured with a shared memory area, and the current processing chunk number of each process is synchronized in the shared memory. And recording the ordering time length of the data blocks corresponding to the execution of the two ordering processes through the shared memory, and judging the size relation between the difference of the ordering time lengths of the two ordering processes and the preset time length. Therefore, the processing efficiency is optimized according to the speed of processing data by the sequencing process, and more chunk can be processed by the process with high reading and sequencing speed.
In some embodiments, adjusting the order of reading the data blocks of the two sorting processes based on the size relationship includes changing the order of reading the next data block of the second sorting process if the difference between the sorting duration of the first sorting process and the sorting duration of the second sorting process is greater than a preset duration.
In particular, the arrangement of data in a database is often substantially ordered, but in an indefinite order. Such as ordering sequence 10,11,12,13,15,17,20, it is clear that positive order processing would be more efficient, and reverse order processing would be more efficient, such as ordering sequence 90,80,70,50,30,20,10. The sorting referred to in the present invention is generally from small to large, and may be in other order.
In this example, process 1 reads chunk1 and ranks, and Process 2 reads chunkj and ranks. The time period t1 consumed is recorded into the shared memory after the process 1 finishes sequencing the trunk 1, and the time period t2 consumed is recorded into the shared memory after the process 2 finishes sequencing the chunkj.
At this time, by comparing t1 and t2, if t1-t2> s (s is a configurable preset time period), it is indicated that the performance of reading and ordering the chunk in reverse order is better, and if t2-t1> s, it is indicated that the performance of reading and ordering the chunk in forward order is better.
As shown in FIG. 3, in the case of t1-t2> s, then process 1 starts with the chunkj-1 block and instead reads in reverse order.
In the case of t2-t1> s, then process 2 starts with the chunk2 block and reads in positive order.
And so on, as shown in fig. 4, the sorting process reads the chunk to the memory, and outputs the physical file after sorting until all chunk are sorted, and the process ends.
In some embodiments, merging ordered files output by each ordering process includes:
And calling merging processes with the same number as the sorting processes participating in sorting, merging ordered files output by the sorting processes through the merging processes to obtain sorting subfiles, wherein in some embodiments, all the merging processes and all the sorting processes run in parallel. Merging the sorting subfiles to obtain the sorting file.
Specifically, as shown in fig. 5, after the sorting by the sorting process, a plurality of sorted chunk files are obtained. For example, sorting 10G of data would result in a total of 100 sorted files if each chunk were 100 MB. To maximize parallelism, the merge process starts the merge process simultaneously during the 100 file outputs. Each merging process merges the ordered files output by the corresponding sorting process until all ordered files are merged to generate a final ordered file. And finally, 4 ordered files are generated by the four processes, and finally, the 4 files are required to be subjected to merging and sorting to obtain a final sorting file.
In summary, the external ordering algorithm in the database of the embodiment enables the external ordering to be performed in parallel, so that the capability of the current hardware can be fully exerted, and the ordering efficiency of the database is improved.
The embodiment of the invention also provides a relational database data sorting device, which comprises a processor, wherein the processor is provided with a plurality of sorting processes, and is configured to:
Allocating the use memory of each ordering process participating in ordering for the ordering process based on the available memory size;
dividing the data file to be ordered into a plurality of data blocks based on the use memory of the ordering process, and
Dividing a data file to be ordered into at least two parts of data, wherein each part of data comprises at least two data blocks;
Reading a data block from the partial data to order by using two ordering processes, wherein the directions of reading the data blocks by the two ordering processes are opposite;
Merging the ordered files output by each ordering process to obtain the ordered files.
The embodiment of the invention also provides a computer readable storage medium, and a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the relational database data ordering method according to the embodiments of the disclosure are realized.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.

Claims (9)

1.一种关系型数据库数据排序方法,其特征在于,应用于具有多个排序进程的处理环境,包括:1. A method for sorting data in a relational database, characterized in that it is applied to a processing environment with multiple sorting processes, comprising: 基于可用的内存大小,为参与排序的各排序进程分配该排序进程的使用内存;Based on the available memory size, allocate the memory used by the sorting process to each sorting process involved in the sorting; 将待排序的数据文件,基于该排序进程的使用内存,划分出多个数据块;以及Dividing the data file to be sorted into a plurality of data blocks based on the memory used by the sorting process; and 将待排序的数据文件,划分为至少两部分数据,每部分数据包括至少两个数据块;Dividing the data file to be sorted into at least two parts of data, each part of data including at least two data blocks; 利用两个排序进程,从该部分数据读取一个数据块进行排序,且该两个排序进程读取数据块的方向相对,其中第一进程从数据块的第1个字节读取到数据块的最后一个字节,第二进程从数据块的最后一个字节读取到数据块的第一个字节;Using two sorting processes, a data block is read from the portion of data for sorting, and the two sorting processes read the data blocks in opposite directions, wherein the first process reads from the first byte of the data block to the last byte of the data block, and the second process reads from the last byte of the data block to the first byte of the data block; 将各排序进程输出的有序文件进行归并,以获得排序文件。Merge the ordered files output by each sorting process to obtain a sorted file. 2.如权利要求1所述的关系型数据库数据排序方法,其特征在于,从该部分数据读取一个数据块进行排序包括:2. The method for sorting relational database data according to claim 1, wherein reading a data block from the portion of data for sorting comprises: 配置一个共享内存;Configure a shared memory; 通过所述共享内存记录该两个排序进程执行对应的数据块的排序时长;The shared memory is used to record the sorting duration of the corresponding data blocks executed by the two sorting processes; 判断该两个排序进程的排序时长之差与预设时长的大小关系;Determine the relationship between the difference in sorting time of the two sorting processes and the preset time; 基于所述大小关系调整该两个排序进程的对数据块的读取顺序。The reading order of the data blocks of the two sorting processes is adjusted based on the size relationship. 3.如权利要求2所述的关系型数据库数据排序方法,其特征在于,基于所述大小关系调整该两个排序进程的对数据块的读取顺序包括:3. The method for sorting data in a relational database according to claim 2, wherein adjusting the order of reading data blocks in the two sorting processes based on the size relationship comprises: 若第一排序进程的排序时长与第二排序进程的排序时长之差大于预设时长,则改变第一排序进程的对下一个数据块的读取顺序;If the difference between the sorting time of the first sorting process and the sorting time of the second sorting process is greater than a preset time, the reading order of the next data block of the first sorting process is changed; 若第二排序进程的排序时长与第一排序进程的排序时长之差大于预设时长,则改变第二排序进程的对下一个数据块的读取顺序。If the difference between the sorting time of the second sorting process and the sorting time of the first sorting process is greater than the preset time, the reading order of the next data block of the second sorting process is changed. 4.如权利要求2所述的关系型数据库数据排序方法,其特征在于,所述共享内存记录有各排序进程处理的数据块的文件编号。4. The relational database data sorting method according to claim 2, wherein the shared memory records the file numbers of the data blocks processed by each sorting process. 5.如权利要求1所述的关系型数据库数据排序方法,其特征在于,将各排序进程输出的有序文件进行归并包括:5. The relational database data sorting method according to claim 1, wherein merging the ordered files output by each sorting process comprises: 调用与参与排序的排序进程数量相同的归并进程,通过该归并进程归并该排序进程输出的有序文件,以获得排序子文件;Calling the same number of merging processes as the number of the sorting processes involved in the sorting, merging the ordered files output by the sorting processes through the merging processes to obtain sorted sub-files; 归并各排序子文件,以获得所述排序文件。The sorted sub-files are merged to obtain the sorted file. 6.如权利要求5所述的关系型数据库数据排序方法,其特征在于,各归并进程以及各排序进程是并行运行的。6. The relational database data sorting method as described in claim 5 is characterized in that each merging process and each sorting process are run in parallel. 7.如权利要求1所述的关系型数据库数据排序方法,其特征在于,将待排序的数据文件,划分为至少两部分数据包括:7. The method for sorting relational database data according to claim 1, wherein dividing the data file to be sorted into at least two parts of data comprises: 将待排序的数据文件,划分出的部分数据的数量为排序进程数量的一半。The number of partial data divided from the data file to be sorted is half the number of sorting processes. 8.一种关系型数据库数据排序装置,其特征在于,包括处理器,所述处理器具有多个排序进程,所述处理器被配置为:8. A relational database data sorting device, comprising a processor, the processor having a plurality of sorting processes, the processor being configured to: 基于可用的内存大小,为参与排序的各排序进程分配该排序进程的使用内存;Based on the available memory size, allocate the memory used by the sorting process to each sorting process involved in the sorting; 将待排序的数据文件,基于该排序进程的使用内存,划分出多个数据块;以及Dividing the data file to be sorted into a plurality of data blocks based on the memory used by the sorting process; and 将待排序的数据文件,划分为至少两部分数据,每部分数据包括至少两个数据块;Dividing the data file to be sorted into at least two parts of data, each part of data including at least two data blocks; 利用两个排序进程,从该部分数据读取一个数据块进行排序,且该两个排序进程读取数据块的方向相对,其中第一进程从数据块的第1个字节读取到数据块的最后一个字节,第二进程从数据块的最后一个字节读取到数据块的第一个字节;Using two sorting processes, a data block is read from the portion of data for sorting, and the two sorting processes read the data blocks in opposite directions, wherein the first process reads from the first byte of the data block to the last byte of the data block, and the second process reads from the last byte of the data block to the first byte of the data block; 将各排序进程输出的有序文件进行归并,以获得排序文件。Merge the ordered files output by each sorting process to obtain a sorted file. 9.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的关系型数据库数据排序方法的步骤。9. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the relational database data sorting method as described in any one of claims 1 to 7 are implemented.
CN202210129279.8A 2022-02-11 2022-02-11 A method and device for sorting data in a relational database Active CN114461173B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210129279.8A CN114461173B (en) 2022-02-11 2022-02-11 A method and device for sorting data in a relational database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210129279.8A CN114461173B (en) 2022-02-11 2022-02-11 A method and device for sorting data in a relational database

Publications (2)

Publication Number Publication Date
CN114461173A CN114461173A (en) 2022-05-10
CN114461173B true CN114461173B (en) 2025-02-07

Family

ID=81413161

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210129279.8A Active CN114461173B (en) 2022-02-11 2022-02-11 A method and device for sorting data in a relational database

Country Status (1)

Country Link
CN (1) CN114461173B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115934643B (en) * 2023-01-06 2025-07-04 济南浪潮数据技术有限公司 A file sorting method, system, device and storage medium
CN118394852B (en) * 2024-06-26 2024-11-12 支付宝(杭州)信息技术有限公司 Method, device and graph database system for online importing graph data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968496A (en) * 2012-12-04 2013-03-13 天津神舟通用数据技术有限公司 Parallel sequencing method based on task derivation and double buffering mechanism
CN107077488A (en) * 2014-10-07 2017-08-18 甲骨文国际公司 parallel merge

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5852826A (en) * 1996-01-26 1998-12-22 Sequent Computer Systems, Inc. Parallel merge sort method and apparatus
JP2007213423A (en) * 2006-02-10 2007-08-23 Akuseru:Kk Bubble sort circuit and data compression system using the same
US8280895B2 (en) * 2009-07-03 2012-10-02 Barracuda Networks Inc Multi-streamed method for optimizing data transfer through parallelized interlacing of data based upon sorted characteristics to minimize latencies inherent in the system
CN104123304B (en) * 2013-04-28 2018-05-29 国际商业机器公司 The sorting in parallel system and method for data-driven
CN111913955A (en) * 2020-06-22 2020-11-10 中科驭数(北京)科技有限公司 Data sorting processing device, method and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968496A (en) * 2012-12-04 2013-03-13 天津神舟通用数据技术有限公司 Parallel sequencing method based on task derivation and double buffering mechanism
CN107077488A (en) * 2014-10-07 2017-08-18 甲骨文国际公司 parallel merge

Also Published As

Publication number Publication date
CN114461173A (en) 2022-05-10

Similar Documents

Publication Publication Date Title
CN114461173B (en) A method and device for sorting data in a relational database
US11169978B2 (en) Distributed pipeline optimization for data preparation
US6725225B1 (en) Data management apparatus and method for efficiently generating a blocked transposed file and converting that file using a stored compression method
JP4669067B2 (en) Dynamic fragment mapping
CN109325032B (en) Index data storage and retrieval method, device and storage medium
US20200210399A1 (en) Signature-based cache optimization for data preparation
CN114816258B (en) NVM external sorting method, device and NVM memory
US20050144167A1 (en) Parallel merge/sort processing device, method, and program
US10642815B2 (en) Step editor for data preparation
CN109240607B (en) File reading method and device
KR20160100216A (en) Method and device for constructing on-line real-time updating of massive audio fingerprint database
AU2016394744A1 (en) Database-archiving method and apparatus that generate index information, and method and apparatus for searching archived database comprising index information
CN107423321B (en) Method and device suitable for cloud storage of large-batch small files
CN114595066B (en) Processing method and device for reserved memory, electronic equipment and medium
CN111625505A (en) File splitting method and device
CN109325022A (en) A kind of data processing method and device
JP6772883B2 (en) Reading program, reading method and information processing device
WO2025020787A1 (en) File merging method and device
CN114816322B (en) SSD external ordering method, SSD external ordering device and SSD memory
CN109189345B (en) Online data sorting method, device, equipment and storage medium
CN114546943B (en) Multi-process call-based database file ordering optimization method and device
CN114546944B (en) Multi-process load balancing database file ordering optimization method and device
US20140351298A1 (en) Method and apparatus for distributed processing of file
CN113051225B (en) ORC optimized data storage format and data reading and writing method based on block data
CN115563116A (en) Database table scanning method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant