CN106775438B - An I/O Scheduling Method Based on the Read and Write Characteristics of Solid State Disk - Google Patents
An I/O Scheduling Method Based on the Read and Write Characteristics of Solid State Disk Download PDFInfo
- Publication number
- CN106775438B CN106775438B CN201510827493.0A CN201510827493A CN106775438B CN 106775438 B CN106775438 B CN 106775438B CN 201510827493 A CN201510827493 A CN 201510827493A CN 106775438 B CN106775438 B CN 106775438B
- Authority
- CN
- China
- Prior art keywords
- request
- read
- write
- merged
- jump
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0679—Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种基于固态盘不同读写特性的I/O调度方法。通过黑盒测试的方式获取固态盘内部最优的合并请求聚簇页大小;将读写请求分离,分别批量地处理读请求和写请求,来避免固态盘中读写混合模式下的相互干扰;考虑到顺序模式下读请求的性能远远高于随机模式下读请求的性能,对读请求不仅按照请求到来的时间在链表中排队,还按照读请求的起始访问地址在红黑树中排序,以构造读请求的顺序性,对排序之后的读请求进行前向合并和后向合并,合并之后请求的最大大小不超过聚簇页大小;考虑到顺序模式下写请求的性能和随机模式下写请求的性能基本相同,写请求只需要按照请求到来的时间在链表中排队,不需要在红黑树中排序构造写请求的顺序性。
The invention discloses an I/O scheduling method based on different read and write characteristics of solid state disks. Obtain the optimal merge request cluster page size inside the solid-state disk through black-box testing; separate read and write requests, and process read requests and write requests in batches separately to avoid mutual interference in the mixed mode of reading and writing in the solid-state disk; Considering that the performance of read requests in sequential mode is much higher than that in random mode, read requests are not only queued in the linked list according to the arrival time of the request, but also sorted in the red-black tree according to the initial access address of the read request , to construct the sequence of read requests, forward and backward merge the sorted read requests, the maximum size of the merged request does not exceed the cluster page size; considering the performance of write requests in sequential mode and random mode The performance of write requests is basically the same. Write requests only need to be queued in the linked list according to the arrival time of the requests, and there is no need to sort in the red-black tree to construct the order of write requests.
Description
技术领域technical field
本发明属于固态盘存储技术领域,更具体地,涉及一种基于固态盘不同读写特性的I/O调度方法。The invention belongs to the technical field of solid-state disk storage, and more specifically relates to an I/O scheduling method based on different read-write characteristics of solid-state disks.
背景技术Background technique
固态盘(Solid state disk,简称SSD)是一种利用NAND闪存芯片作为数据永久存储的硬盘,随着NAND闪存芯片的生产成本下降,SSD的价格也越来越被人们所接受,SSD在数据中心、个人电脑、各种类型的移动设备上的使用越来越广泛。但是,由于长期以来构建存储系统的设备都是硬盘驱动器(Hard disk drive,简称HDD),现有的系统软件大多数是面向HDD进行设计和优化的,直接使用SSD替换掉HDD并不能最大限度地发挥SSD的高性能。因此,针对SSD进行系统软件优化具有十分重要的意义。Solid state disk (SSD for short) is a hard disk that uses NAND flash memory chips as permanent data storage. As the production cost of NAND flash memory chips decreases, the price of SSDs is becoming more and more accepted by people. SSDs are widely used in data centers. , PCs, and various types of mobile devices are increasingly used. However, for a long time, the devices used to build storage systems are hard disk drives (HDD for short), and most of the existing system software is designed and optimized for HDDs. Directly replacing HDDs with SSDs cannot maximize Give full play to the high performance of SSD. Therefore, it is of great significance to optimize the system software for SSD.
HDD是一种机械旋转式的硬盘,它的读写操作主要有以下几个步骤:HDD is a mechanical rotating hard disk, and its read and write operations mainly have the following steps:
1、寻道:磁头传动装置将所有的磁头移动到待访问的柱面,并获取当前需要使用的磁头号;2、旋转等待:定位到相应的盘面和磁道后,盘面就会旋转,在旋转的过程中,磁头会读取其正下方的扇区中的扇区头标中的信息,与需要读取/写入的扇区号进行比对,如果不匹配则盘面会继续旋转,如果匹配则从此扇区开始读/写数据;3、完成数据读写。HDD读写操作很大一部分时间消耗在寻道和旋转等待上,而Linux内核中的IO调度算法所做的优化主要也是为了降低寻道和旋转等待的时间消耗。1. Seek: The head actuator moves all the heads to the cylinder to be accessed, and obtains the current head number to be used; 2. Waiting for rotation: After the corresponding disk and track are located, the disk will rotate, and the disk will rotate. During the process, the magnetic head will read the information in the sector header in the sector directly below it, and compare it with the sector number to be read/written. If there is no match, the disk will continue to rotate. If it matches, the disk will continue to rotate. Start reading/writing data from this sector; 3. Complete data reading and writing. A large part of the time for HDD read and write operations is spent on seeking and waiting for rotation, and the optimization of the IO scheduling algorithm in the Linux kernel is mainly to reduce the time consumption of seeking and waiting for rotation.
在Linux系统中,现有的IO调度算法主要是Noop算法,CFQ算法Deadline算法和Anticipatory算法。其中CFQ、Deadline和Anticipatory算法针对HDD寻道时间长,采取了合并连续的读写请求,这样就可以在寻道结束后进行连续的读写。Noop算法对读写请求不予合并,只将新的请求插入队尾。In the Linux system, the existing IO scheduling algorithms are mainly Noop algorithm, CFQ algorithm Deadline algorithm and Anticipatory algorithm. Among them, the CFQ, Deadline, and Anticipatory algorithms combine continuous read and write requests for HDDs with long seek time, so that continuous read and write can be performed after the seek is completed. The Noop algorithm does not merge read and write requests, but only inserts new requests at the end of the queue.
然而,上述IO调度算法存在以下方面的问题:1、在SSD中,顺序的写和随机的写性能基本相同,所以对写请求进行合并并不能提高SSD的速率,反而会影响系统的性能;2、在SSD中,在不超过聚簇页大小的前提下合并读请求才会提高读效率,读请求大小超过聚簇页大小后并不会提高读效率。However, the above-mentioned IO scheduling algorithm has the following problems: 1. In SSDs, the performance of sequential writes and random writes is basically the same, so combining write requests will not improve the speed of SSDs, but will affect the performance of the system; 2. , In SSD, the read efficiency can be improved only when the read requests are merged under the premise of not exceeding the size of the cluster page, and the read efficiency will not be improved when the size of the read request exceeds the size of the cluster page.
发明内容Contents of the invention
针对现有技术的以上缺陷或改进需求,本发明提供了一种基于固态盘不同读写特性的I/O调度方法,其目的在于,解决现有IO调度方法中存在的连续读请求合并后过大或不能被合并、以及写请求存在不必要的合并的技术问题。Aiming at the above defects or improvement needs of the prior art, the present invention provides an I/O scheduling method based on different read and write characteristics of solid-state disks. Large or cannot be coalesced, and write requests have technical problems with unnecessary coalescing.
为实现上述目的,按照本发明的一个方面,提供了一种基于固态盘读写特性的IO调度方法,包括以下步骤:In order to achieve the above object, according to one aspect of the present invention, a kind of IO scheduling method based on the read and write characteristics of solid state disk is provided, comprising the following steps:
(1)接收来自上层应用并经过文件系统处理后的bio请求,并判断该bio请求的类型是读请求还是写请求,如果是读请求则转入步骤(2),如果是写请求则转入步骤(3);(1) Receive the bio request from the upper layer application and processed by the file system, and judge whether the type of the bio request is a read request or a write request, if it is a read request, then go to step (2), if it is a write request, go to step (3);
(2)判断该读请求是否能够与其邻近的读请求进行合并,若不能合并,则直接转入步骤(4);若可以合并且合并后的大小不超过聚簇页的大小,则将其与邻近的读请求进行合并,合并后的读请求仍然可以与其邻近的读请求进行合并,并且应当确保合并后的大小不超过聚簇页的大小。将最终合并后的读请求放入读队列和读红黑树中,以供系统读取数据。然后转入步骤(4)。(2) Determine whether the read request can be merged with its adjacent read request, if it cannot be merged, then go directly to step (4); if it can be merged and the size of the merged page does not exceed the size of the cluster page, then merge it with Adjacent read requests are merged, and the merged read request can still be merged with its adjacent read requests, and it should be ensured that the merged size does not exceed the size of the cluster page. Put the final merged read request into the read queue and read red-black tree for the system to read data. Then go to step (4).
(3)判断该写请求是否能够与缓存中的写请求进行合并,如果可以则进行合并,并将合并后的写请求插入调度器队列末端,并转入步骤(4);否则判断其能否与哈希表中的写请求进行合并,如果可以则进行合并,并将将合并后的写请求插入调度器队列末端,并转入步骤(4);(3) Judging whether the write request can be merged with the write request in the cache, if possible, merge, and insert the merged write request into the end of the scheduler queue, and go to step (4); otherwise, judge whether it can Merge with the write request in the hash table, if possible, merge, and insert the merged write request into the end of the scheduler queue, and turn to step (4);
(4)将调度器队列中的请求分发到设备驱动器的请求队列中。(4) Distribute the requests in the scheduler queue to the request queue of the device driver.
优选地,步骤(2)包括以下子步骤:Preferably, step (2) includes the following sub-steps:
(2.1)针对该读请求,判断是否存在上次合并的读请求,若存在,则跳转到步骤(2.2);否则跳转到步骤(2.3);(2.1) For the read request, judge whether there is a read request merged last time, and if so, jump to step (2.2); otherwise, jump to step (2.3);
(2.2)判断该读请求与上次合并后的读请求能否进行合并。如果不能合并,则跳转到步骤(2.3),如果能前向合并,则跳转到步骤(2.4),如果能后向合并,则跳转到步骤(2.5);(2.2) Determine whether the read request can be merged with the last merged read request. If it cannot be merged, then jump to step (2.3), if it can be merged forward, then jump to step (2.4), if it can be merged backward, then jump to step (2.5);
(2.3)判断该读请求能否与哈希链表中的请求进行后向合并,如果能则跳转到步骤(2.5);否则跳转到步骤(2.6);(2.3) Judging whether the read request can be merged backwards with the request in the hash list, if so, jump to step (2.5); otherwise, jump to step (2.6);
(2.4)完成可前向合并的读请求Rfrq与该读请求的合并,并判断前向合并之后的读请求Rfrq能否继续与调度器中读红黑树中按照请求的起始访问地址排序的Rfrq的前一个读请求Rprev继续进行前向合并。能则跳转到步骤(2.7),否则跳转到步骤(2.8);(2.4) Complete the merge of the read request Rfrq that can be forward merged with the read request, and judge whether the read request Rfrq after the forward merge can continue to be sorted according to the initial access address of the request in the red-black tree in the scheduler The previous read request Rprev of Rfrq continues to merge forward. If yes, jump to step (2.7), otherwise jump to step (2.8);
(2.5)完成可后向合并的请求Rbrq与该读请求的合并,并判断后向合并之后的请求Rbrq能否继续与调度器中读红黑树中按照请求的起始访问地址排序的Rbrq的后一个请求Rnext继续进行后向合并。能则跳转到步骤(2.9),否则跳转到步骤(2.10);(2.5) Complete the merging of the request Rbrq that can be merged backwards with the read request, and determine whether the request Rbrq after the backward merging can continue to read the Rbrqs sorted according to the initial access address of the request in the red-black tree in the scheduler The latter requests Rnext to proceed with the backward merge. If so, jump to step (2.9), otherwise jump to step (2.10);
(2.6)判断该读请求能否与调度器中读红黑树中的请求进行前向合并,如果能则跳转到步骤(2.4),否则跳转到步骤(2.11);(2.6) Judging whether the read request can be forward merged with the request in the scheduler to read the red-black tree, if so, jump to step (2.4), otherwise jump to step (2.11);
(2.7)完成读请求Rfrq与读请求Rperv的前向合并,删除调度器读队列、读红黑树以及通用块层哈希链表中的Rfrq请求;同时根据被合并的读请求的大小改变Rprev在哈希链表中的位置;并将合并后的最晚服务时间设置为Rfrq和Rprev的最晚服务时间中较小的一个;最后将上次合并的读请求标记为Rfrq。(2.7) Complete the forward merge of the read request Rfrq and the read request Rperv, delete the Rfrq request in the scheduler read queue, read the red-black tree, and the general block layer hash list; at the same time, change the Rprev according to the size of the merged read request The location in the hash list; and the merged latest service time is set to the smaller one of the latest service time of Rfrq and Rprev; finally, the last merged read request is marked as Rfrq.
(2.8)将上次合并的读请求设置为Rfrq,并在读红黑树中更新Rfrq的起始地址;(2.8) The last merged read request is set to Rfrq, and the starting address of Rfrq is updated in the red-black tree;
(2.9)完成请求Rbrq与请求Rnext的后向合并,在调度器的读队列、读红黑树以及通用块层的哈希链表中删除Rnext请求,更新Rbrq请求在哈希链表中的位置,将Rbrq和Rnext中较早的服务时间设置为合并后的读请求的服务时间,将Rbrq标记为上次合并的读请求,并跳转到步骤(2.14);(2.9) Complete the backward merge of request Rbrq and request Rnext, delete the Rnext request in the read queue of the scheduler, read the red-black tree, and the hash list of the general block layer, update the position of the Rbrq request in the hash list, and put The earlier service time in Rbrq and Rnext is set as the service time of the merged read request, Rbrq is marked as the last merged read request, and jumps to step (2.14);
(2.10)将Rbrq标记为上次合并的读请求,并更新Rbrq请求在哈希链表中的位置,结束转换;(2.10) mark Rbrq as the read request merged last time, and update the position of Rbrq request in the hash list, end conversion;
(2.11)为该bio请求分配一个请求Rnrq,并用该bio请求初始化Rnrq,并判断Rnrq是否具有合并属性,如果有则跳转到步骤(2.12);否则跳转到步骤(2.13);(2.11) distribute a request Rnrq for this bio request, and request initialization Rnrq with this bio, and judge whether Rnrq has merged property, if have then jump to step (2.12); Otherwise jump to step (2.13);
(2.12)将Rnrq请求加入哈希链表,并判断是否存在上次合并的读请求。如果存在,则直接跳转到步骤(2.13);否则先将Rnrq标记为上次合并的读请求,并跳转到步骤(2.13);(2.12) Add the Rnrq request to the hash list, and judge whether there is a read request merged last time. If it exists, then jump directly to step (2.13); otherwise first mark Rnrq as the last merged read request, and jump to step (2.13);
(2.13)将Rnrq请求加入调度器中读队列末尾以及读红黑树中。(2.13) Add the Rnrq request to the end of the read queue in the scheduler and read the red-black tree.
优选地,步骤(3)包括以下子步骤:Preferably, step (3) includes the following substeps:
(3.1)针对该写请求,判断是否存在上次合并的写请求,如果存在,跳转到步骤(3.2);否则跳转到步骤(3.3);(3.1) For the write request, judge whether there is a write request merged last time, and if so, jump to step (3.2); otherwise, jump to step (3.3);
(3.2)判断该bio请求是否能够与上次合并的写请求进行合并,如果能够前向合并,跳转到步骤(3.4);如果能够进行后向合并,跳转到步骤(3.5);如果不能合并跳转到步骤(3.3);(3.2) Determine whether the bio request can be merged with the last merged write request, if it can be merged forward, go to step (3.4); if it can be merged backwards, go to step (3.5); if not Merge and jump to step (3.3);
(3.3)判断该bio请求能否与哈希链表中的请求进行后向合并,如果能则跳转到步骤(3.5);否则跳转到步骤(3.6);(3.3) Determine whether the bio request can be merged backwards with the request in the hash list, and if so, jump to step (3.5); otherwise, jump to step (3.6);
(3.4)将该bio请求与上次合并的写请求进行前向合并,并判断合并之后还能否继续与调度器中写链表中其前一个请求Wprev继续进行合并。如果能与Wprev合并,则跳转到步骤(3.7),否则跳转到步骤(3.8);(3.4) Forward merging the bio request with the last merged write request, and judging whether it can continue to be merged with the previous request Wprev in the write list in the scheduler after the merge. If it can be merged with Wprev, then jump to step (3.7), otherwise jump to step (3.8);
(3.5)完成bio请求与可后向合并的请求Wbrq的后向合并,并判断合并之后的Wbrq能否与调度器写链表中Wbrq的下一个请求Wnext继续进行合并。能则跳转到步骤(3.9);否则跳转到步骤(3.10);(3.5) Complete the backward merging of the bio request and the request Wbrq that can be merged backwards, and judge whether the merged Wbrq can continue to be merged with the next request Wnext of Wbrq in the write list of the scheduler. If yes, jump to step (3.9); otherwise, jump to step (3.10);
(3.6)为该bio请求分配一个新请求Wnrq,并用该bio请求初始化Wnrq。然后判断Wnrq是否具有合并属性。有则跳转到步骤(3.11);否则跳转到步骤(3.12);(3.6) Allocate a new request Wnrq for the bio request, and initialize Wnrq with the bio request. Then determine whether Wnrq has the merge attribute. If there is, jump to step (3.11); otherwise, jump to step (3.12);
(3.7)完成Wfrq与Wprev的合并,并在调度器的写链表、哈希链表中删除Wfrq,接着需要调整Wprev在哈希链表中的位置,最后将Wprev标记为上一次合并的写请求,已经将该bio请求转化为了写请求,结束本次转换;(3.7) Complete the merger of Wfrq and Wprev, and delete Wfrq in the write list and hash list of the scheduler, then adjust the position of Wprev in the hash list, and finally mark Wprev as the last merged write request, which has been Convert the bio request into a write request, and end this conversion;
(3.8)将Wfrq标记为上次合并的写请求,已经将该bio请求转化为了写请求,结束本次转换;(3.8) Mark Wfrq as the last merged write request, the bio request has been converted into a write request, and this conversion is ended;
(3.9)完成Wbrq与Wnext的合并,并在调度器的写链表、哈希链表删除Wnext,接着需要调整Wbrq在哈希链表中的位置,最后将Wbrq标记为上次合并的写请求,已经将该bio请求转化为了写请求,结束本次转换;(3.9) Complete the merging of Wbrq and Wnext, and delete Wnext in the write list and hash list of the scheduler, then adjust the position of Wbrq in the hash list, and finally mark Wbrq as the last merged write request. The bio request is converted into a write request, and the conversion ends;
(3.10)调整Wbrq在哈希链表中的位置,并将Wbrq标记为上次合并的写请求,已经将该bio请求转化为了写请求,结束本次转换;(3.10) Adjust the position of Wbrq in the hash linked list, and mark Wbrq as the write request merged last time, the bio request has been converted into a write request, and this conversion is ended;
(3.11)将Wnrq加入哈希链表中,并判断上次合并的写请求是否存在。若不存在,则将Wnrq标记为上次合并的写请求,并跳转到步骤(3.12);否则,直接跳到步骤(3.12);(3.11) Add Wnrq to the hash list, and judge whether the write request merged last time exists. If it does not exist, then mark Wnrq as the last merged write request, and jump to step (3.12); otherwise, jump directly to step (3.12);
(3.12)将Wnrq加入调度器写链表的尾部,从而将该bio请求转化为了写请求。(3.12) Add Wnrq to the tail of the scheduler write list, thereby converting the bio request into a write request.
优选地,步骤(4)包括以下子步骤:Preferably, step (4) includes the following sub-steps:
(4.1)判断调度器队列中目前分发请求所处的阶段是处于读批处理阶段还是写批处理阶段,如果处于读批处理阶段,则跳转到步骤(4.2);如果处于写批处理阶段,则跳转到步骤(4.3),(4.1) Determine whether the stage of the current distribution request in the scheduler queue is in the read batch processing stage or the write batch processing stage, if it is in the read batch processing stage, then jump to step (4.2); if it is in the write batch processing stage, Then jump to step (4.3),
(4.2)判断读批处理阶段分发的请求总数是否已超过阈值。若已超过阈值,则跳转到步骤(4.4),否则跳转到步骤(4.5);(4.2) Determine whether the total number of requests distributed in the read batch processing stage has exceeded the threshold. If the threshold has been exceeded, then jump to step (4.4), otherwise jump to step (4.5);
(4.3)判断写批处理阶段分发的请求总数是否已达上限,若已达上限,跳转到步骤(4.4),否则跳转到步骤(4.6);(4.3) Determine whether the total number of requests distributed in the batch writing stage has reached the upper limit, if it has reached the upper limit, jump to step (4.4), otherwise jump to step (4.6);
(4.4)创建一个批处理阶段,并确定调度器队列中是否存在对应的读请求吗,若存在,跳转到步骤(4.9);否则跳转到步骤(4.10);(4.4) Create a batch processing stage, and determine whether there is a corresponding read request in the scheduler queue, if there is, jump to step (4.9); otherwise jump to step (4.10);
(4.5)将调度器中待分发的请求Drq设置为next_rq[read],并跳转到步骤(4.7);(4.5) set the request Drq to be distributed in the scheduler to next_rq[read], and jump to step (4.7);
(4.6)将调度器中待分发的请求Drq设置为next_rq[write],并跳转到步骤(4.8);(4.6) Set the request Drq to be distributed in the scheduler to next_rq[write], and jump to step (4.8);
(4.7)将Drq分发到设备驱动层的请求队列中,从调度器的读链表和读红黑树中删除该Drq,并将next_rq[read]设置为Drq在调度器的读红黑树中的下一个请求,最后对该操作进行计数,结束本次请求分发过程;(4.7) Distribute Drq to the request queue of the device driver layer, delete the Drq from the read linked list and read red-black tree of the scheduler, and set next_rq[read] to the value of Drq in the read red-black tree of the scheduler For the next request, count the operation at last, and end the request distribution process;
(4.8)将Drq分发到设备驱动层的请求队列中,从调度器的写链表中删除Drq,设置next_rq[write]为Drq在调度器写链表中的下一个请求,最后对该操作进行计数,结束本次请求分发过程;(4.8) Distribute Drq to the request queue of the device driver layer, delete Drq from the write list of the scheduler, set next_rq[write] to be the next request of Drq in the write list of the scheduler, and finally count the operation, End the request distribution process;
(4.9)判断调度器的写链表中是否存在请求,如果不存在,则跳转到步骤(4.11);如果存在,则继续判断是否存在超时的写请求或者写请求被饥饿的次数已达上限,如果判定结果为真,则跳转到步骤(4.12);否则跳转到步骤(4.11);(4.9) Judging whether there is a request in the write list of the scheduler, if not, then jump to step (4.11); if it exists, continue to judge whether there is a timed-out write request or the number of times the write request is starved has reached the upper limit, If the judgment result is true, then jump to step (4.12); otherwise, jump to step (4.11);
(4.10)判断调度器的写链表中是否存在请求,若存在,则跳转到步骤(4.12);否则调度器中不存在待分发的请求,结束本次请求分发过程;(4.10) Judging whether there is a request in the write list of the scheduler, if there is, then jump to step (4.12); otherwise, there is no request to be distributed in the scheduler, and this request distribution process is ended;
(4.11)判断next_rq[read]是否存在。若存在,跳转到步骤(4.13);否则跳转到步骤(4.14);(4.11) Determine whether next_rq[read] exists. If it exists, jump to step (4.13); otherwise, jump to step (4.14);
(4.12)将写请求被饥饿的次数置0。然后判断是否存在超时的写请求或者next_rq[write]是否为空。如果判定结果为真,跳转到步骤(4.15);否者跳转到步骤(4.16);(4.12) Set the number of times the write request is starved to 0. Then judge whether there is a timeout write request or whether next_rq[write] is empty. If the determination result is true, jump to step (4.15); otherwise jump to step (4.16);
(4.13)将待分发请求Drq设置为next_rq[read],设置batching=0,开始创建一个读批处理阶段,跳转到步骤(4.7);(4.13) The request Drq to be distributed is set to next_rq[read], batching=0 is set, a read batch processing stage is started to be created, and the step (4.7) is jumped to;
(4.14)将待分发请求Drq设置为调度器读链表中的第一个请求,设置batching=0,开始创建一个读批处理阶段,跳转到步骤(4.7);(4.14) the request Drq to be distributed is set to the first request in the scheduler read linked list, batching=0 is set, a read batch processing stage is started to be created, and the step (4.7) is jumped to;
(4.15)将待分发请求Drq设置为调度器写链表中的第一个请求,设置batching=0,开始创建一个写批处理阶段,跳转到步骤(4.8);(4.15) the request Drq to be distributed is set to the first request in the scheduler write linked list, batching=0 is set, a write batch stage is started to be created, and the step (4.8) is jumped to;
(4.16)将待分发请求Drq设置为next_rq[write],设置batching=0,开始创建一个写批处理阶段,跳转到步骤(4.8)。(4.16) Set the request Drq to be distributed to next_rq[write], set batching=0, start to create a write batch processing stage, and jump to step (4.8).
总体而言,通过本发明所构思的以上技术方案与现有技术相比,能够取得下列有益效果:Generally speaking, compared with the prior art, the above technical solutions conceived by the present invention can achieve the following beneficial effects:
(1)本发明能够解决现有方法中存在的连续的读请求合并后过大或不合并的问题:由于采用了步骤(2.2)步骤(2.4)步骤(2.5)和步骤(2.7),因此可以解决现有Noop算法中对读请求不合并,以及CFQ、Deadline、Anticipatory算法中对读请求合并过大的问题。(1) The present invention can solve the problem that the continuous read requests that exist in the existing method are too large or not merged: due to the adoption of step (2.2) step (2.4) step (2.5) and step (2.7), it can Solve the problem that the read requests in the existing Noop algorithm are not merged, and the read requests in the CFQ, Deadline, and Anticipatory algorithms are merged too much.
(2)本发明能够解决现有方法中存在的对连续的写请求不必要的合并的问题:由于采用了步骤(3.4)步骤(3.5)步骤(3.7)和步骤(3.9),因此可以解决现有IO调度算法中对连续的写请求进行合并的问题。(2) The present invention can solve the problem of unnecessary merging of continuous write requests existing in the existing method: due to the adoption of step (3.4) step (3.5) step (3.7) and step (3.9), the existing method can be solved There is a problem of merging consecutive write requests in the IO scheduling algorithm.
(3)本发明充分利用固态盘在不同访问模式下读写性能:考虑到SSD中顺序模式下读请求的性能远远高于随机模式下读请求的性能,对读请求不仅按照请求到来的时间在链表中排队,还按照读请求的起始访问地址在红黑树中排序,以构造读请求的顺序性,对排序之后的读请求进行前向合并和后向合并;考虑到顺序模式下写请求的性能和随机模式下写请求的性能基本相同,写请求只需要按照请求到来的时间在链表中排队,不需要在红黑树中排序构造写请求的顺序性,对写请求也只进行后向合并。(3) The present invention makes full use of the read and write performance of solid state disks under different access modes: considering that the performance of read requests in SSDs in sequential mode is much higher than that of read requests in random modes, the read requests are not only based on the time when the request arrives Queue in the linked list, and also sort in the red-black tree according to the initial access address of the read request to construct the order of the read request, and perform forward and backward merges on the sorted read requests; considering the write in sequential mode The performance of the request is basically the same as that of the write request in the random mode. The write request only needs to be queued in the linked list according to the arrival time of the request. It does not need to be sorted in the red-black tree to construct the order of the write request, and the write request is only processed later. to merge.
(4)本发明充分利用固态盘丰富的并行性:合并读写请求时,考虑到SSD中最小的读写操作单元是页,而为了利用SSD内部的并行性,读写操作有可能是以聚簇页为最小操作单元,通过黑盒测试的方式获取聚簇页的大小,并合并请求时,保证合并之后的请求不超过聚簇页的大小。当请求大小达到聚簇页大小时,继续合并请求,不会带来性能的提升,反而会造成SSD设备空闲等待,降低性能。(4) The present invention makes full use of the rich parallelism of solid state disks: when merging read and write requests, it is considered that the smallest read and write operation unit in the SSD is a page, and in order to utilize the parallelism inside the SSD, the read and write operations may be aggregated The cluster page is the smallest operation unit. The size of the cluster page is obtained through black-box testing, and when merging requests, ensure that the merged request does not exceed the size of the cluster page. When the request size reaches the size of the cluster page, continuing to merge requests will not bring performance improvement, but will cause the SSD device to wait idle and reduce performance.
附图说明Description of drawings
图1是Linux操作系统固态盘访问的I/O路径。Figure 1 is the I/O path accessed by the Linux operating system SSD.
图2是本发明基于固态盘不同读写特性的I/O调度方法的流程图。Fig. 2 is a flow chart of the I/O scheduling method based on different read and write characteristics of the solid state disk according to the present invention.
图3是本调度方法中读bio请求到请求的转变。Fig. 3 is the transition from read bio request to request in the scheduling method.
图4是本调度方法中写bio请求到请求的转变。Fig. 4 is the transition from writing bio request to request in the scheduling method.
图5是本调度方法中分发请求到设备驱动层。Fig. 5 is the dispatching request to the device driver layer in the dispatching method.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。此外,下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.
如图1所示,上层应用通过系统调用读写模块发起的IO请求,首先会经过虚拟文件系统层(Virtual file system,简称VFS),并到达真实需要访问的文件系统层,接着会经过通用块层和IO调度层,最后达到SCSI设备驱动层,才能向底层磁盘请求数据。本发明基于固态盘不同读写特性的I/O调度方法工作在I/O调度层。具体完成如下两个功能:As shown in Figure 1, the IO request initiated by the upper-layer application through the system call read-write module will first pass through the virtual file system layer (Virtual file system, referred to as VFS), and reach the file system layer that actually needs to be accessed, and then pass through the general block layer and IO scheduling layer, and finally reach the SCSI device driver layer to request data from the underlying disk. The I/O scheduling method based on the different read-write characteristics of the solid state disk in the present invention works on the I/O scheduling layer. Specifically complete the following two functions:
1、配合通用块层将文件系统层下发的bio请求按照一定的策略封装成请求,并加入调度层的请求队列;1. Cooperate with the general block layer to encapsulate the bio request issued by the file system layer into a request according to a certain strategy, and add it to the request queue of the scheduling layer;
2、将IO调度层请求队列中的请求按照一定的策略分发到下层的SCSI设备驱动层的请求队列中。使得到达固态盘的读请求呈现更强的顺序性,能够合理适度地利用固态盘内部的并行性以及避免读写请求之间的相互干扰,从而能够提高整个固态盘系统的I/O性能。2. Distribute the requests in the request queue of the IO scheduling layer to the request queue of the lower SCSI device driver layer according to a certain strategy. The read requests arriving at the solid-state disk show a stronger sequence, which can reasonably and moderately utilize the internal parallelism of the solid-state disk and avoid mutual interference between read and write requests, thereby improving the I/O performance of the entire solid-state disk system.
本发明的整体思路在于,其充分考虑了固态盘中读写请求的相互干扰,将读写请求分离,批量处理读请求和写请求;考虑到顺序模式下读请求的性能远远高于随机模式下读请求的性能,对读请求不仅按照请求到来的时间在链表中排队,还按照读请求的起始访问地址在红黑树中排序,以构造读请求的顺序性,对排序之后的读请求进行前向合并和后向合并;考虑到顺序模式下写请求的性能和随机模式下写请求的性能基本相同,写请求只需要按照请求到来的时间在链表中排队,不需要在红黑树中排序构造写请求的顺序性,对写请求也只进行后向合并;考虑到固态盘中存在着可利用的最优并行度,当合并请求达到聚簇页大小时,不继续进行合并,请求达到聚簇页大小时已经可以充分利用固态盘中丰富的并行性,继续合并请求,不会带来性能提升,反而会造成SSD设备空闲等待,降低性能。The overall idea of the present invention is that it fully considers the mutual interference of read and write requests in the solid state disk, separates the read and write requests, and processes read requests and write requests in batches; considering that the performance of read requests in sequential mode is much higher than that in random mode For the performance of read requests, the read requests are not only queued in the linked list according to the arrival time of the requests, but also sorted in the red-black tree according to the initial access address of the read requests to construct the order of the read requests, and the read requests after sorting Perform forward merging and backward merging; considering that the performance of write requests in sequential mode is basically the same as that in random mode, write requests only need to be queued in the linked list according to the arrival time of the request, and do not need to be in the red-black tree. Sorting constructs the sequence of write requests, and only merges backwards for write requests; considering that there is an optimal parallelism available in solid-state disks, when the merge request reaches the size of the cluster page, the merge will not continue, and the request will reach When the cluster page size is small, the rich parallelism in the solid state disk can be fully utilized, and continuing to merge requests will not bring about performance improvement, but will cause the SSD device to wait idle and reduce performance.
本发明在Linux内核的I/O调度层实现,具体完成如下两个功能:1、配合通用块层将文件系统层下发的bio请求按照一定的策略封装成请求,并加入调度层的请求队列;2、将IO调度层请求队列中的请求按照一定的策略分发到下层的SCSI设备驱动层的请求队列中。使得到达固态盘的读请求呈现更强的顺序性,能够合理适度地利用固态盘内部的并行性以及避免读写请求之间的相互干扰,从而能够提高整个固态盘系统的I/O性能。The present invention is implemented in the I/O scheduling layer of the Linux kernel, and specifically completes the following two functions: 1. Cooperate with the general block layer to encapsulate the bio request issued by the file system layer into a request according to a certain strategy, and add it to the request queue of the scheduling layer ; 2. Distribute the requests in the request queue of the IO scheduling layer to the request queue of the lower SCSI device driver layer according to a certain strategy. The read requests arriving at the solid-state disk show a stronger sequence, which can reasonably and moderately utilize the internal parallelism of the solid-state disk and avoid mutual interference between read and write requests, thereby improving the I/O performance of the entire solid-state disk system.
如图2所示,本发明基于固态盘读写特性的IO调度方法包括以下步骤:As shown in Figure 2, the IO scheduling method based on the read and write characteristics of the solid state disk in the present invention includes the following steps:
(1)接收来自上层应用并经过文件系统(在本实施方式中,该文件系统是ext4文件系统)处理后的bio请求,并判断该bio请求的类型是读请求还是写请求,如果是读请求则转入步骤(2),如果是写请求则转入步骤(3);(1) Receive the bio request from the upper-layer application and process through the file system (in this embodiment, the file system is an ext4 file system), and judge whether the type of the bio request is a read request or a write request, if it is a read request Then go to step (2), if it is a write request, go to step (3);
(2)判断该读请求是否能够与其邻近的读请求进行合并,若不能合并,则直接转入步骤(4);若可以合并且合并后的大小不超过聚簇页的大小,则将其与邻近的读请求进行合并,合并后的读请求仍然可以与其邻近的读请求进行合并,并且应当确保合并后的大小不超过聚簇页的大小。将最终合并后的读请求放入读队列和读红黑树中,以供系统读取数据。然后转入步骤(4)。(2) Determine whether the read request can be merged with its adjacent read request, if it cannot be merged, then go directly to step (4); if it can be merged and the size of the merged page does not exceed the size of the cluster page, then merge it with Adjacent read requests are merged, and the merged read request can still be merged with its adjacent read requests, and it should be ensured that the merged size does not exceed the size of the cluster page. Put the final merged read request into the read queue and read red-black tree for the system to read data. Then go to step (4).
(3)判断该写请求是否能够与缓存中的写请求进行合并,如果可以则进行合并,并将合并后的写请求插入调度器队列末端,并转入步骤(4);否则判断其能否与哈希表中的写请求进行合并,如果可以则进行合并,并将将合并后的写请求插入调度器队列末端,并转入步骤(4);(3) Judging whether the write request can be merged with the write request in the cache, if possible, merge, and insert the merged write request into the end of the scheduler queue, and go to step (4); otherwise, judge whether it can Merge with the write request in the hash table, if possible, merge, and insert the merged write request into the end of the scheduler queue, and turn to step (4);
(4)将调度器队列中的请求分发到设备驱动器的请求队列中,过程结束。(4) Distribute the requests in the scheduler queue to the request queue of the device driver, and the process ends.
如图3所示,本发明方法中的步骤(2)包括以下子步骤:As shown in Figure 3, step (2) in the inventive method comprises the following substeps:
(2.1)针对该读请求,判断是否存在上次合并的读请求,若存在,则跳转到步骤(2.2);否则跳转到步骤(2.3);(2.1) For the read request, judge whether there is a read request merged last time, and if so, jump to step (2.2); otherwise, jump to step (2.3);
(2.2)判断该读请求与上次合并后的读请求能否进行合并。如果不能合并,则跳转到步骤(2.3),如果能前向合并,则跳转到步骤(2.4),如果能后向合并,则跳转到步骤(2.5);具体而言,如果该读请求与前一读请求在地址上是连续的,而且合并后的请求大小不超过聚簇页大小,则表示能够前向合并,如果该读请求的地址与后一读请求在地址上是连续的,且合并后的读请求大小不超过聚簇页大小,则表示能够后向合并;(2.2) Determine whether the read request can be merged with the last merged read request. If it cannot be merged, then jump to step (2.3), if it can be merged forward, then jump to step (2.4), if it can be merged backwards, then jump to step (2.5); specifically, if the read If the address of the request is continuous with the previous read request, and the size of the merged request does not exceed the size of the cluster page, it means that it can be merged forward. If the address of the read request is continuous with the address of the next read request , and the merged read request size does not exceed the cluster page size, it means that it can be merged backwards;
本步骤的优点在于:通过缓存上一次合并的请求,在几乎不增加开销的情况下可以大大缩短搜寻能与当前读请求合并的请求的时延,提高合并请求的效率,增加性能;另外合并之后请求的大小不超过聚簇页大小,可以充分利用SSD内部丰富的并行性,又不至于过度合并造成SSD设备空闲等待,可以进一步提高性能;The advantage of this step is that by caching the last merged request, the delay in searching for a request that can be merged with the current read request can be greatly shortened with little overhead, improving the efficiency of the merged request and increasing performance; in addition, after the merge The size of the request does not exceed the size of the cluster page, which can make full use of the rich parallelism inside the SSD, and will not cause the SSD device to wait idle due to excessive consolidation, which can further improve performance;
(2.3)判断该读请求能否与哈希链表中的请求进行后向合并,如果能则跳转到步骤(2.5);否则跳转到步骤(2.6);(2.3) Judging whether the read request can be merged backwards with the request in the hash list, if so, jump to step (2.5); otherwise, jump to step (2.6);
(2.4)完成可前向合并的读请求Rfrq与该读请求的合并,并判断前向合并之后的读请求Rfrq能否继续与调度器中读红黑树中按照请求的起始访问地址排序的Rfrq的前一个读请求Rprev继续进行前向合并。能则跳转到步骤(2.7);否则跳转到步骤(2.8);(2.4) Complete the merge of the read request Rfrq that can be forward merged with the read request, and judge whether the read request Rfrq after the forward merge can continue to be sorted according to the initial access address of the request in the red-black tree in the scheduler The previous read request Rprev of Rfrq continues to merge forward. If yes, jump to step (2.7); otherwise, jump to step (2.8);
(2.5)完成可后向合并的请求Rbrq与该读请求的合并,并判断后向合并之后的请求Rbrq能否继续与调度器中读红黑树中按照请求的起始访问地址排序的Rbrq的后一个请求Rnext继续进行后向合并。能则跳转到步骤(2.9),否则跳转到步骤(2.10);(2.5) Complete the merging of the request Rbrq that can be merged backwards with the read request, and determine whether the request Rbrq after the backward merging can continue to read the Rbrqs sorted according to the initial access address of the request in the red-black tree in the scheduler The latter requests Rnext to proceed with the backward merge. If so, jump to step (2.9), otherwise jump to step (2.10);
(2.6)判断该读请求能否与调度器中读红黑树中的请求进行前向合并,如果能则跳转到步骤(2.4),否则跳转到步骤(2.11);(2.6) Judging whether the read request can be forward merged with the request in the scheduler to read the red-black tree, if so, jump to step (2.4), otherwise jump to step (2.11);
(2.7)完成读请求Rfrq与读请求Rperv的前向合并,删除调度器读队列、读红黑树以及通用块层哈希链表中的Rfrq请求;同时根据被合并的读请求的大小改变Rprev在哈希链表中的位置;并将合并后的最晚服务时间设置为Rfrq和Rprev的最晚服务时间中较小的一个;最后将上次合并的读请求标记为Rfrq,本次转换结束。具体而言,以上步骤就是删除被合并后读请求的副本,以及调整合并后的读请求的服务时间和在哈希链表中的位置,并记录上次合并的读请求的地址;(2.7) Complete the forward merge of the read request Rfrq and the read request Rperv, delete the Rfrq request in the scheduler read queue, read the red-black tree, and the general block layer hash list; at the same time, change the Rprev according to the size of the merged read request The position in the hash list; and the latest service time after merging is set to the smaller one of the latest service time of Rfrq and Rprev; finally, the last merged read request is marked as Rfrq, and this conversion ends. Specifically, the above steps are to delete the copy of the merged read request, adjust the service time and position in the hash list of the merged read request, and record the address of the last merged read request;
本步骤的优点在于:通过前向合并,可以将小请求聚合成大请求,可以大大提高系统的吞吐量;The advantage of this step is: through forward merging, small requests can be aggregated into large requests, which can greatly improve the throughput of the system;
(2.8)将上次合并的读请求设置为Rfrq,并在读红黑树中更新Rfrq的起始地址;(2.8) The last merged read request is set to Rfrq, and the starting address of Rfrq is updated in the red-black tree;
(2.9)完成请求Rbrq与请求Rnext的后向合并,在调度器的读队列、读红黑树以及通用块层的哈希链表中删除Rnext请求,更新Rbrq请求在哈希链表中的位置,将Rbrq和Rnext中较早的服务时间设置为合并后的读请求的服务时间,将Rbrq标记为上次合并的读请求,并跳转到步骤(2.14);(2.9) Complete the backward merge of request Rbrq and request Rnext, delete the Rnext request in the read queue of the scheduler, read the red-black tree, and the hash list of the general block layer, update the position of the Rbrq request in the hash list, and put The earlier service time in Rbrq and Rnext is set as the service time of the merged read request, Rbrq is marked as the last merged read request, and jumps to step (2.14);
本步骤的优点在于:通过后向合并,可以将小请求聚合成大请求,可以大大提高系统的吞吐量;The advantage of this step is: through backward merging, small requests can be aggregated into large requests, which can greatly improve the throughput of the system;
(2.10)将Rbrq标记为上次合并的读请求,并更新Rbrq请求在哈希链表中的位置;结束转换;(2.10) mark Rbrq as the read request merged last time, and update the position of Rbrq request in the hash list; end conversion;
(2.11)为该bio请求分配一个请求Rnrq,并用该bio请求初始化Rnrq,并判断Rnrq是否具有合并属性,如果有则跳转到步骤(2.12);否则跳转到步骤(2.13);(2.11) distribute a request Rnrq for this bio request, and request initialization Rnrq with this bio, and judge whether Rnrq has merged property, if have then jump to step (2.12); Otherwise jump to step (2.13);
(2.12)将Rnrq请求加入哈希链表,并判断是否存在上次合并的读请求。如果存在,则直接跳转到步骤(2.13);否则先将Rnrq标记为上次合并的读请求,并跳转到步骤(2.13);(2.12) Add the Rnrq request to the hash list, and judge whether there is a read request merged last time. If it exists, then jump directly to step (2.13); otherwise first mark Rnrq as the last merged read request, and jump to step (2.13);
(2.13)将Rnrq请求加入调度器中读队列末尾以及读红黑树中;(2.13) Add the Rnrq request to the end of the read queue in the scheduler and read the red-black tree;
本步骤的优点在于:设置读请求的最晚需要被服务时间,优先服务超时的读请求,避免某一读请求长期得不到服务;另外对读请求不仅按照请求到来的时间在链表中排队,还按照读请求的起始访问地址在红黑树中排序,以构造读请求的顺序性,以充分利用顺序模式下远远优于随机模式下的读性能;The advantage of this step is: set the latest time for the read request to be serviced, give priority to the overtime read request, and avoid a certain read request from being unserviced for a long time; in addition, the read request is not only queued in the linked list according to the arrival time of the request, It is also sorted in the red-black tree according to the initial access address of the read request to construct the sequentiality of the read request, so as to make full use of the read performance in the sequential mode which is far superior to that in the random mode;
如图4所示,本发明方法中的步骤(3)包括以下子步骤:As shown in Figure 4, step (3) in the inventive method comprises the following substeps:
(3.1)针对该写请求,判断是否存在上次合并的写请求,如果存在,跳转到步骤(3.2);否则跳转到步骤(3.3);(3.1) For the write request, judge whether there is a write request merged last time, and if so, jump to step (3.2); otherwise, jump to step (3.3);
(3.2)判断该bio请求是否能够与上次合并的写请求进行合并,如果能够前向合并,跳转到步骤(3.4);如果能够进行后向合并,跳转到步骤(3.5);如果不能合并跳转到步骤(3.3);(3.2) Determine whether the bio request can be merged with the last merged write request, if it can be merged forward, go to step (3.4); if it can be merged backwards, go to step (3.5); if not Merge and jump to step (3.3);
本步骤的优点在于:通过缓存上一次合并的请求,在几乎不增加开销的情况下可以大大缩短搜寻能与当前bio请求合并的请求的时延,提高合并请求的效率,增加性能;另外合并之后请求的大小不超过聚簇页大小,可以充分利用SSD内部丰富的并行性,又不至于过度合并造成SSD设备空闲等待,可以进一步提高性能;The advantage of this step is that by caching the last merged request, the delay in searching for a request that can be merged with the current bio request can be greatly shortened, improving the efficiency of the merged request, and increasing performance; in addition, after the merge The size of the request does not exceed the size of the cluster page, which can make full use of the rich parallelism inside the SSD, and will not cause the SSD device to wait idle due to excessive consolidation, which can further improve performance;
(3.3)判断该bio请求能否与哈希链表中的请求进行后向合并,如果能则跳转到步骤(3.5);否则跳转到步骤(3.6);(3.3) Determine whether the bio request can be merged backwards with the request in the hash list, and if so, jump to step (3.5); otherwise, jump to step (3.6);
(3.4)将该bio请求与上次合并的写请求进行前向合并,并判断合并之后还能否继续与调度器中写链表中其前一个请求Wprev继续进行合并。如果能与Wprev合并,则跳转到步骤(3.7),否则跳转到步骤(3.8);(3.4) Forward merging the bio request with the last merged write request, and judging whether it can continue to be merged with the previous request Wprev in the write list in the scheduler after the merge. If it can be merged with Wprev, then jump to step (3.7), otherwise jump to step (3.8);
(3.5)完成bio请求与可后向合并的请求Wbrq的后向合并,并判断合并之后的Wbrq能否与调度器写链表中Wbrq的下一个请求Wnext继续进行合并。能则跳转到步骤(3.9);否则跳转到步骤(3.10);(3.5) Complete the backward merging of the bio request and the request Wbrq that can be merged backwards, and judge whether the merged Wbrq can continue to be merged with the next request Wnext of Wbrq in the write list of the scheduler. If yes, jump to step (3.9); otherwise, jump to step (3.10);
本步骤的优点在于:通过后向合并,可以将小请求聚合成大请求,可以大大提高系统的吞吐量;The advantage of this step is: through backward merging, small requests can be aggregated into large requests, which can greatly improve the throughput of the system;
(3.6)为该bio请求分配一个新请求Wnrq,并用该bio请求初始化Wnrq。然后判断Wnrq是否具有合并属性。有则跳转到步骤(3.11);否则跳转到步骤(3.12);(3.6) Allocate a new request Wnrq for the bio request, and initialize Wnrq with the bio request. Then determine whether Wnrq has the merge attribute. If there is, jump to step (3.11); otherwise, jump to step (3.12);
(3.7)完成Wfrq与Wprev的合并,并在调度器的写链表、哈希链表中删除Wfrq,接着需要调整Wprev在哈希链表中的位置,最后将Wprev标记为上一次合并的写请求,已经将该bio请求转化为了写请求,结束本次转换;(3.7) Complete the merger of Wfrq and Wprev, and delete Wfrq in the write list and hash list of the scheduler, then adjust the position of Wprev in the hash list, and finally mark Wprev as the last merged write request, which has been Convert the bio request into a write request, and end this conversion;
(3.8)将Wfrq标记为上次合并的写请求,已经将该bio请求转化为了写请求,结束本次转换;(3.8) Mark Wfrq as the last merged write request, the bio request has been converted into a write request, and this conversion is ended;
(3.9)完成Wbrq与Wnext的合并,并在调度器的写链表、哈希链表删除Wnext,接着需要调整Wbrq在哈希链表中的位置,最后将Wbrq标记为上次合并的写请求,已经将该bio请求转化为了写请求,结束本次转换;(3.9) Complete the merging of Wbrq and Wnext, and delete Wnext in the write list and hash list of the scheduler, then adjust the position of Wbrq in the hash list, and finally mark Wbrq as the last merged write request. The bio request is converted into a write request, and the conversion ends;
(3.10)调整Wbrq在哈希链表中的位置,并将Wbrq标记为上次合并的写请求,已经将该bio请求转化为了写请求,结束本次转换;(3.10) Adjust the position of Wbrq in the hash linked list, and mark Wbrq as the write request merged last time, the bio request has been converted into a write request, and this conversion is ended;
(3.11)将Wnrq加入哈希链表中,并判断上次合并的写请求是否存在。若不存在,则将Wnrq标记为上次合并的写请求,并跳转到步骤(3.12);否则,直接跳到步骤(3.12);(3.11) Add Wnrq to the hash list, and judge whether the write request merged last time exists. If it does not exist, then mark Wnrq as the last merged write request, and jump to step (3.12); otherwise, jump directly to step (3.12);
(3.12)将Wnrq加入调度器写链表的尾部,从而将该bio请求转化为了写请求;(3.12) Add Wnrq to the tail of the scheduler write linked list, thereby converting the bio request into a write request;
本步骤的优点在于:充分考虑到SSD中顺序写请求的性能与随机写请求的性能基本相同,对写请求只按照到来时间在链表中排队,不需要按照写请求的起始访问地址在红黑树中排序,构造写请求的顺序性,消除了排序的性能花销,提高了性能;The advantage of this step is: fully considering that the performance of sequential write requests in SSD is basically the same as that of random write requests, the write requests are only queued in the linked list according to the arrival time, and there is no need to follow the initial access address of the write request in red and black. Sorting in the tree, constructing the order of write requests, eliminating the performance overhead of sorting and improving performance;
如图5所示,本发明方法中的步骤(4)包括以下子步骤:As shown in Figure 5, step (4) in the inventive method comprises the following substeps:
(4.1)判断调度器队列中目前分发请求所处的阶段是处于读批处理阶段还是写批处理阶段。如果处于读批处理阶段,则跳转到步骤(4.2);如果处于写批处理阶段,则跳转到步骤(4.3);具体而言,就是通过获取程序运行时next_rq[dir]变量的值来判断分发请求是处于读批处理阶段还是写批处理阶段。本步骤的优点在于:批量分发读请求或写请求,避免SSD中读写请求之间的相互干扰造成的性能下降;(4.1) Determine whether the current distribution request in the scheduler queue is in the read batch processing phase or the write batch processing phase. If it is in the batch reading phase, then jump to step (4.2); if it is in the batch writing phase, then jump to step (4.3); specifically, by obtaining the value of the next_rq[dir] variable when the program is running Determine whether the distribution request is in the batch read phase or the batch write phase. The advantage of this step is: distribute read requests or write requests in batches, avoid performance degradation caused by mutual interference between read and write requests in SSD;
(4.2)判断读批处理阶段分发的请求总数是否已超过阈值。若已超过阈值,则跳转到步骤(4.4),否则跳转到步骤(4.5);在本发明中,该阈值的取值范围是(0,16]。(4.2) Determine whether the total number of requests distributed in the read batch processing stage has exceeded the threshold. If the threshold has been exceeded, then jump to step (4.4), otherwise jump to step (4.5); in the present invention, the value range of the threshold is (0,16].
(4.3)判断写批处理阶段分发的请求总数是否已达上限,若已达上限,跳转到步骤(4.4);否则跳转到步骤(4.6);(4.3) Determine whether the total number of requests distributed in the batch writing stage has reached the upper limit, if it has reached the upper limit, jump to step (4.4); otherwise, jump to step (4.6);
(4.4)创建一个批处理阶段,并确定调度器队列中是否存在对应的读请求吗,若存在,跳转到步骤(4.9);否则跳转到步骤(4.10);(4.4) Create a batch processing stage, and determine whether there is a corresponding read request in the scheduler queue, if there is, jump to step (4.9); otherwise jump to step (4.10);
(4.5)将调度器中待分发的请求Drq设置为next_rq[read],并跳转到步骤(4.7);(4.5) set the request Drq to be distributed in the scheduler to next_rq[read], and jump to step (4.7);
(4.6)将调度器中待分发的请求Drq设置为next_rq[write],并跳转到步骤(4.8);(4.6) Set the request Drq to be distributed in the scheduler to next_rq[write], and jump to step (4.8);
(4.7)将Drq分发到设备驱动层的请求队列中,从调度器的读链表和读红黑树中删除该Drq,并将next_rq[read]设置为Drq在调度器的读红黑树中的下一个请求,最后对该操作进行计数,结束本次请求分发过程;(4.7) Distribute Drq to the request queue of the device driver layer, delete the Drq from the read linked list and read red-black tree of the scheduler, and set next_rq[read] to the value of Drq in the read red-black tree of the scheduler For the next request, count the operation at last, and end the request distribution process;
本步骤的优点在于:批量分发读请求阶段,从红黑树种获取下一个读请求,以利用红黑树的排序功能,构造读请求的顺序性,充分利用SSD中顺序读的性能远远高于随机读的性能;The advantage of this step is: in the stage of batch distribution of read requests, the next read request is obtained from the red-black tree species, so as to use the sorting function of the red-black tree to construct the sequence of read requests, and make full use of the performance of sequential read in SSD, which is much higher than Random read performance;
(4.8)将Drq分发到设备驱动层的请求队列中,从调度器的写链表中删除Drq,设置next_rq[write]为Drq在调度器写链表中的下一个请求,最后对该操作进行计数,结束本次请求分发过程;(4.8) Distribute Drq to the request queue of the device driver layer, delete Drq from the write list of the scheduler, set next_rq[write] to be the next request of Drq in the write list of the scheduler, and finally count the operation, End the request distribution process;
本步骤的优点在于:批量分发写请求阶段,从链表中获取下一个写请求,不需要构造写请求的顺序性,从而省去了排序的开销,充分利用SSD中随机写的性能基本与顺序写的性能相同;The advantage of this step is that in the phase of distributing write requests in batches, the next write request is obtained from the linked list without constructing the sequence of write requests, thereby saving the overhead of sorting and making full use of the performance of random write in SSD and sequential write. have the same performance;
(4.9)判断调度器的写链表中是否存在请求,如果不存在,则跳转到步骤(4.11);如果存在,则继续判断是否存在超时的写请求或者写请求被饥饿的次数已达上限。如果判定结果为真,则跳转到步骤(4.12);否则跳转到步骤(4.11);(4.9) Judging whether there is a request in the write list of the scheduler, if not, then jump to step (4.11); if it exists, continue to judge whether there is a time-out write request or the number of times the write request is starved has reached the upper limit. If the judgment result is true, then jump to step (4.12); otherwise, jump to step (4.11);
本步骤的优点在于:优先创建读请求的批量分发过程,赋予读请求较高的优先级,已满足用户对读请求响应时间要求比较严格的需求,另外为了防止写请求被饿死,也设置了写请求最晚需要被服务的时间,当有写请求超时时,优先分发超时的写请求;The advantage of this step is: the batch distribution process of read requests is created first, and read requests are given a higher priority, which has met the user's strict requirements on the response time of read requests. In addition, in order to prevent write requests from being starved to death, the The latest time when a write request needs to be serviced. When a write request times out, the timed-out write request is given priority;
(4.10)判断调度器的写链表中是否存在请求。若存在,则跳转到步骤(4.12);否则调度器中不存在待分发的请求,结束本次请求分发过程;(4.10) Determine whether there is a request in the write list of the scheduler. If it exists, jump to step (4.12); otherwise, there is no request to be distributed in the scheduler, and the request distribution process ends;
(4.11)判断next_rq[read]是否存在。若存在,跳转到步骤(4.13);否则跳转到步骤(4.14);(4.11) Determine whether next_rq[read] exists. If it exists, jump to step (4.13); otherwise, jump to step (4.14);
(4.12)准备进行写请求的批量处理阶段,将写请求被饥饿的次数置0。然后判断是否存在超时的写请求或者next_rq[write]是否为空。如果判定结果为真,跳转到步骤(4.15);否者跳转到步骤(4.16);(4.12) Prepare for the batch processing phase of the write request, and set the number of times the write request is starved to 0. Then judge whether there is a timeout write request or whether next_rq[write] is empty. If the determination result is true, jump to step (4.15); otherwise jump to step (4.16);
(4.13)将待分发请求Drq设置为next_rq[read],设置batching=0,开始创建一个读批处理阶段,跳转到步骤(4.7);(4.13) The request Drq to be distributed is set to next_rq[read], batching=0 is set, a read batch processing stage is started to be created, and the step (4.7) is jumped to;
(4.14)将待分发请求Drq设置为调度器读链表中的第一个请求,设置batching=0,开始创建一个读批处理阶段,跳转到步骤(4.7);(4.14) the request Drq to be distributed is set to the first request in the scheduler read linked list, batching=0 is set, a read batch processing stage is started to be created, and the step (4.7) is jumped to;
(4.15)将待分发请求Drq设置为调度器写链表中的第一个请求,设置batching=0,开始创建一个写批处理阶段,跳转到步骤(4.8);(4.15) the request Drq to be distributed is set to the first request in the scheduler write linked list, batching=0 is set, a write batch stage is started to be created, and the step (4.8) is jumped to;
(4.16)将待分发请求Drq设置为next_rq[write],设置batching=0,开始创建一个写批处理阶段,跳转到步骤(4.8)。(4.16) Set the request Drq to be distributed to next_rq[write], set batching=0, start to create a write batch processing stage, and jump to step (4.8).
本领域的技术人员容易理解,以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。Those skilled in the art can easily understand that the above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention, All should be included within the protection scope of the present invention.
Claims (4)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510827493.0A CN106775438B (en) | 2015-11-25 | 2015-11-25 | An I/O Scheduling Method Based on the Read and Write Characteristics of Solid State Disk |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510827493.0A CN106775438B (en) | 2015-11-25 | 2015-11-25 | An I/O Scheduling Method Based on the Read and Write Characteristics of Solid State Disk |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN106775438A CN106775438A (en) | 2017-05-31 |
| CN106775438B true CN106775438B (en) | 2019-08-30 |
Family
ID=58963938
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201510827493.0A Active CN106775438B (en) | 2015-11-25 | 2015-11-25 | An I/O Scheduling Method Based on the Read and Write Characteristics of Solid State Disk |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN106775438B (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107943413B (en) * | 2017-10-12 | 2021-02-09 | 记忆科技(深圳)有限公司 | Method for improving reading performance of solid state disk |
| CN108108130A (en) * | 2017-12-22 | 2018-06-01 | 深圳忆联信息系统有限公司 | A kind of method and solid state disk for promoting solid state disk read-write performance |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP4452064B2 (en) * | 2003-11-18 | 2010-04-21 | 株式会社日立製作所 | Information processing system, information processing apparatus, information processing apparatus control method, and program |
| US8037219B2 (en) * | 2009-04-14 | 2011-10-11 | Lsi Corporation | System for handling parallel input/output threads with cache coherency in a multi-core based storage array |
| CN101639763B (en) * | 2009-08-27 | 2011-08-24 | 中兴通讯股份有限公司 | IO dispatching method and device |
| CN102831014B (en) * | 2012-07-30 | 2016-05-25 | 华中科技大学 | A kind of method of utilizing many request queues to promote IO concurrency and reducing little IO delay |
| CN103336669B (en) * | 2013-05-21 | 2015-12-02 | 华中科技大学 | A kind of I/O dispatching method based on solid-state disk internal concurrency and scheduler |
| US9727248B2 (en) * | 2014-02-05 | 2017-08-08 | Apple Inc. | Dynamic IO operation timeout assignment for a solid state drive |
-
2015
- 2015-11-25 CN CN201510827493.0A patent/CN106775438B/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| CN106775438A (en) | 2017-05-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9128699B2 (en) | Method and system for queuing transfers of multiple non-contiguous address ranges with a single command | |
| CN103336669B (en) | A kind of I/O dispatching method based on solid-state disk internal concurrency and scheduler | |
| CN103186350B (en) | The moving method of mixing storage system and hot spot data block | |
| Gao et al. | Exploiting parallelism in I/O scheduling for access conflict minimization in flash-based solid state drives | |
| JP5729774B2 (en) | Memory controller, memory system, solid state drive, and method for processing several commands | |
| US8392670B2 (en) | Performance management of access to flash memory in a storage device | |
| US9348747B2 (en) | Solid state memory command queue in hybrid device | |
| CN101118477A (en) | Method for improving disk data access efficiency | |
| JP2009181148A (en) | Storage subsystem | |
| US20150253992A1 (en) | Memory system and control method | |
| CN111158602A (en) | Data layered storage method, data reading method, storage host and storage system | |
| CN101639763B (en) | IO dispatching method and device | |
| CN102981783A (en) | Cache accelerating method based on Nand Flash | |
| CN107305473B (en) | A method and device for scheduling IO requests | |
| CN106681660B (en) | IO scheduling method and IO scheduling device | |
| CN103226448B (en) | The driving method of solid state hard disc and device | |
| CN104750433A (en) | Cache design method based on SCST | |
| CN113821175B (en) | SSD instruction scheduling method and system based on storage content priority | |
| CN103064636B (en) | A kind of solid state disk read-write method and a kind of solid state hard disc | |
| WO2024193096A1 (en) | Data migration method and computing device | |
| US9430168B2 (en) | Recording medium storing a program for data relocation, data storage system and data relocating method | |
| CN116302105B (en) | Access instruction scheduling method, system, hard disk, controller, storage medium and program product | |
| CN106775438B (en) | An I/O Scheduling Method Based on the Read and Write Characteristics of Solid State Disk | |
| WO2020029749A1 (en) | I/o request dispatching method and apparatus | |
| WO2023029417A1 (en) | Data storage method and device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |