HK1188854A - Scheduling of i/o writes in a storage environment - Google Patents
Scheduling of i/o writes in a storage environment Download PDFInfo
- Publication number
- HK1188854A HK1188854A HK14101964.4A HK14101964A HK1188854A HK 1188854 A HK1188854 A HK 1188854A HK 14101964 A HK14101964 A HK 14101964A HK 1188854 A HK1188854 A HK 1188854A
- Authority
- HK
- Hong Kong
- Prior art keywords
- requests
- operations
- category
- storage devices
- read
- Prior art date
Links
Description
Technical Field
The present invention relates to computer networks, and more particularly to computing data storage systems.
Background
As computer memory storage and data bandwidth increase, the amount and complexity of enterprise managed data also increases. Large distributed storage systems, such as data centers, typically perform many business operations. A distributed storage system may be coupled to a number of client computers interconnected by one or more networks. If any portion of the distributed storage system is poor performing or becomes unavailable, then the company operations may be compromised or stopped altogether. Such distributed systems attempt to maintain high standards for data availability and high performance functionality.
In the storage system itself, the file system and storage level input/output (I/O) schedulers typically determine the order of read and write operations, and provide steps for how the operations will be performed. For example, non-sequential read and write operations may be more costly (e.g., in terms of time and/or resources) to perform than sequential read and write operations for a memory device. The I/O scheduler may then attempt to reduce the out-of-order operations. In addition, the I/O scheduler may provide additional functionality, such as starvation prevention, request coalescing, and inter-process fairness.
At least the read and write response times differ considerably between storage devices. Such differences may be characteristic of the technology itself. Thus, the techniques and mechanisms associated with the selected data storage device may determine the method for efficient I/O scheduling. For example, many current algorithms are established for systems that utilize Hard Disk Drives (HDDs). HDDs contain one or more rotating disks, each disk covered with magnetic media. The disks rotate at several thousand revolutions per minute. In addition, the electromagnetic actuator is responsible for positioning the magnetic read/write device onto the rotating disk. The mechanical and electromechanical design of a device affects its I/O characteristics. Unfortunately, friction, wear, vibration, and mechanical misalignment can create reliability problems and affect the I/O characteristics of the HDD. Many current I/O schedulers are designed taking into account the input/output (I/O) characteristics of HDDs.
An example of another type of storage medium is a Solid State Drive (SSD). In contrast to HDDs, SSDs utilize solid state memory, rather than magnetic media devices, to hold persistent data. Solid state memory may include flash memory cells. Flash memory has many features different from hard disk drives. For example, flash memory cells are typically erased in larger blocks before being rewritten or reprogrammed. Flash memory is also typically arranged in complex arrangements such as dice, packages, boards, and blocks. The size and parallelism of the chosen arrangement, the wear of the flash memory over time, and the interconnect and transfer speeds of the devices all vary. Additionally, such devices may also include a Flash Translation Layer (FTL) that manages storage on the device. The algorithms utilized by the FTL are variable and may also affect changes in the behavior and/or performance of the device. Thus, in systems that use flash-based SSDs for storage while utilizing I/O schedulers designed for systems such as hard disk drives with different characteristics, higher performance and predictable latency are often not achieved.
In view of the foregoing, there is a need for a system and method for efficiently scheduling read and write operations among a plurality of storage devices.
Disclosure of Invention
Various embodiments of a computer system and method for efficiently scheduling read and write operations among a plurality of solid-state storage devices are disclosed.
In one embodiment, a computer system includes a plurality of client computers configured to transmit read requests and write requests over a network to one or more data storage arrays coupled via the network to receive the read requests and write requests. Data storage arrays comprising a plurality of storage locations on a plurality of storage devices are envisioned. In various embodiments, the storage devices are configured in a Redundant Array of Independent Drives (RAID) arrangement for data storage and protection. The data storage device may include solid state memory technology for data storage, such as flash memory cells. The characteristics of the corresponding storage device are used to schedule I/O requests to the storage device. The characteristics may include predicted response time to an I/O request, device age, any corresponding cache size, access rate, error rate, current I/O request, completed I/O request, and so forth.
In one embodiment, an I/O scheduler is configured to receive read requests and write requests and schedule the read requests and write requests for processing by a plurality of storage devices. Depending on the operations being serviced, the storage devices may exhibit varying latencies and may also exhibit non-scheduled or unpredictable behavior at various times, resulting in performance that differs from expected or desired performance. In various embodiments, these actions correspond to actions that the device is operating properly (i.e., not in an error state), but only at a lower than expected or desired level based on latency and/or throughput. Such behavior and performance may be referred to as "variable performance" behavior. For example, technologies such as flash-based storage technologies may exhibit these variable performance behaviors. A storage controller is contemplated that is configured to receive a request targeting a data storage medium, the request including a first kind of operation and a second kind of operation. The controller is further configured to schedule requests of the first kind for immediate processing by the plurality of storage devices and to queue requests of the second kind for later processing by the plurality of storage devices. The first category of operations corresponds to operations having an expected lower latency and the second category of operations corresponds to operations having an expected higher latency. The low latency operation may correspond to a read operation, and the high latency operation may include a write operation. Embodiments are also contemplated in which, after queuing a plurality of requests corresponding to a second category of operation, the storage controller is configured to stop processing requests corresponding to the first category of operation and only process those requests corresponding to the second category of operation.
These and other embodiments will become apparent from the following description and the accompanying drawings.
Drawings
FIG. 1 is a generalized block diagram illustrating one embodiment of a network architecture.
FIG. 2 depicts a conceptual model according to one embodiment of a computing system.
FIG. 3 is a generalized block diagram illustrating one embodiment of a method of adjusting I/O scheduling to reduce unpredictable variable I/O response times across a data storage subsystem.
FIG. 4 is a generalized block diagram illustrating one embodiment of a method of isolating operations issued to a storage device.
FIG. 5 is a generalized flow diagram illustrating one embodiment of a method of modeling to characterize behavior of storage devices in a storage subsystem.
FIG. 6 is a generalized block diagram illustrating one embodiment of a storage subsystem.
FIG. 7 is a generalized block diagram illustrating another embodiment of a device unit.
FIG. 8 is a generalized block diagram illustrating another embodiment of a state table.
FIG. 9 is a generalized flow diagram illustrating one embodiment of a method of adjusting I/O scheduling to reduce unpredictable variable I/O response times across a data storage subsystem.
FIG. 10 is a generalized flow diagram illustrating one embodiment of a method for maintaining read operations with effective latency on a shared data memory.
FIG. 11 is a generalized flow diagram illustrating one embodiment of a method to reduce the number of storage devices exhibiting variable I/O response times.
FIG. 12 is a generalized flow diagram illustrating one embodiment of a method for maintaining read operations with effective latency on a shared data memory.
While the invention is susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and have been described in detail herein. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, one of ordinary skill in the art will realize that the invention may be practiced without these specific details. In some instances, well-known circuits, structures, signals, computer program instructions, and techniques have not been shown in detail to avoid obscuring the present invention.
Referring to fig. 1, a generalized block diagram of one embodiment of a network architecture 100 is shown. As described further below, one embodiment of network architecture 100 includes client computer systems 110a-110b interconnected by a network 180 and connected to data storage arrays 120a-120 b. The network 180 may be coupled to a second network 190 through a switch 140. Client computer system 110c is coupled to client computer systems 110a-110b and data storage arrays 120a-120b via network 190. Additionally, the network 190 may be coupled to the internet 160 or other external network through the switch 150.
Note that in alternative embodiments, the number and variety of client computers and servers, switches, networks, data storage arrays, and data storage devices is not limited to the number and variety shown in FIG. 1. At various times, one or more clients may operate offline. In addition, during operation, the various client computer connection categories may change as users connect, disconnect, and reconnect to network architecture 100. Further, while the present description generally discusses network attached storage, the systems and methods described herein are also applicable to direct attached storage systems, which may include a host operating system configured to implement one or more aspects of the described methods. Numerous such alternatives are possible and contemplated. A further description of the various components shown in fig. 1 is briefly provided below. First, an overview of some of the features provided by the data storage arrays 120a-120b is described.
In network architecture 100, each data storage array 120a-120b may be used for the sharing of data between different servers and computers, such as client computer systems 110a-110 c. In addition, the data storage arrays 120a-120b may be used for disk mirroring, backup and restore, archival and retrieval of archived data, and data migration from one storage device to another. In an alternative embodiment, one or more client computer systems 110a-110c may be linked to each other via a fast Local Area Network (LAN) to form a cluster. Such clients may share storage resources, such as cluster shared volumes that exist within one of the data storage arrays 120a-120 b.
Each data storage array 120a-120b includes a storage subsystem 170 for data storage. The storage subsystem 170 may include a plurality of storage devices 176a-176 m. These storage devices 176a-176m may provide data storage services to client computer systems 110a-110 c. Each of the storage devices 176a-176m utilizes particular techniques and mechanisms for data storage. The variety of techniques and mechanisms used within each storage device 176a-176m may be used, at least in part, to determine algorithms for controlling and scheduling read and write operations with respect to each storage device 176a-176 m. The logic used in these algorithms may be embodied in one or more of the base Operating System (OS)116, the file system 110, one or more global I/O schedulers 178 within the storage subsystem controller 174, control logic within each storage device 176a-176m, and so forth. Additionally, the logic, algorithms, and control mechanisms described herein may comprise hardware and/or software.
Each storage device 176a-176m may be configured to receive read requests and write requests, and include a plurality of data storage locations, each addressable as a row and a column in an array. In one embodiment, the data storage locations within the storage devices 176a-176m may be arranged as logical redundant storage containers or RAID arrays (redundant arrays of inexpensive/independent disks). In some embodiments, each storage device 176a-176m may utilize a different data storage technology than a conventional Hard Disk Drive (HDD). For example, one or more of the storage devices 176a-176m may include or be further coupled to a memory comprised of solid state memory to hold persistent data. In other embodiments, one or more of the storage devices 176a-176m may include or be further coupled to memory utilizing other technologies, such as spin-torque transfer technology, Magnetoresistive Random Access Memory (MRAM) technology, shingles, memristors, phase change memory, or other storage technologies. These different storage methods and techniques may result in different I/O characteristics between storage devices.
In one embodiment, the included solid state memory comprises Solid State Drive (SSD) technology. Typically, SSD technology utilizes flash memory cells. As is well known in the art, flash memory cells retain a binary value according to the range of electrons trapped and stored in the floating gate. A fully erased flash memory cell holds no or very little electrons in the floating gate. A particular binary value, such as binary 1 for Single Level Cell (SLC) flash, is associated with an erased flash memory cell. Multi-level cell (MLC) flash has a binary value of 11 associated with an erased flash memory cell. A flash memory cell traps a given range of electrons in the floating gate after a voltage higher than a given threshold voltage is applied to the control gate within the flash memory cell. Thus, another particular binary value, such as binary 0 for SLC flash, is associated with the flash memory cell being programmed (written). An MLC flash cell may have one of multiple binary values associated with the programmed memory cell depending on the voltage applied to the control gate.
Differences in technology and mechanisms between HDD technology and SDD technology can result in differences in input/output (I/O) characteristics of the data storage devices 176a-176 m. Generally, SSD technology provides lower read access latency than HDD technology. However, the write performance of SSDs is typically slower than the read performance and can be greatly affected by the availability of free, programmable blocks within the SSD. Since the write performance of SSDs is significantly slower than the read performance of SSDs, problems arise with respect to certain functions or operations that expect similar latencies as reads. In addition, long write latencies that affect read latencies may make scheduling more difficult. Thus, different algorithms may be used for I/O scheduling in the various data storage arrays 120a-120 b.
In one embodiment, where different kinds of operations, such as read operations and write operations, have different latencies, the I/O scheduling algorithm may isolate these operations and process them separately for scheduling. For example, in one or more of the storage devices 176a-176m, the device itself may batch write operations, such as by storing the write operations in an internal cache. When these caches reach a given occupancy threshold, or at some other time, the corresponding storage devices 176a-176m may flush the caches. Often, these cache flushes introduce additional latency to reads and/or writes at unpredictable times, thereby making it difficult to efficiently schedule operations. The I/O scheduler may then utilize characteristics of the storage device, such as the size of the cache, or the measured idle time, in order to predict when such cache flushes will occur. Knowing the characteristics of each of the one or more storage devices 176a-176m may result in more efficient I/O scheduling. In one embodiment, the global I/O scheduler 178 may detect that a given one of the one or more storage devices 176a-176m exhibits a long response time to I/O requests at an unpredictable time. In response, global I/O scheduler 178 may schedule a given operation on the given device to cause the device to resume exhibiting the expected behavior. In one embodiment, such an operation may be a cache flush command, a trim command, an erase command, and the like. More details regarding I/O scheduling will be discussed below.
Components of a network architecture
Again, as shown, network architecture 100 includes client computer systems 110a-110c interconnected to each other and to data storage arrays 120a-120b via networks 180 and 190. Networks 180 and 190 may include a variety of technologies including wireless connections, direct Local Area Network (LAN) connections, Wide Area Network (WAN) connections such as the internet, routers, storage area networks, ethernet, and so forth. Networks 180 and 190 may include one or more LANs that may also be wireless. Networks 180 and 190 may also include Remote Direct Memory Access (RDMA) hardware and/or software, transmission control protocol/internet protocol (TCP/IP) hardware and/or software, routers, repeaters, switches, grids, and/or others. In networks 180 and 190, protocols such as fibre channel, fibre channel over ethernet (FCoE), iSCSI, and the like may be utilized. Switch 140 may utilize protocols associated with networks 180 and 190. Network 190 may interface with a set of communication protocols for internet 160, such as Transmission Control Protocol (TCP) and Internet Protocol (IP), or TCP/IP. Switch 150 may be a TCP/IP switch.
Client computer systems 110a-110c represent a number of fixed or mobile computers, such as desktop Personal Computers (PCs), servers, server farms, workstations, laptop computers, handheld computers, servers, Personal Digital Assistants (PDAs), smart phones, and the like. In general, the client computer systems 110a-110c include one or more processors, including one or more processor cores. Each processor core includes circuitry to execute instructions according to a predefined general-purpose instruction set. For example, the x86 instruction set architecture may be selected. On the other hand, can selectOr any other general instruction set architecture. The processor cores may access the cache memory subsystem with respect to data and computer program instructions. The cache subsystem may be coupled to a memory hierarchy including Random Access Memory (RAM) and storage.
Each processor core and memory hierarchy within the client computer system may be connected to a network interface. In addition to hardware components, each client computer system 110a-110c may include a memory hierarchy stored withinA basic Operating System (OS). The base OS may represent any of a variety of operating systems, such asDART, etc. Thus, the base OS is operable to provide various services to end users, and to provide a software architecture operable to support the execution of various programs. In addition, each client computer system 110a-110c may include a hypervisor for supporting Virtual Machines (VMs). It is well known to those skilled in the art that virtualization may be utilized in desktop computers and servers to completely or partially separate software, such as an OS, from the hardware of the system. Virtualization may provide end users with the illusion of running multiple OSs on the same machine, each OS having its own resources, and having access to logical storage entities (e.g., LUNs) established on storage devices 176a-176m within each data storage array 120a-120 b.
Each data storage array 120a-120b may be used for the sharing of data between different servers, such as client computer systems 110a-110 c. Each data storage array 120a-120b includes a storage subsystem 170 for data storage. The storage subsystem 170 may include a plurality of storage devices 176a-176 m. These storage devices 176a-176m may all be SSDs. Controller 174 may include logic to process received read/write requests. For example, at least the algorithms briefly described above may be implemented in the controller 174. Random Access Memory (RAM)172 may be used for batch operations, such as received write requests. In various embodiments, non-volatile memory (e.g., NVRAM) may be used when a write operation (or other operation) is batched.
The base OS132, file system 134, any OS drivers (not shown), and other software stored in the storage medium 130 may provide functions for providing access to files, and the management of such functions. The base OS132 and OS drivers may contain instructions stored on the storage medium 130 that are executable by the processor 122 to perform one or more memory access operations in the storage subsystem 170 corresponding to a received request. The system shown in fig. 1 generally includes one or more file servers and/or block servers.
Each data storage array 120a-120b may be connected to a network 180 using a network interface 124. Similar to client computer systems 110a-110c, in one embodiment, the functionality of network interface 124 may be embodied on a network adapter card. The functionality of the network interface 124 may be implemented using both hardware and software. Random Access Memory (RAM) and Read Only Memory (ROM) may be included on the network card implementation of network interface 124. One or more Application Specific Integrated Circuits (ASICs) may be used to provide the functionality of the network interface 124.
In one embodiment, a data storage model may be built that attempts to optimize I/O performance. In one embodiment, the model is based at least in part on characteristics of storage devices within the storage system. For example, in a storage system utilizing solid state storage technology, characteristics of a particular device may be used to build a model about the device, which in turn may be used to inform a corresponding I/O scheduling algorithm. For example, if a particular storage device being used exhibits a relatively high write latency compared to a read latency, such characteristics may be considered in scheduling operations. Note that what is considered relatively high or low may vary with a given system, the type of data processed, the amount of data processed, the timing of the data, and so forth. Generally, the system can be programmed to determine what constitutes a low or high latency, and/or what constitutes a significant difference between the two.
In general, any model built for a device or computing system will be incomplete. In general, in a real system, there are too many variables to consider to fully simulate a given system. In some cases, incomplete, but still valuable models can be built. As described more fully below, embodiments are described herein in which memory devices are modeled based on characteristics of the memory devices. In various embodiments, I/O scheduling is performed based on certain predictions as to how a device behaves. From knowledge of the characteristics of the devices, certain device behaviors are more predictable than others. To more efficiently schedule operations for optimal I/O performance, greater control over the behavior of the system is required. Unexpected or unpredictable device behavior makes it more difficult to schedule operations. Algorithms are then built that attempt to minimize unpredictable or unexpected behavior in the system.
FIG. 2 provides a conceptual diagram of a device or system being simulated, and a pathway for minimizing unpredictable behavior within the device or system. In a first block 200, an ideal situation is described. Shown in block 200 is a system 204 and a model 202 of the system. In one embodiment, the system may be a single device. In another aspect, a system may comprise a number of devices and/or components. As described above, model 202 may not be a complete model of system 204 that it is attempting to simulate. However, the model 202 captures behaviors that are meaningful to the model. In one embodiment, model 202 attempts to model a computing storage system. In the ideal case 200, the actual behavior of the system 204 is "aligned" with the behavior of the model 202. In other words, the behavior of the system 204 is generally consistent with those behaviors that the model 202 is attempting to capture. While the system behavior 204 is consistent with the behavior of the model 202, the system behavior is generally more predictable. Thus, scheduling of operations (e.g., read operations and write operations) within the system may be performed more efficiently.
For example, if it is desired to optimize read response time, the reads may be scheduled so that they are serviced more timely if other behavior of the system is relatively predictable. On the other hand, if the system behavior is relatively unpredictable, then the confidence in the ability to schedule these reads to provide results is reduced when needed. Block 210 illustrates a situation where the system behavior (smaller circles) is not aligned with the behavior of the model of the system (larger circles). In this case, the system exhibits behavior beyond the scope of the model. Thus, the system behavior is less predictable and the scheduling of operations may become less efficient. For example, if solid state storage devices are used in a storage system and these devices independently initiate actions that cause the devices to service requests with greater (or unexpected) latency, then any operation that may be scheduled for the device may also experience greater or unexpected latency. One example of the operation of such a device is internal cache flushing.
To address the issue of unexpected or unscheduled system behavior and corresponding variable performance, the established model may include actions that it may take in order to restore the system to a less uncertain state. In other words, if the system begins to exhibit behavior that reduces the model's ability to predict system behavior, the model has incorporated therein certain actions that it may take in order to restore the system to a state where certain unexpected behavior is eliminated, or becomes unlikely. In the example shown, an action 212 is shown that attempts to "move" the system to a state that more closely aligns the model. Action 212 may be referred to as a "reactive" action or operation because it is performed in response to detecting system behavior outside of the model. After performing action 212, a more desirable state 220 may be reached.
While it is desirable to build models that can react to unpredictable behavior, thereby moving the system to a more desirable state, the existence of these unpredictable behaviors can still interfere with efficient scheduling operations. Accordingly, it is desirable to minimize the occurrence of unexpected behavior or events. In one embodiment, a model is built that includes actions or operations to avoid or reduce the occurrence of unexpected behavior. These actions may be referred to as "proactive" actions or operations, as these actions are typically performed proactively, in order to avoid the occurrence of certain actions or events, or to change the timing of certain actions or events. Block 230 in fig. 2 illustrates the situation where the system behavior (smaller circles) is within the behavior of the model (larger circles). Nonetheless, the model takes action 232 to move the system behavior in a manner that retains the system behavior within the model and may be more ideally aligned. The system behavior in block 230 may be viewed as approaching a state where it exhibits behavior outside of the model. In this case, the model may have some basis for believing that the system is approaching such a state. For example, if the I/O scheduler has transmitted many write operations to a given device, the scheduler may anticipate that the device may be performing an internal cache flush operation at some point in the future. The scheduler may proactively schedule cache flush operations for the device so that cache flushes occur at a time selected by the scheduler, rather than waiting for such an event to occur. Alternatively, or in addition to the above, such an active operation may be performed at any time. While cache flushes still occur, their occurrence is not unexpected and it has become a part of the overall scheduling by the scheduler and thus can be managed in a more efficient and intelligent manner. After this proactive action 232 is taken, the system is generally considered to be in a more predictable state 240. This is because the device is scheduled and cache flushed, and the device is less likely to autonomously initiate an internal cache flush (i.e., its cache has been flushed). By combining both reactive and proactive actions or operations within the model, higher system predictability may be achieved, and improved scheduling may likewise be achieved.
Referring now to FIG. 3, one embodiment of a method 300 for I/O scheduling to reduce unpredictable behavior is illustrated. The components included in the network architecture 100 and data storage arrays 120a-120b described above may generally operate in accordance with the method 300. The steps in this embodiment are shown in sequence. However, some steps may occur in a different order than shown, some steps may be performed simultaneously, some steps may be combined with other steps, and in another embodiment, some steps may not be present.
At block 302, the I/O scheduler schedules read and write operations for one or more storage devices. In various embodiments, the I/O scheduler may maintain a separate queue (either physically or logically) for each storage device. In addition, the I/O scheduler may include a separate queue for each class of operation supported by the corresponding storage device. For example, the I/O scheduler may maintain at least a separate read queue and a separate write queue for the SSD. At block 304, the I/O scheduler may monitor the behavior of the one or more storage devices. In one embodiment, the I/O scheduler may include a model (e.g., a behavior class model and/or an algorithm based at least in part on the model of the device) of the corresponding storage device and receive state data from the storage device for input into the model. By utilizing known and/or observed characteristics of the storage devices, models within the I/O scheduler may model and predict the behavior of the storage devices.
The I/O scheduler may detect characteristics of a given storage device that affect or may affect I/O performance. For example, various characteristics and states of the device, as well as various characteristics and states of I/O traffic, may be maintained, as described further below. By observing these characteristics and states, the I/O scheduler may predict that a given device will soon enter a state that exhibits high I/O latency behavior. For example, in one embodiment, the I/O scheduler may detect or predict that an internal cache flush will occur within the storage device that affects the response time of requests to the storage device. For example, in one embodiment, a storage device that is idle for a given amount of time may flush its internal cache. In some embodiments, whether a given device is idle may be based on observations outside of the device. For example, if an operation has not been scheduled for a device for a period of time, the device may be considered to be in an idle state for approximately the period of time. In such embodiments, the device may be busy in nature based on internally initiated activities within the device. However, such internally initiated activity is not considered when determining whether the device is idle. In other embodiments, internally initiated activity of a device may be considered when determining whether the device is idle or busy. By observing the behavior of the device, and noting that the device has been idle for a given period of time, the scheduler can predict when an internal cache flush is likely to occur. In other embodiments, the scheduler also has the ability to poll the various devices to determine various states or conditions of the various devices. In any event, the scheduler may be configured to determine non-scheduled behavior, such as the likelihood of internal cache flushes, and initiate proactive operations to prevent such behavior from occurring. In this way, the scheduler controls the timing of events in the device and system, thereby enabling better scheduling of operations.
Various characteristics may be used as a basis for making predictions about device behavior. In various embodiments, the scheduler may maintain a history of the state of currently pending operations and/or recent operations corresponding to the storage. In some embodiments, the I/O scheduler may be aware of the size of the cache within the device and/or the cache policy and keep a count of the number of write requests sent to the storage device. In other embodiments, other mechanisms may be used to determine the status of a cache within a device (e.g., direct polled access to the device). In addition, the I/O scheduler may track the amount of data in write requests sent to the storage device. The I/O scheduler may then detect when the number of write requests, or the total amount of data corresponding to the write requests, reaches a given threshold. If the I/O scheduler detects such a condition (conditional block 306), then the I/O scheduler may schedule a particular operation for the device in block 308. Such operations may generally correspond to the proactive operations described above. For example, the I/O scheduler may place a cache flush command in the corresponding queue to force the storage device to perform a cache flush at a time selected by the scheduler. On the other hand, the I/O scheduler may place a virtual read operation in the queue to determine if any cache flushes to the storage device have completed. In addition, the scheduler may query the devices for status information (e.g., idle, busy, etc.). These and other characteristics and operations are possible and contemplated. Additionally, in various embodiments, proactive operations may be scheduled when the SSD is brought back in place. In such embodiments, the SSD firmware and/or mapping table may enter a state requesting suspension or being persistently slow. The driver may simply be reset or powered off and on to clear the firmware of the barrier. However, if the condition is persistent (i.e., program bugs in firmware that cannot handle the current state of the mapping table), another repair is to reformat the drive to completely clear and reset the FTL, and then to repopulate it or use it for some other data.
The above actions may be performed to avoid or reduce the number of occurrences of unpredictable variable response times. At the same time, the I/O scheduler may detect the occurrence of any variable behavior of a given storage device at unpredictable times. If the I/O scheduler detects such a condition (conditional block 310), then the I/O scheduler may place the operation in the corresponding queue of the storage device in block 312. In this case, the operation generally corresponds to the reactive operation described above. The operations may be used to both reduce the amount of time the storage device provides the variable behavior and to detect the end of the variable behavior. In various embodiments, the proactive and/or reactive operations generally include any operation that is capable of bringing a device (at least in part) into a known state. For example, initiating a cache flush operation may cause the device to reach a flush cache state. A cache-empty device is less likely to initiate an internal cache flush than a device whose cache is not empty. Some examples of proactive and/or reactive operations include cache flush operations, erase operations, secure erase operations, trim operations, sleep operations, power on and off, reset operations.
Referring now to FIG. 4, one embodiment of a method 400 of isolating operations issued to a storage device is shown. The steps in this embodiment are shown in sequence. However, some steps may occur in a different order than shown, some steps may be performed simultaneously, some steps may be combined with other steps, and some steps may not be present in another embodiment. In various embodiments, the first kind of operation and the second kind of operation may be separated for scheduling. For example, in one embodiment, a first category of operation may be given a higher scheduling priority than a second category of operation. In such an embodiment, operations of the first type may be scheduled for relatively fast processing while operations of the second type are queued for later processing (in effect, processing of these operations is deferred). At a given time, processing of operations of the first kind may be suspended while previously queued (operations of the second kind) are processed. Subsequently, the processing of the second kind of operation may be stopped again, while returning the processing priority to the first kind of operation. When to suspend processing of one operation while to begin processing of another operation may be based on time periods, accumulated data, transaction frequency, available resources (e.g., queue utilization), any combination of the above, or any desired condition, as appropriate.
SSDs generally exhibit better performance than HDDs for random read and write requests. However, SSDs generally exhibit poorer performance for random write requests than read requests due to the characteristics of SSDs. Unlike HDDs, the relative latency of read requests and write requests is quite different, and write requests are typically significantly longer than read requests because it takes longer to program a flash memory cell than it does to read a flash memory cell. In addition, the latency of write operations is quite variable due to additional operations that need to be performed as part of the write. For example, for flash memory cells that have been modified, an erase operation may be performed prior to a write or program operation. In addition, the erase operation may be performed on a block basis. In this case, all flash memory cells within a block (erase segment) are erased together. Since the block is large, including multiple pages, the operation takes a long time. On the other hand, the FTL can remap the block to an erased block that has been erased. In either case, the additional operations associated with performing a write operation result in a write with a relatively high latency variability, and a relatively long latency, as compared to a read. Other storage devices may exhibit different characteristics depending on the type of request. In addition to the above, some memory devices may provide poor and/or variable performance if read requests and write requests are mixed. Thus, to improve performance, various embodiments may separate read requests and write requests. Note that although the above discussion specifically refers to read operations and write operations, the systems and methods described herein are applicable to other operations as well. In such other embodiments, other relatively high and low latency operations may likewise be identified and separated for scheduling. Additionally, in some embodiments, reads and writes may be classified as operations of a first class, while other operations, such as cache flush and trim operations, may be classified as corresponding to operations of a second class. Various combinations are possible and are contemplated.
At block 402, an I/O scheduler may receive and cache an I/O request to a given storage device of one or more storage devices. At block 404, low latency I/O requests may be sent to the storage device, typically in preference to high latency requests. For example, depending on the storage technology used by the storage device, read requests may have a lower latency than write requests and other command types, and thus may be issued first. Thus, write requests may be accumulated while read requests are given priority for sending (i.e., transmitted to the device prior to the write request). At some point in time, the I/O scheduler may stop sending read requests to the device and begin sending write requests. In one embodiment, the write request may be sent in the form of a series of multiple writes. Thus, the overhead associated with a write request may be amortized over multiple write requests. In this manner, high latency requests (e.g., write requests) and low latency requests (e.g., read requests) may be isolated to be handled separately.
In block 406, the I/O scheduler determines whether a particular condition exists indicating that a high latency request should be transmitted to the device. For example, in one embodiment, detecting such a condition may include detecting a given number of high-latency I/O requests, or that a number of corresponding data has accumulated and reached a given threshold. On the other hand, the rate of high-latency requests received may reach some threshold. Numerous such conditions are possible and are envisioned. In one embodiment, the high-latency request is a write request. If such a condition occurs (conditional block 408), then the I/O scheduler may begin sending high latency I/O requests to a given storage device in block 410. The number of such requests sent may vary with a given algorithm. The number may correspond to a fixed or programmable number of writes, or an amount of data. Alternatively, the write may be sent for a given period of time. For example, the period of time may continue until the particular condition ceases to exist (e.g., the rate of received write requests decreases), or a particular condition occurs. Alternatively, any combination of the above may be used to determine when to start and when to stop sending high latency requests to devices. In some embodiments, the first read request after a series of write requests may be slower than other read requests. To avoid scheduling "real" read requests in a transmit slot immediately following a series of write requests, the I/O scheduler may be configured to automatically schedule "virtual" reads following a series of write requests. In this case, a "real" read is a read of data requested by a user or application, while a "virtual" read is a manually generated read whose data can be discarded. In various embodiments, the write request is not determined to have completed before the end of the dummy read is detected. Additionally, in various embodiments, a cache flush may follow a series of write requests and be used to determine when a write is complete.
Referring now to FIG. 5, one embodiment of a method 500 of creating a model characterizing the behavior of storage devices in a storage subsystem is shown. The steps in this embodiment are shown in order. However, some steps may occur in a different order than shown, some steps may be performed simultaneously, some steps may be combined with other steps, and in another embodiment, some steps may not be present.
At block 502, one or more storage devices to be used in a storage subsystem may be selected. At block 504, various characteristics of each device may be identified, such as cache size, general read and write response times, storage topology, age of the device, and so forth. At block 506, one or more characteristics that affect I/O performance of a given storage device may be identified.
At block 508, one or more actions that affect the timing and/or occurrence of the characteristics of a given device may be determined. Examples may include cache flushes, and the performance of a given operation (such as an erase operation on an SSD). For example, a forced operation such as a cache flush may reduce the occurrence of variable response times of the SSD at unpredictable times. At block 510, a model may be established for each of the one or more selected devices based on the corresponding characteristics and actions. The model may be used in software, such as within an I/O scheduler within a storage controller.
Referring now to FIG. 6, a generalized block diagram of one embodiment of a storage subsystem is shown. In the illustrated embodiment, each of storage devices 176a-176m is shown within a single device group. However, in other embodiments, one or more of storage devices 176a-176m may be divided into two or more of device groups 173a-173 m. In the device units 600a-600w, one or more corresponding operation queues and state tables for each storage device may be included. These device units may be stored in RAM 172. For each device group 173a-173m, a corresponding I/O scheduler 718 may be included. Each I/O scheduler 178 may include a monitor 610 that tracks status data for each storage device within the corresponding device group. Scheduling logic 620 may make decisions as to which requests to send to the corresponding storage devices and determine the timing of sending the requests.
Referring now to FIG. 7, a generalized block diagram of one embodiment of an apparatus unit 600 is shown. Device unit 600 may include a device queue 710 and a table 720. Device queues 710 may include a read queue 712, a write queue 714, and one or more other queues, such as other operation queues 716. Each queue may contain a plurality of entries 730 for holding one or more corresponding requests. For example, a device unit corresponding to an SSD may include queues that hold at least read requests, write requests, trim requests, erase requests, and the like. The table 720 may contain one or more state tables 722a-722b, each containing a plurality of entries 730 for holding state data. In various embodiments, the queues shown in FIG. 7 may be physically and/or logically isolated. Also note that while the queues and tables are shown as including a particular number of entries, the entries themselves do not necessarily correspond to one another. In addition, the number of queues and tables may be different than that shown in the figure. Additionally, entries within or across a given queue may be prioritized. For example, read requests may have a high, medium, or low priority that affects the order in which the requests are sent to the devices. In addition, such priority may vary with various conditions. For example, a low priority read that reaches a certain age may have its priority raised. Numerous such prioritization schemes and techniques are known to those skilled in the art. All such approaches are contemplated and may be used in conjunction with the systems and methods described herein.
Referring now to FIG. 8, a generalized block diagram illustrating one embodiment of a state table, such as the state table shown in FIG. 7, is shown. In one embodiment, such a table may include data corresponding to the status, errors, wear level information, and other information for a given storage device. The corresponding I/O scheduler may access this information so that the I/O scheduler may better schedule I/O requests to the storage devices. In one embodiment, the information may include at least one or more of a device age 802, an error rate 804, a total number of errors detected on the device 806, a number of recoverable errors 808, a number of unrecoverable errors 810, an access rate 812 of the device, an age of data saved 814, a corresponding cache size 816, a corresponding cache flush free time 818, one or more allocation states for allocated space 820 and 822, a number of concurrencies 824, and an expected time for various operations 826. The allocation status may include full, idle, error, etc. The concurrency number for a given device may include information about the device's ability to handle multiple operations simultaneously. For example, if a device has 4 flash chips and each chip is capable of one transfer at a time, the device is capable of up to 4 parallel operations. Whether or not a particular operation is performed in parallel may depend on how the data is arranged on the device. For example, if the data within the device is arranged such that the data being accessed is requested to be on one chip, then the operation on the data may be performed in parallel with the request to access the data on a different chip. However, if the data accessed by the request is split across multiple chips, the requests may interfere with each other. Thus, the device is capable of a maximum of N parallel/concurrent operations (e.g., 4 parallel operations in the case of the device described above having 4 chips). On the other hand, the maximum number of concurrencies may be based on the kind of operation involved. In any event, the scheduler may consider the saved information representing the number of concurrencies N and the number of pending transactions M when scheduling operations.
Referring now to FIG. 9, another embodiment of a method 900 of adjusting I/O scheduling to reduce unpredictable variable I/O response times on a data storage subsystem is shown. The components included in the network architecture 100 and data storage arrays 120a-120b described above may generally operate in accordance with the method 900. For purposes of discussion, the various steps in this embodiment are shown in sequential order. However, some steps may occur in a different order than shown, some steps may be performed simultaneously, some steps may be combined with other steps, and in another embodiment, some steps may not be present.
In block 902, the I/O scheduler may monitor the behavior of each of the storage devices. Conditional block 904-908 illustrates one embodiment of detecting characteristics of a given device that may affect I/O performance as described above with respect to conditional step 306 of method 300. In one embodiment, if the I/O scheduler detects that a given storage device has exceeded a given idle time (conditional block 904), or detects that a corresponding cache has exceeded an occupancy threshold (conditional block 906), or detects that cached data has exceeded a data age threshold (conditional block 908), then the I/O scheduler may issue a forced (proactive) operation to the given storage device at block 910. In this case, the scheduler may predict that an internal cache flush will occur at an unpredictable time soon. To avoid the occurrence of such an event, the I/O scheduler proactively schedules operations to circumvent the event.
Note that circumvention of an event as described above may mean that the event does not occur, or does not occur at an unpredictable or unexpected time. In other words, the scheduler typically prefers that a given event occur according to the timing of the scheduler, rather than in other ways. In this sense, the occurrence of a long latency event is better than the unexpected occurrence of such an event due to the scheduler scheduling the event. Timers and counters within scheduling logic 620 may be utilized in conjunction with monitor 610 to perform at least these detections. One example of a forced operation sent to a given storage device may include a cache flush. Another example of a forced operation may include an erase request. As part of the scheduling, a forced operation may be sent from the I/O scheduler to a corresponding one of the device queues 710 within the corresponding device unit 600.
Referring now to FIG. 10, one embodiment of a method 1000 for maintaining low latency read operations on a shared data memory is shown. The components included in the network architecture 100 and data access arrays 120a-120b described above may generally operate in accordance with the method 1000. For purposes of discussion, the various steps in this embodiment are shown in sequential order. However, some steps may occur in a different order than shown, some steps may be performed simultaneously, some steps may be combined with other steps, and in another embodiment, some steps may not be present.
At block 1002, the amount of redundancy in the RAID architecture of the storage subsystem may be determined for use within a given device group 173. For example, for a 4+2RAID group, 2 of the storage devices may be used to hold erasure code (ECC) information, such as parity information. This information may be used as part of a reconstruct read request. In one embodiment, upon detecting that many storage devices exhibit variable I/O response times, reconstruct read requests may be used during normal I/O scheduling to improve performance of the device group. At block 1004, a maximum number of devices within the group of devices that may be simultaneously busy or exhibit variable response times is determined. This maximum number may be referred to as a target number. In one embodiment, the storage device is an SSD that may exhibit variable response times due to performing write requests, erase requests, or cache flushes. In one embodiment, the number of targets is chosen such that a reconstruction read is still possible.
In one embodiment, the I/O scheduler may detect that a condition is provided that adequately bases raising the target number to the point where the reconstruct request is no longer valid. For example, the number of pending write requests for a given device may reach a wait threshold (i.e., the write requests have been pending for a significant period of time and it is determined that these write requests should no longer wait). On the other hand, it is possible to detect a given number of write requests of higher priority that cannot be accumulated for later transmission, as described above. If the I/O scheduler detects such a condition (condition block 1006), then in block 1008, the I/O scheduler may increment or decrement the target number based on one or more detected conditions. For example, if an appropriate number of high priority write requests are pending, or some other condition occurs, the I/O scheduler may allow the number of targets to exceed the amount of redundancy supported. At block 1010, the I/O scheduler may determine that N storage devices within the device group exhibit variable I/O response times. If N is greater than the target number (conditional block 1012), then at block 1014, the storage device may be scheduled in a manner that decreases N. Otherwise, at block 1016, the I/O scheduler may schedule the request in a manner that improves performance. For example, the I/O scheduler may utilize the ability to reconstruct read requests as further described below.
Referring now to FIG. 11, one embodiment of a method 1100 of reducing the number of storage devices exhibiting variable I/O response times is illustrated. The steps in this embodiment are shown in order. However, some steps may occur in a different order than shown, some steps may be performed simultaneously, some steps may be combined with other steps, and in another embodiment, some steps may not be present.
At block 1102, the I/O scheduler may determine to reduce the number N of storage devices within the storage subsystem that execute high latency operations that may result in variable response times at unpredictable times. At block 1104, the I/O scheduler may select a given device to perform high latency operations. At block 1106, the I/O scheduler may pause the execution of the high-latency operation on the given device and decrement N. For example, the I/O scheduler may stop sending write requests and erase requests to a given storage device. In addition, the corresponding I/O scheduler may suspend execution of the sent write requests and erase requests. At block 1108, the I/O scheduler may initiate execution of a low-latency operation (such as a read request) on the given device. These read requests may include reconstruct read requests. Thus, the device comes out of the long latency response state and N is reduced.
Referring now to FIG. 12, one embodiment of a method for maintaining read operations with efficient latency on a shared data memory is shown. The components included in the network architecture 100 and data access arrays 120a-120b described above may generally operate in accordance with this method. For purposes of discussion, the various steps in this embodiment are shown in sequential order. However, some steps may occur in a different order than shown, some steps may be performed simultaneously, some steps may be combined with other steps, and in another embodiment, some steps may not be present.
The method of FIG. 12 may represent one embodiment of the various steps taken to perform step 1016 in method 1000. In block 1201, the I/O scheduler receives an original read request targeting a first device that is exhibiting variable response time behavior. The first device may exhibit a variable response time due to receiving a particular scheduled operation (i.e., a known cause), or due to some unknown cause. In various embodiments, what is considered a variable response time may be determined based at least in part on an expected latency for a given operation. For example, a response to a given read may be expected to occur over a given period of time, based on the characteristics of the device and/or recent operating history. For example, an average response latency may be determined for a device by an increment determined to reflect a range of allowable response latencies. The increment may be selected to account for 99% of the transactions, or any other suitable number of transactions. If no response is received within an expected period of time, initiation of a reconstruct read may be triggered.
Generally, whether to initiate a reconstruct read may be based on a cost-benefit analysis that compares the cost associated with performing the reconstruct read with the (potential) benefit of obtaining the results of the reconstruct read. For example, if no response to the original read request is received in a given device within a given period of time, it may be predicted that the device is performing an operation that would result in a latency that is longer than the latency of a reconstruct read that is to be initiated. Thus, a reconstruct read may be initiated. Such action may be taken, for example, to maintain a given level of read service performance. Note that other factors may also be considered when determining whether to initiate a reconstruct read, such as the current load, the type of request being received, the priority of the request, the status of other devices in the system, various characteristics as illustrated in fig. 7 and 8, and so forth. Further note that while a reconstruct read may be initiated due to the long response latency of the original read, it is expected that the original read request will actually complete. In fact, both the original read and the reconstructed read may complete successfully and provide results. Thus, there is no need to reconstruct the read in order to have the original request serviced. This is in contrast to the latency incurred by an error condition, such as the detection of a latency and some error indication that the transaction will (or may) not complete successfully. For example, a device timeout caused by an inability to read a given memory location represents a response that is not expected to complete. In this case, the read may need to be reconstructed in order to service the request. Thus, in various embodiments, the system actually includes at least two timeout conditions for a given device. The first timeout corresponds to a period of time after which a reconstruct read can be initiated, if not necessarily required. In this way, the reconstruct read can be incorporated into the scheduling algorithm as a normal part of the error-free dependent scheduling process. A second timeout occurring after the first timeout represents a period of time after which an error condition is considered to have occurred. In this case, a reconstruct read may also be initiated since the device that indicated the error is expected not to service the original read.
In view of the above, the I/O scheduler may then determine whether to initiate a reconstruct read corresponding to the original read (decision block 1202). Reconstructing the read typically causes one or more reads to be serviced by a device other than the first device. Many factors may be considered in determining whether to initiate a reconstruct read. Generally, the I/O scheduler performs a cost/benefit analysis to determine whether it is attempted to service the original read "better" with the first device or to issue a reconstruct read to service the original read "better". As described above, many factors may be considered when determining whether to initiate a reconstruct read. The "better" choice for a given situation may be variable, programmable, and dynamically determinable. For example, an algorithm may always favor faster read response times. In this case, it may be determined whether servicing of the reconstruct read can (or may) be completed before servicing the original read by the original device. On the other hand, the algorithm may determine that a reduced system load is favored at a given time. In this case, the I/O scheduler may choose not to initiate a reconstruct read with its overhead-even though the reconstruct read would complete faster than the original read. Further, in such a determination, a more subtle balance of speed and overhead may be used. In various embodiments, the algorithm is programmable with an initial weighting (e.g., always prefers to select speed regardless of load). Such weighting may be constant or may be programmable to dynamically vary according to various conditions. For example, the conditions may include time, rate of received I/O requests, priority of received requests, whether a particular task is detected (e.g., a backup operation is currently being performed), discovery of a failure, and so forth.
If the scheduler decides not to initiate a reconstruct read, then the read may be serviced by the original target device (block 1203). On the other hand, a reconstruct read may be initiated (block 1204). In one embodiment, the other devices selected to service the reconstruct read are those devices identified as exhibiting non-variable behavior. By selecting a device that exhibits non-variable behavior (i.e., more predictable behavior), the I/O scheduler is better able to predict how much time the device may take to service a reconstruct read. In addition to a given variable/non-variable behavior of a device, the I/O scheduler may also consider other aspects of each device. For example, in selecting a particular device for servicing a reconstruct read, the I/O scheduler may also evaluate the number of outstanding requests for a given device (e.g., the fullness of the queue for that device), the priority of the currently pending requests for the given device, the expected processing speed of the device itself (e.g., some devices may represent older or otherwise inherently slower technology than others), and so forth. Further, the scheduler may desire to schedule the reconstruct reads in such a way that the corresponding results from each device are returned at approximately the same time. In this case, the scheduler may not favor a particular device to service the reconstruct read if it is predicted that the processing time of the particular device will be significantly different from the processing time of other devices — even if the particular device is much faster than the other devices. Many such considerations and conditions are possible and are contemplated.
In one embodiment, the reconstruct read request may continue with the priority of the original read request. In other embodiments, the reconstruct read request may have a different priority than the original read request. If the I/O scheduler detects that the selected second (another) device receiving the corresponding reconstruct read request is now exhibiting variable response time behavior (conditional block 1205) and the second device is expected to remain variable until after the first device is expected to become non-variable (conditional block 1206), then the I/O scheduler may issue the original read request to the first device in block 1208. In one embodiment, a timer may be utilized to predict when a storage device exhibiting a variable response time will again provide a non-variable response time. Control of the method 1200 passes from block 1208 through block C to conditional block 1212. If the second device is not expected to remain in the variable state longer than the first device (conditional block 1206), then control flow of method 1200 passes to block 1210. At block 1210, the read request is serviced with the issued reconstruct read request.
If the I/O scheduler detects that a given variable device becomes a non-variable device (conditional block 1212), then the I/O scheduler issues the original read request to the given device in block 1214. The I/O scheduler may designate the given device as non-variable and decrement N (the number of storage devices detected that provide variable I/O response times). If the original read request ends before the alternate reconstruct read request (conditional block 1216), then the I/O scheduler services the read request with the original read request in block 1218. In various embodiments, the scheduler may remove the reconstruct read request. On the other hand, reconstruct read requests may be completed, but their data may simply be discarded. Otherwise, at block 1220, the I/O scheduler services the read request with the reconstruct read request and may remove the original request read (or discard the return data of the original read request).
Note that the above embodiments may include software. In such embodiments, the program instructions that implement the methods and/or mechanisms may be transmitted over or stored on a computer-readable medium. Various media may be utilized that are configured to store program instructions and include hard disks, floppy disks, CD-ROMs, DVDs, flash memory, programmable ROMs (proms), Random Access Memory (RAM), and various other forms of volatile or non-volatile memory.
In various embodiments, one or more portions of the methods and mechanisms described herein may form part of a cloud computing environment. In such embodiments, the resources may be provided as services over the Internet in accordance with one or more of a variety of models. Such models may include infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS). In IaaS, a computer infrastructure is provided in the form of a service. In this case, the computing device is typically owned and operated by a service provider. In the PaaS model, software tools and underlying equipment used by developers to build software solutions may be provided in the form of services and hosted by service providers. SaaS typically includes service provider authorization software as an on-demand service. The service provider may host the software or may deploy the software to the consumer for a given period of time. Numerous combinations of the above models are possible and are contemplated. In addition, while the above description focuses on networked storage devices and controllers, the methods and mechanisms described above may also be used in systems with directly attached storage devices, host operating systems, and the like.
Although the embodiments described above have been described in considerable detail, numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
Claims (20)
1. A computer system, comprising:
a data storage medium comprising a plurality of storage devices configured to hold data; and a data storage controller coupled to the data storage medium;
wherein the data storage controller is configured to:
receiving a request targeting a data storage medium, the request including a first kind of operation and a second kind of operation;
scheduling requests of a first kind for immediate processing by the plurality of storage devices; and
the second type of request is queued for later processing by the plurality of storage devices.
2. The computer system of claim 1, wherein the first category of operations corresponds to operations having an expected lower latency and the second category of operations corresponds to operations having an expected higher latency.
3. The computer system of claim 2, wherein the first type of operation corresponds to a read request and the second type of operation corresponds to a write request, a cache flush, or a trim operation.
4. The computer system of claim 1, wherein the plurality of storage devices are solid state storage devices, and wherein each of the solid state storage devices processes read requests having a lower latency and processes write requests having a higher latency.
5. The computer system of claim 1, wherein after queuing the plurality of requests corresponding to the second category of operation, the storage controller is configured to stop processing requests corresponding to the first category of operation and process only those requests corresponding to the second category of operation.
6. The computer system of claim 5, wherein the plurality of requests corresponding to the second category of operations correspond to write requests, and wherein the storage controller is configured to automatically schedule virtual reads to a given device to follow the write requests.
7. The computer system of claim 5, wherein the storage controller is configured to suspend processing of those requests corresponding to the second category of operation in response to detecting that another one of the plurality of devices exhibits a longer response latency.
8. The computer system of claim 2 wherein the plurality of requests corresponding to the second category of operations correspond to write requests, and wherein the storage controller is configured to stream said write requests for processing as a plurality of discrete units of data, the storage controller being capable of stopping said streaming after any of said units.
9. The computer system of claim 1, wherein the storage controller is configured to immediately stop processing requests corresponding to the first category and start processing queued requests corresponding to the second category in response to detecting the given condition.
10. The computer system of claim 9, wherein the condition comprises at least one of: a certain number of requests of the second kind have been queued, a period of time has elapsed since the processing of the requests of the second kind, and no new requests have been received for a given period of time.
11. A method for use in a computing system, the method comprising:
receiving a request targeting a data storage medium, the data storage medium comprising a plurality of storage devices configured to hold data, the request comprising a first kind of operation and a second kind of operation;
scheduling requests of a first kind for immediate processing by the plurality of storage devices; and
the second type of request is queued for later processing by the plurality of storage devices.
12. The method of claim 11, wherein the first category of operations corresponds to operations having an expected lower latency and the second category of operations corresponds to operations having an expected higher latency.
13. The method of claim 12, wherein the first type of operation corresponds to a read request and the second type of operation corresponds to a write request, a cache flush, or a trim operation.
14. The method of claim 11, wherein the plurality of storage devices are solid state storage devices, and wherein each of the solid state storage devices processes read requests having a lower latency and processes write requests having a higher latency.
15. The method of claim 11, wherein after queuing the plurality of requests corresponding to the second category of operation, the method includes ceasing processing of the requests corresponding to the first category of operation and processing only those requests corresponding to the second category of operation.
16. The method of claim 15, wherein the plurality of requests corresponding to the second category of operations correspond to write requests, and wherein the method includes automatically scheduling a dummy read to a given device to follow the write requests.
17. The method of claim 15, wherein the method includes aborting processing those requests corresponding to the second category of operation in response to detecting that another one of the plurality of devices exhibits a longer response latency.
18. The method of claim 12, wherein the plurality of requests corresponding to the second type of operation correspond to write requests, and wherein the method includes streaming the write requests for processing as a plurality of discrete units of data, the method being capable of stopping the streaming after any of the units.
19. A computer readable storage medium containing program instructions, wherein when executed by a processing device, the program instructions are operable to:
receiving a request targeting a data storage medium, the data storage medium comprising a plurality of storage devices configured to hold data, the request comprising a first kind of operation and a second kind of operation;
scheduling requests of a first kind for immediate processing by the plurality of storage devices; and
the second type of request is queued for later processing by the plurality of storage devices.
20. The computer-readable medium of claim 19, wherein the first category of operations corresponds to operations having an expected lower latency and the second category of operations corresponds to operations having an expected higher latency.
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US12/882,877 | 2010-09-15 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| HK1188854A true HK1188854A (en) | 2014-05-16 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20220244865A1 (en) | Scheduling Of Reconstructive I/O Read Operations In A Storage Environment | |
| US10228865B1 (en) | Maintaining a target number of storage devices for variable I/O response times in a storage system | |
| US10126982B1 (en) | Adjusting a number of storage devices in a storage system that may be utilized to simultaneously service high latency operations | |
| US9684460B1 (en) | Proactively correcting behavior that may affect I/O performance in a non-volatile semiconductor storage device | |
| HK1188854A (en) | Scheduling of i/o writes in a storage environment | |
| HK1188853A (en) | Scheduling of reconstructive i/o read operations in a storage environment | |
| HK1188856A (en) | Scheduling of i/o in an ssd environment |