CN115495394B - Data prefetching method and data prefetching device - Google Patents
Data prefetching method and data prefetching deviceInfo
- Publication number
- CN115495394B CN115495394B CN202211102695.5A CN202211102695A CN115495394B CN 115495394 B CN115495394 B CN 115495394B CN 202211102695 A CN202211102695 A CN 202211102695A CN 115495394 B CN115495394 B CN 115495394B
- Authority
- CN
- China
- Prior art keywords
- data
- access
- mode
- history
- prefetching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0862—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0877—Cache access modes
- G06F12/0882—Page mode
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
A data prefetching method and a data prefetching device are provided. The data prefetching method comprises the steps of receiving a first access instruction, determining prefetching data according to a historical access mode of the first storage area, wherein an address corresponding to the first access instruction belongs to the first storage area of a memory, indicating the historical access data of the first storage area by the historical access mode, and sending the prefetching instruction to a memory controller according to the prefetching data, wherein the prefetching data comprises part or all of the historical access data, and the prefetching instruction is used for storing the prefetching data in a prefetching cache area. According to the embodiment of the application, part or all of the historical access data of the area to which the address corresponding to the access instruction belongs is stored in the cache area in advance, so that the access instruction to be executed can acquire the data to be used from the cache area. The method can enable the prefetched data to cover the data required by the instruction to be executed as much as possible with the least operation times, thereby being beneficial to reducing the power consumption of the memory.
Description
Technical Field
The present application relates to the field of data prefetching technologies, and in particular, to a data prefetching method and a data prefetching device.
Background
The rapid development of modern processor speeds and slow development of memory speeds has resulted in processors spending a significant amount of time waiting for memory data to return. The data to be used is stored into the buffer memory area in advance through data prefetching, so that the time delay caused by data waiting can be avoided. The related art improves the coverage of data prefetching by issuing a large number of prefetches, which may negatively affect memory power consumption.
Disclosure of Invention
The application provides a data prefetching method and a data prefetching device. Various aspects of embodiments of the application are described below.
The first aspect provides a data prefetching method, which comprises the steps of receiving a first access instruction, determining prefetching data according to a history access mode of a first storage area, wherein an address corresponding to the first access instruction belongs to the first storage area of a memory, the history access mode is used for indicating the history access data of the first storage area, the prefetching data comprises part or all of the history access data, and sending the prefetching instruction to a memory controller according to the prefetching data, wherein the prefetching instruction is used for storing the prefetching data in a prefetching cache area.
The second aspect provides a data prefetching device, which comprises a prefetching cache area and a processor, wherein the prefetching cache area is used for storing prefetching data, the processor is used for receiving a first access instruction, an address corresponding to the first access instruction belongs to a first storage area of a memory, the prefetching data is determined according to a historical access mode of the first storage area, the historical access mode is used for indicating historical access data of the first storage area, the prefetching data comprises part or all of the historical access data, and the prefetching instruction is sent to a memory controller according to the prefetching data and is used for storing the prefetching data to the prefetching cache area.
According to the embodiment of the application, part or all of the historical access data of the area to which the access instruction belongs is prestored in the cache area according to the historical access mode of the area to which the address corresponding to the access instruction belongs, so that the prefetched data can cover the data required by the instruction to be executed as much as possible with the least operation times, the accuracy and coverage of the prefetched data are ensured, and the power consumption of the memory is reduced.
Drawings
Fig. 1 is a flow chart of a data prefetching method according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a data prefetching apparatus according to an embodiment of the application.
Fig. 3 is a schematic diagram of a data prefetching apparatus according to another embodiment of the application.
FIG. 4 is a schematic diagram of a workflow of the data prefetching apparatus 300.
Fig. 5 is a schematic diagram of a data prefetching system according to an embodiment of the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments.
During operation, the processor generally needs to access or store data or instructions in the memory to realize corresponding functions. The memory controller may act as a bridge for processor interaction with memory. For example, the memory controller may perform operations such as reading and/or writing according to an access instruction (abbreviated as a memory access instruction) for a memory sent by the processor.
The rapid growth of modern processor speeds and slow growth of memory speeds have resulted in processors spending a significant amount of time waiting for memory data to return, thereby causing processor stalls or running delays. The data prefetching technique can avoid cache failure when the data is actually used by storing future possible access data in the prefetching cache area in advance, thereby eliminating processor delay caused by data access delay.
The principle of the data prefetching technique is mainly that program execution has locality. For example, locality phenomena may include temporal locality and spatial locality. Temporal locality refers to the fact that the data that is about to be used may be the data that is currently being used. The time locality may be utilized to store currently accessed data in a cache for future use. E.g., for, while loops, recursive calls, etc., in the c++ language. Spatial locality means that the data to be used may be adjacent or close in address space to the data currently in use.
Based on temporal locality and spatial locality, data pre-fetching techniques may typically store in-use data and data near an address corresponding to the in-use data to a cache region for use by subsequent access instructions. The related art ensures that prefetched data can cover more data that can be used by subsequent access instructions by issuing a large number of prefetch operations.
Excessive prefetching reduces the accuracy of data prefetching, i.e., more data may be prefetched than is not needed for subsequent accesses. This approach improves the coverage of data prefetching by sacrificing accuracy. Excessive false prefetching may negatively impact the power consumption of the memory.
In some power consumption sensitive scenarios, the power consumption of data prefetching is very important. For example, mobile devices are relatively sensitive to the power consumption of data prefetching. Taking a mobile device as an example, the power consumption situation of data prefetching will be described below.
With the large-scale popularization of mobile platforms, the problem of power consumption of mobile devices has become a focus of attention. Because the battery technology development is slower, the low-power design becomes a mode for effectively improving the duration of the mobile equipment.
In the mobile device, the power consumption of the memory occupies a large part of the overall power consumption, so that the overall power consumption of the mobile phone on-chip processor can be effectively reduced by optimizing the power consumption of the memory. The prefetch strategy in the related art can trade out a larger number of prefetches for higher coverage, but too many erroneous prefetches may increase DMC bandwidth, increase row activation times per unit time, and decrease DMC idle rate, which may lead to a significant increase in power consumption.
It can be seen that increasing the accuracy of data prefetching helps to reduce the negative impact of data prefetching on power consumption. In some embodiments, by fetching instruction addresses, the contents of the instructions to be executed may be obtained, thereby locating the data to be accessed to achieve high-precision data prefetching. For example, in the last level of cache (cache), the instruction address from which the access instruction is issued may be obtained, thereby resolving accesses to the same address that originate from different instructions. However, devices accessing memory include a variety of devices such as a central processing unit (central processing unit, CPU), a graphics processor (graphic processing unit, GPU), and some devices have difficulty obtaining instruction addresses.
In order to solve the above problems, an embodiment of the present application provides a data prefetching method. According to the historical access mode of the area where the address corresponding to the access instruction belongs, part or all of the historical access data of the area where the access instruction belongs is pre-stored in the cache area, so that the access instruction to be executed can quickly acquire the data to be used from the cache area. According to the method, under the condition that no instruction address exists, the prefetched data can cover the data required by the instruction to be executed as much as possible with the least operation times, and the accuracy and coverage of data prefetching are ensured, so that the power consumption of the memory is reduced.
Fig. 1 is a flowchart illustrating a data prefetching method according to an embodiment of the present application. The data prefetching method 100 includes step S110, step S120, and step S130.
In step S110, a first access instruction is received, where an address corresponding to the first access instruction belongs to a first storage area of the memory.
The first memory access instruction may be sent by a device accessing the memory. For example, the device accessing memory may be a CPU, GPU, or the like.
The first memory access instruction may be used to write data or instructions into the memory, or may be used to access data or instructions in the memory. As an implementation, during the process of accessing the memory, the first memory instruction may be used to indicate the location of the data or instruction to be accessed. For example, the first memory instruction may include an address or address range of data to be accessed, and an access type. As one example, the first access instruction may include a read enable instruction and an address of data to be read.
The address corresponding to the first access instruction belongs to a first storage area of the memory. That is, the location to be accessed by the first access instruction belongs to the first storage area. The address corresponding to the first access instruction in the memory may be a target address included in the first access instruction, or may be a target address mapped by the first access instruction.
The first storage area is a storage area in the memory. For example, the first storage area may be a storage area of any size in the memory. As an example, the first storage area may be one memory page in the memory. As another example, a memory page in the memory may include a plurality of memory blocks, and the first memory region may be one of the plurality of memory blocks. As yet another example, the first storage area may be one or more memory rows in a memory page.
There are various ways to locate the location of the first storage area during the data pre-fetching process. As one implementation, the location of the first storage area may be located by an index. For example, when the first storage area is a page in the memory, the memory page number may be used as an index of the first storage area.
In step S120, pre-fetch data is determined according to a history access mode of the first storage area, where the history access mode is used to indicate history access data of the first storage area, and the pre-fetch data includes part or all of the history access data.
The historical access pattern of the first storage region may be used to indicate historical access data of the first storage region. As one implementation, the historical access pattern of the first storage region may indicate historical access data of the first storage region via a historical access address. In some cases, the particular value held in the historical access address may be modified or updated, but the historical access address is typically unchanged. The historical access data may also be modified or updated values stored in the historical access address.
Taking the history access data as a as an example, in the a++ operation, the value of a is increased by 1 every time the program is executed. The updated value of a is typically stored at the same address, overriding the original value of a. Thus, the history access data a can be indicated by the storage address of a.
The historical access data can be all accessed data of the first storage area, or can be accessed data of the first storage area by one application. The historical access data may also be data that is accessed in the first memory region during a historical time window. As one implementation, a historical time window may be partitioned according to the number of accesses. For example, every 3000 accesses act as a historical time window.
Correspondingly, the historical memory access mode can be composed of memory access behaviors aiming at the first memory area, can be composed of memory access behaviors of an application on the first memory area, and can be composed of memory access behaviors aiming at the first memory area in a historical time window.
Different applications or programs may access the data of the first storage area, but the data of the first storage area may not be exactly the same by different applications or programs. Thus, the historical access pattern may also include multiple patterns. For example, the plurality of modes may be access behaviors of the plurality of applications to the first storage area. Wherein each mode may indicate the behavior of a first storage area accessed by an application. That is, the history access data may include multiple sets of history data, each of which may access the first storage area for one application.
The prefetched data may be determined based on the historical access pattern of the first memory region, that is, based on the historical access behavior of the first memory region. As one implementation, the prefetch data may be determined from historical access data of the first storage region.
In order to make the prefetched data cover all the data to be used as much as possible, that is, to improve the coverage rate of the prefetched data, the prefetched data may be all the data in the history access data. But when a large amount of data is included in the history access data, the prefetch data may be part of the history access data. For example, the pre-fetch data may be a quantity of data adjacent to the currently accessed data in the history access data. As one example, the prefetch data may be historical access data contained in the data of the row and two adjacent rows to which the current access data is located. As another example, the amount of prefetched data may be determined by the size of the cache region used to store the prefetched data and the power consumption requirements. Wherein the larger the amount of prefetch data, the greater the power consumption consumed by the prefetch operation may be. As another example, the prefetched data may be all historical data of the application corresponding to the current access data in the historical access data.
In step S130, a prefetch instruction is sent to the memory controller according to the prefetch data, where the prefetch instruction is used to store the prefetch data in the prefetch buffer.
And generating a prefetch instruction according to the address corresponding to the prefetch data. As one implementation, the prefetch instruction may include an address corresponding to the prefetch data and a read enable command. The prefetch instruction is sent to the memory controller, and prefetch data can be read from the memory.
The prefetch buffer may be used to store prefetch data read from memory. The pre-fetch buffer may be part of an existing buffer or a newly added dedicated buffer.
According to the historical access mode of the area where the address corresponding to the access instruction belongs, part or all of the historical access data of the area where the access instruction belongs is pre-stored in the cache area, so that the access instruction to be executed can quickly acquire the data to be used from the cache area. According to the method, under the condition that no instruction address exists, the prefetched data can cover the data required by the instruction to be executed as much as possible with the least operation times, and the accuracy and coverage of data prefetching are ensured, so that the power consumption of the memory is reduced.
The additional data prefetch operation may be burdened with power consumption, and thus, the data prefetch operation may be reduced as much as possible while guaranteeing the data coverage. As one implementation, the prefetch data may be determined based on a historical access pattern and a current access pattern.
The current access mode may be an access mode of the currently executing instruction to the first storage area, or may be an access mode of the currently executing instruction to the first storage area within a certain time or a certain number of accesses from the current time. The certain time and the certain number of accesses may be determined according to the actual usage scenario. For example, a certain number of accesses may be 3000. The current access mode may also be an access mode for the first storage area within a time window in which the current time is located.
The current access mode is used to indicate current access data of the first memory region. The current access data may be data accessed in the first memory area in the current access mode. The current access data may be quite different from the historical access data. For example, the historical access data and the current access data correspond to different programs, and the data accessed by the processor during the process of running the different programs is also different. The current access data may also be part or all of the historical access data. For example, when the same program is executed again by the processor, if the current time is the time when the execution of the program ends, the current access data may be the same as the historical access data corresponding to when the program was last executed by the processor. If the current time is in the process of executing the program, the current access data can be part of the corresponding historical access data when the processor executes the program last time. As another example, some of the data accessed when the processor executes different programs may be identical.
As previously described, the pre-fetch data may be part or all of the history access data. Considering that some of the prefetched data determined by the method described above may be identical to the data corresponding to the current access mode, the current access data in the prefetched data determined by the method described above is rejected as final prefetched data. That is, the prefetch data is data in which some or all of the history access data is data other than the current access data.
The program run by the processor may be updated or modified throughout the life cycle, or a new program may be added. The memory access behavior of the processor to the first memory area may also vary. Therefore, it is necessary to update the history access pattern to improve the coverage of the data prefetch. As an implementation, the historical memory access pattern of the first memory region may be updated based on the current memory access pattern of the first memory region.
In some embodiments, the historical access patterns may include a historical pattern and a historical candidate pattern. Based on the current memory pattern, the history pattern, and the history candidate pattern, an updated history pattern and history candidate pattern may be determined based on scoring rules.
The updated history pattern may be obtained by the following steps. First, a union of the current access pattern and the history candidate pattern and a union of the current access pattern and the history pattern may be obtained. For example, the union of the current access mode and the history candidate mode may be the union of the address contained in the current access mode and the address contained in the history candidate mode. That is, the union of the current memory pattern and the history candidate pattern includes both the data indicated by the current memory pattern and the data indicated by the history candidate pattern.
Secondly, the two union sets can be scored according to a scoring rule. The scoring result can be related to the coverage rate and accuracy rate of the access pattern to be evaluated on the actual access behavior. As an implementation manner, the access behavior included in the current access mode may be used as an actual access behavior, and the two union sets may be scored to evaluate the coverage rate and accuracy of the new access mode for the actual access. As one example, the access pattern score for the complete coverage actually occurring access is highest. As another example, access patterns may be scored at an accuracy rate, with higher accuracy rates scoring higher. As yet another example, the number of accesses covered by the access pattern may score the access pattern, the greater the number of accesses covered, the higher the score.
Finally, the union with the high score in the union can be used as an updated history mode. Since the union sets can all contain the current access mode, the union sets can all completely cover the actual access behavior. The fewer prefetch operands, the higher the accuracy with full coverage of all access behaviors. That is, in one implementation, the smaller aggregate score is higher for the aggregate, which may be used as an updated history pattern.
The updated history candidate pattern may be obtained by the following steps. First, an intersection of the current access pattern and the history candidate pattern and an intersection of the current access pattern and the history pattern may be obtained. For example, the intersection of the current access pattern and the history candidate pattern may be the intersection of the address contained in the current access pattern and the address contained in the history candidate pattern. That is, the intersection includes common data among the data indicated by the current access pattern and the historical candidate pattern.
Secondly, the two intersections can be scored according to a scoring rule. The scoring result can be related to the coverage rate and accuracy rate of the access pattern to be evaluated on the actual access behavior. As one implementation, the scoring rules may be the same as those described previously.
Finally, the intersection with the high score in the intersections can be used as the updated history candidate pattern. Because the data indicated by the intersection set may be common data among the data indicated by the multiple modes, when the data in the intersection set is taken as the prefetched data, the probability of hitting the data to be accessed is higher, that is, the accuracy is higher. In this case, the higher the coverage of the larger set in the intersection described above. That is, in one implementation, the larger set of intersections described above may score higher, and the set may be used as an updated historical candidate pattern.
The embodiment provided by the application not only can determine the prefetched data according to the historical memory access mode, but also can learn the memory access behavior to generate the historical memory access mode. As one implementation, access behaviors may be learned through the use of a combination of filter tables, aggregate tables, and history pattern tables. The learning process of the access mode is described below by taking the page numbers of the memory pages as indexes of the filtering table, the aggregation table and the history mode table as examples.
In some cases, the memory access behavior may include some random, sporadic accesses. These access behaviors are difficult to learn their access laws and learning them also increases overhead. As one implementation, the use of a filtering table may filter scattered, random, hard to learn access behavior. For example, when the access to the first storage area is less than a first threshold, the access behavior is recorded in a filter table. As an example, the first threshold may be indicative of the number of accesses to the first storage area, or may be indicative of the number of accesses to different locations of the first storage area.
Before the memory access behavior is recorded in the filtering table, whether the memory access behavior aiming at the first storage area is recorded in the filtering table can be firstly queried. That is, a query is made as to whether access to the first storage region has occurred recently. For example, the memory page number can be obtained through the address corresponding to the access instruction, and whether the record of the access behavior of the page exists can be obtained by looking up the table entry in the filtering table according to the page number. If the table entry corresponding to the page exists in the filtering table, the table entry corresponding to the page can be updated according to the current access behavior. If the table entry corresponding to the page does not exist in the filtering table, a record of the current access behavior can be newly allocated in the filtering table.
In order to ensure that the filtering table can comprehensively record effective access behaviors, table entries of the filtering table can be managed. For example, when the last access to an entry exceeds a preset time from the current access time interval, the entry is reset to record a new access behavior. That is, the memory access behavior corresponding to the table entry is random and scattered access, and the content in the table entry can be discarded. As one example, the preset time may be a time of 3000 accesses. For another example, when the filter table is full and a new access pattern needs to be recorded, entries in the filter table may be replaced with the new access pattern. As one example, the replacement rules for entries in the filter table may employ SRRIP policies. SRRIP policies can replace entries according to access frequency.
Since each entry in the filter table corresponds to an overhead, the overhead is saved by controlling the number of entries in the filter table. As one implementation, the filter table may be a 128-term look-up table. As one example, the structure of each entry in the filter table may be as shown in table 1. Wherein each overhead is 7.875B, which requires 0.98KB in total. Taking a dynamic random access memory with four channels (channels) as an example, the total overhead of the filtering table is 3.93KB.
Table 1 table entry structure of filtering table
| valid | tag | offsets | wrFlag | pos | last Access | may reuse | rrpv |
| 1 | (mid(3bit)+pn(20bit))23bit | 18 | 1 | 2 | 16 | 1 | 2 |
See table 1, where valid is the valid bit. For example, when valid is 1, the entry content is indicated as valid. When valid is 0, indicating that the entry is invalid in content, it can be used to store a new entry. A tag may represent a page number, i.e., an index of the entry. offsets may be used to record the offset of the address corresponding to the access behavior in the memory page. For example, the first threshold may be a value for offsets. wrFlag may be a flag bit for write access behavior. pos, lastAccess may be associated with a location corresponding to the last memory access activity. The make reuse, rrpv may reflect the frequency with which the entry was called or the probability of being accessed again. Wherein rrpv may replace the flag bit of the policy for SRRIP.
When an access associated with an item in the filter table exceeds a first threshold, the item may be moved into the aggregate table. That is, the accesses associated with an item may be non-random, non-sporadic accesses.
The aggregate table may be used to observe memory access behavior over a period of time. As one implementation, after receiving the access instruction, a match may first be made in the aggregate table. If the aggregate table includes an entry associated with the access instruction, that is, if the aggregate table includes an entry of a memory page where an address corresponding to the access instruction is located, the entry may be updated according to the access instruction. If the aggregate table does not include an entry associated with the memory instruction, the memory instruction may be forwarded to a filter table.
For ease of use, entries in the aggregate table are typically managed. For example, SRRIP replacement algorithms may be used to manage the aggregate table in a cache management manner.
In some embodiments, the structure of each entry in the aggregate table may be as shown in table 2.
Table 2 table entry structure of aggregation table
See table 2, where valid is the valid bit. For example, when valid is 1, the entry content is indicated as valid. When valid is 0, indicating that the entry is invalid in content, it can be used to store a new entry. A tag may represent a page number, i.e., an index of the entry. wrFlag may be a flag bit for write access behavior. LAST ACCESS may be the data location corresponding to the last memory access activity. The make reuse, rrpv may reflect the frequency with which the entry was called or the probability of being accessed again. Wherein rrpv may replace the flag bit of the policy for SRRIP.
Pattern may be used to indicate the data accessed in the current mode. As one implementation, the accessed data may be indicated in a bitmap fashion. For example, a page in memory is divided into 16 data blocks, and 16-bit binary data sequences may be used to represent the memory access condition of the 16 data blocks. As an example, when 16 is the fourth bit of binary data is 1, it may indicate that the data in the fourth data block in the memory page is accessed. 16 is 0, which indicates that the data in the sixth data block in the memory page is not accessed.
The history pattern table may be used to store access patterns that have been learned. And in response to the number of accesses in the interval between the current access and the last access to the first storage area in the aggregation table being greater than a second threshold, moving the access storage mode in the aggregation table to a history mode table. That is, entries in the aggregate table satisfying the above conditions may be considered to have been learned at this stage.
On the other hand, the history pattern table may also be used to validate the prefetched data. As one implementation, after receiving the memory instruction, a history pattern table may be searched for matches. If there is an entry in the history pattern that matches it, then the history access data for the page may be determined based on the access pattern of the entry.
Prefetch data may be determined based on the historical access data. For example, all the history access data may be regarded as the prefetch data. For another example, there may be entries in the aggregate table that match the access instruction, and the prefetch data may be data in the history access data other than the current access data indicated by the entries in the aggregate table. Taking the aforementioned memory page divided into a plurality of data blocks as an example, the prefetched data includes data blocks other than the currently accessed data block among the history access data blocks. As one example, the address of the prefetched data block may be determined from the address range of each data block when the data block is divided by the memory page.
In some embodiments, the structure of each entry in the history pattern table may be as shown in Table 3.
TABLE 3 History mode entry Structure
Referring to table 3, the history pattern table includes two access patterns, namely a history pattern and a history candidate pattern. Where tags may represent page numbers, i.e., the index of the entry. wrFlags may be a flag bit for write access behavior. The read counts may be the number of reads. wrType may be a write type of write behavior in the memory behavior. The make reuse may reflect the probability that the entry is again accessed. high score high may be used to flag updated history patterns and score candidate may be used to flag updated history candidate patterns. Pattern may be used to indicate the data accessed in the history pattern. PATTERN CANDIDATE may be used to indicate the data accessed in the history candidate pattern.
In order to facilitate understanding, the following describes a complete and coherent data prefetching method provided by the embodiment of the application from two angles of data prefetching and learning access mode in combination with the flow process of access instructions.
The description is presented in terms of data prefetching. When the processor needs to access the data in the memory in the running process, a corresponding memory access instruction can be generated. Depending on the memory access instruction, it may be queried whether the prefetch buffer (storing data that has been prefetched prior to the current memory access instruction) already has data to be used. If there is data to be used in the prefetch buffer, the data may be returned to the processor. If the data to be used does not exist in the prefetching cache region, the access instruction can be forwarded to the memory controller so as to read the data to be used from the memory.
Meanwhile, according to the access instruction, the history mode table and the table items in the aggregation table can be queried to prefetch data for the subsequent instruction in advance. For example, if an entry corresponding to the access instruction exists in the history mode table, the prefetch data may be determined according to the history access data indicated in the entry. For another example, an entry corresponding to the access instruction exists in the history mode table, and a corresponding entry also exists in the aggregation table, so that the prefetched data can be determined according to the history access data indicated by the entry in the history mode table and the current access data indicated by the entry in the aggregation table. As one example, the pre-fetch data may be other data in the historical access data than in the current access data.
Finally, the prefetch instruction corresponding to the prefetch data can be sent to the memory controller to store the prefetch data into the prefetch buffer.
The description is presented in terms of learning access mode. And the aggregation table receives the access instruction aiming at the memory, and judges whether the aggregation table comprises the access record of the page where the target address of the access instruction is located according to the address corresponding to the access instruction. If the table entry corresponding to the access record of the page where the target address of the access instruction is located exists in the aggregation table, the table entry can be updated according to the access instruction. If the table entry corresponding to the access record of the page where the target address of the access instruction is located does not exist in the aggregation table, the access instruction can be forwarded to the filtering table.
And after the filtering table receives the access instruction, judging whether the filtering table has a corresponding table entry. If the table entry corresponding to the access instruction exists in the filtering table, the table entry can be updated according to the access instruction. If the filtering table does not have the table entry corresponding to the access instruction, a table entry can be newly established to store the access behavior corresponding to the access instruction.
The table entry management and circulation modes of the aggregation table and the filtering table are as follows.
When the last access distance of an item in the aggregation table exceeds the access times of the current access interval by a second threshold, the access behavior learning of the memory page corresponding to the item is considered to be completed at the present stage, and the item can be moved into the history mode table. In addition, when the entries in the aggregation table are full and the newly generated access mode is not stored, the entry with the largest access times from the last access to the current access interval in the aggregation table can be moved to the history mode table according to the access frequency, and the newly generated access mode is stored in the aggregation table.
When the access times of the memory pages corresponding to an entry in the filtering table exceeds a first threshold, the entry can be moved into the aggregation table. And discarding the content of an item in the filtering table when the interval between the last access of the item and the current access exceeds a certain amount of access times. When the entries in the filtering table are full and the newly generated access mode is not stored, the entry with the largest access frequency, which is accessed last time from the current access interval, in the filtering table can be deleted according to the access frequency, and the newly generated access mode is stored in the filtering table.
When the table entry in the aggregation table moves into the history mode table, judging whether the table entry corresponding to the same memory page as the table entry in the aggregation table exists in the history mode table. If there is no entry in the history pattern table that corresponds to the same memory page as the entry in the aggregate table, a new entry may be created in the history pattern table to store the entry in the aggregate table. If the table entry of the same memory page corresponding to the table entry in the aggregation table exists in the history mode table, the corresponding table entry in the history mode table can be updated according to the table entry in the aggregation table. The update rules of the entries in the history pattern table may employ an update strategy as described previously.
The method embodiment of the present application is described in detail above in connection with fig. 1, and the apparatus embodiment of the present application is described in detail below in connection with fig. 2 to 5. It is to be understood that the description of the device embodiments corresponds to the description of the method embodiments, and that parts not described in detail can therefore be seen in the preceding method embodiments.
Fig. 2 is a schematic diagram of a data prefetching apparatus according to an embodiment of the application. The data prefetching apparatus 200 may include a prefetch buffer 210 and a processor 220.
Prefetch buffer 210 is used to store prefetch data.
A processor 220 for performing the following operations:
Receiving a first access instruction, wherein an address corresponding to the first access instruction belongs to a first storage area of a memory;
Determining pre-fetch data according to a history access mode of the first storage area, wherein the history access mode is used for indicating history access data of the first storage area, and the pre-fetch data comprises part or all of the history access data;
And sending a prefetch instruction to a memory controller according to the prefetch data, wherein the prefetch instruction is used for storing the prefetch data into a prefetch cache area.
Optionally, determining the prefetched data according to the historical access mode of the first storage area includes determining the prefetched data according to the historical access mode and the current access mode, wherein the current access mode is used for indicating the current access data of the first storage area, and the prefetched data is data except the current access data in part or all of the historical access data.
Optionally, the historical access mode is updated according to the current access mode of the first storage area.
Optionally, the history access mode includes a history mode and a history candidate mode, and the updating the history access mode according to the current access mode of the first storage area includes:
Optionally, according to the current access mode, the history mode and the history candidate mode, the updated history mode and the history candidate mode are determined based on a scoring rule, wherein the updated history mode is determined by a result that the score is high in a union of the current access mode and the history candidate mode and a union of the current access mode and the history mode, and the updated history candidate mode is determined by a result that the score is high in an intersection of the current access mode and the history candidate mode and a union of the current access mode and the history mode.
Optionally, the result of the scoring rule is associated with the coverage rate and accuracy of the actual memory access behavior of the memory access pattern to be evaluated.
Optionally, the index of the first storage area is a page number of the memory page.
Optionally, the processor is further configured to record a memory access behavior for the first storage area, record the memory access behavior in a filtering table if the memory access frequency for the first storage area is less than a first threshold, and move the memory access behavior record from the filtering table to an aggregation table if the memory access frequency for the first storage area is greater than the first threshold.
Optionally, the processor is further configured to move the memory behavior record from the aggregate table to a history pattern table in response to a number of accesses of an interval between an access to the memory and a last access to the first storage area being greater than a second threshold.
Fig. 3 is a schematic structural diagram of a data prefetching apparatus according to another embodiment of the present application. The data prefetching apparatus 300 includes a controller 310, a memory access request queue 320, a prefetch request queue 330, a prefetch buffer 340, a filter table 350, an aggregate table 360, and a history pattern table 370. Wherein filter table 350, aggregate table 360, and history pattern table 370 constitute prefetch component 380.
The memory request queue 320 may be used to store memory instructions for memory. The controller 310 may be configured to control the prefetch component 380 to determine prefetched data corresponding to the memory access instruction according to the data prefetching method described above. Prefetch request queue 330 may be used to access prefetch instructions that may be used to cache prefetch data corresponding to access instructions into prefetch cache 340. Prefetch buffer 340 may be used to store prefetch data.
FIG. 4 is a schematic diagram of a workflow of the data prefetching apparatus 300. The workflow diagram 400 includes steps S410 to S460.
In step S410, a memory access request is received.
In step S420, the memory request is stored in memory request queue 320.
In step S430, the access behavior is observed in the aggregate table.
At step S440, the scattered access requests are filtered through the filter table.
In step S450, the access pattern is recorded in the history pattern table.
In step S460, a prefetch request is issued according to the access mode in the history mode table.
Fig. 5 is a schematic diagram of a data prefetching system according to an embodiment of the present application. The data prefetch system 500 includes an upper layer device 510, a data prefetch apparatus 300, and a memory controller 520.
The data prefetching apparatus 300 may determine the prefetched data according to the access instruction of the upper layer device 510 for the memory. Meanwhile, the data prefetching apparatus 300 may send a prefetch instruction corresponding to the prefetched data to the memory controller 520. Memory controller 520 may read the prefetch data from memory according to the prefetch instruction and return to data prefetching apparatus 300.
It should be appreciated that in embodiments of the present application, the processor may be a central processing unit (central processing unit, CPU), the processor may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application SPECIFIC INTEGRATED Circuits (ASICs), off-the-shelf programmable gate arrays (field programmable GATE ARRAY, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It should be understood that in embodiments of the present application, "B corresponding to a" means that B is associated with a, from which B may be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information.
It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B, and that three cases, a alone, a and B together, and B alone, may exist. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber Line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be read by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., digital versatile disk (digital video disc, DVD)), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (12)
1. A data prefetching method, the data prefetching method comprising:
Receiving a first access instruction, wherein an address corresponding to the first access instruction belongs to a first storage area of a memory;
Determining pre-fetch data according to a history access mode of the first storage area, wherein the history access mode is used for indicating history access data of the first storage area, and the pre-fetch data comprises part or all of the history access data;
according to the prefetched data, a prefetching instruction is sent to a memory controller, and the prefetching instruction is used for storing the prefetched data into a prefetching cache area;
the history access mode comprises a history mode and a history candidate mode, and the data prefetching method further comprises the following steps:
determining the updated history mode and the updated history candidate mode based on a scoring rule according to the current access mode, the history mode and the history candidate mode;
The updated history mode is determined by a result with high scores in a union of the current access mode and the history candidate mode and a union of the current access mode and the history mode, and the updated history candidate mode is determined by a result with high scores in an intersection of the current access mode and the history candidate mode and an intersection of the current access mode and the history mode;
The current access mode is used for indicating current access data of the first storage area, the history mode is the currently used history access mode, and the history candidate mode is the candidate history access mode.
2. The method of claim 1, wherein determining prefetched data based on a historical access pattern of the first storage area comprises:
and determining the prefetched data according to the historical access mode and the current access mode, wherein the prefetched data is data except the current access data in part or all of the historical access data.
3. The data pre-fetching method according to claim 1 or 2, wherein the result of the scoring rule is related to the coverage and accuracy of the access pattern to be evaluated for the actual access behavior.
4. The method of claim 1, wherein the index of the first storage area is a page number of a memory page.
5. The data prefetching method of claim 1, wherein said data prefetching method further comprises:
recording access behaviors aiming at the first storage area;
if the access times of the first storage area are smaller than a first threshold value, recording the access behaviors in a filtering table;
and if the access times of the first storage area are greater than a first threshold value, moving the access behavior record from the filtering table to the aggregation table.
6. The data prefetching method of claim 5 wherein said data prefetching method further comprises:
And in response to the number of accesses to the memory and the interval between the last access to the first storage area being greater than a second threshold, moving the memory behavior record from the aggregate table to a history pattern table.
7. A data prefetching apparatus, the data prefetching apparatus comprising:
the prefetching cache area is used for storing prefetched data;
A processor for performing the operations of:
Receiving a first access instruction, wherein an address corresponding to the first access instruction belongs to a first storage area of a memory;
Determining pre-fetch data according to a history access mode of the first storage area, wherein the history access mode is used for indicating history access data of the first storage area, and the pre-fetch data comprises part or all of the history access data;
according to the prefetched data, a prefetching instruction is sent to a memory controller, and the prefetching instruction is used for storing the prefetched data into a prefetching cache area;
the history access mode includes a history mode and a history candidate mode, and the processor is further configured to:
determining the updated history mode and the updated history candidate mode based on a scoring rule according to the current access mode, the history mode and the history candidate mode;
The updated history mode is determined by a result with high scores in a union of the current access mode and the history candidate mode and a union of the current access mode and the history mode, and the updated history candidate mode is determined by a result with high scores in an intersection of the current access mode and the history candidate mode and an intersection of the current access mode and the history mode;
The current access mode is used for indicating current access data of the first storage area, the history mode is the currently used history access mode, and the history candidate mode is the candidate history access mode.
8. The data prefetching apparatus of claim 7 wherein said determining prefetched data based on a historical access pattern of said first memory region comprises:
and determining the prefetched data according to the historical access mode and the current access mode, wherein the prefetched data is data except the current access data in part or all of the historical access data.
9. The data prefetching apparatus according to claim 7 or 8, wherein the result of said scoring rule is associated with the coverage and accuracy of the access pattern to be evaluated for the actual access behavior.
10. The data prefetching apparatus of claim 7 wherein the index of the first memory region is a page number of a memory page.
11. The data prefetching apparatus of claim 7 wherein the processor is further configured to:
recording access behaviors aiming at the first storage area;
if the access times of the first storage area are smaller than a first threshold value, recording the access behaviors in a filtering table;
and if the access times of the first storage area are greater than a first threshold value, moving the access behavior record from the filtering table to the aggregation table.
12. The data prefetching apparatus of claim 11 wherein the processor is further configured to:
And in response to the number of accesses to the memory and the interval between the last access to the first storage area being greater than a second threshold, moving the memory behavior record from the aggregate table to a history pattern table.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211102695.5A CN115495394B (en) | 2022-09-09 | 2022-09-09 | Data prefetching method and data prefetching device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211102695.5A CN115495394B (en) | 2022-09-09 | 2022-09-09 | Data prefetching method and data prefetching device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN115495394A CN115495394A (en) | 2022-12-20 |
| CN115495394B true CN115495394B (en) | 2025-09-26 |
Family
ID=84468625
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202211102695.5A Active CN115495394B (en) | 2022-09-09 | 2022-09-09 | Data prefetching method and data prefetching device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115495394B (en) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118260218A (en) * | 2022-12-27 | 2024-06-28 | 华为技术有限公司 | Data processing method, device, chip and computer readable storage medium |
| CN118277292A (en) * | 2022-12-30 | 2024-07-02 | 华为技术有限公司 | Data pre-fetching method and data pre-fetching device |
| CN117217977B (en) * | 2023-05-26 | 2024-07-19 | 摩尔线程智能科技(北京)有限责任公司 | GPU data access processing method, device and storage medium |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114756481A (en) * | 2022-04-07 | 2022-07-15 | 中国人民解放军国防科技大学 | Data prefetching method and device supporting multiple memory access modes |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105183663B (en) * | 2010-03-29 | 2018-11-27 | 威盛电子股份有限公司 | Prefetch unit and data prefetch method |
| US9471497B2 (en) * | 2014-01-24 | 2016-10-18 | Netapp, Inc. | Methods for combining access history and sequentiality for intelligent prefetching and devices thereof |
| CN104572501B (en) * | 2015-01-08 | 2017-05-17 | 北京航空航天大学 | Access trace locality analysis-based shared buffer optimization method in multi-core environment |
| KR101681423B1 (en) * | 2015-08-11 | 2016-11-30 | 인하대학교 산학협력단 | Instructions and Data Prefetch Method using Displacement History Buffer and Systems thereof |
| CN105930281B (en) * | 2016-05-12 | 2019-01-15 | 清华大学 | With the matched on piece cache prefetching mechanism of configuration information driving data memory access mode |
| US11249909B2 (en) * | 2018-12-28 | 2022-02-15 | Intel Corporation | Systems and methods for adaptive multipath probability (AMP) prefetcher |
-
2022
- 2022-09-09 CN CN202211102695.5A patent/CN115495394B/en active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114756481A (en) * | 2022-04-07 | 2022-07-15 | 中国人民解放军国防科技大学 | Data prefetching method and device supporting multiple memory access modes |
Also Published As
| Publication number | Publication date |
|---|---|
| CN115495394A (en) | 2022-12-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN115495394B (en) | Data prefetching method and data prefetching device | |
| CN110297787B (en) | Method, device and equipment for accessing memory by I/O equipment | |
| CN105677580A (en) | Method and device for accessing cache | |
| US8364904B2 (en) | Horizontal cache persistence in a multi-compute node, symmetric multiprocessing computer | |
| US11593268B2 (en) | Method, electronic device and computer program product for managing cache | |
| US12026099B2 (en) | System and method for storing cache location information for cache entry transfer | |
| US12099451B2 (en) | Re-reference interval prediction (RRIP) with pseudo-LRU supplemental age information | |
| CN109478163B (en) | System and method for identifying a pending memory access request at a cache entry | |
| US20210240631A1 (en) | Cache memory device, system including the same, and method of operating the same | |
| US20250165400A1 (en) | Data reduction method, apparatus, and system | |
| CN119988255A (en) | A method, device, equipment and storage medium for memory access cache management | |
| CN113010454A (en) | Data reading and writing method, device, terminal and storage medium | |
| US7979640B2 (en) | Cache line duplication in response to a way prediction conflict | |
| JP2019521410A (en) | Set cache entry age based on hints from different cache levels | |
| US20190155604A1 (en) | Selective prefetching in multithreaded processing units | |
| CN117687931A (en) | Address mapping information activation method, electronic equipment and computer readable storage device | |
| CN111694504B (en) | Method and device for processing read request | |
| JP2002182978A (en) | Storage subsystem and information processing system | |
| CN113391974A (en) | Memory monitoring method, device, processor and storage medium | |
| US12423243B2 (en) | Systems and methods for reducing cache fills | |
| US20250147904A1 (en) | Page detection using recency score filters | |
| US20230100230A1 (en) | Using request class and reuse recording in one cache for insertion policies of another cache | |
| CN117453608A (en) | NVMe command processing method, computer equipment, storage medium and product | |
| CN117971731A (en) | Hardware implementation device of LRU approximation algorithm, method and device for updating LRU value | |
| CN117687929A (en) | Verify cached request generators, methods, devices, media, procedures |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant |