US20060155934A1 - System and method for reducing unnecessary cache operations - Google Patents
System and method for reducing unnecessary cache operations Download PDFInfo
- Publication number
- US20060155934A1 US20060155934A1 US11/032,875 US3287505A US2006155934A1 US 20060155934 A1 US20060155934 A1 US 20060155934A1 US 3287505 A US3287505 A US 3287505A US 2006155934 A1 US2006155934 A1 US 2006155934A1
- Authority
- US
- United States
- Prior art keywords
- data
- write
- cache
- memory cache
- present
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0893—Caches characterised by their organisation or structure
- G06F12/0897—Caches characterised by their organisation or structure with two or more cache hierarchy levels
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0804—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
- G06F12/0817—Cache consistency protocols using directory methods
 
Definitions
- the present invention relates in general to data processing systems, and more particularly, to an improved multi-processor data processing system. Still more particularly, the present invention relates to improved cache operation within multi-processor data processing systems.
- a conventional multi-processor data processing system typically includes a system memory, input/output (I/O) devices, multiple processing elements that each include a processor and one or more levels of high-speed cache memory, and a system bus coupling the processing elements to each other and to the system memory and I/O devices.
- the processors all utilize common instruction sets and communication protocols, have similar hardware architectures, and are generally provided with similar memory hierarchies.
- Caches are commonly utilized to temporarily store values that might be accessed by a processor in order to speed up processing by reducing access latency as compared to loading needed values from memory.
- Each cache includes a cache array and a cache directory.
- An associated cache controller manages the transfer of data and instructions between the processor core or system memory and the cache.
- the cache directory also contains a series of bits utilized to track the coherency states of the data in the cache.
- a coherent memory hierarchy is maintained through the utilization of a coherency protocol, such as the MESI protocol.
- MESI protocol an indication of a coherency state is stored in association with each coherency granule (e.g., a cache line or sector) of one or more levels of cache memories.
- Each coherency granule can have one of the four MESI states, which is indicated by bits in the cache directory.
- the MESI protocol allows a cache line of data to be tagged with one of four states: “M” (modified), “E” (exclusive), “S” (shared), or “I” (invalid).
- the Modified state indicates that a coherency granule is valid only in the cache storing the modified coherency granule and that the value of the modified coherency granule has not been written to system memory.
- a coherency granule is indicated as Exclusive, then, of all caches at that level of the memory hierarchy, only that cache holds the coherency.
- the data in the Exclusive state is consistent with system memory, however.
- a coherency granule is marked as Shared in a cache directory, the coherency granule is resident in the associated cache and in at least one other cache at the same level of the memory hierarchy, and all of the copies of the coherency granule are consistent with system memory. Finally, the Invalid state indicates that the data and address tag associated with a coherency granule are both invalid.
- each coherency granule e.g., cache line or sector
- the state to which each coherency granule is set is dependent upon both a previous state of the data within the cache line and the type of memory access request received from a requesting device (e.g., the processor). Accordingly, maintaining memory coherency in the MP requires that the processors communicate messages across the system bus indicating their intention to read or write to memory locations. For example, when a processor desires to write data to a memory location, the processor must first inform all other processing elements of its intention to write data to the memory location and receive permission from all other processing elements to carry out the write operation. The permission messages received by the requesting processor indicate that all other cached copies of the contents of the memory location have been invalidated, thereby guaranteeing that the other processors will not access their stale local data.
- the cache hierarchy includes at least two levels.
- the level one (L1), or upper-level cache is usually a private cache associated with a particular processor core in an MP system.
- the processor core first looks for a data in the upper-level cache. If the requested data is not found in the upper-level cache, the processor core then access lower-level caches (e.g., level two (L2) or level three (L3) caches) for the requested data.
- the lowest level cache e.g., L3 is often shared among several processor cores.
- the data processing system includes a processor and a memory hierarchy.
- the memory hierarchy includes at least an upper memory cache, at least a lower memory cache, and a write-back data structure.
- the upper memory cache examines the write-back data structure to determine whether or not the data is present in the lower memory cache. If the data is present in the lower memory cache, the data is replaced in the upper memory cache without casting out the data to the lower memory cache.
- FIG. 1 is a block diagram of an exemplary multi-processor data processing system in which a preferred embodiment of the present invention may be implemented;
- FIG. 2 is a more detailed block diagram of a processing unit in accordance with a preferred embodiment of the present invention.
- FIG. 3 is a high-level logical flowchart illustrating an exemplary cache operation in accordance with a preferred embodiment of the present invention.
- multi-processor data processing system 100 includes multiple processing units 102 , which are each coupled to a respective one of memories 104 .
- Each processing unit 102 is further coupled to an interconnect 106 that supports the communication of data, instructions, and control information between processing units 102 .
- lookaside L3 caches 108 are preferably coupled to processing units 102 . Because of the lower latencies of cache memories as compared with memories 104 , L3 caches 108 are utilized by processing units 102 as castout storage facilities for recently accessed data.
- Each processing unit 102 is preferably implemented as a single integrated circuit comprising a semiconductor substrate having integrated circuitry formed thereon. Multiple processing units 102 and at least a portion of interconnect 106 may be advantageously be packaged together on a common backplane or chip carrier.
- multi-processor data processing system 100 can include many additional components not specifically illustrated in FIG. 1 . Because such additional components are not necessary for an understanding of the present invention, they are not illustrated in FIG. 1 or discussed further herein. It should also be understood, however, that the enhancements to data processing system to reduce unnecessary cache operations provided by the present invention are applicable to data processing systems of any system architecture and are in no way limited to the generalized multi-processor architecture or symmetric multi-processing (SMP) architecture illustrated in FIG. 1 .
- SMP symmetric multi-processing
- processing unit 102 includes multiple processor cores 200 , each of which includes a L1 cache 202 . Coupled to each processor core 200 is a respective L2 cache 204 , which further includes write-back queue 205 and an L2 cache controller 212 that sends a write-back request to lookaside L3 cache 108 when a selected line (e.g., a victim line) to be cast out to lookaside L3 cache 108 is loaded into write-back queue 205 .
- a selected line e.g., a victim line
- an alternate embodiment of the present invention may include multiple processor cores 200 sharing a single L2 cache 204 .
- L2 cache 204 allocates an entry within write-back history table 206 if L2 cache 204 receives a return message indicating that the victim line is already present in L3 cache 108 .
- Write-back queue 205 described herein in more detail in conjunction with FIG. 3 , buffers victim lines designated to be evicted from L2 cache 204 .
- Write-back history table 206 also described herein in more detail in conjunction with FIG. 3 , tracks cache lines of data and instructions that have been evicted from L2 cache 204 that are also present in L3 cache 108 .
- Each assembly 210 that includes processor core 200 , L2 cache 204 , and write-back history table 206 is coupled by intra-chip interconnect 208 to other assemblies 210 within processing unit 102 .
- L3 cache 108 coupled to processing unit 102 , also includes L3 cache controller 110 .
- FIG. 3 there is illustrated a high-level logical flowchart of an exemplary method of reducing unnecessary cache operations in accordance with a preferred embodiment of the present invention.
- the process depicted in FIG. 3 involves operations by both L2 cache 204 (top part of FIG. 3 ) and L3 cache 108 (bottom part of FIG. 3 ).
- the process begins at step 300 and continues to step 302 , which illustrates L2 cache controller 212 determining whether or not there is a miss on a cache line requested by the affiliated processor core 200 in L2 cache 204 . If no cache miss has occurred, the process iterates at step 302 .
- step 306 illustrates L2 cache controller 212 determining whether or not write-back history table information is utilized in replacement policy.
- the replacement policy may be varied by code running in multi-processor data processing system 100 , a hardware switch that is physically toggled by a user, or another method. If L2 cache controller 212 determines that write-back history table information is utilized in the replacement policy, the process moves to step 308 , which depicts processor core 200 determining whether or not write-back history table 206 includes an entry that indicates that there is a cache line in L2 cache 204 that is also included in L3 cache 108 .
- processor core 200 determines that there is a cache line in L2 cache 204 that is also included in L3 cache 108 by accessing write-back history table 206 , the process moves to step 312 , which illustrates a replacement of the cache line in L2 cache 204 that is determined to be also included in L3 cache 108 .
- the process then returns to step 302 , and proceeds in an iterative fashion.
- step 308 if L2 cache controller 212 determines that there is not a cache line in congruence class L2 cache 204 that is also included in L3 cache 108 by accessing write-back history table 206 , the process moves to step 310 , which depicts L2 cache controller 212 utilizing a least-recently used (LRU) algorithm or another replacement algorithm.
- LRU least-recently used
- step 306 if L2 cache controller 212 determines that write-back history table information is not utilized in the replacement policy, L2 cache controller 212 also selects a victim cache line, as depicted in step 310 , utilizing a default replacement policy.
- step 310 the process then continues to step 314 , which illustrates L2 cache controller 212 placing the selected victim line into write-back queue 205 .
- step 316 which depicts L2 cache controller 212 determining whether or not the selected victim line is dirty.
- MESI protocol like the MESI protocol.
- a dirty cache line is considered a Modified granule that is valid only in the cache storing the modified coherency and that the value of the modified coherency granule has not been written to memory 104 or any other type of data storage device (e.g., CD-ROM, hard disk drive, floppy diskette drive or others).
- a clean line may be marked with an Exclusive or Shared tag. If L2 cache controller 212 determines the selected victim line is dirty, the process continues to step 320 , which illustrates L2 cache 204 issuing a write-back request to L3 cache 108 . However, if L2 cache controller 212 determines the selected victim line is not dirty, the process continues to step 318 , which depicts L2 cache controller 212 examining the contents of write-back history table 206 for an entry that indicates the selected victim line is also present in L3 cache 108 .
- step 312 illustrates L2 cache controller 212 replacing the selected victim line without first writing the selected victim line to L3 cache 108 or memory 104 .
- the process then returns to step 302 and continues in an iterative fashion.
- step 318 if processor core 200 determines by reference to write-back history table 206 that the selected victim is not present in L3 cache 108 , the process continues to step 320 .
- step 322 depicts L3 cache controller 110 snooping a write-back request issued from L2 cache 204 .
- step 324 which illustrates L3 cache controller 110 determining whether or not the selected victim line had been modified while present in L2 cache 204 . If L3 cache 108 determines that the selected victim line had been modified while present in L2 cache 204 , the process continues to step 326 , which depicts L3 cache 108 accepting the selected victim line from L2 cache 204 .
- L3 cache 108 must accept and cache the selected victim line in order to preserve the changes made to the selected victim line while present in L2 cache 204 .
- the process then continues to step 327 , which illustrates L3 cache 108 performing a castout of a data line from L3 cache 108 according an algorithm such as a least recently used (LRU) algorithm.
- LRU least recently used
- L2 cache 204 When a selected victim line has not been modified while present in L2 cache 204 , L2 cache 204 first examines the contents of a write-back history table 206 to determine if the selected victim line is already present in L3 cache 108 .
- processor core 200 when processor core 200 initially requests and processes a cache line, the line is written to only L1 cache 202 and L2 cache 204 . The line is not written to lookaside L3 cache 108 until L2 cache 204 casts out a line to make room for a new data line requested by processor core 200 . Therefore, each time a clean (not modified) line is selected to be cast out of L2 cache 202 , L2 cache 202 must examine write-back history table 206 to determine whether or not a copy of the selected victim line is already present in L3 cache 108 .
- step 328 which illustrates L3 cache 108 determining whether or not the selected victim line is already present in L3 cache 108 . If L3 cache 108 determines that the selected victim line is not already present in L3 cache 108 , the process continues to step 326 , which depicts L3 cache 108 accepting the selected victim line received from L2 cache 204 . This process is not necessary to preserve a copy of the selected victim line since the data in the selected victim line in L2 cache 204 also resides in memory 104 , but is advantageous for latency purposes.
- step 327 which illustrates L3 cache 108 casting out a data line to accommodate the selected victim line from L2 cache 204 , if necessary.
- the process then returns to step 302 and continues in an iterative fashion.
- step 330 illustrates L3 cache 108 setting an inhibit bit in the response to the write-back request from L2 cache 204 .
- Setting the inhibit bit indicates to L2 cache 204 that the selected victim line is already present in the L3 cache 108 and that L2 cache 204 may replace the selected victim line with newly-requested data without casting out the selected victim line to L3 cache 108 .
- an actual inhibit bit is not required to be set by L3 cache 108 .
- L3 cache 108 may achieve the same result by sending an alternate bus response that indicates the validity of the selected victim line in L3 cache 108 .
- step 332 depicts L3 cache 108 sending the response with the inhibit bit set out on intra-chip interconnect 208 .
- the process proceeds to step 334 , which illustrates L2 cache 204 receiving the response from L3 cache 108 .
- step 336 depicts L2 cache 204 determining whether or not the response includes an inhibit bit set by L3 cache 108 . If L2 cache 204 determines that the response includes an inhibit bit set by L3 cache 108 .
- step 338 illustrates L2 cache 204 allocating an entry in write-back history table 206 indicating that the selected victim line is already present and valid in L3 cache 108 .
- L2 cache 204 determines that write-back history table 206 is full, least recently accessed entries in write-back history table 206 are merely overwritten when L2 cache 204 allocates a new entry in step 338 .
- L2 cache 204 will determine that the line is valid and present in L3 cache 108 by locating the entry in write-back history table 206 and will evict the selected victim line without attempting to write-back the selected victim line to L3 cache 108 or memory 104 . The process then returns to step 302 and proceeds in an iterative fashion.
- step 336 if L2 cache 204 determines that the response does not include an inhibit bit set by L3 cache 108 , the process continues to step 340 , which illustrates L2 cache 204 ignoring the response sent by L3 cache 108 . The process then returns to step 302 and continues in an iterative fashion.
- the present invention is a system and method of reducing unnecessary cache operations within a data processing system.
- the L2 cache controller core selects a line (e.g., a victim line) to be cast out of the L2 cache to make room for a newly-requested cache line.
- the L2 cache controller determines the state of the victim line with respect to the L3 cache and memory by examining the contents of a write-back history table.
- the write-back history table includes entries that indicate the lines that have been recently evicted from the L2 cache and written to the L3 cache.
- the victim line is replaced by the newly-requested line without first writing the victim line to the L3 cache or memory. If, however, the victim line is not characterized by an entry in the write-back history table to be recently evicted from the L2 cache and valid in the L3 cache, an entry is made in the write-back history table to reflect such status.
- This system and method inhibits the redundant writing of cache lines to a L3 cache or memory if the cache lines are determined to be both present in the L3 cache or memory and valid (e.g. unmodified). Therefore, this system and method reduces wasted bandwidth normally utilized by such redundant writing of cache lines.
- the present invention may be implemented to reduce the unnecessary of writing any type of data among different levels of any memory hierarchy.
- Examples of such implementations include: the utilization of random access memory, writeable storage media (e.g., floppy diskette, hard disk drive, read/write CD-ROM, optical media), and flash memory among the different levels of the memory hierarchy.
- the present invention may be implemented among different levels of data processing systems, where a client might suppress the writing of data to a server if the data is already present and valid on the server.
- Program defining functions on the present invention can be delivered to a data storage system or a computer system via a variety of signal-bearing media, which include, without limitation, non-writable storage media (e.g., CD-ROM), writable storage media (e.g., a floppy diskette, hard disk drive, read/write CD-ROM, optical media), and communication media, such as computer and telephone networks including Ethernet.
- signal-bearing media when carrying or encoding computer readable instructions that direct method functions in the present invention, represent alternative embodiments of the present invention.
- the present invention may be implemented by a system having means in the form of hardware, software, or a combination of software and hardware as described herein or their equivalent.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
A system and method for cache management in a data processing system. The data processing system includes a processor and a memory hierarchy. The memory hierarchy includes at least an upper memory cache, at least a lower memory cache, and a write-back data structure. In response to replacing data from the upper memory cache, the upper memory cache examines the write-back data structure to determine whether or not the data is present in the lower memory cache. If the data is present in the lower memory cache, the data is replaced in the upper memory cache without casting out the data to the lower memory cache. 
  Description
-  1. Technical Field
-  The present invention relates in general to data processing systems, and more particularly, to an improved multi-processor data processing system. Still more particularly, the present invention relates to improved cache operation within multi-processor data processing systems.
-  2. Description of the Related Art
-  A conventional multi-processor data processing system (referred to hereinafter as an MP), typically includes a system memory, input/output (I/O) devices, multiple processing elements that each include a processor and one or more levels of high-speed cache memory, and a system bus coupling the processing elements to each other and to the system memory and I/O devices. The processors all utilize common instruction sets and communication protocols, have similar hardware architectures, and are generally provided with similar memory hierarchies.
-  Caches are commonly utilized to temporarily store values that might be accessed by a processor in order to speed up processing by reducing access latency as compared to loading needed values from memory. Each cache includes a cache array and a cache directory. An associated cache controller manages the transfer of data and instructions between the processor core or system memory and the cache. Typically, the cache directory also contains a series of bits utilized to track the coherency states of the data in the cache.
-  With multiple caches within the memory hierarchy, a coherent structure is required for valid execution results in the MP. This coherent structure provides a single view of the contents of the memory to all of the processors and other memory access devices (e.g., I/O devices). A coherent memory hierarchy is maintained through the utilization of a coherency protocol, such as the MESI protocol. In the MESI protocol, an indication of a coherency state is stored in association with each coherency granule (e.g., a cache line or sector) of one or more levels of cache memories. Each coherency granule can have one of the four MESI states, which is indicated by bits in the cache directory.
-  The MESI protocol allows a cache line of data to be tagged with one of four states: “M” (modified), “E” (exclusive), “S” (shared), or “I” (invalid). The Modified state indicates that a coherency granule is valid only in the cache storing the modified coherency granule and that the value of the modified coherency granule has not been written to system memory. When a coherency granule is indicated as Exclusive, then, of all caches at that level of the memory hierarchy, only that cache holds the coherency. The data in the Exclusive state is consistent with system memory, however. If a coherency granule is marked as Shared in a cache directory, the coherency granule is resident in the associated cache and in at least one other cache at the same level of the memory hierarchy, and all of the copies of the coherency granule are consistent with system memory. Finally, the Invalid state indicates that the data and address tag associated with a coherency granule are both invalid.
-  The state to which each coherency granule (e.g., cache line or sector) is set is dependent upon both a previous state of the data within the cache line and the type of memory access request received from a requesting device (e.g., the processor). Accordingly, maintaining memory coherency in the MP requires that the processors communicate messages across the system bus indicating their intention to read or write to memory locations. For example, when a processor desires to write data to a memory location, the processor must first inform all other processing elements of its intention to write data to the memory location and receive permission from all other processing elements to carry out the write operation. The permission messages received by the requesting processor indicate that all other cached copies of the contents of the memory location have been invalidated, thereby guaranteeing that the other processors will not access their stale local data.
-  In some MP systems, the cache hierarchy includes at least two levels. The level one (L1), or upper-level cache is usually a private cache associated with a particular processor core in an MP system. The processor core first looks for a data in the upper-level cache. If the requested data is not found in the upper-level cache, the processor core then access lower-level caches (e.g., level two (L2) or level three (L3) caches) for the requested data. The lowest level cache (e.g., L3) is often shared among several processor cores.
-  Typically, when a congruence class of one of the upper-level caches becomes full, data lines are “evicted” or written to the lower-level cache for storage. However, in any memory hierarchy, there may be several copies of the same data residing in the memory hierarchy at the same time. The policy of evicting lines to provide for more space in the upper-level cache may result in unnecessary writes to lower-level caches, which results in increased bandwidth demands.
-  Therefore, there is a need for a more intelligent system and method for managing a multi-level memory hierarchy to reduce unnecessary inter-cache communication.
-  A system and method for cache management in a data processing system are disclosed. The data processing system includes a processor and a memory hierarchy. The memory hierarchy includes at least an upper memory cache, at least a lower memory cache, and a write-back data structure. In response to replacing data from the upper memory cache, the upper memory cache examines the write-back data structure to determine whether or not the data is present in the lower memory cache. If the data is present in the lower memory cache, the data is replaced in the upper memory cache without casting out the data to the lower memory cache.
-  The above-mentioned features, as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.
-  The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objects and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
-  FIG. 1 is a block diagram of an exemplary multi-processor data processing system in which a preferred embodiment of the present invention may be implemented;
-  FIG. 2 is a more detailed block diagram of a processing unit in accordance with a preferred embodiment of the present invention; and
-  FIG. 3 is a high-level logical flowchart illustrating an exemplary cache operation in accordance with a preferred embodiment of the present invention.
-  With reference now to the figures, and in particular, with reference toFIG. 1 , there is illustrated a block diagram of a multi-processordata processing system 100 in which a preferred embodiment of the present invention may be implemented. As depicted, multi-processordata processing system 100 includesmultiple processing units 102, which are each coupled to a respective one ofmemories 104. Eachprocessing unit 102 is further coupled to aninterconnect 106 that supports the communication of data, instructions, and control information betweenprocessing units 102. Also,lookaside L3 caches 108 are preferably coupled toprocessing units 102. Because of the lower latencies of cache memories as compared withmemories 104,L3 caches 108 are utilized byprocessing units 102 as castout storage facilities for recently accessed data. Eachprocessing unit 102 is preferably implemented as a single integrated circuit comprising a semiconductor substrate having integrated circuitry formed thereon.Multiple processing units 102 and at least a portion ofinterconnect 106 may be advantageously be packaged together on a common backplane or chip carrier.
-  Those skilled in the art will appreciate that multi-processordata processing system 100 can include many additional components not specifically illustrated inFIG. 1 . Because such additional components are not necessary for an understanding of the present invention, they are not illustrated inFIG. 1 or discussed further herein. It should also be understood, however, that the enhancements to data processing system to reduce unnecessary cache operations provided by the present invention are applicable to data processing systems of any system architecture and are in no way limited to the generalized multi-processor architecture or symmetric multi-processing (SMP) architecture illustrated inFIG. 1 .
-  Referring now toFIG. 2 , there is illustrated a more detailed block diagram of aprocessing unit 102 ofFIG. 1 . As illustrated,processing unit 102 includesmultiple processor cores 200, each of which includes aL1 cache 202. Coupled to eachprocessor core 200 is arespective L2 cache 204, which further includes write-back queue 205 and anL2 cache controller 212 that sends a write-back request to lookasideL3 cache 108 when a selected line (e.g., a victim line) to be cast out to lookasideL3 cache 108 is loaded into write-back queue 205. However, persons with ordinary skill in this art will appreciate that an alternate embodiment of the present invention may includemultiple processor cores 200 sharing asingle L2 cache 204. Also, coupled toL2 cache 204 is write-back history table 206.L2 cache 204 allocates an entry within write-back history table 206 ifL2 cache 204 receives a return message indicating that the victim line is already present inL3 cache 108. Write-back queue 205, described herein in more detail in conjunction withFIG. 3 , buffers victim lines designated to be evicted fromL2 cache 204. Write-back history table 206, also described herein in more detail in conjunction withFIG. 3 , tracks cache lines of data and instructions that have been evicted fromL2 cache 204 that are also present inL3 cache 108. Eachassembly 210 that includesprocessor core 200,L2 cache 204, and write-back history table 206 is coupled byintra-chip interconnect 208 toother assemblies 210 withinprocessing unit 102.L3 cache 108, coupled toprocessing unit 102, also includesL3 cache controller 110.
-  With reference now toFIG. 3 , there is illustrated a high-level logical flowchart of an exemplary method of reducing unnecessary cache operations in accordance with a preferred embodiment of the present invention. The process depicted inFIG. 3 involves operations by both L2 cache 204 (top part ofFIG. 3 ) and L3 cache 108 (bottom part ofFIG. 3 ). The process begins atstep 300 and continues to step 302, which illustratesL2 cache controller 212 determining whether or not there is a miss on a cache line requested by the affiliatedprocessor core 200 inL2 cache 204. If no cache miss has occurred, the process iterates atstep 302.
-  However, ifL2 cache controller 212 determines that there is a request for data missed inL2 cache 204, the process moves to step 306, which illustratesL2 cache controller 212 determining whether or not write-back history table information is utilized in replacement policy. The replacement policy may be varied by code running in multi-processordata processing system 100, a hardware switch that is physically toggled by a user, or another method. IfL2 cache controller 212 determines that write-back history table information is utilized in the replacement policy, the process moves to step 308, which depictsprocessor core 200 determining whether or not write-back history table 206 includes an entry that indicates that there is a cache line inL2 cache 204 that is also included inL3 cache 108. Ifprocessor core 200 determines that there is a cache line inL2 cache 204 that is also included inL3 cache 108 by accessing write-back history table 206, the process moves to step 312, which illustrates a replacement of the cache line inL2 cache 204 that is determined to be also included inL3 cache 108. The process then returns to step 302, and proceeds in an iterative fashion.
-  Returning to step 308, ifL2 cache controller 212 determines that there is not a cache line in congruenceclass L2 cache 204 that is also included inL3 cache 108 by accessing write-back history table 206, the process moves to step 310, which depictsL2 cache controller 212 utilizing a least-recently used (LRU) algorithm or another replacement algorithm. Returning to step 306, ifL2 cache controller 212 determines that write-back history table information is not utilized in the replacement policy,L2 cache controller 212 also selects a victim cache line, as depicted instep 310, utilizing a default replacement policy.
-  Afterstep 310, the process then continues to step 314, which illustratesL2 cache controller 212 placing the selected victim line into write-back queue 205. The process then proceeds to step 316, which depictsL2 cache controller 212 determining whether or not the selected victim line is dirty. Many data processing systems, including exemplary multi-processordata processing system 100, preferably utilize a coherency protocol, like the MESI protocol. For example, a dirty cache line is considered a Modified granule that is valid only in the cache storing the modified coherency and that the value of the modified coherency granule has not been written tomemory 104 or any other type of data storage device (e.g., CD-ROM, hard disk drive, floppy diskette drive or others). A clean line may be marked with an Exclusive or Shared tag. IfL2 cache controller 212 determines the selected victim line is dirty, the process continues to step 320, which illustratesL2 cache 204 issuing a write-back request toL3 cache 108. However, ifL2 cache controller 212 determines the selected victim line is not dirty, the process continues to step 318, which depictsL2 cache controller 212 examining the contents of write-back history table 206 for an entry that indicates the selected victim line is also present inL3 cache 108. IfL2 cache controller 212 determines the selected victim line is also present inL3 cache 108, the process then proceeds to step 312, which illustratesL2 cache controller 212 replacing the selected victim line without first writing the selected victim line toL3 cache 108 ormemory 104. The process then returns to step 302 and continues in an iterative fashion.
-  Returning to step 318, ifprocessor core 200 determines by reference to write-back history table 206 that the selected victim is not present inL3 cache 108, the process continues to step 320. The process continues to step 322, which depictsL3 cache controller 110 snooping a write-back request issued fromL2 cache 204. The process then continues to step 324, which illustratesL3 cache controller 110 determining whether or not the selected victim line had been modified while present inL2 cache 204. IfL3 cache 108 determines that the selected victim line had been modified while present inL2 cache 204, the process continues to step 326, which depictsL3 cache 108 accepting the selected victim line fromL2 cache 204. If the selected victim line was modified while present inL2 cache 204,L3 cache 108 must accept and cache the selected victim line in order to preserve the changes made to the selected victim line while present inL2 cache 204. The process then continues to step 327, which illustratesL3 cache 108 performing a castout of a data line fromL3 cache 108 according an algorithm such as a least recently used (LRU) algorithm. The process then returns to step 302 and proceeds in an iterative fashion.
-  The purpose of the present invention is to reduce unnecessary inter-cache operations. When a selected victim line has not been modified while present inL2 cache 204,L2 cache 204 first examines the contents of a write-back history table 206 to determine if the selected victim line is already present inL3 cache 108.
-  It is possible that whenprocessor core 200 initially requests and processes a cache line, the line is written toonly L1 cache 202 andL2 cache 204. The line is not written to lookasideL3 cache 108 untilL2 cache 204 casts out a line to make room for a new data line requested byprocessor core 200. Therefore, each time a clean (not modified) line is selected to be cast out ofL2 cache 202,L2 cache 202 must examine write-back history table 206 to determine whether or not a copy of the selected victim line is already present inL3 cache 108. If an entry in write-back history table 206 indicates that a copy of the selected victim line is already present inL3 cache 108, which serves as a castout or victim cache, casting out the selected victim line toL3 cache 108 would be an unnecessary function. The present invention inhibits these unnecessary functions.
-  Therefore, returning to step 324, ifL3 cache 108 determines that the selected victim line has not been modified while present inL2 cache 204, the process continues to step 328, which illustratesL3 cache 108 determining whether or not the selected victim line is already present inL3 cache 108. IfL3 cache 108 determines that the selected victim line is not already present inL3 cache 108, the process continues to step 326, which depictsL3 cache 108 accepting the selected victim line received fromL2 cache 204. This process is not necessary to preserve a copy of the selected victim line since the data in the selected victim line inL2 cache 204 also resides inmemory 104, but is advantageous for latency purposes. Then, ifL3 cache 204 is filled with castout lines fromL2 cache 204, a data line fromL3 cache 204 must be cast out, preferably tomemory 104, to accommodate the selected victim line fromL2 cache 204. Therefore, the process then continues to step 327, which illustratesL3 cache 108 casting out a data line to accommodate the selected victim line fromL2 cache 204, if necessary. The process then returns to step 302 and continues in an iterative fashion.
-  Returning to step 328, ifL3 cache 108 determines that the selected victim line is valid inL3 cache 108, the process continues to step 330, which illustratesL3 cache 108 setting an inhibit bit in the response to the write-back request fromL2 cache 204. Setting the inhibit bit indicates toL2 cache 204 that the selected victim line is already present in theL3 cache 108 and thatL2 cache 204 may replace the selected victim line with newly-requested data without casting out the selected victim line toL3 cache 108. Persons with ordinary skill in this art will appreciate that an actual inhibit bit is not required to be set byL3 cache 108.L3 cache 108 may achieve the same result by sending an alternate bus response that indicates the validity of the selected victim line inL3 cache 108.
-  The process then continues to step 332, which depictsL3 cache 108 sending the response with the inhibit bit set out onintra-chip interconnect 208. The process proceeds to step 334, which illustratesL2 cache 204 receiving the response fromL3 cache 108. The process then continues to step 336, which depictsL2 cache 204 determining whether or not the response includes an inhibit bit set byL3 cache 108. IfL2 cache 204 determines that the response includes an inhibit bit set byL3 cache 108. The process then moves to step 338, which illustratesL2 cache 204 allocating an entry in write-back history table 206 indicating that the selected victim line is already present and valid inL3 cache 108. IfL2 cache 204 determines that write-back history table 206 is full, least recently accessed entries in write-back history table 206 are merely overwritten whenL2 cache 204 allocates a new entry instep 338. On the next cycle of the process, if the selected victim line is considered as a candidate for cast out fromL2 cache 204,L2 cache 204 will determine that the line is valid and present inL3 cache 108 by locating the entry in write-back history table 206 and will evict the selected victim line without attempting to write-back the selected victim line toL3 cache 108 ormemory 104. The process then returns to step 302 and proceeds in an iterative fashion. However, returning to step 336, ifL2 cache 204 determines that the response does not include an inhibit bit set byL3 cache 108, the process continues to step 340, which illustratesL2 cache 204 ignoring the response sent byL3 cache 108. The process then returns to step 302 and continues in an iterative fashion.
-  As has been described, the present invention is a system and method of reducing unnecessary cache operations within a data processing system. When a L2 cache controller detects a cache miss in an L2 cache, the L2 cache controller core selects a line (e.g., a victim line) to be cast out of the L2 cache to make room for a newly-requested cache line. The L2 cache controller determines the state of the victim line with respect to the L3 cache and memory by examining the contents of a write-back history table. The write-back history table includes entries that indicate the lines that have been recently evicted from the L2 cache and written to the L3 cache. If an entry in the write-back history table indicates that the victim line has been recently evicted from the L2 cache and that line is also valid in the L3 cache, the victim line is replaced by the newly-requested line without first writing the victim line to the L3 cache or memory. If, however, the victim line is not characterized by an entry in the write-back history table to be recently evicted from the L2 cache and valid in the L3 cache, an entry is made in the write-back history table to reflect such status. This system and method inhibits the redundant writing of cache lines to a L3 cache or memory if the cache lines are determined to be both present in the L3 cache or memory and valid (e.g. unmodified). Therefore, this system and method reduces wasted bandwidth normally utilized by such redundant writing of cache lines.
-  Of course, persons having ordinary skill in this art are aware that while this preferred embodiment of the present invention reduces the unnecessary writing of cache lines between an L2 cache and L3 cache and/or memory, the present invention may be implemented to reduce the unnecessary of writing any type of data among different levels of any memory hierarchy. Examples of such implementations include: the utilization of random access memory, writeable storage media (e.g., floppy diskette, hard disk drive, read/write CD-ROM, optical media), and flash memory among the different levels of the memory hierarchy. Also, the present invention may be implemented among different levels of data processing systems, where a client might suppress the writing of data to a server if the data is already present and valid on the server.
-  It should be understood that at least some aspects of the present invention may alternatively be implemented in a program product. Program defining functions on the present invention can be delivered to a data storage system or a computer system via a variety of signal-bearing media, which include, without limitation, non-writable storage media (e.g., CD-ROM), writable storage media (e.g., a floppy diskette, hard disk drive, read/write CD-ROM, optical media), and communication media, such as computer and telephone networks including Ethernet. It should be understood, therefore in such signal-bearing media when carrying or encoding computer readable instructions that direct method functions in the present invention, represent alternative embodiments of the present invention. Further, it is understood that the present invention may be implemented by a system having means in the form of hardware, software, or a combination of software and hardware as described herein or their equivalent.
-  While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.
Claims (14)
 1. A method for cache management in a data processing system, wherein said data processing system includes a processor and a memory hierarchy, wherein said memory hierarchy includes at least an upper memory cache, at least a lower memory cache, and a write-back data structure, said method comprising: 
  in response to replacing data from said upper memory cache, examining said write-back data structure to determine whether or not said data is present in said lower memory cache; and 
 if said data is present in said lower memory cache, replacing said data in said upper memory cache without casting out said data to said lower memory cache. 
  2. The method in claim 1 , further comprising: 
  determining whether or not write-back data in said write-back data structure is utilized to determine whether or not said data is to be replaced in said upper memory cache; and 
 in response to determining said write-back data is not utilized, utilizing a least-recently used algorithm to determine whether or not to replace said data. 
  3. The method in claim 1 , further comprising: 
  in response to determining said data is to be replaced, placing said data in a write-back queue; 
 writing said data to said lower memory cache; 
 determining whether or not said data had been modified in said upper memory cache; 
 in response to determining said data had been modified in said upper memory cache, determining whether or not said data is present in said lower memory cache; 
 in response to determining said data is present in said lower memory cache, issuing a message to said upper memory cache indicating said data is present in said lower memory cache; and 
 in response to receiving said message, updating said write-back data structure. 
  4. A computer program product, comprising: 
  code when executed emulates a data processing system, said data processing system include a first processing unit, second processing unit and a write-back data structure, said first processing unit, in response to replacing data from first processing unit, examining said write-back data structure to determine whether or not said data is present in said second processing unit; and 
 code when executed emulates said first processing unit replacing said data in said first processing unit without casting out said data to said second processing unit, if said data is present in said second processing unit. 
  5. The computer program product in claim 4 , further comprising: 
  code when executed emulates said data processing system determining whether or not write-back data in said write-back data structure is utilized to determine whether or not said data is to be replaced in said first processing unit; and 
 code when executed emulates said data processing system utilizing at least-recently used algorithm to determine whether or not to replace said data, in response to determining said write-back data is not utilized. 
  6. The computer program product in claim 4 , further comprising: 
  code when executed emulates said data processing system placing said data in a write-back queue, in response to determining said data is to be replaced; 
 code when executed emulates said data processing system writing said data to said second processing unit; 
 code when executed emulates said data processing system determining whether or not said data had been modified in said first processing unit; and 
 code when executed emulates said data processing system issuing a message to said first processing unit indicating said data is present in said second processing unit, in response to determining said data is present in said second processing unit. 
  7. A processor, comprising: 
  a processor core; 
 a memory hierarchy, coupled to said processor core, said memory hierarchy further including an upper memory cache and a lower memory cache; and 
 a write-back data structure, coupled to said memory hierarchy, wherein said upper memory cache examines said write-back data structure to determine whether or not data is present in said lower memory cache, in response to replacing said data from said upper memory cache, and if said data is present in said lower memory cache, replacing said data in said upper memory cache without casting out said data to said lower memory cache. 
  8. The processor in claim 7 , said processor core further comprises: 
  a circuit to determine whether or not write-back data in said write-back data structure is utilized to determine whether or not said data is replaced in said upper memory cache; and 
 a circuit utilizing a least-recently used algorithm to determine whether or not to replace said data, in response to determining said write-back data is not utilized. 
  9. The processor in claim 7 , further comprising: 
  a write-back queue for queuing data once said processor core determines said data is to be replaced; 
 a first flag, included in said data, to indicated whether or not said data had been modified in said upper memory cache; 
 a second flag, included in an entry in said write-back data structure, indicating said data is present in said lower memory cache; and 
 a message generator for issuing a message to said upper memory cache is present in said lower memory cache. 
  10. A data processing system, comprising: 
  a plurality of processors, in accordance with claim 7;  
 a memory; and 
 an interconnect coupling said memory and said plurality of processors. 
  11. The data processing system in claim 10 , wherein said plurality of processors further comprise: 
  a circuit to determine whether or not write-back data in said write-back data structure is utilized to determine whether or not said data is replaced in said upper memory cache; and 
 a circuit utilizing a least-recently used algorithm to determine whether or not to replace said data, in response to determining said write-back data is not utilized. 
  12. The data processing system in claim 10 , further comprising: 
  a write-back queue for queuing data once said processor core determines said data is to be replaced; 
 a first flag, included in said data, to indicate whether or not said data had been modified in said upper memory cache; 
 a second flag, included in an entry in said write-back data structure, indicating said data is present in said lower memory cache; and 
 a message generator for issuing a message to said upper memory cache is present in said lower memory cache. 
  13. A multi-chip module, with a plurality of processors in accordance with claim 7 , wherein said plurality of processors further comprise: 
  a circuit to determine whether or not write-back data in said write-back data structure is utilized to determine whether or not said data is replaced in said upper memory cache; and 
 a circuit utilizing a least-recently used algorithm to determine whether or not to replace said data, in response to determining said write-back data is not utilized. 
  14. The multi-chip module in claim 13 , further comprising: 
  a write-back queue for queuing data once said processor core determines said data is to be replaced; 
 a first flag, included in said data, to indicate whether or not said data had been modified in said upper memory cache; 
 a second flag, included in an entry in said write-back data structure, indicating said data is present in said lower memory cache; and 
 a message generator for issuing a message to said upper memory cache is present in said lower memory cache.
 Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| US11/032,875 US20060155934A1 (en) | 2005-01-11 | 2005-01-11 | System and method for reducing unnecessary cache operations | 
| US11/674,960 US7698508B2 (en) | 2005-01-11 | 2007-02-14 | System and method for reducing unnecessary cache operations | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| US11/032,875 US20060155934A1 (en) | 2005-01-11 | 2005-01-11 | System and method for reducing unnecessary cache operations | 
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| US11/674,960 Division US7698508B2 (en) | 2005-01-11 | 2007-02-14 | System and method for reducing unnecessary cache operations | 
Publications (1)
| Publication Number | Publication Date | 
|---|---|
| US20060155934A1 true US20060155934A1 (en) | 2006-07-13 | 
Family
ID=36654610
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| US11/032,875 Abandoned US20060155934A1 (en) | 2005-01-11 | 2005-01-11 | System and method for reducing unnecessary cache operations | 
| US11/674,960 Expired - Fee Related US7698508B2 (en) | 2005-01-11 | 2007-02-14 | System and method for reducing unnecessary cache operations | 
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| US11/674,960 Expired - Fee Related US7698508B2 (en) | 2005-01-11 | 2007-02-14 | System and method for reducing unnecessary cache operations | 
Country Status (1)
| Country | Link | 
|---|---|
| US (2) | US20060155934A1 (en) | 
Cited By (15)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US20060218352A1 (en) * | 2005-03-22 | 2006-09-28 | Shannon Christopher J | Cache eviction technique for reducing cache eviction traffic | 
| US20080162818A1 (en) * | 2006-12-28 | 2008-07-03 | Fujitsu Limited | Cache-memory control apparatus, cache-memory control method and computer product | 
| US20080307167A1 (en) * | 2007-06-05 | 2008-12-11 | Ramesh Gunna | Converting Victim Writeback to a Fill | 
| US20080307166A1 (en) * | 2007-06-05 | 2008-12-11 | Ramesh Gunna | Store Handling in a Processor | 
| US20130185494A1 (en) * | 2012-01-17 | 2013-07-18 | International Business Machines Corporation | Populating a first stride of tracks from a first cache to write to a second stride in a second cache | 
| US20130262767A1 (en) * | 2012-03-28 | 2013-10-03 | Futurewei Technologies, Inc. | Concurrently Accessed Set Associative Overflow Cache | 
| US8825953B2 (en) | 2012-01-17 | 2014-09-02 | International Business Machines Corporation | Demoting tracks from a first cache to a second cache by using a stride number ordering of strides in the second cache to consolidate strides in the second cache | 
| US8825957B2 (en) | 2012-01-17 | 2014-09-02 | International Business Machines Corporation | Demoting tracks from a first cache to a second cache by using an occupancy of valid tracks in strides in the second cache to consolidate strides in the second cache | 
| US8825944B2 (en) | 2011-05-23 | 2014-09-02 | International Business Machines Corporation | Populating strides of tracks to demote from a first cache to a second cache | 
| US20140369348A1 (en) * | 2013-06-17 | 2014-12-18 | Futurewei Technologies, Inc. | Enhanced Flow Entry Table Cache Replacement in a Software-Defined Networking Switch | 
| US9021201B2 (en) | 2012-01-17 | 2015-04-28 | International Business Machines Corporation | Demoting partial tracks from a first cache to a second cache | 
| JP2015111435A (en) * | 2007-01-31 | 2015-06-18 | クゥアルコム・インコーポレイテッドQualcomm Incorporated | Apparatus and methods for reducing castouts in multi-level cache hierarchy | 
| US20170024329A1 (en) * | 2015-07-22 | 2017-01-26 | Fujitsu Limited | Arithmetic processing device and arithmetic processing device control method | 
| US10152425B2 (en) * | 2016-06-13 | 2018-12-11 | Advanced Micro Devices, Inc. | Cache entry replacement based on availability of entries at another cache | 
| US20190034335A1 (en) * | 2016-02-03 | 2019-01-31 | Swarm64 As | Cache and method | 
Families Citing this family (20)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US7890700B2 (en) * | 2008-03-19 | 2011-02-15 | International Business Machines Corporation | Method, system, and computer program product for cross-invalidation handling in a multi-level private cache | 
| US8327072B2 (en) | 2008-07-23 | 2012-12-04 | International Business Machines Corporation | Victim cache replacement | 
| US8347037B2 (en) * | 2008-10-22 | 2013-01-01 | International Business Machines Corporation | Victim cache replacement | 
| US8209489B2 (en) * | 2008-10-22 | 2012-06-26 | International Business Machines Corporation | Victim cache prefetching | 
| US8499124B2 (en) * | 2008-12-16 | 2013-07-30 | International Business Machines Corporation | Handling castout cache lines in a victim cache | 
| US8117397B2 (en) * | 2008-12-16 | 2012-02-14 | International Business Machines Corporation | Victim cache line selection | 
| US8225045B2 (en) * | 2008-12-16 | 2012-07-17 | International Business Machines Corporation | Lateral cache-to-cache cast-in | 
| US8489819B2 (en) * | 2008-12-19 | 2013-07-16 | International Business Machines Corporation | Victim cache lateral castout targeting | 
| US8949540B2 (en) | 2009-03-11 | 2015-02-03 | International Business Machines Corporation | Lateral castout (LCO) of victim cache line in data-invalid state | 
| US8285939B2 (en) * | 2009-04-08 | 2012-10-09 | International Business Machines Corporation | Lateral castout target selection | 
| US8347036B2 (en) * | 2009-04-09 | 2013-01-01 | International Business Machines Corporation | Empirically based dynamic control of transmission of victim cache lateral castouts | 
| US8312220B2 (en) * | 2009-04-09 | 2012-11-13 | International Business Machines Corporation | Mode-based castout destination selection | 
| US8327073B2 (en) * | 2009-04-09 | 2012-12-04 | International Business Machines Corporation | Empirically based dynamic control of acceptance of victim cache lateral castouts | 
| US8996812B2 (en) * | 2009-06-19 | 2015-03-31 | International Business Machines Corporation | Write-back coherency data cache for resolving read/write conflicts | 
| US9189403B2 (en) | 2009-12-30 | 2015-11-17 | International Business Machines Corporation | Selective cache-to-cache lateral castouts | 
| US8352687B2 (en) | 2010-06-23 | 2013-01-08 | International Business Machines Corporation | Performance optimization and dynamic resource reservation for guaranteed coherency updates in a multi-level cache hierarchy | 
| US9189424B2 (en) * | 2011-05-31 | 2015-11-17 | Hewlett-Packard Development Company, L.P. | External cache operation based on clean castout messages | 
| US10210087B1 (en) * | 2015-03-31 | 2019-02-19 | EMC IP Holding Company LLC | Reducing index operations in a cache | 
| US10078591B2 (en) | 2016-09-27 | 2018-09-18 | International Business Machines Corporation | Data storage cache management | 
| US10915461B2 (en) * | 2019-03-05 | 2021-02-09 | International Business Machines Corporation | Multilevel cache eviction management | 
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US5317720A (en) * | 1990-06-29 | 1994-05-31 | Digital Equipment Corporation | Processor system with writeback cache using writeback and non writeback transactions stored in separate queues | 
| US6282615B1 (en) * | 1999-11-09 | 2001-08-28 | International Business Machines Corporation | Multiprocessor system bus with a data-less castout mechanism | 
| US6629210B1 (en) * | 2000-10-26 | 2003-09-30 | International Business Machines Corporation | Intelligent cache management mechanism via processor access sequence analysis | 
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US5564035A (en) * | 1994-03-23 | 1996-10-08 | Intel Corporation | Exclusive and/or partially inclusive extension cache system and method to minimize swapping therein | 
| US5829038A (en) * | 1996-06-20 | 1998-10-27 | Intel Corporation | Backward inquiry to lower level caches prior to the eviction of a modified line from a higher level cache in a microprocessor hierarchical cache structure | 
| US6134634A (en) * | 1996-12-20 | 2000-10-17 | Texas Instruments Incorporated | Method and apparatus for preemptive cache write-back | 
| US5787478A (en) * | 1997-03-05 | 1998-07-28 | International Business Machines Corporation | Method and system for implementing a cache coherency mechanism for utilization within a non-inclusive cache memory hierarchy | 
| US6349367B1 (en) * | 1999-08-04 | 2002-02-19 | International Business Machines Corporation | Method and system for communication in which a castout operation is cancelled in response to snoop responses | 
| US7334089B2 (en) * | 2003-05-20 | 2008-02-19 | Newisys, Inc. | Methods and apparatus for providing cache state information | 
- 
        2005
        - 2005-01-11 US US11/032,875 patent/US20060155934A1/en not_active Abandoned
 
- 
        2007
        - 2007-02-14 US US11/674,960 patent/US7698508B2/en not_active Expired - Fee Related
 
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US5317720A (en) * | 1990-06-29 | 1994-05-31 | Digital Equipment Corporation | Processor system with writeback cache using writeback and non writeback transactions stored in separate queues | 
| US6282615B1 (en) * | 1999-11-09 | 2001-08-28 | International Business Machines Corporation | Multiprocessor system bus with a data-less castout mechanism | 
| US6629210B1 (en) * | 2000-10-26 | 2003-09-30 | International Business Machines Corporation | Intelligent cache management mechanism via processor access sequence analysis | 
Cited By (34)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US7277992B2 (en) * | 2005-03-22 | 2007-10-02 | Intel Corporation | Cache eviction technique for reducing cache eviction traffic | 
| US20060218352A1 (en) * | 2005-03-22 | 2006-09-28 | Shannon Christopher J | Cache eviction technique for reducing cache eviction traffic | 
| US7743215B2 (en) * | 2006-12-28 | 2010-06-22 | Fujitsu Limited | Cache-memory control apparatus, cache-memory control method and computer product | 
| US20080162818A1 (en) * | 2006-12-28 | 2008-07-03 | Fujitsu Limited | Cache-memory control apparatus, cache-memory control method and computer product | 
| JP2015111435A (en) * | 2007-01-31 | 2015-06-18 | クゥアルコム・インコーポレイテッドQualcomm Incorporated | Apparatus and methods for reducing castouts in multi-level cache hierarchy | 
| US7836262B2 (en) * | 2007-06-05 | 2010-11-16 | Apple Inc. | Converting victim writeback to a fill | 
| US20080307166A1 (en) * | 2007-06-05 | 2008-12-11 | Ramesh Gunna | Store Handling in a Processor | 
| US20110047336A1 (en) * | 2007-06-05 | 2011-02-24 | Ramesh Gunna | Converting Victim Writeback to a Fill | 
| US8131946B2 (en) | 2007-06-05 | 2012-03-06 | Apple Inc. | Converting victim writeback to a fill | 
| US8239638B2 (en) | 2007-06-05 | 2012-08-07 | Apple Inc. | Store handling in a processor | 
| US8364907B2 (en) | 2007-06-05 | 2013-01-29 | Apple Inc. | Converting victim writeback to a fill | 
| US20080307167A1 (en) * | 2007-06-05 | 2008-12-11 | Ramesh Gunna | Converting Victim Writeback to a Fill | 
| US8892841B2 (en) | 2007-06-05 | 2014-11-18 | Apple Inc. | Store handling in a processor | 
| US8850106B2 (en) | 2011-05-23 | 2014-09-30 | International Business Machines Corporation | Populating strides of tracks to demote from a first cache to a second cache | 
| US8825944B2 (en) | 2011-05-23 | 2014-09-02 | International Business Machines Corporation | Populating strides of tracks to demote from a first cache to a second cache | 
| US8825956B2 (en) | 2012-01-17 | 2014-09-02 | International Business Machines Corporation | Demoting tracks from a first cache to a second cache by using a stride number ordering of strides in the second cache to consolidate strides in the second cache | 
| US8966178B2 (en) * | 2012-01-17 | 2015-02-24 | International Business Machines Corporation | Populating a first stride of tracks from a first cache to write to a second stride in a second cache | 
| US8825953B2 (en) | 2012-01-17 | 2014-09-02 | International Business Machines Corporation | Demoting tracks from a first cache to a second cache by using a stride number ordering of strides in the second cache to consolidate strides in the second cache | 
| US8832377B2 (en) | 2012-01-17 | 2014-09-09 | International Business Machines Corporation | Demoting tracks from a first cache to a second cache by using an occupancy of valid tracks in strides in the second cache to consolidate strides in the second cache | 
| US9471496B2 (en) | 2012-01-17 | 2016-10-18 | International Business Machines Corporation | Demoting tracks from a first cache to a second cache by using a stride number ordering of strides in the second cache to consolidate strides in the second cache | 
| US20130185478A1 (en) * | 2012-01-17 | 2013-07-18 | International Business Machines Corporation | Populating a first stride of tracks from a first cache to write to a second stride in a second cache | 
| US20130185494A1 (en) * | 2012-01-17 | 2013-07-18 | International Business Machines Corporation | Populating a first stride of tracks from a first cache to write to a second stride in a second cache | 
| US9026732B2 (en) | 2012-01-17 | 2015-05-05 | International Business Machines Corporation | Demoting partial tracks from a first cache to a second cache | 
| US8959279B2 (en) * | 2012-01-17 | 2015-02-17 | International Business Machines Corporation | Populating a first stride of tracks from a first cache to write to a second stride in a second cache | 
| US8825957B2 (en) | 2012-01-17 | 2014-09-02 | International Business Machines Corporation | Demoting tracks from a first cache to a second cache by using an occupancy of valid tracks in strides in the second cache to consolidate strides in the second cache | 
| US9021201B2 (en) | 2012-01-17 | 2015-04-28 | International Business Machines Corporation | Demoting partial tracks from a first cache to a second cache | 
| CN104169892A (en) * | 2012-03-28 | 2014-11-26 | 华为技术有限公司 | Concurrently accessed set associative overflow cache | 
| US20130262767A1 (en) * | 2012-03-28 | 2013-10-03 | Futurewei Technologies, Inc. | Concurrently Accessed Set Associative Overflow Cache | 
| US20140369348A1 (en) * | 2013-06-17 | 2014-12-18 | Futurewei Technologies, Inc. | Enhanced Flow Entry Table Cache Replacement in a Software-Defined Networking Switch | 
| US9160650B2 (en) * | 2013-06-17 | 2015-10-13 | Futurewei Technologies, Inc. | Enhanced flow entry table cache replacement in a software-defined networking switch | 
| US20170024329A1 (en) * | 2015-07-22 | 2017-01-26 | Fujitsu Limited | Arithmetic processing device and arithmetic processing device control method | 
| US10545870B2 (en) * | 2015-07-22 | 2020-01-28 | Fujitsu Limited | Arithmetic processing device and arithmetic processing device control method | 
| US20190034335A1 (en) * | 2016-02-03 | 2019-01-31 | Swarm64 As | Cache and method | 
| US10152425B2 (en) * | 2016-06-13 | 2018-12-11 | Advanced Micro Devices, Inc. | Cache entry replacement based on availability of entries at another cache | 
Also Published As
| Publication number | Publication date | 
|---|---|
| US20070136535A1 (en) | 2007-06-14 | 
| US7698508B2 (en) | 2010-04-13 | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| US7698508B2 (en) | System and method for reducing unnecessary cache operations | |
| US10078592B2 (en) | Resolving multi-core shared cache access conflicts | |
| US7827354B2 (en) | Victim cache using direct intervention | |
| US7711902B2 (en) | Area effective cache with pseudo associative memory | |
| US8015365B2 (en) | Reducing back invalidation transactions from a snoop filter | |
| US9032145B2 (en) | Memory device and method having on-board address protection system for facilitating interface with multiple processors, and computer system using same | |
| US8606997B2 (en) | Cache hierarchy with bounds on levels accessed | |
| US7305523B2 (en) | Cache memory direct intervention | |
| US20080147986A1 (en) | Line swapping scheme to reduce back invalidations in a snoop filter | |
| US20020169935A1 (en) | System of and method for memory arbitration using multiple queues | |
| JP2000250812A (en) | Memory cache system and managing method therefor | |
| US6915396B2 (en) | Fast priority determination circuit with rotating priority | |
| US7277992B2 (en) | Cache eviction technique for reducing cache eviction traffic | |
| US7281092B2 (en) | System and method of managing cache hierarchies with adaptive mechanisms | |
| EP0936552B1 (en) | Pseudo precise I-cache inclusivity for vertical caches | |
| US7117312B1 (en) | Mechanism and method employing a plurality of hash functions for cache snoop filtering | |
| US7325102B1 (en) | Mechanism and method for cache snoop filtering | |
| US8473686B2 (en) | Computer cache system with stratified replacement | |
| US7434007B2 (en) | Management of cache memories in a data processing apparatus | |
| US7380068B2 (en) | System and method for contention-based cache performance optimization | |
| US6347363B1 (en) | Merged vertical cache controller mechanism with combined cache controller and snoop queries for in-line caches | |
| US6397303B1 (en) | Data processing system, cache, and method of cache management including an O state for memory-consistent cache lines | |
| US6918021B2 (en) | System of and method for flow control within a tag pipeline | |
| JP3732397B2 (en) | Cash system | |
| US6356982B1 (en) | Dynamic mechanism to upgrade o state memory-consistent cache lines | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| AS | Assignment | Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RAJAMONY, RAMAKRISHNAN;SHAFI, HAZIM;SPEIGHT, WILLIAM EVAN;AND OTHERS;REEL/FRAME:015738/0077;SIGNING DATES FROM 20041210 TO 20041216 | |
| STCB | Information on status: application discontinuation | Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |