CN118349494A

CN118349494A - Computing system, chip and related methods for translating virtual addresses

Info

Publication number: CN118349494A
Application number: CN202310072049.7A
Authority: CN
Inventors: 王灿; 万超; 陈洁君; 李云; 李寅; 张淼
Original assignee: Pingtouge Shanghai Semiconductor Co Ltd
Current assignee: Pingtouge Shanghai Semiconductor Co Ltd
Priority date: 2023-01-13
Filing date: 2023-01-13
Publication date: 2024-07-16

Abstract

The application discloses a computing system, a chip and a related method for converting virtual addresses. The computing system includes a computing core, at least one last-level cache core, and a memory management unit. The memory management unit is configured to: receiving a virtual address LLC command; acquiring a first continuous physical space size and corresponding physical address information according to virtual address information based on a virtual address LLC command; generating at least one physical address LLC command; and sending the at least one physical address based LLC command to the at least one LLC core. The at least one LLC core is configured to: the virtual address LLC command based LLC data is distributed to the at least one LLC core based on the at least one physical address LLC command.

Description

Computing system, chip and related methods for translating virtual addresses

Technical Field

The present invention relates to a computing system, a chip and a related method, and more particularly, to a computing system capable of converting virtual addresses and a related method.

Background

Depending on the design of the modern central processing unit or image processor, the computing system may include a cache (cache) system. The cache system may include a Virtual Address (VA) cache near the core and a physical Address (PHYSICAL ADDRESS, PA) cache near the memory. The application program almost always uses the virtual address of the virtual memory to call the related resource. For example, when an application program needs to execute a command such as invalidation (invalidate) to a physical address cache, the address range used by the application program is based on the virtual address of the virtual memory. A memory management unit (Memory Management Unit, MMU) may be further included between the virtual address cache and the physical address cache. The memory management unit may perform translations between virtual addresses and physical addresses. Therefore, how to more efficiently perform the conversion between the virtual address and the physical address in the computing system to further accelerate the execution speed of the application program is an important task in the art.

Disclosure of Invention

One embodiment of the application relates to a method of processing virtual address-based last level cache commands. The method comprises the steps of obtaining a first continuous physical space size and corresponding physical address information according to virtual address information based on a virtual address LLC command; LLC data based on virtual address LLC commands is distributed to at least one LLC core based on the PA information.

Another embodiment of the application relates to a computing system. The computing system includes a computing core, at least one last-level cache core, and a memory management unit. At least one last level cache core is communicatively coupled to the compute core. The memory management unit is in communication with the computing core and the at least one LLC core. The memory management unit is configured to: receiving a virtual address LLC command; acquiring a first continuous physical space size and corresponding physical address information according to virtual address information based on a virtual address LLC command; generating at least one physical address LLC command; and sending the at least one physical address based LLC command to the at least one LLC core. The at least one LLC core is configured to: the virtual address LLC command based LLC data is distributed to the at least one LLC core based on the at least one physical address LLC command.

The computing system and the related method can accelerate the disassembly flow, the distribution flow and the execution flow of the cache command, so that the hardware utilization rate of the computing system is improved.

Drawings

Aspects of the disclosure are better understood from the following embodiments when read in conjunction with the accompanying drawings. It should be noted that the various structures are not drawn to scale according to standard practice in the industry. In fact, the dimensions of the various structures may be arbitrarily increased or decreased for clarity of discussion.

FIG. 1 is a schematic diagram of one embodiment of a computing system of the present application.

FIG. 2 is a schematic diagram of an embodiment of a page table entry (page table entry) and a page directory entry (page directory entry) of the present application.

FIG. 3 is a schematic diagram of one embodiment of the virtual address and multi-level page table mapping of the present application.

FIG. 4 is a schematic diagram of a command disassembling method according to an embodiment of the present application.

FIG. 5 is a schematic diagram of a command distribution method according to an embodiment of the present application.

Fig. 6A and 6B are schematic diagrams illustrating another embodiment of the command distribution method according to the present application.

FIG. 7 is a schematic diagram of an embodiment of the present application for processing virtual address last-level-cache (LLC) based commands.

Detailed Description

The following disclosure provides many different embodiments, or examples, of the different means for implementing the provided subject matter. Specific examples of components and arrangements are described below to simplify the present disclosure. Of course, such is merely an example and is not intended to be limiting. For example, in the following description, the formation of a first member over or on a second member may include embodiments in which the first member and the second member are formed in direct contact, and may also include embodiments in which additional members may be formed between the first member and the second member such that the first member and the second member may not be in direct contact. In addition, the present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.

Moreover, for ease of description, spatially relative terms such as "under … …," "under … …," "lower," "above … …," "upper," "over … …," and the like may be used herein to describe one component or member's relationship to another component or member as illustrated in the figures. In addition to the orientations depicted in the drawings, the spatially relative terms are intended to encompass different orientations of the device in use or operation. The apparatus may be otherwise oriented (rotated 90 degrees or otherwise) and thus the spatially relative descriptors used herein interpreted as such.

As used herein, terms such as "first," "second," and "third" describe various components, regions, layers and/or sections, but such components, regions, layers and/or sections should not be limited by such terms. Such terms may be used only to distinguish one component, region, layer or section from another. The terms such as "first," "second," and "third" when used herein do not imply a sequence or order unless clearly indicated by the context.

The singular forms "a", "an" and "the" may include plural forms as well, unless the context clearly indicates otherwise. The term "coupled" along with its derivatives may be used herein to describe structural relationships between parts. "connected" may be used to describe two or more elements in direct physical or electrical contact with each other. "connected" may also be used to indicate that two or more elements are in direct or indirect (with intervening elements between them) physical or electrical contact with each other, and/or that the two or more elements cooperate or interact with each other.

FIG. 1 is a schematic diagram of one embodiment of a computing system of the present application. The computing system 100 may be adapted to perform the processes or methods disclosed herein. Computing system 100 includes computing cores 110 and 120. Registers 111 may be included in the compute core 110. Registers 121 may be included in compute core 120. Computing system 100 includes first level caches (L1-caches) 112 and 122 and second level caches (L2-caches) 113 and 123. The first level cache 112 and the second level cache 113 may be a cache dedicated to the compute core 110. The first level cache 122 and the second level cache 123 may be private caches to the compute core 120. The computing system 100 includes a level three cache (L3-cache) 150. Third level cache 150 may be a shared cache for computing cores 110 and 120. The computing system 100 includes a memory management unit (memory management unit, MMU) 130 and a translation lookaside buffer (translation lookaside buffer, TLB) 140. The computing system 100 includes a bus interface 160.

The level one cache 112 or 122 is closer to the compute core 110 or 120, and in this embodiment, the level one cache 112 or 122 may have a capacity of 32KB. The third level cache 150 is farther from the compute core 110 or 120 and the third level cache 150 may have a size of 6144KB. The second level cache 113 or 123 is farther from the compute core 110 or 120 than the first level cache 112 or 122 is from the compute core 110 or 120, but closer than the third level cache 150 is from the compute core 110 or 120. The level two cache 113 or 123 may have a capacity that is intermediate between that of the level one cache and the level three cache. Registers 111 or 121 are closest to compute cores 110 or 120, and registers 111 or 121 have a smaller size than level one caches 112 or 122. The register 111 or 121 is accessed at the fastest speed, followed by the first level cache 112 or 122, and then the second level cache 113 or 123. The slowest access speed is the tertiary cache 150.

In summary, the closer to the compute core (i.e., the higher the layer), the more expensive the memory, the faster the speed, and the less capacity; the more remote the computing core (i.e., the lower layer) the cheaper the memory, the slower the speed, and the greater the capacity.

According to some embodiments of the present application, the operation of the compute core 110 or 120 to access data using virtual addresses may be as follows. First, the compute core 110 or 120 may attempt to access the desired data from the level one cache 112 or 122 based on the virtual address. Wherein the index of a level one cache 112 or 122 is established based on the virtual address. If the data indicated by the virtual address is not available in the level one cache 112 or 122, the data indicated by the virtual address is further requested from the memory management unit 130.

The memory management unit 130 may first query the translation look-up buffer 140 for the physical address to which the virtual address maps. The translation look-aside buffer 140 is a fast but small memory. Storing the virtual address and its mapped physical address, which are more or less likely to be queried, in the translation look-aside buffer 140 can speed up the virtual address to physical address translation and speed up the data access.

If an entry for the virtual address exists in the translation look-aside buffer 140, the virtual address is translated to a physical address based on the contents of the translation look-aside buffer 140. If the virtual address cannot be successfully converted into the physical address, a page fault (page fault) event occurs, and the subsequent processing is performed by an upper layer (e.g., an operating system).

If the virtual address is successfully translated to a physical address, the corresponding data may be accessed from the level two cache 113 or 123 according to the translated physical address. The index of the second level cache 113 or 123 may be established based on the physical address. In some embodiments, the compute core 110 or 120 may also attempt to access the desired data from the level two cache 113 or 123 based on the virtual address. If the corresponding data of the physical address cannot be found in the second level cache 113 or 123, the corresponding data of the physical address is searched for and accessed to the third level cache 150. If the corresponding data for the physical address cannot be found in the three-level cache 150, the corresponding data for the physical address may be further requested from a memory (not shown in FIG. 1), for example, via the bus interface 160. In some embodiments, the tertiary cache may be a Last Level Cache (LLC).

If the entry for the virtual address is not present in the translation look-aside buffer 140, then the memory management unit 130 obtains the entry for the virtual address from a page table (page table) in memory (main memory) (e.g., via the bus interface 160) and translates the virtual address to a physical address based on the associated entry in the page table (e.g., page table entry, PTE). If the virtual address cannot be successfully converted into the physical address, a page fault (page fault) event occurs, and the subsequent processing is performed by an upper layer (e.g., an operating system). If the virtual address is successfully translated to a physical address, corresponding data for the physical address may be requested from the memory storage based on the translated physical address.

In addition, the memory management unit 130 may store the entries in the page table in the translation look-aside buffer 140 to speed up the translation speed of the subsequent virtual addresses. Furthermore, the memory management unit 130 may store the data requested from the memory into the first level cache 112 or 122, the second level cache 113 or 123, or the third level cache 150 to speed up the access speed of the subsequent data. The entries in the translation look-aside cache 140 may be replaced with an algorithm such as LRU (LEAST RECENTLY Used), LFU (Least Frequently Used), or FIFO (FIRST IN FIRST out), or the data in the cache may be replaced.

FIG. 2 is a diagram illustrating an embodiment of Page Table Entries (PTEs) and page directory entries (page directory entry, PDEs) of the present application. In this embodiment, page table entry 210 may contain 64 bits (from bit 0 to bit 63). Bits 12 through 47 (36 bits total) of page table entry 210 are address 211. Address 211 may be part of a virtual address. Address 211 may be 36 bits in the virtual address from the most significant bit (Most Significant Bit, MSB) to the lower bit direction. Address 211 may be the most significant 36 bits of the virtual address. V (valid) of bit 0 in page table entry 210 indicates whether this page table entry is valid. The S (system) bit 1 in the page table entry 210 indicates that the page table entry address points to either local memory or system memory. The SN (snoop) of bit 3 in page table entry 210 is used to indicate that this page table entry address supports peripheral component interconnect standard cache coherency (PCI cache coherency). Bit 4E (executable) in page table entry 210 indicates whether the data for this page table entry address is executable. R (readable) of bit 5 in page table entry 210 is used to indicate whether the PA address of this page table entry is readable. W (writeable) of bit 6 in page table entry 210 is used to indicate whether the PA address of this page table entry is writable. The "fragment" of bits 7 through 10 in page table entry 210 represents a fragment in physical memory space, and may be used to indicate the size of the contiguous space in which the physical page mapped by the page table entry is located, or may be used to indicate that the contiguous space includes several contiguous physical pages. RE (reactant) in page table entry 210 is used to indicate whether the physical page to which this page table entry maps is present in memory. The PS (page size) in page table entry 210 is used to indicate the size of the physical page (e.g., 4KB or 64 KB) to which this page table entry maps. The "Level" in page table entry 210 may be used to indicate the Level of this entry (e.g., PTE, PDE, PD (page directory entry 1) or PD0 (page directory entry 0)). The Offset in page table entry 210 may be used to indicate the Offset of the data to be accessed in this page or its corresponding physical page, or may indicate the intra-page address of the data to be accessed. Bits not explicitly described in page table entry 210 may be reserved bits that may be used for future functional expansion.

In this embodiment, page directory entry 220 may contain 64 bits (from bit 0 to bit 63). Bits 12 through 47 (36 bits total) of page directory entry 220 are address 221. Address 221 may be part of a virtual address. Address 221 may be 36 bits in the virtual address from the most significant bit (Most Significant Bit, MSB) to the lower bit direction. Address 221 may be the most significant 36 bits of the virtual address. The V of bit 0 in page directory entry 220 represents valid, which may be used to indicate whether the page table entry is valid. The S (system) bit 1 in the page directory entry 220 indicates that the page directory entry address points to local memory or system memory. The SN (snoop) of bit 3 in page directory entry 220 is used to indicate that this page directory entry address supports peripheral component interconnect standard cache coherency (PCI cache coherency). E (executable) of bit 4 in the page directory entry 220 indicates whether the data of the page directory entry address is executable. R (readable) of bit 5 in page directory entry 220 is used to indicate whether the PA address of this page directory entry is readable. W (writeable) of bit 6 in page directory entry 220 is used to indicate whether the PA address of this page table entry is writable. The "segment" of bits 7 through 10 in page directory entry 220 represents a segment in physical memory space, and may be used to indicate the size of the contiguous space in which the page directory entry maps to a physical page directory, or may be used to indicate that the contiguous space includes several contiguous physical page directories. RE (including) in page directory entry 220 indicates whether the physical page directory to which the page directory entry maps is present in memory. The PS (page size) in page directory entry 220 is used to indicate the size of the corresponding physical page (e.g., 4KB or 64 KB) of the page directory entry. The "Level" in page directory entry 220 may be used to indicate the Level of this entry (e.g., PTE, PDE, PD (page directory entry 1) or PD0 (page directory entry 0)). The Offset in page directory entry 220 may be used to indicate the Offset of the data to be accessed in the corresponding page or corresponding physical page, or may indicate the intra-page address of the data to be accessed. Bits not explicitly described in page directory entry 220 may be reserved bits that may be used for future functional expansion.

FIG. 3 is a schematic diagram of one embodiment of the virtual address and multi-level page table mapping of the present application. Virtual address 300 may be 48 bits in length, bits 0 through 47. Bits 0 through 11 of virtual address 300 may be intra-page address 301, with intra-page address 301 being 12 bits in length. Bits 12 through 20 of virtual address 300 may be PTE address 302, PTE address 302 being 9 bits in length. Bits 21 through 29 of virtual address 300 may be PDE address 303, PDE address 303 being 9 bits in length. Bits 30 through 38 of virtual address 300 may be PD1 (page directory entry 1) address 304, with the length of PD1 address 304 being 9 bits. Bits 39 through 47 of virtual address 300 may be PD0 (page directory entry 0) address 305, with the length of PD0 address 305 being 9 bits.

Referring to the computing system architecture of FIG. 1, the computing cores 110 or 120 may access relevant data using the virtual address 300. The memory management unit 130 may first query the translation look-up buffer 140 for the physical address mapped by the virtual address 300. If the entry for the virtual address is not present in the translation look-aside buffer 140, then the memory management unit 130 will retrieve the entry for the virtual address from a table (e.g., page table, directory table, etc.) in memory (e.g., via bus interface 160) and translate the virtual address to a physical address based on the associated entry in the table.

In an embodiment of multi-level page table mapping, memory management unit 130 may look up PD0 table 310 in translation look-aside cache 140 or memory based on PD0 address 305. Based on PD0 address 305, the query result of PD0 table 310 points to PD1 table 320 of the plurality of PD1 tables. The memory management unit 130 may query the PD1 table 320 according to the PD1 address 304. Based on the PD1 address 304, the query result of the PD1 table 320 points to a PDE table 330 of the plurality of PDE tables. The memory management unit 130 may look up the PDE table 330 based on the PDE address 303. Based on PDE address 303, the results of the query of PDE table 330 are directed to PTE table 340 of the plurality of PTE tables. The memory management unit 130 may query the PTE table 340 based on the PTE address 302. Based on PTE address 302, the query result of PTE table 340 points to physical page 350 of the plurality of physical pages. The memory management unit 130 may obtain the location of the corresponding data in the physical page 350 according to the intra-page address 301. Based on the intra-page address 301, data corresponding to the virtual address may be accessed at a particular location in the physical page 350.

In some embodiments, PDE address 303 may point to one large page (LARGE PAGE), which may include 512 (i.e., 2 ⁹) pages or physical pages. In other words, if a page is 4KB in size, the large page indicated by PDE address 303 may be 2MB (i.e., 4kb×512=2048 KB). The PD1 address 304 may point to one PD1 page, which PD1 page may include 512 (i.e., 2 ⁹) large pages. In other words, if a page is 4KB in size, the PD1 page indicated by the PD1 address 304 may be 1GB (i.e., 2mb×512=1024 MB). The PD0 address 305 may point to one PD0 page, which PD0 page may include 512 (i.e., 2 ⁹) PD1 pages. In other words, if one page is 4KB in size, the PD0 page indicated by the PD0 address 305 may be 512GB (i.e., 1gb×512=512 GB).

Furthermore, in some embodiments, a page may be 64KB in size. In this embodiment, the intra-page address 301 may be 16 bits in length, the PTE address 302 may be 5 bits in length, the PDE address 303 may be 9 bits in length, the PD1 address 304 may be 9 bits in length, and the PD. address 305 may be 9 bits in length.

Referring again to FIG. 2, the "segment" in page table entry 210 represents a segment in physical memory space that may be used to indicate the size of the contiguous physical space in which the page table entry maps to physical pages, and may also be used to indicate that the contiguous physical space includes several contiguous physical pages. Successive physical pages may be to the power of 2. For example, when the "segment" information in page table entry 210 is 0, it represents that the consecutive physical page is 1 (i.e., 2 ⁰), i.e., there is no consecutive physical page. When the "segment" information in page table entry 210 is 1, it represents that the consecutive physical pages are 2 (i.e., 2 ¹), i.e., there are two consecutive physical pages. When the "segment" information in page table entry 210 is2, it represents that the consecutive physical pages are 4 (i.e., 2 ²), i.e., there are four consecutive physical pages. When the "segment" information in page table entry 210 is 3, it represents that the consecutive physical pages are 8 (i.e., 2 ³), i.e., there are eight consecutive physical pages, and so on. The physical address of the continuous physical space defined by the segmentation information may be the base address of the continuous physical space.

If the contiguous physical page space is greater than the upper bound (e.g., 512 physical pages) of a PTE address (e.g., PTE address 302 of FIG. 3), the contiguous physical page space may be set to a large page (e.g., the large page pointed to by PDE address 303). Large pages may be distinguished from pages pointed to by PTEs using "Level" information in page table entry 210 or page directory entry 220. The physical address of the continuous physical space defined by the large page may be the base address of the continuous physical space. After the contiguous physical space defined by a large page is utilized, no PTE indicates the large page.

Common commands in the cache may include: invalidation, flush, clean, etc. And invalidating or writing the data in the cache to the next-level storage. The invalidate command may identify the contents of one or more cache lines as invalid, thereby causing all operations accessing the specified cache line to miss. The flush command may flush the one or more cache lines. In some embodiments, the invalidate command and the clear command may be the same or similar operations. The flush command may write back the contents of one or more cache lines identified as dirty (dirty) to memory and then flush the one or more cache lines.

The operation range of the command of the cache includes: (1) all data in the cache, (2) data for a particular process in the cache, or (3) data within a particular VA address range for a particular process in the cache. The inventive computing system, computing core, chip and related methods disclosed herein may process data within a particular VA address range for a particular process in a cache.

In addition, most recent level caches (LLC) closest to memory (e.g., three-level cache 150 shown in fig. 1) are caches that are accessed using physical addresses. The LLC closest to memory is typically larger in capacity and there are multiple instantiations. Such LLCs may need to be interleaved in the physical address. The ordering may use hash values (hash values) of the multi-bit addresses, if further consideration is given to averaging the bandwidth utilization of each LLC.

According to some embodiments of the present application, for virtual address based LLC commands, the memory management unit (or may be a cache command controller) may perform the following operations: (1) command disassembly, (2) command distribution, (3) receiving feedback signals, and (4) exception handling.

FIG. 4 is a schematic diagram of a command disassembling method according to an embodiment of the present application. A virtual address based LLC command may contain virtual address information. The virtual address information may indicate a virtual address section 410. VA base address 413 and VA top address 414 can be obtained from the virtual address information. VA size 415 may be calculated from VA base address 413 and VA top address 414. VA size 415 may indicate the number of pages or the number of large pages in virtual address section 410.

The memory management unit may query the corresponding page table or the corresponding directory table according to the VA base address 413 to obtain the PA base address 423. The memory management unit may query the corresponding page table or the corresponding directory table according to the VA base address 413 to obtain the segment information and the level information in the hit page table entry or directory entry. The size of the continuous physical space 420 in which the PA base address 423 is located may be obtained from the segment information and the hierarchy information obtained in the page table entry or the directory entry.

From the segmentation information, the hierarchy information, and the PA base address 423, the PA top address 424 of the end of the contiguous physical space 420 may be calculated. From the PA base address 423 and the PA top address 424, the PA size 425 may be calculated. PA size 425 does not indicate the size of contiguous physical space 420. PA size 425 indicates the physical space size in contiguous physical space 420 that corresponds to the physical space size in virtual address section 411.

When the PA size 425 is smaller than the VA size 415, the PA base address 423 and the PA top address 424 may be sent to a subsequent stage for command distribution operations. In other words, when PA size 425 is less than VA size 415, a portion of LLC data based on virtual address LLC commands may be distributed based on PA base address 423 and PA top address 424, which corresponds to physical space 421. The physical space 422 is independent of LLC commands based on virtual addresses.

When PA size 425 is less than VA size 415, the next round of cycling will occur. In the next round, VA base address 433 of virtual address segment 430 is calculated based on VA base address 413. VA base address 433 may be equal to VA base address 413 plus PA size 425 plus 1. VA size 435 of virtual address section 430 may be calculated based on VA size 415 and PA size 425. VA dimension 435 may be equal to VA dimension 415 minus PA dimension 425. VA top address 434 of virtual address section 430 is equal to VA top address 414. In some embodiments, VA size 435 may also be calculated based on VA top address 434 and VA base address 433.VA size 435 of virtual address section 430 may indicate the number of pages or the number of large pages in virtual address section 430.

The memory management unit may query the corresponding page table or the corresponding directory table according to the VA base address 433 to obtain the PA base address 443. The memory management unit may query the corresponding page table or the corresponding directory table according to the VA base address 433, and obtain segment information and level information in the hit page table entry or directory entry. The size of the continuous physical space 440 in which the PA base address 443 is located can be obtained from the segment information and the hierarchy information obtained in the page table entry or the directory entry.

From the segmentation information, the hierarchy information, and the PA base address 443, the PA top address 444 of the end of the contiguous physical space 440 may be calculated. PA size 445 may also be calculated based on PA base address 443 and PA top address 444. The size of continuous physical space 440 may also be equal to PA size 445.PA size 445 also indicates the corresponding physical space size in contiguous physical space 440 and virtual address section 431.

When PA size 445 is smaller than VA size 435, PA base address 443 and PA top address 444 may be sent to a subsequent stage for command distribution operations. In other words, when PA size 445 is smaller than VA size 435, a portion of LLC data based on virtual address LLC command may be distributed based on PA base address 443 and PA top address 444, which corresponds to physical space 441.

When PA size 445 is less than VA size 435, the next round of cycling will occur. In the next round, VA base address 453 of virtual address section 450 is calculated based on VA base address 453.VA base address 453 may be equal to VA base address 433 plus PA size 445 plus 1. VA size 455 for virtual address section 450 may be calculated based on VA size 435 and PA size 445. VA dimension 455 may be equal to VA dimension 435 minus PA dimension 445. VA top address 454 of virtual address section 450 is equal to VA top address 434. In some embodiments, VA size 455 may also be calculated based on VA top address 454 and VA base address 453. The VA size 455 of the virtual address section 450 may indicate the number of pages or the number of large pages in the virtual address section 450.

The memory management unit may look up the corresponding page table or the corresponding directory table according to the VA base address 453 to obtain the PA base address 463. The memory management unit may query the corresponding page table or the corresponding directory table according to the VA base address 453 to obtain the segment information and the level information in the hit page table entry or directory entry. The size of the contiguous physical space 460 in which the PA base address 463 is located may be derived from the segment information and hierarchy information obtained in the page table entry or directory table entry.

From the segmentation information, the hierarchy information, and the PA base address 463, the PA top address 464 of the end of the contiguous physical space 460 may be calculated. From the PA base address 463 and the PA top address 464, PA size 465 may be calculated.

When PA size 465 is greater than or equal to VA size 455, end PA top address 466 may be calculated based on PA base address 463 and VA size 455. The end PA top address 466 may be equal to the PA base address 463 plus VA size 455. The PA base address 463 and the end PA top address 466 may be sent to a subsequent stage for command distribution operations. In other words, when the PA size 465 is greater than or equal to the VA size 455, a portion of the LLC data based on the virtual address LLC command, which corresponds to the physical space 461, may be distributed based on the PA base address 463 and the end PA top address 466. The physical spaces 462 and 467 are independent of virtual address based LLC commands.

For VA-based commands, a command disassembly operation may be performed first to obtain one continuous physical space (or continuous PA address segment). Data is then written to the cache based on the contiguous physical space (or contiguous PA address segments), e.g., commands of the contiguous physical space (or contiguous PA address segments) are entered into the cache. By utilizing the continuous physical space information obtained according to the segmentation and/or large page information of the page table, the minimum number of physical address commands or data can be disassembled in the command disassembly flow. The command distribution operation may be decoupled from the command de-assignment operation. The command distribution operation and the command disassembly operation can be independently executed in each bit. The command distribution operation and the command disassembly operation may be performed in parallel. Therefore, the computing system and the related method can accelerate the disassembly flow, the distribution flow and the execution flow of the cache command so as to improve the hardware utilization rate of the computing system. In addition, if the command is based on PA, the command distribution operation may be directly performed without performing the command disassembly operation.

FIG. 5 is a schematic diagram of a command distribution method according to an embodiment of the present application. In an embodiment of the present application, the physical address may be 48 bits in length. Fig. 5 includes LLC core 510, LLC core 511, LLC core 512, LLC core 513.LLC core 510 includes channel 520, channel 521, channel 522, and channel 523.LLC core 511 includes channel 524, channel 525, channel 526, and channel 527.LLC core 512 includes channel 528, channel 529, channel 530, and channel 531.LLC core 513 includes channels 532, 533, 534, and 535. Each of the 16 channels in FIG. 5 includes row R0, row R1, row R2, row R3 … …. Each channel in fig. 5 may include one or more rows.

The capacity of a column (e.g., column R0) of one of LLC core 510, LLC core 511, LLC core 512, LLC core 513 may be 1KB. Each of the LLC cores 510, 511, 512, 513 is interleaved in a physical space of 1KB. Each of LLC core 510, LLC core 511, LLC core 512, LLC core 513 includes four channels. The capacity of a column of channels (e.g., column R0 of channels 521) may be 256B. The 16 channels in fig. 5 are interleaved in the physical space of 256B.

According to an embodiment of the application, the 10 th and 11 th bits (e.g., PA [11:10 ]) in the physical address may be identifiers indicating four LLC cores. The 10 th and 11 th bits (e.g., PA [11:10 ]) in the physical address may indicate to which LLC core the corresponding data or command should be dispatched.

According to embodiments of the present application, the 8 th and 9 th bits (e.g., PA [9:8 ]) in the physical address may be identifiers indicating four lanes in an LLC core. The 8 th and 9 th bits (e.g., PA [9:8 ]) in the physical address may indicate which channel of a given LLC core the corresponding data or command should be distributed to.

According to a physical embodiment of the present application, the 12 th to 47 th bits (e.g., PA [47:12 ]) in the physical address may be an identifier for indicating a row. The 12 th to 47 th bits (e.g., PA [47:12 ]) in the physical address may indicate to which column the corresponding data or command should be dispatched.

Position 591 in fig. 5 may be a PA base address for one contiguous physical space (or contiguous PA address segment). Because the PA addresses of a contiguous physical space (or contiguous PA address segment) are also contiguous, the corresponding data or commands can be sequentially distributed according to the 8 th to 47 th bits (e.g., PA [47:8 ]) in the PA address. According to an embodiment of the present application, corresponding data or commands may be sequentially distributed from left to right, from bottom to top, starting at location 591. Position 592 in fig. 5 may be a PA top address of the contiguous physical space (or contiguous PA address segment). The corresponding data or commands for the consecutive physical space (or consecutive PA address segments) are sequentially distributed to locations 592.

According to another embodiment of the present application, the channel identifiers of 16 channels (channel 520, channel 521, channel 522, channel 523, channel 524, channel 525, channel 526, channel 527, channel 528, channel 529, channel 530, channel 531, channel 532, channel 533, channel 534, and channel 535) may be numbered sequentially from 0 to 15.

In some embodiments, i=0 to 15 may be set as the channel ID, and then the command base address of the channel i may be set as base_per_ch [ i ], and the command top address of the channel i is top_per_ch [ i ]. Taking FIG. 5 as an example, base_per_ch4 is the address of row R0 of channel 523, and top_per_ch4 is the address of row R3 of channel 523. The channel ID may be 0 to 15, and bits 8 to 11 (e.g., PA [11:8 ]) in the physical address may be used to indicate the channel ID. In some embodiments, the start channel ID may be set to start_ch_id, the PA base address may be set to pa_base, and the start_ch_id may be the 8 th to 11 th bits in pa_base, then start_ch_id=pa_base [11:8]. The end channel ID may be set to end_ch_id, the PA top address may be set to pa_top, and end_ch_id may be the 8 th to 11 th bits in pa_top, end_ch_id=pa_top [11:8].

If i < start_ch_id, base_per_ch [ i ] =pa_base [47:12] +1. Otherwise base_per_ch [ i ] =pa_base [47:12]. Taking fig. 5 as an example, start_ch_id is 1 (i.e., channel 521); when i=0 (i.e., channel 520), base_per_ch0 ] =pa_base [47:12] +1, where pa_base [47:12] points to column R0 (i.e., pa_base [47:12 ]), and pa_base [47:12] +1 points to column R1 (i.e., pa_base [47:12] +1); when i=2 (i.e., channel 522), base_per_ch2=pa_base [47:12], where pa_base [47:12] points to column R0 (i.e., pa_base [47:12 ]). Thus, taking FIG. 5 as an example, on a channel with a channel ID of 0 (i.e., channel 520), data or commands are filled in from column R1; on the channel with channel ID 2 (i.e., channel 522), the data or command is filled in from column R0.

If i > end_ch_id, top_per_ch [ i ] =pa_top [47:12] -1. Otherwise top_per_ch [ i ] =pa_top [47:12]. Taking fig. 5 as an example, end_ch_id is 10 (i.e., channel 530); when i=11 (i.e., channel 531), top_per_ch [11] =pa_top [47:12] -1, where pa_top [47:12] points to column R3 (i.e., pa_top [47:12 ]), and pa_base [47:12] -1 points to column R2 (i.e., pa_top [47:12] -1); when i=9 (i.e., channel 529), top_per_ch [9] =pa_top [47:12], where pa_base [47:12] points to column R3 (i.e., pa_top [47:12 ]). Thus, taking FIG. 5 as an example, on a channel with a channel ID of 11 (i.e., channel 531), data or commands are at most filled into column R2; on a lane with a lane ID of 9 (i.e., lane 529), data or commands fill up to column R3 only.

To balance the load between the LLC cores and/or channels, the LLC may support determining the LLC core ID and channel ID using hash values of more physical address bits. The memory management unit also supports determining the LLC core ID and channel ID using hash values of more physical address bits.

Fig. 6A and 6B are schematic diagrams illustrating another embodiment of the command distribution method according to the present application. Each of fig. 6A and 6B includes an LLC core 610, an LLC core 620, an LLC core 630, and an LLC core 640. Each of FIGS. 6A and 6B includes a plurality of columns including column R0, column R1, column R2, column R3, and the like. To more simply illustrate the present application, each LLC core of FIGS. 6A and 6B includes only one channel, with each of LLC core 610, LLC core 620, LLC core 630, and LLC core 640 being interleaved in 1KB of physical space. LLC command address ranges for each of LLC core 610, LLC core 620, LLC core 630, and LLC core 640 are 1KB aligned. The capacity of a column of an LLC core (e.g., column R0 of LLC core 610) may be 1KB. Because the LLC command address range is 1KB aligned, the intra-page address is the lowest 10 bits (i.e., bits 0 through 9) of the physical address, then the bits used to determine the LLC core ID and/or channel ID may begin with bit 10 of the physical address.

Referring to FIG. 6A, on a 4KB aligned physical address segment, column R0 contains both the LLCs that are not covered and the LLCs that are covered. The command base address for LLC core ID i in FIG. 6A may be set to base_per_llc [ i ]. In column R0 of fig. 6A, if the LLC core with LLC core ID i is not covered, then base_per_llc [ i ] =pa_base [47:12] +1. In column R0 of fig. 6A, if there is an LLC core with LLC core ID i covered, base_per_llc [ i ] =pa_base [47:12]. Referring to fig. 6A, when i equals 0 (i.e., LLC core 610), LLC core 610 in column R0 is not covered, then base_per_llc [0] =pa_base [47:12] +1, where pa_base [47:12] points to R0 (i.e., pa_base [47:12] =0), and pa_base [47:12] +1 points to R1 (i.e., pa_base [47:12] +1). When i is equal to 1 (i.e., LLC core 611), LLC core 611 is covered in column R0, then base_per_llc [1] = pa_base [47:12], where pa_base [47:12] points to R0 (i.e., pa_base [47:12 ]). Thus, taking the example of FIG. 6A, on a lane with LLC core ID 0 (i.e., LLC core 610), data or commands are filled in from column R1; on the channel with LLC core ID 1 (i.e., LLC core 611), the data or command is filled in from column R0.

Referring to FIG. 6A, on a 4KB aligned physical address segment, column R3 contains both the LLCs that are not covered and the LLCs that are covered. The command top address with LLC core ID i in FIG. 6A may be set to top_per_llc [ i ]. In column R3 of fig. 6A, if the LLC core with LLC core ID i is not covered, top_per_llc [ i ] =pa_top [47:12] -1. In column R0 of fig. 6A, if there is an LLC core with LLC core ID i covered, top_per_llc [ i ] =pa_top [47:12]. Referring to FIG. 6A, when i is equal to 2 (i.e., LLC core 610), LLC core 612 is not covered in column R3, then top_per_llc [2] = pa_top [47:12] -1, where pa_top [47:12] points to R3 (i.e., pa_top [47:12 ]), and pa_top [47:12] -1 points to R2 (i.e., pa_top [47:12] -1). When i is equal to 1 (i.e., LLC core 611), LLC core 611 is covered in column R3, then top_per_llc [1] = pa_top [47:12], where pa_base [47:12] points to R3 (i.e., pa_top [47:12 ]). Thus, taking the example of FIG. 6A, on a lane with LLC core ID 2 (i.e., LLC core 610), data or commands are at most stuffed into column R2; on the channel with LLC core ID 1 (i.e., LLC core 611), data or commands are stuffed into R3.

The hash function may be calculated using all bits above bit 10 or may be configured according to program characteristics, in some embodiments, the hash function for the LLC core ID may be designed as follows:

Where llc_id [0] is bit 0 of the LLC core ID, llc_id [1] is bit 1 of the LLC core ID, Is an Exclusive or (XOR) operation. The hash function is designed to determine the LLC core ID based on the high bits of the physical segment.

Referring to FIG. 6A, prior to computation via the hash function, data 604 is located at column R3 of LLC core 610 and data 605 is located at column R3 of LLC core 611. Referring to FIG. 6B, in some embodiments, after calculation via a hash function, data 604 is located at column R3 of LLC core 612 and data 605 is located at column R3 of LLC core 610.

Referring to FIG. 6A, prior to computation via the hash function, data 601 is located in row R0 of LLC core 611, data 602 is located in row R0 of LLC core 612, and data 603 is located in row R0 of LLC core 613. Referring to FIG. 6B, in some embodiments, after calculation via a hash function, data 601 is located in row R0 of LLC core 613, data 602 is located in row R0 of LLC core 610, and data 603 is located in row R0 of LLC core 611.

Referring to fig. 6A and 6B, the LLC cores of the intermediate address segments (i.e., columns R1 and R2) are all covered. Therefore, even after hashing the 1KB address segments of columns R1 and R2, the LLC cores of columns R1 and R2 are still covered. In some embodiments, if the intermediate address segment has both covered and uncovered LLCs, then a hash function is used to determine the LLC core ID based on the high bits of the physical segment, yet effectively change both covered and uncovered LLCs.

By determining the LLC core ID and/or channel ID based on the high bits of the physical address, the load between each LLC core and each channel may be balanced. In some embodiments, frequent accesses to particular LLC cores or channels due to programmer errors may be mitigated.

According to some embodiments of the application, the memory management unit may support feedback control for the LLC. For example, to reduce the impact of command execution on LLC performance, embodiments of the present application may enable LLC to execute multiple physical address commands simultaneously. According to some embodiments of the application, the memory management unit may send an end flag (e.g., last tag) to the LLC after the entire command is disassembled and sent. The memory management unit may only send last tags to the LLC that sent the command. Thus, for other LLCs, the memory management unit may not expect to receive a feedback signal from the LLC in response to the last tag. The LLC receives the end flag and the LLC may send a feedback signal to the memory management unit after all commands have been executed. After receiving the feedback signal from the LLC, the memory management unit may send another feedback signal to the upstream module (e.g., the compute core). Therefore, the application can reduce unnecessary waiting time of the memory management unit and the upstream module.

In some embodiments, when the de-assembly is completed based on the virtual address LLC command and the distribution of the data based on the virtual address LLC command is completed, the memory management unit may communicate an end flag to a set of LLC sets, wherein LLC data based on the virtual address LLC command is distributed to each of the set of LLC cores. After each of the set of LLC cores completes the associated command, the respective one of the set of LLC cores may send a feedback signal to the memory management unit. Upon receiving feedback signals from one or more of the set of LLC cores, the memory management unit may transmit another feedback signal to an upstream module (e.g., a compute core). In some embodiments, the memory management unit transmits another feedback signal to the upstream module (e.g., the compute core) upon receipt of a feedback signal from each of the set of LLC cores, wherein the other feedback signal may indicate that the LLC command has completed execution based on the virtual address. Therefore, the application can reduce unnecessary waiting time of the memory management unit and the upstream module.

According to some embodiments of the application, a memory management unit may support exception handling for virtual address LLC based commands. When the memory management unit detects that an abnormality occurs in the LLC command based on the virtual address, the memory management unit may interrupt the disassembly of the current command and send an end flag to the LLC (or LLC core), and the memory management unit may send a corresponding feedback signal to an upstream module (e.g., a compute core). The corresponding feedback signal may indicate: the memory management unit generates or returns any exceptions (e.g., ECC errors or address decoding errors, etc.) during address translation; and the address translation result of the memory management unit points to a physical address other than LLC (e.g., to memory).

According to some embodiments of the application, a feedback signal may be sent to the memory management unit after the LLC (or LLC core) has fully executed the received command. Based on the feedback signal from the LLC (or LLC core) and the detected command exception, the memory management unit may send a corresponding feedback signal to an upstream module (e.g., a compute core). The corresponding feedback signal may indicate: the memory management unit generates or returns any exceptions (e.g., ECC errors or address decoding errors, etc.) during address translation; and the address translation result of the memory management unit points to a physical address other than LLC (e.g., to memory). Therefore, the application can accurately detect related abnormality and effectively reduce punishment time.

According to the embodiments of the present application, the memory management unit disclosed herein may support physical address cache commands based on virtual address ranges. The memory management unit according to the present application can disassemble a minimum number of physical address commands based on fragment (LARGE PAGE) and/or big page (LARGE PAGE) information in the page table. The memory management unit disclosed by the application can decouple the processes of command disassembly, command distribution, LLC execution command and the like, and can execute the processes of command disassembly, command distribution, LLC execution command and the like in parallel so as to accelerate the processes. The memory management unit disclosed by the application can further support command distribution by using hash, wherein the distributed LLC core ID and/or LLC channel ID are obtained by using hash. The memory management unit disclosed by the application can support related exception handling.

FIG. 7 is a schematic diagram of an embodiment of the present application for processing virtual address last-level-cache (LLC) based commands. The method 700 in fig. 7 may include operations 701 and 703. In operation 701, the memory management unit may obtain a first continuous physical space size and corresponding physical address (PHYSICAL ADDRESS, PA) information according to Virtual Address (VA) information of an LLC command. The LLC command is a virtual address based LLC command.

In operation 703, the memory management unit may distribute LLC data associated with the LLC command based on the physical address information. LLC data associated with LLC commands may be distributed into one or more LLC cores. LLC data associated with LLC commands may be distributed into one or more channels of one or more LLC cores.

The foregoing outlines structures, flows, or methods of several embodiments so that those skilled in the art may better understand aspects of the disclosure. Those skilled in the art will appreciate that the present disclosure may be readily utilized as a basis for designing or modifying other structures, processes, or methods for carrying out the same purposes and/or achieving the same advantages of the embodiments introduced herein. Those skilled in the art should also realize that such equivalent constructions, procedures, or methods do not depart from the spirit and scope of the present disclosure, and that they may make various changes, substitutions, and alterations herein without departing from the spirit and scope of the present disclosure.

Claims

1. A method of processing virtual address-based last level cache commands, comprising:

Acquiring a first continuous physical space size and corresponding physical address information based on virtual address information of the virtual address LLC command;

The LLC data based on virtual address LLC commands is distributed to at least one LLC core based on the PA information.

2. The method as recited in claim 1, further comprising:

Acquiring a first VA base address and a first VA top address based on the VA information based on the virtual address LLC command;

Calculating a first VA size based on the first VA base address and the first VA top address;

Acquiring a first PA base address and the first continuous physical space size based on the first VA base address;

Calculating a first PA top address and a first PA size based on the first PA base address and the first contiguous physical space size;

Distributing a first portion of the LLC data to the at least one LLC core based on the first PA-base address and the first PA-top address when the first PA size is smaller than the first VA size;

When the first PA size is not less than the first VA size, an ending PA top address is equal to the first PA base address plus the first VA size, and the LLC data is distributed to the at least one LLC core based on the first PA base address and the ending PA top address.

3. The method of claim 2, wherein the first contiguous physical space size of the PA base address is determined based on a fragment attribute in a respective page table entry and based on a respective page directory entry.

4. The method as recited in claim 2, further comprising:

When the first PA size is smaller than the first VA size, the second VA base address is the first VA base address plus a first PA size and 1, and the second VA size is the first VA size minus the first PA size.

5. The method as recited in claim 4, further comprising:

Acquiring a second PA base address and a second continuous physical space size based on the second VA base address;

Calculating a second PA top address and a second PA size based on the second PA base address and the second continuous physical space size;

Distributing a second portion of the LLC data to the at least one LLC core based on the second PA-base address and the second PA-top address when the second PA-size is smaller than the second VA-size;

When the second PA size is not less than the second VA size, the ending PA top address is equal to the second PA base address plus the second VA size, and a third portion of the LLC data is distributed to the at least one LLC core based on the second PA base address and the ending PA top address, wherein the LLC data consists of the first and third portions of the LLC data.

6. The method as recited in claim 2, further comprising:

Distributing a fourth portion of the LLC data to one of the at least one LLC core based on a first portion of a PA address;

The fourth portion of the LLC data is distributed to one of at least one channel of the respective LLC core based on a second portion of the PA address.

7. The method as recited in claim 6, further comprising:

Distributing the fourth portion of the LLC data to one of the at least one LLC cores based on the first and third portions of the PA address; and

The fourth portion of the LLC data is distributed to one of at least one channel of the respective LLC core based on the second portion and the third portion of the PA address.

8. The method as recited in claim 1, further comprising:

when the LLC data based on the virtual address LLC command completes distribution, a first end flag is transmitted to a first LLC core set, the LLC data based on the virtual address LLC command being distributed to each of the first LLC core sets.

9. The method as recited in claim 8, further comprising:

receive a first feedback signal from each of the first set of LLC cores; and

A second feedback signal is transmitted.

10. The method of claim 9, wherein the second feedback signal is indicative of one of:

completion of execution of the virtual address based LLC command, VA information transition exception, or PA information is not directed to the LLC.

11. A computing system, comprising:

A computing core;

at least one last level cache core communicatively coupled to the compute core; and

A memory management unit communicatively coupled to the computing core and the at least one LLC core,

Wherein the memory management unit is configured to:

receiving a virtual address LLC command;

generating at least one physical address LLC command; and

Sending the at least one physical address based LLC command to the at least one LLC core;

Wherein the at least one LLC core is configured to:

The virtual address LLC command based LLC data is distributed to the at least one LLC core based on the at least one physical address LLC command.

12. The computing system of claim 11, wherein the memory management unit is configured to:

Acquiring a first VA base address and a first VA top address based on the VA address information based on the virtual address LLC command;

13. The computing system of claim 12, wherein the memory management unit is configured to:

determining the first contiguous physical space size of the PA base address based on a fragment attribute in a respective page table entry; and

The first contiguous physical space size of the PA base address is determined based on a respective page directory entry.

14. The computing system of claim 12, wherein the memory management unit is configured to:

15. The computing system of claim 14, wherein the memory management unit is configured to:

calculating a second VA top address based on the second VA base address and the second VA size;

16. The computing system of claim 12, wherein:

the memory management unit is configured to perform the following operations:

distributing a fourth portion of the LLC data to one of the at least one LLC core based on a first portion of a PA address of the at least one physical address based LLC command;

The at least one LLC core is configured to:

the fourth portion of the LLC data is distributed to one of at least one channel of the respective core based on a second portion of the PA address.

17. The computing system of claim 16, wherein:

the memory management unit is configured to perform the following operations:

The at least one LLC core is configured to:

18. The computing system of claim 11, wherein the memory management unit is configured to:

When the at least one physical address based LLC command completes sending, a first end flag is communicated to a first set of LLC cores, the at least one physical address based LLC command being distributed to each of the first set of LLC cores.

19. The computing system of claim 18, wherein the memory management unit is configured to:

receive a first feedback signal from each of the first set of LLC cores; and

A second feedback signal is transmitted.

20. The computing system of claim 19, wherein the second feedback signal is indicative of one of:

21. A chip, characterized in that: a computing system comprising any of claims 11 to 20.