WO2018166337A1 - Data processing method and device - Google Patents
Data processing method and device Download PDFInfo
- Publication number
- WO2018166337A1 WO2018166337A1 PCT/CN2018/077026 CN2018077026W WO2018166337A1 WO 2018166337 A1 WO2018166337 A1 WO 2018166337A1 CN 2018077026 W CN2018077026 W CN 2018077026W WO 2018166337 A1 WO2018166337 A1 WO 2018166337A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- cache
- address
- external storage
- module
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
Definitions
- the present application relates to the field of computer technology, and in particular, to a data processing method and apparatus.
- the hardware system of a computer device needs to support both the big end data format and the little end data format.
- the processor instruction pipeline in the computer device often only supports one data format, for example, only supports the little end data format or only supports the big end data format.
- the data read by the computer device from the cache is sent to the processor instruction pipeline.
- the data format in the external storage operation is consistent with the data format supported by the processor instruction pipeline (Endianness match)
- the data format in the external storage work is inconsistent with the data format supported by the processor instruction pipeline (Endianness Mismatch)
- the data read from the Cache needs to be right-shifted by Data Alignment and size.
- Load Hit Path such as Endian Conversion and Sign Extension, can be sent to the processor instruction pipeline for subsequent processing.
- the embodiment of the present application provides a data processing method and apparatus, so as to at least reduce a load hit delay when data formats of different sizes are inconsistent.
- the embodiment of the present application provides the following technical solutions:
- a data processing method comprising: acquiring a read instruction sent by a processor instruction pipeline, where the read instruction includes address information of an first data to be read in an external storage of a cache cache, where The read/write width of the Cache is 2P bytes, and the number of bytes of the first data is K ⁇ P, and both K and P are positive integers; when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline Determining address information of the second data in the Cache according to the address information of the first data in the external storage, where the second data is data corresponding to the first data in the third data, the third The data is data obtained by converting the cache block cacheline containing the first data into a size-end format, and the size of the cacheline is ⁇ 2P; according to the address information of the second data in the Cache, 2P is read from the cache.
- ⁇ indicates negation
- n log 2 P
- Address[n: 0] indicates the value of the low (n+1)-bit address of the first address of the first data in the external storage of the Cache;
- the second data is sent to the processor instruction pipeline.
- the data read from the Cache is the data converted by the format of the large and small end, and thus is in the Cache.
- the data is read, there is no need to perform the format conversion of the large and small end, thereby avoiding the load hit delay caused by the data conversion processing logic of the large and small end data introduced by the data format inconsistent after the data is read from the Cache in the prior art.
- the added problem reduces the load hit delay when the data format of the big and small ends is inconsistent.
- the method before reading the 2P bytes of the fourth data from the Cache according to the address information in the Cache according to the second data, the method further includes: reading the inclusion from the external storage of the Cache a cacheline of the first data; converting the cacheline containing the first data to a data format of the size end, to obtain the third data; and writing the third data into the cache.
- the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the data in the external storage can be written into the Cache, and the data written in the Cache is converted by the size end format. data.
- the size-to-size format conversion is not required, thereby avoiding the data processing of the big-end data format introduced by the inconsistent data format of the large and small ends after reading data from the Cache in the prior art.
- the resulting increase in load hit latency increases the load hit latency when the data format on the big and small ends is inconsistent.
- the method further includes: when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, according to the first data
- the method before reading the 6P bytes of the sixth data from the Cache according to the address information in the external storage according to the first data, the method further includes: reading from the external storage of the Cache a cacheline containing the first data; writing the cacheline containing the first data to the cache. Based on the scheme, when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the data in the external storage can be written into the Cache.
- the method further includes: acquiring a write instruction sent by the processor instruction pipeline, where the write instruction includes an eighth data to be written and a byte number T of the eighth data, where T ⁇ P, T is an integer; when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the eighth data is rotated to the left by the third byte number to obtain the ninth data of 2P bytes.
- Index 3 ⁇ (Address [n: 0] + T - 1), Index 3 represents the third byte number; the ninth data is written into the Cache.
- the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline.
- the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline.
- an embodiment of the present application provides a data processing apparatus, which has a function of implementing behavior of a data processing apparatus in the foregoing method embodiment.
- This function can be implemented in hardware or in hardware by executing the corresponding software.
- the hardware or software includes one or more modules corresponding to the functions described above.
- an embodiment of the present application provides a data processing apparatus, including: a processor, a memory, a bus, and a communication interface; the memory is configured to store a computer execution instruction, and the processor is connected to the memory through the bus, when the data is The processor executes the computer-executable instructions stored in the memory to cause the data processing apparatus to perform the data processing method of any of the first aspects described above.
- an embodiment of the present application provides a computer readable storage medium, configured to store computer software instructions used by the data processing apparatus, when executed on a computer, to enable the computer to perform any of the foregoing first aspects.
- a data processing method configured to store computer software instructions used by the data processing apparatus, when executed on a computer, to enable the computer to perform any of the foregoing first aspects.
- an embodiment of the present application provides a computer program product comprising instructions, which when executed on a computer, enable the computer to perform the data processing method of any of the above first aspects.
- 1 is a logic block diagram of data processing when the data formats of the big and small ends are inconsistent in the prior art
- FIG. 2 is a schematic structural diagram of a hierarchical storage multi-core system to which the embodiment of the present application is applied;
- FIG. 3 is a schematic structural diagram of hardware of a data processing apparatus according to an embodiment of the present disclosure
- FIG. 4 is a schematic flowchart 1 of a data processing method according to an embodiment of the present application.
- FIG. 5 is a logic block diagram of data processing when the data formats of the big and small ends are inconsistent according to an embodiment of the present disclosure
- FIG. 6 is a second schematic flowchart of a data processing method according to an embodiment of the present disclosure.
- FIG. 7 is a schematic diagram 1 of an example of a data processing method according to an embodiment of the present disclosure.
- FIG. 8 is a schematic diagram 2 of an example of a data processing method according to an embodiment of the present disclosure.
- FIG. 9 is a schematic diagram 3 of an example of a data processing method according to an embodiment of the present disclosure.
- FIG. 10 is a schematic diagram 4 of an example of a data processing method according to an embodiment of the present disclosure.
- FIG. 11 is a schematic structural diagram 1 of a data processing apparatus according to an embodiment of the present disclosure.
- FIG. 12 is a second schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure.
- FIG. 13 is a schematic structural diagram 3 of a data processing apparatus according to an embodiment of the present disclosure.
- FIG. 2 is a schematic structural diagram of a hierarchical storage multi-core system to which the embodiment of the present application is applied.
- the multi-core system 100 includes a bus 101, a multi-core processor 102 connected to the bus 101, and a memory 103 connected to the bus 101.
- the memory 103 may be a random access memory (English: Random Access Memory, abbreviation: RAM), or a dynamic random access memory (English: Dynamic Random Access Memory, DRAM), etc. No specific limitation.
- the bus 101 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus.
- PCI Peripheral Component Interconnect
- EISA Extended Industry Standard Architecture
- the bus can be divided into an address bus, a data bus, a control bus, and the like. For convenience of representation, only one thick line is shown in Figure 2, but it does not mean that there is only one bus or one type of bus.
- the multi-core processor 102 includes a plurality of processor cores, such as a processor core 102a, a processor core 102b, ..., a processor core 102c, which may be a central processing unit (English: Central Processing Unit, abbreviation: CPU)
- the core may be a graphics processing unit (English: Graphic Processing Unit, GPU) core, which is not specifically limited in this embodiment.
- these processor cores are mainly used to perform calculations, and each processor core has its own level 1 cache (English: Level 1 Cache, abbreviation: L1C) and Level 2 Cache (abbreviation: L2C);
- the processor core shares a last level cache (English: Last Level Cache, abbreviation: LLC); multiple multi-core processors share a single memory.
- a processor core When a processor core receives an instruction to read data, first check whether the address exists in L1C. If it exists, the processor core directly reads the data from L1C. If the address does not exist, then the The processor core will continue to look into the L2C, and so on.
- the embodiment of the present application is also applicable to a single-core system or a system including a Cache having a similar hierarchical storage structure, which is not specifically limited in this embodiment of the present application.
- FIG. 3 is a schematic diagram showing the hardware structure of a data processing apparatus 30 according to an embodiment of the present application.
- the data processing device 30 includes a processor 301, a memory 302, a communication interface 304, and a bus 303.
- the processor 301, the communication interface 304, and the memory 302 are connected to one another via a bus 303.
- the processor 301 is the control center of the data processing device 30, connecting the various portions of the entire data processing device 30 via the bus 303, by running or executing software programs and/or modules stored in the memory 302, and recalling stored in the memory 302.
- the data, various functions of the data processing device 30 and processing data are executed to thereby perform overall monitoring of the data processing device 30.
- the processor 301 can be any one of the processor cores in FIG. 2 above.
- the memory 302 can be used to store software programs and modules, and the processor 301 executes various functional applications and data processing of the data processing device 30 by running software programs and modules stored in the memory 302.
- the memory 302 mainly includes a storage program area and a storage data area, wherein the storage program area can store an operating system, an application 2 required for at least one function, and the like; the storage data area can store data created according to the use of the data processing apparatus 30, and the like.
- memory 302 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
- the memory 302 can be the memory in FIG. 2 above.
- the bus 303 can be a PCI bus or an EISA bus or the like.
- the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 3, but it does not mean that there is only one bus or one type of bus.
- bus 303 may be the bus of FIG. 2 described above.
- Communication interface 304 is used for communication of data processing device 30 with external devices.
- the data processing device 30 may also include a radio frequency (English: Radio Frequency, abbreviated as RF) circuit, an audio circuit, a communication interface, and/or a plurality of sensors, which are not specifically limited in this embodiment of the present application.
- RF Radio Frequency
- the data processing device 30 provided by the embodiment of the present application may be used when the processor 301 is the processor core of the foregoing FIG. 2, the memory 302 is the memory in FIG. 2, and the bus 303 is the bus in FIG. It is the multi-core system in FIG. 2 above, and the embodiment of the present application does not specifically limit the situation.
- a schematic flowchart of a data processing method includes the following steps:
- the data processing device acquires a read instruction sent by the processor instruction pipeline, where the read instruction includes address information of the first data to be read in an external storage of the Cache.
- the read/write width of the Cache is 2P bytes, and the number of bytes of the first data is K ⁇ P, and both K and P are positive integers.
- the read/write width of the Cache is 2P bytes, which means that when the data is read from the Cache, 2P bytes of data are read each time; when the data is written into the Cache, 2P is written each time. Byte of data.
- the Cache in the step S401 may be the L1C, and the external storage of the Cache may be the L2C, the LLC or the memory, etc., in the embodiment of the present application, the data processing device in the embodiment of the present application is the multi-core system in FIG. No specific limitation.
- the data processing apparatus determines, according to the address information of the first data in the external storage, the address information of the second data in the Cache, where The data is the data corresponding to the first data in the third data, and the third data is the data obtained by converting the cache line containing the first data into a large and small format, and the size of the cacheline is ⁇ 2P.
- the data processing device reads the second data of 2 Pbytes from the Cache according to the address information of the second data in the Cache, where the fourth data includes the second data.
- the data processing device shifts the fourth data right by the first byte to obtain the fifth data, where the fifth data includes the second data, where the second data is the lower K address of the 2P byte address corresponding to the fifth data.
- Index 1 ⁇ (Address [n: 0] + K - 1)
- Index1 represents the first byte
- ⁇ represents inversion
- n log 2 P
- Address [n: 0] represents the first data in the Cache The value of the low (n+1)-bit address in the first address in the external storage.
- the data is shifted to the right to achieve data right alignment.
- the reason why the data is right-aligned is that when the processor instruction pipeline reads the data, the rightmost byte is the byte corresponding to the first address of the read instruction, so the data needs to be right before being sent to the processor instruction pipeline. Alignment, a unified description here, the details are not described below.
- the data processing device sends the second data to the processor instruction pipeline.
- the data read from the Cache is data converted by the size end format, and thus is read from the Cache. After the data is fetched, there is no need to perform a large-scale format conversion.
- the data processing logic block diagram when the data format of the big and small ends provided by the embodiment of the present application is inconsistent.
- the format conversion of the big end is located in the write channel of the cache, that is, the data is in the data.
- the size conversion is performed in the process of writing to the Cache from the external storage.
- the Cache can be directly accessed from the Cache. The data after the size conversion is read, and the data is right aligned to obtain the required data.
- symbol bit extension in FIG. 5 is an optional operation in the data processing method provided by the present application.
- the specific implementation may refer to the existing processing manner, which is not specifically limited in this embodiment of the present application.
- the first information is determined according to the address information in the external storage.
- the address information of the second data in the Cache wherein the second data is the data corresponding to the first data in the third data, and the three data is the data obtained by converting the cacheline containing the first data into a format of the size end; Reading the second data of the 2P bytes of the second data from the Cache according to the address information of the second data in the Cache; then shifting the fourth data right to the first byte to obtain the fifth data including the second data Data, and the second data is sent to the processor instruction pipeline, which is the data on the lower K-bit address of the 2P byte address corresponding to the fifth data.
- the data read from the Cache is the data converted by the format of the large and small end, and thus is in the Cache.
- the data is read, there is no need to perform the format conversion of the large and small end, thereby avoiding the load hit delay caused by the data conversion processing logic of the large and small end data introduced by the data format inconsistent after the data is read from the Cache in the prior art.
- the added problem reduces the load hit delay when the data format of the big and small ends is inconsistent.
- the data processing device may further include:
- the data processing device reads the cacheline containing the first data from the external storage of the Cache; converts the cacheline containing the first data into the data format of the large and small end to obtain the third data; and writes the third data into the Cache.
- the size of the cacheline is ⁇ 2P
- the size of the third data is the size of the cacheline
- it may be written in a plurality of write processes. For example, if the size of the cacheline is 4P, it is written in two write processes, which is not specifically limited in this embodiment of the present application.
- the data in the external storage can be written into the Cache, and the data written in the Cache is converted by the size end format. data. Furthermore, after the data is read from the Cache, the size-to-size format conversion is not required, thereby avoiding the data processing of the big-end data format introduced by the inconsistent data format of the large and small ends after reading data from the Cache in the prior art. The resulting increase in load hit latency increases the load hit latency when the data format on the big and small ends is inconsistent.
- the method may further include the following steps:
- the data processing device reads the 6P bytes of the sixth data from the Cache according to the address information of the first data in the external storage, where The sixth data includes the first data.
- the data processing device shifts the sixth data right to the second byte to obtain the seventh data.
- the seventh data includes the first data, where the first data is the lower K address of the 2P byte address corresponding to the seventh data. The data on it.
- Index2 Address[n:0]
- Index2 represents the second byte
- n log 2 P
- Address[n:0] indicates that the first data is low in the first address in the external storage of the Cache (n+1) The value of the bit address.
- the data processing device sends the first data to the processor instruction pipeline.
- the data read from the Cache is the data written to the Cache from the external storage.
- the data in the Cache can be sent to the processor instruction pipeline when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline.
- the method may further include:
- the data processing device reads the cacheline containing the first data from the external storage of the Cache; and writes the cacheline containing the first data into the Cache.
- the read/write width of the Cache is 2 Pbytes and the size of the cacheline is ⁇ 2 P
- the cacheline containing the first data is written into the Cache, it may be written multiple times.
- the process is written, for example, if the size of the cacheline is 4P, it is written in two write processes, which is not specifically limited in this embodiment of the present application.
- the data in the external storage can be written into the Cache.
- the data processing method provided by the embodiment of the present application may further include: the data processing device acquires a write instruction sent by the processor instruction pipeline, where the write instruction includes the eighth data to be written and the number of bytes of the eighth data T.
- T ⁇ P T is an integer
- the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline.
- the method may further include: when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, shifting the eighth data to the left
- the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline.
- the processor instruction pipeline only supports the little endian data format
- the external storage supports the big end data format
- the size of the cacheline is 16 bytes, for example
- the data processing device inverts the 16-byte cacheline in the external storage in bytes and stores it in the Cache, which is equivalent to storing the 16-byte big endian data in the cache in a little endian format.
- the processor instruction pipeline initiates a read instruction, the little endian data can be read directly from the Cache.
- the cacheline is reversed when the data is written to the Cache, when the data is right-aligned, the index that is rotated rightward needs to be inversely compensated.
- the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data at the 0x7, 0x8, 0x9, and 0xA addresses in the external storage, because of the data in the external storage.
- B0 is written to the 0x5 address in the Cache
- B1 is written to the 0x6 address in the Cache
- B2 is written to the 0x7 address in the Cache
- B3 is written to the 0x8 address in the Cache.
- the Cache When the processor instruction pipeline needs to read data at addresses 0x7, 0x8, 0x9, and 0xA, the Cache outputs the cacheline data as shown in mem_data_o.
- the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index.
- Index ⁇ (Address[n:0]+K-1).
- the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.
- the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data on the addresses of 0xE, 0xF, 0x10, and 0x11 in the external storage, and it is required to cross the cacheline. Since the data is stored in the big end mode in the external storage, when the data is written to the Cache, the entire cacheline needs to be inverted in bytes and then written into the Cache, as shown in FIG.
- B0 is written to the 0x1E address in the Cache
- B1 is written to the 0x1F address in the Cache
- B2 is written to the 0x0 address in the Cache
- B3 is written to the 0x1 address in the Cache.
- the data is already stored in little form in the Cache.
- the Cache outputs the cacheline data as shown in mem_data_o.
- the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index.
- Index ⁇ (Address[n:0]+K-1).
- the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.
- the data processing device writes the 16-byte cacheline in the external storage directly into the Cache.
- the processor instruction pipeline initiates a read instruction
- the little endian data can be read directly from the Cache.
- the cacheline is not reversed when the data is written to the Cache, the right alignment of the data is performed, and the index that is rotated rightward does not need to be inversely compensated.
- the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data at the 0x7, 0x8, 0x9, and 0xA addresses in the external storage, because of the data in the external storage.
- B0 is written to the 0x7 address in the Cache
- B1 is written to the 0x8 address in the Cache
- B2 is written to the 0x9 address in the Cache
- B3 is written to the 0xA address in the Cache.
- the data is still stored in little form in the Cache.
- the Cache When the processor instruction pipeline needs to read data at addresses 0x7, 0x8, 0x9, and 0xA, the Cache outputs the cacheline data as shown in mem_data_o.
- the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index.
- Index Address[n:0].
- the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.
- the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data on the addresses of 0xE, 0xF, 0x10, and 0x11 in the external storage, and cross-cacheline is required. Since the data is stored in little endian mode in the external storage, when the data is written into the Cache, the entire cacheline can be directly written into the Cache, as shown in FIG. B0 is written to the 0xE address in the Cache, B1 is written to the 0xF address in the Cache, B2 is written to the 0x10 address in the Cache, and B3 is written to the 0x11 address in the Cache.
- the Cache When the processor instruction pipeline needs to read data at addresses 0xE, 0xF, 0x10, and 0x11, the Cache outputs the cacheline data as shown in mem_data_o.
- the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index.
- Index Address[n:0].
- the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.
- the above example uses the processor instruction pipeline to support only the small end data format, and the external storage supports the small end data format as an example for description.
- the processor instruction pipeline only supports the big end data format, and the external storage supports the data format of the big end. This embodiment does not specifically limit this.
- the solution provided by the embodiment of the present application is mainly introduced from the perspective of the data processing method performed by the data processing apparatus.
- the above data processing apparatus includes a hardware structure and/or a software module corresponding to each function in order to implement the above functions.
- the present application can be implemented in a combination of hardware or hardware and computer software in combination with the elements and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.
- the embodiment of the present application may divide the function module into the data processing device according to the foregoing method example.
- each function module may be divided according to each function, or two or more functions may be integrated into one processing module.
- the above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the module in the embodiment of the present application is schematic, and is only a logical function division, and the actual implementation may have another division manner.
- FIG. 11 shows a possible structural diagram of the data processing apparatus 110 involved in the above embodiment.
- the data processing apparatus 110 includes an acquisition module 1101, a determination module 1102, a reading module 1103, a shifting module 1104, and a transmitting module 1105.
- the obtaining module 1101 is configured to support the data processing device 110 to perform step S401 shown in FIG. 4;
- the determining module 1102 is configured to support the data processing device 110 to perform step S402 shown in FIG. 4;
- the reading module 1103 is configured to support the data processing device.
- 110 executes step S403 shown in FIG. 4;
- the shifting module 1104 is configured to support the data processing apparatus 110 to perform step S404 shown in FIG. 4;
- the transmitting module 1105 is configured to support the data processing apparatus 110 to perform step S405 shown in FIG.
- the data processing apparatus 110 may further include a format conversion module 1106 and a write module 1107.
- the reading module 1103 is further configured to: before reading the second data of 2 Pbytes from the Cache according to the address information in the Cache according to the second data, reading the cacheline containing the first data from the external storage of the Cache.
- the format conversion module 1106 is configured to perform a size end data format conversion on the cacheline containing the first data to obtain third data, and a 1107 write module to write the third data into the Cache.
- the reading module 1103 is further configured to support the data processing device 110 to perform step S406 shown in FIG. 6; the shifting module 1104 is further configured to support the data processing device 110 to perform step S407 shown in FIG. 6; the sending module 1105 further It is used to support the data processing device 110 to perform step S408 shown in FIG.
- the reading module 1103 is further configured to: before the sixth data of 2 P bytes is read from the Cache in the address information in the external storage according to the first data, read from the external storage of the Cache, including the first The cacheline of the data; the writing module 1107 is further configured to write the cacheline containing the first data into the Cache.
- the obtaining module 1101 is further configured to acquire a write instruction sent by the processor instruction pipeline, where the write instruction includes the eighth data to be written and the number of bytes T of the eighth data, where T ⁇ P, T is an integer .
- the writing module 1107 is further configured to write the ninth data into the Cache.
- FIG. 13 is a schematic diagram showing a possible structure of the data processing apparatus involved in the foregoing embodiment.
- the data processing apparatus 130 includes: a processing module 1301 and a communication module 1302. .
- the processing module 1301 can be used to perform the operations that can be performed by the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, the format converting module 1106, and the writing module 1107 in FIG. 11 or FIG. 12;
- the operation of the sending module 1105 in FIG. 11 or FIG. 12 can be performed.
- FIG. 11 or FIG. 12 For details, refer to the embodiment shown in FIG. 11 or FIG. 12 , and details are not described herein again.
- the data processing device is presented in the form of dividing each functional module corresponding to each function, or the data processing device is presented in a form that divides each functional module in an integrated manner.
- a “module” herein may refer to a particular ASIC, circuitry, processor and memory that executes one or more software or firmware programs, integrated logic circuitry, and/or other devices that provide the functionality described above.
- data processing device 110 or data processing device 130 may take the form shown in FIG.
- the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, and the sending module 1105 in FIG. 11 can be implemented by the processor 301 and the memory 303 of FIG.
- the reading module 1103, the shifting module 1104, and the sending module 1105 can be executed by the processor 301 to call the application code stored in the memory 303, which is not limited in this embodiment.
- the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, the transmitting module 1105, the format converting module 1106, and the writing module 1107 in FIG. 12 may pass through the processor 301 and the memory 303 of FIG.
- the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, the sending module 1105, the format converting module 1106, and the writing module 1107 can be called by the processor 301 to store the memory stored in the memory 303.
- the application code is executed, and the embodiment of the present application does not impose any limitation on this.
- the processing module 1301 and the communication module 1302 in FIG. 13 may be implemented by the processor 301 and the memory 303 of FIG. 3.
- the processing module 1301 and the communication module 1302 may be called by the processor 301 in the memory 303.
- the stored application code is executed, and the embodiment of the present application does not impose any limitation on this.
- the data processing device provided by the embodiment of the present application can be used to perform the foregoing data processing method. Therefore, the technical effects of the present invention can be referred to the foregoing method embodiments.
- the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
- a software program it may be implemented in whole or in part in the form of a computer program product.
- the computer program product includes one or more computer instructions.
- the computer program instructions When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are generated in whole or in part.
- the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
- the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
- the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device that includes one or more servers, data centers, etc. that can be integrated with the media.
- the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a Solid State Disk (SSD)) or the like.
- a magnetic medium eg, a floppy disk, a hard disk, a magnetic tape
- an optical medium eg, a DVD
- a semiconductor medium such as a Solid State Disk (SSD)
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Advance Control (AREA)
Abstract
Disclosed in the present application are a data processing method and device, for use in reducing the load hit latency when big-endian data format and little-endian data format mismatch. The method comprises: obtaining a read instruction sent by a processor instruction pipeline, wherein the read instruction comprises address information of first data to be read in an external memory of a cache, the read/write width of the cache is 2P bytes, and the number of bytes of the first data, i.e., K, is less than or equal to P; when the data format supported by the external memory mismatches the data format supported by the processor instruction pipeline, determining address information of second data in the cache according to the address information of the first data in the external memory, wherein the second data is data in third data and corresponding to the first data, and the third data is data obtained by converting a cacheline comprising the first data between big endian format and little endian format; reading 2P bytes of fourth data from the cache according to the address information of the second data in the cache; rotating the fourth data right by a first byte to obtain fifth data; and sending the second data in the fifth data to the processor instruction pipeline.
Description
本申请要求于2017年03月16日提交中国专利局、申请号为201710157711.3、申请名称为“数据处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims the priority of the Chinese Patent Application, the entire disclosure of which is hereby incorporated by reference.
本申请涉及计算机技术领域,尤其涉及数据处理方法及装置。The present application relates to the field of computer technology, and in particular, to a data processing method and apparatus.
通常,一个计算机设备的硬件系统需要同时支持大端数据格式和小端数据格式。但是为了简化设计,目前,计算机设备中的处理器指令流水线(Instruction Pipeline)往往只支持一种数据格式,比如仅支持小端数据格式或者仅支持大端数据格式。Usually, the hardware system of a computer device needs to support both the big end data format and the little end data format. However, in order to simplify the design, at present, the processor instruction pipeline in the computer device often only supports one data format, for example, only supports the little end data format or only supports the big end data format.
这样,在计算机设备将某个缓存(Cache)的外部存储(External Memory)的数据以cacheline的形式写入该Cache中之后,在计算机设备将从该Cache中读取的数据送至处理器指令流水线处理之前,首先需要进行数据格式的匹配。当外部存储工作中的数据格式与处理器指令流水线支持的数据格式一致(Endianness match)时才能送至处理器指令流水线处理。当外部存储工作中的数据格式与处理器指令流水线支持的数据格式不一致(Endianness Mismatch)时,如图1所示,从Cache中读取的数据需要分别经过右移数据对齐(Data Alignment)、大小端数据格式转换(Endian Conversion)和符号位扩展(Sign Extension)等加载命中路径(Load Hit Path)才能送到处理器指令流水线做后续的处理,在这过程中,由于大小端数据格式不一致而引入的大小端数据格式转换处理逻辑必然导致加载命中延时(Load Hit Latency)的增加。考虑到加载命中路径往往是高速缓存设计的关键路径,因此,如何降低大小端数据格式不一致时的加载命中时延,是目前亟待解决的问题。In this way, after the computer device writes the data of the external memory of the cache to the cache in the form of a cacheline, the data read by the computer device from the cache is sent to the processor instruction pipeline. Before processing, you first need to match the data format. When the data format in the external storage operation is consistent with the data format supported by the processor instruction pipeline (Endianness match), it can be sent to the processor instruction pipeline processing. When the data format in the external storage work is inconsistent with the data format supported by the processor instruction pipeline (Endianness Mismatch), as shown in Figure 1, the data read from the Cache needs to be right-shifted by Data Alignment and size. Load Hit Path, such as Endian Conversion and Sign Extension, can be sent to the processor instruction pipeline for subsequent processing. In this process, the data format of the big and small ends is inconsistent. The size of the data format conversion processing logic will inevitably lead to an increase in the load hit latency (Load Hit Latency). Considering that the load hit path is often the key path of the cache design, how to reduce the load hit delay when the data format of the big and small ends is inconsistent is an urgent problem to be solved.
发明内容Summary of the invention
本申请实施例提供数据处理方法及装置,以至少降低大小端数据格式不一致时的加载命中时延。The embodiment of the present application provides a data processing method and apparatus, so as to at least reduce a load hit delay when data formats of different sizes are inconsistent.
为达到上述目的,本申请实施例提供如下技术方案:To achieve the above objective, the embodiment of the present application provides the following technical solutions:
第一方面,提供一种数据处理方法,该方法包括:获取处理器指令流水线发送的读指令,该读指令包括待读取的第一数据在缓存Cache的外部存储中的地址信息,其中,该Cache的读写宽度为2P字节,该第一数据的字节数K≤P,K和P均为正整数;当该外部存储支持的数据格式与该处理器指令流水线支持的数据格式不一致时,根据该第一数据在该外部存储中的地址信息,确定第二数据在该Cache中的地址信息,其中,该第二数据为第三数据中与该第一数据对应的数据,该第三数据为将包含该第一数据的高速缓存块cacheline进行大小端格式转换后得到的数据,该cacheline的大小≥ 2P;根据该第二数据在该Cache中的地址信息,从该Cache中读取2P字节的第四数据,其中,该第四数据中包含该第二数据;将该第四数据循环右移第一字节,得到第五数据,其中,该第五数据中包含该第二数据,该第二数据为该第五数据对应的2P字节地址中低K位地址上的数据,Index 1=~(Address[n:0]+K–1),Index1表示该第一字节,~表示取反,n=log
2P,Address[n:0]表示该第一数据在该Cache的外部存储中的首地址中低(n+1)位地址的值;将该第二数据发送给该处理器指令流水线。也就是说,本申请实施例中,当外部存储支持的数据格式与处理器指令流水线支持的数据格式不一致时,从Cache中读取的数据是经过大小端格式转换后的数据,因此在从Cache中读数据之后,不需要再进行大小端格式转换,从而避免了现有技术中从Cache中读数据之后,由于大小端数据格式不一致而引入的大小端数据格式转换处理逻辑导致的加载命中延时增加的问题,降低了大小端数据格式不一致时的加载命中时延。
In a first aspect, a data processing method is provided, the method comprising: acquiring a read instruction sent by a processor instruction pipeline, where the read instruction includes address information of an first data to be read in an external storage of a cache cache, where The read/write width of the Cache is 2P bytes, and the number of bytes of the first data is K≤P, and both K and P are positive integers; when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline Determining address information of the second data in the Cache according to the address information of the first data in the external storage, where the second data is data corresponding to the first data in the third data, the third The data is data obtained by converting the cache block cacheline containing the first data into a size-end format, and the size of the cacheline is ≥ 2P; according to the address information of the second data in the Cache, 2P is read from the cache. a fourth data of the byte, wherein the fourth data includes the second data; the fourth data is rotated right by the first byte to obtain a fifth data, wherein the fifth data packet The second data is data on a low-K address in a 2P byte address corresponding to the fifth data, Index 1=~(Address[n:0]+K-1), and Index1 indicates the first data. One byte, ~ indicates negation, n = log 2 P, Address[n: 0] indicates the value of the low (n+1)-bit address of the first address of the first data in the external storage of the Cache; The second data is sent to the processor instruction pipeline. That is to say, in the embodiment of the present application, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the data read from the Cache is the data converted by the format of the large and small end, and thus is in the Cache. After the data is read, there is no need to perform the format conversion of the large and small end, thereby avoiding the load hit delay caused by the data conversion processing logic of the large and small end data introduced by the data format inconsistent after the data is read from the Cache in the prior art. The added problem reduces the load hit delay when the data format of the big and small ends is inconsistent.
在一种可能的设计中,在根据该第二数据在该Cache中的地址信息,从该Cache中读取2P字节的第四数据之前,还包括:从该Cache的外部存储中读取包含该第一数据的cacheline;将该包含该第一数据的cacheline进行大小端数据格式转换,得到该第三数据;将该第三数据写入该Cache中。基于该方案,当外部存储支持的数据格式与处理器指令流水线支持的数据格式不一致时,可以将外部存储中的数据写入Cache中,并且写入Cache中的数据为经过大小端格式转换后的数据。进而,在从Cache中读取数据之后,不需要再进行大小端格式转换,从而避免了现有技术中从Cache中读数据之后,由于大小端数据格式不一致而引入的大小端数据格式转换处理逻辑导致的加载命中延时增加的问题,降低了大小端数据格式不一致时的加载命中时延。In a possible design, before reading the 2P bytes of the fourth data from the Cache according to the address information in the Cache according to the second data, the method further includes: reading the inclusion from the external storage of the Cache a cacheline of the first data; converting the cacheline containing the first data to a data format of the size end, to obtain the third data; and writing the third data into the cache. Based on the scheme, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the data in the external storage can be written into the Cache, and the data written in the Cache is converted by the size end format. data. Furthermore, after the data is read from the Cache, the size-to-size format conversion is not required, thereby avoiding the data processing of the big-end data format introduced by the inconsistent data format of the large and small ends after reading data from the Cache in the prior art. The resulting increase in load hit latency increases the load hit latency when the data format on the big and small ends is inconsistent.
在一种可能的设计中,在获取处理器指令流水线发送的读指令之后,还包括:当该外部存储支持的数据格式与该处理器指令流水线支持的数据格式一致时,根据该第一数据在该外部存储中的地址信息,从该Cache中读取2P字节的第六数据,其中,该第六数据中包含该第一数据;将该第六数据循环右移第二字节,得到第七数据,该第七数据中包含该第一数据,该第一数据为该第七数据对应的2P字节地址中低K位地址上的数据,其中,Index2=Address[n:0],Index2表示该第二字节,n=log
2P,Address[n:0]表示该第一数据在该Cache的外部存储中的首地址中低(n+1)位地址的值;将该第一数据发送给该处理器指令流水线。基于该方案,可以在外部存储支持的数据格式与处理器指令流水线支持的数据格式一致时,将Cache中的数据发送给处理器指令流水线。
In a possible design, after acquiring the read instruction sent by the processor instruction pipeline, the method further includes: when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, according to the first data The address information in the external storage reads the 6P bytes of the sixth data from the Cache, wherein the sixth data includes the first data; and the sixth data is rotated right to the second byte to obtain the first Seven data, the seventh data includes the first data, the first data is data on a low K bit address in a 2P byte address corresponding to the seventh data, where Index2=Address[n:0], Index2 Representing the second byte, n=log 2 P, Address[n:0] indicating the value of the low (n+1)-bit address of the first address of the first data in the external storage of the Cache; Data is sent to the processor instruction pipeline. Based on the scheme, the data in the Cache can be sent to the processor instruction pipeline when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline.
在一种可能的设计中,在根据该第一数据在该外部存储中的地址信息,从该Cache中读取2P字节的第六数据之前,还包括:从该Cache的外部存储中读取包含该第一数据的cacheline;将该包含该第一数据的cacheline写入该Cache中。基于该方案,当外部存储支持的数据格式与处理器指令流水线支持的数据格式一致时,可以将外部存储中的数据写入Cache中。In a possible design, before reading the 6P bytes of the sixth data from the Cache according to the address information in the external storage according to the first data, the method further includes: reading from the external storage of the Cache a cacheline containing the first data; writing the cacheline containing the first data to the cache. Based on the scheme, when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the data in the external storage can be written into the Cache.
在一种可能的设计中,该方法还包括:获取该处理器指令流水线发送的写指令,该写指令包括待写入的第八数据和该第八数据的字节数T,其中,T≤P,T为整数;当该外部存储支持的数据格式与该处理器指令流水线支持的数据格式不一致时,将该第八数据循环左移第三字节数,得到2P字节的第九数据,其中,Index 3=~(Address[n:0] +T–1),Index3表示该第三字节数;将该第九数据写入该Cache中。基于该方案,可以在外部存储支持的数据格式与处理器指令流水线支持的数据格式不一致时,将处理器指令流水线中的数据写入Cache中。In a possible design, the method further includes: acquiring a write instruction sent by the processor instruction pipeline, where the write instruction includes an eighth data to be written and a byte number T of the eighth data, where T≤ P, T is an integer; when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the eighth data is rotated to the left by the third byte number to obtain the ninth data of 2P bytes. Where Index 3 = ~ (Address [n: 0] + T - 1), Index 3 represents the third byte number; the ninth data is written into the Cache. Based on the scheme, the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline.
在一种可能的设计中,在获取该处理器指令流水线发送的写指令之后,还包括:当该外部存储支持的数据格式与该处理器指令流水线支持的数据格式一致时,将该第八数据循环左移第四字节数,得到2P字节的第十数据,其中,Index 4=(Address[n:0]),Index4表示该第四字节数;将该第十数据写入该Cache中。基于该方案,可以在外部存储支持的数据格式与处理器指令流水线支持的数据格式一致时,将处理器指令流水线中的数据写入Cache中。In a possible design, after acquiring the write instruction sent by the processor instruction pipeline, the method further includes: when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the eighth data is Looping the fourth byte number to the left to obtain the 10th data of 2P bytes, where Index 4=(Address[n:0]), Index4 indicates the fourth byte number; writing the tenth data to the Cache in. Based on the scheme, the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline.
第二方面,本申请实施例提供一种数据处理装置,该数据处理装置具有实现上述方法实施例中数据处理装置行为的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。In a second aspect, an embodiment of the present application provides a data processing apparatus, which has a function of implementing behavior of a data processing apparatus in the foregoing method embodiment. This function can be implemented in hardware or in hardware by executing the corresponding software. The hardware or software includes one or more modules corresponding to the functions described above.
第三方面,本申请实施例提供一种数据处理装置,包括:处理器、存储器、总线和通信接口;该存储器用于存储计算机执行指令,该处理器与该存储器通过该总线连接,当该数据处理装置运行时,该处理器执行该存储器存储的该计算机执行指令,以使该数据处理装置执行如上述第一方面任意一项的数据处理方法。In a third aspect, an embodiment of the present application provides a data processing apparatus, including: a processor, a memory, a bus, and a communication interface; the memory is configured to store a computer execution instruction, and the processor is connected to the memory through the bus, when the data is The processor executes the computer-executable instructions stored in the memory to cause the data processing apparatus to perform the data processing method of any of the first aspects described above.
第四方面,本申请实施例提供了一种计算机可读存储介质,用于储存为上述数据处理装置所用的计算机软件指令,当其在计算机上运行时,使得计算机可以执行上述第一方面中任意一项的数据处理方法。In a fourth aspect, an embodiment of the present application provides a computer readable storage medium, configured to store computer software instructions used by the data processing apparatus, when executed on a computer, to enable the computer to perform any of the foregoing first aspects. A data processing method.
第五方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机可以执行上述第一方面中任意一项的数据处理方法。In a fifth aspect, an embodiment of the present application provides a computer program product comprising instructions, which when executed on a computer, enable the computer to perform the data processing method of any of the above first aspects.
其中,第二方面至第五方面中任一种设计方式所带来的技术效果可参见第一方面中不同设计方式所带来的技术效果,此处不再赘述。For the technical effects brought by any one of the second aspect to the fifth aspect, refer to the technical effects brought by different design modes in the first aspect, and details are not described herein again.
图1为现有技术中大小端数据格式不一致时的数据处理逻辑框图;1 is a logic block diagram of data processing when the data formats of the big and small ends are inconsistent in the prior art;
图2为本申请实施例所适用的一个分层存储的多核系统的架构示意图;2 is a schematic structural diagram of a hierarchical storage multi-core system to which the embodiment of the present application is applied;
图3为本申请实施例提供的一种数据处理装置的硬件结构示意图;FIG. 3 is a schematic structural diagram of hardware of a data processing apparatus according to an embodiment of the present disclosure;
图4为本申请实施例提供的数据处理方法的流程示意图一;4 is a schematic flowchart 1 of a data processing method according to an embodiment of the present application;
图5为本申请实施例提供的大小端数据格式不一致时的数据处理逻辑框图;FIG. 5 is a logic block diagram of data processing when the data formats of the big and small ends are inconsistent according to an embodiment of the present disclosure;
图6为本申请实施例提供的数据处理方法的流程示意图二;FIG. 6 is a second schematic flowchart of a data processing method according to an embodiment of the present disclosure;
图7为本申请实施例提供的数据处理方法的示例示意图一;FIG. 7 is a schematic diagram 1 of an example of a data processing method according to an embodiment of the present disclosure;
图8为本申请实施例提供的数据处理方法的示例示意图二;FIG. 8 is a schematic diagram 2 of an example of a data processing method according to an embodiment of the present disclosure;
图9为本申请实施例提供的数据处理方法的示例示意图三;FIG. 9 is a schematic diagram 3 of an example of a data processing method according to an embodiment of the present disclosure;
图10为本申请实施例提供的数据处理方法的示例示意图四;FIG. 10 is a schematic diagram 4 of an example of a data processing method according to an embodiment of the present disclosure;
图11为本申请实施例提供的数据处理装置的结构示意图一;FIG. 11 is a schematic structural diagram 1 of a data processing apparatus according to an embodiment of the present disclosure;
图12为本申请实施例提供的数据处理装置的结构示意图二;FIG. 12 is a second schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure;
图13为本申请实施例提供的数据处理装置的结构示意图三。FIG. 13 is a schematic structural diagram 3 of a data processing apparatus according to an embodiment of the present disclosure.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,在本申请的描述中,“多个”是指两个或多于两个。The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. In the description of the present application, unless otherwise stated, "/" means the meaning of or, for example, A/B may represent A or B; "and/or" herein is merely an association describing the associated object. The relationship indicates that there may be three kinds of relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately. In addition, in the description of the present application, "a plurality" means two or more than two.
图2为本申请实施例所适用的一个分层存储的多核系统的架构示意图。如图2所示,该多核系统100包括:总线101、与总线101相连的多核处理器102、以及与总线101相连的存储器103。FIG. 2 is a schematic structural diagram of a hierarchical storage multi-core system to which the embodiment of the present application is applied. As shown in FIG. 2, the multi-core system 100 includes a bus 101, a multi-core processor 102 connected to the bus 101, and a memory 103 connected to the bus 101.
其中,存储器103具体可以是随机存取存储器(英文:Random Access Memory,缩写:RAM)、或者动态随机存取存储器(英文:Dynamic Random Access Memory,缩写:DRAM)等等,本申请实施例对此不作具体限定。The memory 103 may be a random access memory (English: Random Access Memory, abbreviation: RAM), or a dynamic random access memory (English: Dynamic Random Access Memory, DRAM), etc. No specific limitation.
总线101可以是外设部件互连标准(英文:Peripheral Component Interconnect,缩写:PCI)总线或扩展工业标准结构(英文:Extended Industry Standard Architecture,缩写:EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图2中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线The bus 101 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The bus can be divided into an address bus, a data bus, a control bus, and the like. For convenience of representation, only one thick line is shown in Figure 2, but it does not mean that there is only one bus or one type of bus.
多核处理器102包括多个处理器核,如处理器核102a、处理器核102b、……、处理器核102c,这些处理器核具体可以是中央处理单元(英文:Central Processing Unit,缩写:CPU)核,也可以是图形处理器(英文:Graphic Processing Unit,缩写:GPU)核,本申请实施例对此不作具体限定。其中,这些处理器核主要用于执行计算,每一个处理器核有自己的一级缓存(英文:Level 1 Cache,缩写:L1C)和二级缓存(Level 2 Cache,缩写:L2C);多个处理器核共用一个最后一级缓存(英文:Last Level Cache,缩写:LLC);多个多核处理器共用一个存储器。当某一个处理器核接收到一条读取数据的指令时,首先在L1C中查看是否存在该地址,如果存在则该处理器核直接从L1C中读取该数据,如果该地址不存在,则该处理器核会继续到L2C中查找,以此类推。The multi-core processor 102 includes a plurality of processor cores, such as a processor core 102a, a processor core 102b, ..., a processor core 102c, which may be a central processing unit (English: Central Processing Unit, abbreviation: CPU) The core may be a graphics processing unit (English: Graphic Processing Unit, GPU) core, which is not specifically limited in this embodiment. Among them, these processor cores are mainly used to perform calculations, and each processor core has its own level 1 cache (English: Level 1 Cache, abbreviation: L1C) and Level 2 Cache (abbreviation: L2C); The processor core shares a last level cache (English: Last Level Cache, abbreviation: LLC); multiple multi-core processors share a single memory. When a processor core receives an instruction to read data, first check whether the address exists in L1C. If it exists, the processor core directly reads the data from L1C. If the address does not exist, then the The processor core will continue to look into the L2C, and so on.
当然,本申请实施例也适用于有类似分层存储结构的单核系统或者其他包含Cache的系统,本申请实施例对此不作具体限定。Of course, the embodiment of the present application is also applicable to a single-core system or a system including a Cache having a similar hierarchical storage structure, which is not specifically limited in this embodiment of the present application.
如图3所示,为本申请实施例提供的一种数据处理装置30的硬件结构示意图。该数据处理装置30包括处理器301、存储器302、通信接口304和总线303。其中,处理器301、通信接口304和存储器302通过总线303相互连接。FIG. 3 is a schematic diagram showing the hardware structure of a data processing apparatus 30 according to an embodiment of the present application. The data processing device 30 includes a processor 301, a memory 302, a communication interface 304, and a bus 303. The processor 301, the communication interface 304, and the memory 302 are connected to one another via a bus 303.
处理器301是数据处理装置30的控制中心,通过总线303连接整个数据处理装置30的各个部分,通过运行或执行存储在存储器302内的软件程序和/或模块,以及调用存储在存储器302内的数据,执行数据处理装置30的各种功能和处理数据,从而对数据处理装置30进行整体监控。The processor 301 is the control center of the data processing device 30, connecting the various portions of the entire data processing device 30 via the bus 303, by running or executing software programs and/or modules stored in the memory 302, and recalling stored in the memory 302. The data, various functions of the data processing device 30 and processing data are executed to thereby perform overall monitoring of the data processing device 30.
可选的,该处理器301可以是上述图2中的任意一个处理器核。Optionally, the processor 301 can be any one of the processor cores in FIG. 2 above.
存储器302可用于存储软件程序以及模块,处理器301通过运行存储在存储器302中的软件程序以及模块,从而执行数据处理装置30的各种功能应用以及数据处理。存储器302主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序2等;存储数据区可存储根据数据处理装置30的使用所创 建的数据等。此外,存储器302可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 302 can be used to store software programs and modules, and the processor 301 executes various functional applications and data processing of the data processing device 30 by running software programs and modules stored in the memory 302. The memory 302 mainly includes a storage program area and a storage data area, wherein the storage program area can store an operating system, an application 2 required for at least one function, and the like; the storage data area can store data created according to the use of the data processing apparatus 30, and the like. . Moreover, memory 302 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
可选的,该存储器302可以是上述图2中的存储器。Optionally, the memory 302 can be the memory in FIG. 2 above.
总线303可以是PCI总线或EISA总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图3中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The bus 303 can be a PCI bus or an EISA bus or the like. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 3, but it does not mean that there is only one bus or one type of bus.
可选的,该总线303可以是上述图2中的总线。Alternatively, the bus 303 may be the bus of FIG. 2 described above.
通信接口304用于数据处理装置30与外部设备的通信。 Communication interface 304 is used for communication of data processing device 30 with external devices.
尽管未示出,数据处理装置30还可能包括射频(英文:Radio Frequency,缩写:RF)电路、音频电路、通信接口和/或多种传感器,本申请实施例对此不作具体限定。Although not shown, the data processing device 30 may also include a radio frequency (English: Radio Frequency, abbreviated as RF) circuit, an audio circuit, a communication interface, and/or a plurality of sensors, which are not specifically limited in this embodiment of the present application.
其中,当处理器301为上述图2中的任意一个处理器核,存储器302为上述图2中的存储器,总线303为上述图2中的总线时,本申请实施例提供的数据处理装置30可以是上述图2中的多核系统,本申请实施例对该情况不作具体限定。The data processing device 30 provided by the embodiment of the present application may be used when the processor 301 is the processor core of the foregoing FIG. 2, the memory 302 is the memory in FIG. 2, and the bus 303 is the bus in FIG. It is the multi-core system in FIG. 2 above, and the embodiment of the present application does not specifically limit the situation.
如图4所示,为本申请实施例提供的数据处理方法的流程示意图,包括如下步骤:As shown in FIG. 4 , a schematic flowchart of a data processing method provided by an embodiment of the present application includes the following steps:
S401、数据处理装置获取处理器指令流水线发送的读指令,该读指令包括待读取的第一数据在Cache的外部存储中的地址信息。S401. The data processing device acquires a read instruction sent by the processor instruction pipeline, where the read instruction includes address information of the first data to be read in an external storage of the Cache.
其中,Cache的读写宽度为2P字节,第一数据的字节数K≤P,K和P均为正整数。The read/write width of the Cache is 2P bytes, and the number of bytes of the first data is K≤P, and both K and P are positive integers.
本申请实施例中,Cache的读写宽度为2P字节具体是指,从Cache中读取数据时,每次读取2P字节的数据;将数据写入Cache中时,每次写入2P字节的数据。In the embodiment of the present application, the read/write width of the Cache is 2P bytes, which means that when the data is read from the Cache, 2P bytes of data are read each time; when the data is written into the Cache, 2P is written each time. Byte of data.
其中,当本申请实施例中的数据处理装置为上述图2中的多核系统时,步骤S401中的Cache可以是L1C,Cache的外部存储可以是L2C,LLC或者存储器等,本申请实施例对此不作具体限定。The Cache in the step S401 may be the L1C, and the external storage of the Cache may be the L2C, the LLC or the memory, etc., in the embodiment of the present application, the data processing device in the embodiment of the present application is the multi-core system in FIG. No specific limitation.
S402、当外部存储支持的数据格式与处理器指令流水线支持的数据格式不一致时,数据处理装置根据第一数据在外部存储中的地址信息,确定第二数据在Cache中的地址信息,其中,第二数据为第三数据中与第一数据对应的数据,第三数据为将包含第一数据的高速缓存块(cacheline)进行大小端格式转换后得到的数据,cacheline的大小≥2P。S402. When the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the data processing apparatus determines, according to the address information of the first data in the external storage, the address information of the second data in the Cache, where The data is the data corresponding to the first data in the third data, and the third data is the data obtained by converting the cache line containing the first data into a large and small format, and the size of the cacheline is ≥ 2P.
S403、数据处理装置根据第二数据在Cache中的地址信息,从Cache中读取2P字节的第四数据,其中,第四数据中包含第二数据。S403. The data processing device reads the second data of 2 Pbytes from the Cache according to the address information of the second data in the Cache, where the fourth data includes the second data.
S404、数据处理装置将第四数据循环右移第一字节,得到第五数据,第五数据中包含第二数据,该第二数据为第五数据对应的2P字节地址中低K位地址上的数据。S404. The data processing device shifts the fourth data right by the first byte to obtain the fifth data, where the fifth data includes the second data, where the second data is the lower K address of the 2P byte address corresponding to the fifth data. The data on it.
其中,Index 1=~(Address[n:0]+K–1),Index1表示第一字节,~表示取反,n=log
2P,Address[n:0]表示第一数据在Cache的外部存储中的首地址中低(n+1)位地址的值。
Where Index 1 = ~ (Address [n: 0] + K - 1), Index1 represents the first byte, ~ represents inversion, n = log 2 P, Address [n: 0] represents the first data in the Cache The value of the low (n+1)-bit address in the first address in the external storage.
需要说明的是,本申请实施例中,数据循环右移是为了实现数据右对齐。之所以进行数据右对齐,是由于处理器指令流水线在读取数据时,最右端字节即为读取指令首地址对应的字节,因此数据在发送给处理器指令流水线前,需要进行数据右对齐,在此进行统一说明,以下不再赘述。It should be noted that, in the embodiment of the present application, the data is shifted to the right to achieve data right alignment. The reason why the data is right-aligned is that when the processor instruction pipeline reads the data, the rightmost byte is the byte corresponding to the first address of the read instruction, so the data needs to be right before being sent to the processor instruction pipeline. Alignment, a unified description here, the details are not described below.
S405、数据处理装置将第二数据发送给处理器指令流水线。S405. The data processing device sends the second data to the processor instruction pipeline.
即,本申请实施例中,当外部存储支持的数据格式与处理器指令流水线支持的数据格式不一致时,从Cache中读取的数据是经过大小端格式转换后的数据,因此在从Cache中读取数据之后,不需要再进行大小端格式转换。That is, in the embodiment of the present application, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the data read from the Cache is data converted by the size end format, and thus is read from the Cache. After the data is fetched, there is no need to perform a large-scale format conversion.
如图5所示,为本申请实施例提供的大小端数据格式不一致时的数据处理逻辑框图。由图5可以看出,本申请实施例中,当外部存储支持的数据格式与处理器指令流水线支持的数据格式不一致时,大小端格式转换位于Cache的写通道中,也就是说,是在数据从外部存储写入Cache的过程中进行了大小端格式转换,进而,在从Cache中读取数据时,当外部存储支持的数据格式与处理器指令流水线支持的数据格式不一致时,可以直接从Cache中读取经过大小端转换后的数据,并经过数据右对齐操作,得到所需的数据。As shown in FIG. 5, the data processing logic block diagram when the data format of the big and small ends provided by the embodiment of the present application is inconsistent. It can be seen from FIG. 5 that, in the embodiment of the present application, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the format conversion of the big end is located in the write channel of the cache, that is, the data is in the data. The size conversion is performed in the process of writing to the Cache from the external storage. Further, when reading data from the Cache, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the Cache can be directly accessed from the Cache. The data after the size conversion is read, and the data is right aligned to obtain the required data.
需要说明的是,图5中的符号位扩展在本申请提供的数据处理方法中是个可选的操作,具体实现可参考现有的处理方式,本申请实施例对此不作具体限定。It should be noted that the symbol bit extension in FIG. 5 is an optional operation in the data processing method provided by the present application. The specific implementation may refer to the existing processing manner, which is not specifically limited in this embodiment of the present application.
基于本申请实施例提供的数据处理方法,本申请实施例中,当外部存储支持的数据格式与处理器指令流水线支持的数据格式不一致时,根据第一数据在外部存储中的地址信息,确定第二数据在Cache中的地址信息,其中,该第二数据为第三数据中与第一数据对应的数据,该三数据为将包含第一数据的cacheline进行大小端格式转换后得到的数据;进而根据第二数据在Cache中的地址信息,从Cache中读取包含第二数据的2P字节的第四数据;然后将第四数据循环右移第一字节,得到包含第二数据的第五数据,并将第二数据发送给处理器指令流水线,该第二数据为第五数据对应的2P字节的地址中低K位地址上的数据。也就是说,本申请实施例中,当外部存储支持的数据格式与处理器指令流水线支持的数据格式不一致时,从Cache中读取的数据是经过大小端格式转换后的数据,因此在从Cache中读数据之后,不需要再进行大小端格式转换,从而避免了现有技术中从Cache中读数据之后,由于大小端数据格式不一致而引入的大小端数据格式转换处理逻辑导致的加载命中延时增加的问题,降低了大小端数据格式不一致时的加载命中时延。According to the data processing method provided by the embodiment of the present application, in the embodiment of the present application, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the first information is determined according to the address information in the external storage. The address information of the second data in the Cache, wherein the second data is the data corresponding to the first data in the third data, and the three data is the data obtained by converting the cacheline containing the first data into a format of the size end; Reading the second data of the 2P bytes of the second data from the Cache according to the address information of the second data in the Cache; then shifting the fourth data right to the first byte to obtain the fifth data including the second data Data, and the second data is sent to the processor instruction pipeline, which is the data on the lower K-bit address of the 2P byte address corresponding to the fifth data. That is to say, in the embodiment of the present application, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the data read from the Cache is the data converted by the format of the large and small end, and thus is in the Cache. After the data is read, there is no need to perform the format conversion of the large and small end, thereby avoiding the load hit delay caused by the data conversion processing logic of the large and small end data introduced by the data format inconsistent after the data is read from the Cache in the prior art. The added problem reduces the load hit delay when the data format of the big and small ends is inconsistent.
进一步的,在数据处理装置根据第二数据在Cache中的地址信息,从Cache中读取2P字节的第四数据(步骤S403)之前,还可以包括:Further, before the data processing device reads the second data of 2 Pbytes from the Cache according to the address information of the second data in the Cache (step S403), the data processing device may further include:
数据处理装置从Cache的外部存储中读取包含第一数据的cacheline;将包含第一数据的cacheline进行大小端数据格式转换,得到第三数据;将第三数据写入Cache中。The data processing device reads the cacheline containing the first data from the external storage of the Cache; converts the cacheline containing the first data into the data format of the large and small end to obtain the third data; and writes the third data into the Cache.
需要说明的是,本申请实施例中,由于Cache的读写宽度为2P字节,cacheline的大小≥2P,而第三数据的大小为cacheline的大小,因此在将第三数据写入Cache中时,可能是经过多次写入过程写入的,比如,若cacheline的大小为4P,则是分两次写入过程写入的,本申请实施例对此不作具体限定。It should be noted that, in the embodiment of the present application, since the read/write width of the Cache is 2 Pbytes, the size of the cacheline is ≥2P, and the size of the third data is the size of the cacheline, when the third data is written into the Cache, It may be written in a plurality of write processes. For example, if the size of the cacheline is 4P, it is written in two write processes, which is not specifically limited in this embodiment of the present application.
基于该方案,当外部存储支持的数据格式与处理器指令流水线支持的数据格式不一致时,可以将外部存储中的数据写入Cache中,并且写入Cache中的数据为经过大小端格式转换后的数据。进而,在从Cache中读取数据之后,不需要再进行大小端格式转换,从而避免了现有技术中从Cache中读数据之后,由于大小端数据格式不一致而引入的大小端数据格式转换处理逻辑导致的加载命中延时增加的问题,降低了大小 端数据格式不一致时的加载命中时延。Based on the scheme, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the data in the external storage can be written into the Cache, and the data written in the Cache is converted by the size end format. data. Furthermore, after the data is read from the Cache, the size-to-size format conversion is not required, thereby avoiding the data processing of the big-end data format introduced by the inconsistent data format of the large and small ends after reading data from the Cache in the prior art. The resulting increase in load hit latency increases the load hit latency when the data format on the big and small ends is inconsistent.
可选的,如图6所示,在数据处理装置获取处理器指令流水线发送的读指令(步骤S401)之后,还可以包括如下步骤:Optionally, as shown in FIG. 6, after the data processing device acquires the read command sent by the processor instruction pipeline (step S401), the method may further include the following steps:
S406、当外部存储支持的数据格式与处理器指令流水线支持的数据格式一致时,数据处理装置根据第一数据在外部存储中的地址信息,从Cache中读取2P字节的第六数据,其中,第六数据中包含第一数据。S406. When the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the data processing device reads the 6P bytes of the sixth data from the Cache according to the address information of the first data in the external storage, where The sixth data includes the first data.
S407、数据处理装置将第六数据循环右移第二字节,得到第七数据,第七数据中包含第一数据,该第一数据为第七数据对应的2P字节地址中低K位地址上的数据。S407. The data processing device shifts the sixth data right to the second byte to obtain the seventh data. The seventh data includes the first data, where the first data is the lower K address of the 2P byte address corresponding to the seventh data. The data on it.
其中,Index2=Address[n:0],Index2表示第二字节,n=log
2P,Address[n:0]表示第一数据在Cache的外部存储中的首地址中低(n+1)位地址的值。
Where Index2=Address[n:0], Index2 represents the second byte, n=log 2 P, and Address[n:0] indicates that the first data is low in the first address in the external storage of the Cache (n+1) The value of the bit address.
S408、数据处理装置将第一数据发送给处理器指令流水线。S408. The data processing device sends the first data to the processor instruction pipeline.
即,本申请实施例中,当外部存储支持的数据格式与处理器指令流水线支持的数据格式一致时,由于不需要再进行大小端格式转换。因此从Cache中读取的数据就是从外部存储写入Cache中的数据。That is, in the embodiment of the present application, when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the size end format conversion is not required. Therefore, the data read from the Cache is the data written to the Cache from the external storage.
基于该方案,可以在外部存储支持的数据格式与处理器指令流水线支持的数据格式一致时,将Cache中的数据发送给处理器指令流水线。Based on the scheme, the data in the Cache can be sent to the processor instruction pipeline when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline.
进一步的,在数据处理装置根据第一数据在外部存储中的地址信息,从Cache中读取2P字节的第六数据(步骤S406)之前,还可以包括:Further, before the data processing device reads the second data of 2 Pbytes from the Cache according to the address information in the external storage of the first data (step S406), the method may further include:
数据处理装置从Cache的外部存储中读取包含第一数据的cacheline;将包含第一数据的cacheline写入Cache中。The data processing device reads the cacheline containing the first data from the external storage of the Cache; and writes the cacheline containing the first data into the Cache.
需要说明的是,本申请实施例中,由于Cache的读写宽度为2P字节,cacheline的大小≥2P,因此在将包含第一数据的cacheline写入Cache中时,可能是经过多次写入过程写入的,比如,若cacheline的大小为4P,则是分两次写入过程写入的,本申请实施例对此不作具体限定。It should be noted that, in the embodiment of the present application, since the read/write width of the Cache is 2 Pbytes and the size of the cacheline is ≥ 2 P, when the cacheline containing the first data is written into the Cache, it may be written multiple times. The process is written, for example, if the size of the cacheline is 4P, it is written in two write processes, which is not specifically limited in this embodiment of the present application.
基于该方案,当外部存储支持的数据格式与处理器指令流水线支持的数据格式一致时,可以将外部存储中的数据写入Cache中。Based on the scheme, when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the data in the external storage can be written into the Cache.
可选的,本申请实施例提供的数据处理方法还可以包括:数据处理装置获取处理器指令流水线发送的写指令,该写指令包括待写入的第八数据和第八数据的字节数T,其中,T≤P,T为整数;当外部存储支持的数据格式与处理器指令流水线支持的数据格式不一致时,将第八数据循环左移第三字节数,得到2P字节的第九数据,其中,Index3=~(Address[n:0]+T–1),Index3表示第三字节数;将第九数据写入Cache中。Optionally, the data processing method provided by the embodiment of the present application may further include: the data processing device acquires a write instruction sent by the processor instruction pipeline, where the write instruction includes the eighth data to be written and the number of bytes of the eighth data T. Where T≤P, T is an integer; when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the eighth data loop is shifted left by the third byte number to obtain the ninth of the 2P byte Data, where Index3=~(Address[n:0]+T-1), Index3 represents the third byte number; the ninth data is written into the Cache.
基于该方案,可以在外部存储支持的数据格式与处理器指令流水线支持的数据格式不一致时,将处理器指令流水线中的数据写入Cache中。Based on the scheme, the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline.
可选的,在数据处理装置获取处理器指令流水线发送的写指令之后,还可以包括:当外部存储支持的数据格式与处理器指令流水线支持的数据格式一致时,将第八数据循环左移第四字节数,得到2P字节的第十数据,其中,Index 4=(Address[n:0]),Index4表示第四字节数;将第十数据写入所述Cache中。Optionally, after the data processing device acquires the write instruction sent by the processor instruction pipeline, the method may further include: when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, shifting the eighth data to the left The four-byte number yields the tenth data of 2P bytes, where Index 4 = (Address [n: 0]), Index 4 represents the fourth byte number, and the tenth data is written into the Cache.
基于该方案,可以在外部存储支持的数据格式与处理器指令流水线支持的数据格式一致时,将处理器指令流水线中的数据写入Cache中。Based on the scheme, the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline.
下面将结合一个具体示例对本申请实施例提供的数据处理方法进行进一步说明。The data processing method provided by the embodiment of the present application is further described below with reference to a specific example.
示例性的,以2P=16字节,处理器指令流水线只支持小端数据格式,外部存储支持大小端数据格式,并且cacheline的大小为16字节为例,则,Exemplarily, with 2P=16 bytes, the processor instruction pipeline only supports the little endian data format, the external storage supports the big end data format, and the size of the cacheline is 16 bytes, for example,
当外部存储工作在大端数据格式时:When external storage works in big endian data format:
数据处理装置将外部存储中16字节的cacheline以字节为单位反转后存储在Cache中,相当于将16字节的大端数据以小端格式存储在Cache中。当处理器指令流水线发起读指令时,可以直接从Cache中读取小端数据。另外,由于之前数据写入Cache时cacheline做了反转操作,因此在进行数据右对齐时,循环右移的Index需要做相应的反转补偿。The data processing device inverts the 16-byte cacheline in the external storage in bytes and stores it in the Cache, which is equivalent to storing the 16-byte big endian data in the cache in a little endian format. When the processor instruction pipeline initiates a read instruction, the little endian data can be read directly from the Cache. In addition, since the cacheline is reversed when the data is written to the Cache, when the data is right-aligned, the index that is rotated rightward needs to be inversely compensated.
比如,在图7中,假设需要读取的数据为外部存储中的B0、B1、B2和B3,也就是外部存储中0x7、0x8、0x9和0xA地址上的数据,则由于在外部存储中数据以大端模式存储,因此,当把数据写入Cache时,整条cacheline需要以字节为单位反转后写入Cache中,如图7所示。其中,B0写入Cache中的0x5地址上,B1写入Cache中的0x6地址上,B2写入Cache中的0x7地址上,B3写入Cache中的0x8地址上。此时,数据在Cache中已经以小端形式存储了。当处理器指令流水线需要读取0x7、0x8、0x9和0xA地址上的数据时,Cache将cacheline数据输出,如mem_data_o中所示。此时只要恰当的处理数据右对齐的循环右移Index即可得到目标数据B3、B2、B1和B0。其中,Index=~(Address[n:0]+K–1)。在此例中,Address[n:0]=7,K=4,因此,Index=~(Address[n:0]+K–1)=~(7+4–1)=5,即循环右移5个字节,如图7所示。最终,发送给处理器指令流水线的即为0x0、0x1、0x2和0x3地址上的数据B3、B2、B1和B0。For example, in Figure 7, it is assumed that the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data at the 0x7, 0x8, 0x9, and 0xA addresses in the external storage, because of the data in the external storage. Stored in big endian mode, therefore, when writing data to the Cache, the entire cacheline needs to be inverted in bytes and written to the Cache, as shown in Figure 7. B0 is written to the 0x5 address in the Cache, B1 is written to the 0x6 address in the Cache, B2 is written to the 0x7 address in the Cache, and B3 is written to the 0x8 address in the Cache. At this point, the data is already stored in little form in the Cache. When the processor instruction pipeline needs to read data at addresses 0x7, 0x8, 0x9, and 0xA, the Cache outputs the cacheline data as shown in mem_data_o. At this time, the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index. Among them, Index=~(Address[n:0]+K-1). In this example, Address[n:0]=7, K=4, therefore, Index=~(Address[n:0]+K–1)=~(7+4–1)=5, that is, loop right Move 5 bytes, as shown in Figure 7. Finally, the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.
当然,在将处理器指令流水线中0x0、0x1、0x2和0x3地址上的数据B3、B2、B1和B0写入Cache中时,需要循环左移5个字节,得到图7中Cache中的数据,本申请实施例在此不再赘述。Of course, when the data B3, B2, B1, and B0 on the 0x0, 0x1, 0x2, and 0x3 addresses in the processor instruction pipeline are written into the Cache, it is necessary to cyclically shift 5 bytes to the left to obtain the data in the Cache in FIG. The embodiments of the present application are not described herein again.
或者,比如,在图8中,假设需要读取的数据为外部存储中的B0、B1、B2和B3,也就是外部存储中0xE、0xF、0x10和0x11地址上的数据,需要跨cacheline。则由于在外部存储中数据以大端模式存储,因此,当把数据写入Cache时,整条cacheline需要以字节为单位反转后写入Cache中,如图8所示。其中,B0写入Cache中的0x1E地址上,B1写入Cache中的0x1F地址上,B2写入Cache中的0x0地址上,B3写入Cache中的0x1地址上。此时,数据在Cache中已经以小端形式存储了。当处理器指令流水线需要读取0xE、0xF、0x10和0x11地址上的数据时,Cache将cacheline数据输出,如mem_data_o中所示。此时只要恰当的处理数据右对齐的循环右移Index即可得到目标数据B3、B2、B1和B0。其中,Index=~(Address[n:0]+K–1)。在此例中,Address[n:0]=14,K=4,因此,Index=~(Address[n:0]+K–1)=~(14+4–1)=14,即循环右移14个字节,如图8所示。最终,发送给处理器指令流水线的即为0x0、0x1、0x2和0x3地址上的数据B3、B2、B1和B0。Or, for example, in FIG. 8, it is assumed that the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data on the addresses of 0xE, 0xF, 0x10, and 0x11 in the external storage, and it is required to cross the cacheline. Since the data is stored in the big end mode in the external storage, when the data is written to the Cache, the entire cacheline needs to be inverted in bytes and then written into the Cache, as shown in FIG. B0 is written to the 0x1E address in the Cache, B1 is written to the 0x1F address in the Cache, B2 is written to the 0x0 address in the Cache, and B3 is written to the 0x1 address in the Cache. At this point, the data is already stored in little form in the Cache. When the processor instruction pipeline needs to read data at addresses 0xE, 0xF, 0x10, and 0x11, the Cache outputs the cacheline data as shown in mem_data_o. At this time, the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index. Among them, Index=~(Address[n:0]+K-1). In this example, Address[n:0]=14, K=4, therefore, Index=~(Address[n:0]+K–1)=~(14+4–1)=14, that is, loop right Move 14 bytes, as shown in Figure 8. Finally, the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.
当然,在将处理器指令流水线中0x0、0x1、0x2和0x3地址上的数据B3、B2、B1和B0写入Cache中时,需要循环左移14个字节,得到图8中Cache中的数据,本申请实施例在此不再赘述。Of course, when the data B3, B2, B1, and B0 at the addresses 0x0, 0x1, 0x2, and 0x3 in the processor instruction pipeline are written into the Cache, it is necessary to rotate the left byte by 14 bytes to obtain the data in the Cache in FIG. The embodiments of the present application are not described herein again.
当外部存储工作在小端数据格式时:When external storage works in little endian data format:
数据处理装置将外部存储中16字节的cacheline直接写入Cache中。当处理器指令流水线发起读指令时,可以直接从Cache中读取小端数据。另外,由于之前数据写入Cache时cacheline未进行反转操作,因此在进行数据右对齐,循环右移的Index不需要做相应的反转补偿。The data processing device writes the 16-byte cacheline in the external storage directly into the Cache. When the processor instruction pipeline initiates a read instruction, the little endian data can be read directly from the Cache. In addition, since the cacheline is not reversed when the data is written to the Cache, the right alignment of the data is performed, and the index that is rotated rightward does not need to be inversely compensated.
比如,在图9中,假设需要读取的数据为外部存储中的B0、B1、B2和B3,也就是外部存储中0x7、0x8、0x9和0xA地址上的数据,则由于在外部存储中数据以小端模式存储,因此,当把数据写入Cache时,可以将整条cacheline直接写入Cache中,如图7所示。其中,B0写入Cache中的0x7地址上,B1写入Cache中的0x8地址上,B2写入Cache中的0x9地址上,B3写入Cache中的0xA地址上。此时,数据在Cache中仍以小端形式存储。当处理器指令流水线需要读取0x7、0x8、0x9和0xA地址上的数据时,Cache将cacheline数据输出,如mem_data_o中所示。此时只要恰当的处理数据右对齐的循环右移Index即可得到目标数据B3、B2、B1和B0。其中,Index=Address[n:0]。在此例中,Address[n:0]=7,因此,Index=Address[n:0]=7,即循环右移7个字节,如图9所示。最终,发送给处理器指令流水线的即为0x0、0x1、0x2和0x3地址上的数据B3、B2、B1和B0。For example, in Figure 9, assume that the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data at the 0x7, 0x8, 0x9, and 0xA addresses in the external storage, because of the data in the external storage. Stored in little endian mode, so when writing data to the Cache, you can write the entire cacheline directly into the Cache, as shown in Figure 7. B0 is written to the 0x7 address in the Cache, B1 is written to the 0x8 address in the Cache, B2 is written to the 0x9 address in the Cache, and B3 is written to the 0xA address in the Cache. At this point, the data is still stored in little form in the Cache. When the processor instruction pipeline needs to read data at addresses 0x7, 0x8, 0x9, and 0xA, the Cache outputs the cacheline data as shown in mem_data_o. At this time, the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index. Where Index=Address[n:0]. In this example, Address[n:0]=7, therefore, Index=Address[n:0]=7, that is, the loop is shifted right by 7 bytes, as shown in FIG. Finally, the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.
当然,在将处理器指令流水线中0x0、0x1、0x2和0x3地址上的数据B3、B2、B1和B0写入Cache中时,需要循环左移7个字节,得到图9中Cache中的数据,本申请实施例在此不再赘述。Of course, when the data B3, B2, B1, and B0 at the addresses 0x0, 0x1, 0x2, and 0x3 in the processor instruction pipeline are written into the Cache, it is necessary to cyclically shift 7 bytes to the left to obtain the data in the Cache in FIG. The embodiments of the present application are not described herein again.
或者,比如,在图10中,假设需要读取的数据为外部存储中的B0、B1、B2和B3,也就是外部存储中0xE、0xF、0x10和0x11地址上的数据,需要跨cacheline。则由于在外部存储中数据以小端模式存储,因此,当把数据写入Cache时,可以将整条cacheline直接写入Cache中,如图10所示。其中,B0写入Cache中的0xE地址上,B1写入Cache中的0xF地址上,B2写入Cache中的0x10地址上,B3写入Cache中的0x11地址上。此时,数据在Cache中仍以小端形式存储。当处理器指令流水线需要读取0xE、0xF、0x10和0x11地址上的数据时,Cache将cacheline数据输出,如mem_data_o中所示。此时只要恰当的处理数据右对齐的循环右移Index即可得到目标数据B3、B2、B1和B0。其中,Index=Address[n:0]。在此例中,Address[n:0]=14,因此,Index=Address[n:0]=14,即循环右移14个字节,如图10所示。最终,发送给处理器指令流水线的即为0x0、0x1、0x2和0x3地址上的数据B3、B2、B1和B0。Or, for example, in FIG. 10, it is assumed that the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data on the addresses of 0xE, 0xF, 0x10, and 0x11 in the external storage, and cross-cacheline is required. Since the data is stored in little endian mode in the external storage, when the data is written into the Cache, the entire cacheline can be directly written into the Cache, as shown in FIG. B0 is written to the 0xE address in the Cache, B1 is written to the 0xF address in the Cache, B2 is written to the 0x10 address in the Cache, and B3 is written to the 0x11 address in the Cache. At this point, the data is still stored in little form in the Cache. When the processor instruction pipeline needs to read data at addresses 0xE, 0xF, 0x10, and 0x11, the Cache outputs the cacheline data as shown in mem_data_o. At this time, the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index. Where Index=Address[n:0]. In this example, Address[n:0]=14, therefore, Index=Address[n:0]=14, that is, the loop is shifted right by 14 bytes, as shown in FIG. Finally, the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.
当然,在将处理器指令流水线中0x0、0x1、0x2和0x3地址上的数据B3、B2、B1和B0写入Cache中时,需要循环左移14个字节,得到图10中Cache中的数据,本申请实施例在此不再赘述。Of course, when the data B3, B2, B1, and B0 at the addresses 0x0, 0x1, 0x2, and 0x3 in the processor instruction pipeline are written into the Cache, it is necessary to cyclically shift 14 bytes to the left to obtain the data in the Cache in FIG. The embodiments of the present application are not described herein again.
需要说明的是,上述示例以处理器指令流水线只支持小端数据格式,外部存储支持大小端数据格式为例进行说明。当然,也可以是处理器指令流水线只支持大端数据格式,外部存储支持大小端数据格式,本申请实施例对此不作具体限定。It should be noted that the above example uses the processor instruction pipeline to support only the small end data format, and the external storage supports the small end data format as an example for description. Of course, the processor instruction pipeline only supports the big end data format, and the external storage supports the data format of the big end. This embodiment does not specifically limit this.
上述主要从数据处理装置执行数据处理方法的角度对本申请实施例提供的方案进行了介绍。可以理解的是,上述数据处理装置为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中 所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。The solution provided by the embodiment of the present application is mainly introduced from the perspective of the data processing method performed by the data processing apparatus. It can be understood that the above data processing apparatus includes a hardware structure and/or a software module corresponding to each function in order to implement the above functions. Those skilled in the art will readily appreciate that the present application can be implemented in a combination of hardware or hardware and computer software in combination with the elements and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.
本申请实施例可以根据上述方法示例对数据处理装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。The embodiment of the present application may divide the function module into the data processing device according to the foregoing method example. For example, each function module may be divided according to each function, or two or more functions may be integrated into one processing module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the module in the embodiment of the present application is schematic, and is only a logical function division, and the actual implementation may have another division manner.
比如,在采用对应各个功能划分各个功能模块的情况下,图11示出了上述实施例中所涉及的数据处理装置110的一种可能的结构示意图。该数据处理装置110包括:获取模块1101、确定模块1102、读取模块1103、移位模块1104和发送模块1105。其中,获取模块1101用于支持数据处理装置110执行图4所示的步骤S401;确定模块1102用于支持数据处理装置110执行图4所示的步骤S402;读取模块1103用于支持数据处理装置110执行图4所示的步骤S403;移位模块1104用于支持数据处理装置110执行图4所示的步骤S404;发送模块1105用于支持数据处理装置110执行图4所示的步骤S405。For example, in the case of dividing each functional module by corresponding functions, FIG. 11 shows a possible structural diagram of the data processing apparatus 110 involved in the above embodiment. The data processing apparatus 110 includes an acquisition module 1101, a determination module 1102, a reading module 1103, a shifting module 1104, and a transmitting module 1105. The obtaining module 1101 is configured to support the data processing device 110 to perform step S401 shown in FIG. 4; the determining module 1102 is configured to support the data processing device 110 to perform step S402 shown in FIG. 4; and the reading module 1103 is configured to support the data processing device. 110 executes step S403 shown in FIG. 4; the shifting module 1104 is configured to support the data processing apparatus 110 to perform step S404 shown in FIG. 4; and the transmitting module 1105 is configured to support the data processing apparatus 110 to perform step S405 shown in FIG.
可选的,如图12所示,数据处理装置110还可以包括格式转换模块1106和写入模块1107。读取模块1103,还用于在根据第二数据在Cache中的地址信息,从Cache中读取2P字节的第四数据之前,从Cache的外部存储中读取包含第一数据的cacheline。格式转换模块1106,用于将包含第一数据的cacheline进行大小端数据格式转换,得到第三数据;1107写入模块,用于将第三数据写入Cache中。Optionally, as shown in FIG. 12, the data processing apparatus 110 may further include a format conversion module 1106 and a write module 1107. The reading module 1103 is further configured to: before reading the second data of 2 Pbytes from the Cache according to the address information in the Cache according to the second data, reading the cacheline containing the first data from the external storage of the Cache. The format conversion module 1106 is configured to perform a size end data format conversion on the cacheline containing the first data to obtain third data, and a 1107 write module to write the third data into the Cache.
可选的,读取模块1103还用于支持数据处理装置110执行图6所示的步骤S406;移位模块1104还用于支持数据处理装置110执行图6所示的步骤S407;发送模块1105还用于支持数据处理装置110执行图6所示的步骤S408。Optionally, the reading module 1103 is further configured to support the data processing device 110 to perform step S406 shown in FIG. 6; the shifting module 1104 is further configured to support the data processing device 110 to perform step S407 shown in FIG. 6; the sending module 1105 further It is used to support the data processing device 110 to perform step S408 shown in FIG.
可选的,读取模块1103,还用于在根据第一数据在外部存储中的地址信息,从Cache中读取2P字节的第六数据之前,从Cache的外部存储中读取包含第一数据的cacheline;写入模块1107,还用于将包含第一数据的cacheline写入Cache中。Optionally, the reading module 1103 is further configured to: before the sixth data of 2 P bytes is read from the Cache in the address information in the external storage according to the first data, read from the external storage of the Cache, including the first The cacheline of the data; the writing module 1107 is further configured to write the cacheline containing the first data into the Cache.
可选的,获取模块1101,还用于获取处理器指令流水线发送的写指令,写指令包括待写入的第八数据和第八数据的字节数T,其中,T≤P,T为整数。移位模块1104,还用于当外部存储支持的数据格式与处理器指令流水线支持的数据格式不一致时,将第八数据循环左移第三字节数,得到2P字节的第九数据,其中,Index 3=~(Address[n:0]+T–1),Index3表示第三字节数。写入模块1107,还用于将第九数据写入Cache中。Optionally, the obtaining module 1101 is further configured to acquire a write instruction sent by the processor instruction pipeline, where the write instruction includes the eighth data to be written and the number of bytes T of the eighth data, where T≤P, T is an integer . The shifting module 1104 is further configured to: when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, shift the eighth data loop to the left by the third byte number to obtain the ninth data of 2P bytes, wherein , Index 3 = ~ (Address [n: 0] + T - 1), Index3 represents the third byte number. The writing module 1107 is further configured to write the ninth data into the Cache.
可选的,移位模块1104,还用于在获取模块1101获取处理器指令流水线发送的写指令之后,当外部存储支持的数据格式与处理器指令流水线支持的数据格式一致时,将第八数据循环左移第四字节数,得到2P字节的第十数据,其中,Index 4=(Address[n:0]),Index4表示第四字节数;写入模块1107,还用于将第十数据写入Cache中。Optionally, the shifting module 1104 is further configured to: after the obtaining module 1101 acquires the write instruction sent by the processor instruction pipeline, when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the eighth data is The fourth byte number is shifted to the left to obtain the 10th data of 2P bytes, wherein Index 4=(Address[n:0]), Index4 represents the fourth byte number; and the writing module 1107 is also used to Ten data is written to the Cache.
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块 的功能描述,在此不再赘述。All the related content of the steps involved in the foregoing method embodiments may be referred to the functional descriptions of the corresponding functional modules, and details are not described herein again.
以采用集成的方式划分各个功能模块的情况下,图13示出了上述实施例中所涉及的数据处理装置的一种可能的结构示意图,该数据处理装置130包括:处理模块1301和通信模块1302。其中,该处理模块1301可用于执行图11或图12中获取模块1101、确定模块1102、读取模块1103、移位模块1104、格式转换模块1106和写入模块1107所能执行的操作;通信模块1302可用于执行图11或图12中发送模块1105所能执行的操作,具体可参考图11或图12所示的实施例,本申请实施例在此不再赘述。FIG. 13 is a schematic diagram showing a possible structure of the data processing apparatus involved in the foregoing embodiment. The data processing apparatus 130 includes: a processing module 1301 and a communication module 1302. . The processing module 1301 can be used to perform the operations that can be performed by the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, the format converting module 1106, and the writing module 1107 in FIG. 11 or FIG. 12; The operation of the sending module 1105 in FIG. 11 or FIG. 12 can be performed. For details, refer to the embodiment shown in FIG. 11 or FIG. 12 , and details are not described herein again.
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。All the related content of the steps involved in the foregoing method embodiments may be referred to the functional descriptions of the corresponding functional modules, and details are not described herein again.
在发明实施例中,该数据处理装置以对应各个功能划分各个功能模块的形式来呈现,或者,该数据处理装置以采用集成的方式划分各个功能模块的形式来呈现。这里的“模块”可以指特定ASIC,电路,执行一个或多个软件或固件程序的处理器和存储器,集成逻辑电路,和/或其他可以提供上述功能的器件。在一个简单的实施例中,本领域的技术人员可以想到数据处理装置110或者数据处理装置130可以采用图3所示的形式。比如,图11中的获取模块1101、确定模块1102、读取模块1103、移位模块1104和发送模块1105可以通过图3的处理器301和存储器303来实现,具体的,获取模块1101、确定模块1102、读取模块1103、移位模块1104和发送模块1105可以通过由处理器301来调用存储器303中存储的应用程序代码来执行,本申请实施例对此不作任何限制。或者,比如,图12中的获取模块1101、确定模块1102、读取模块1103、移位模块1104、发送模块1105、格式转换模块1106和写入模块1107可以通过图3的处理器301和存储器303来实现,具体的,获取模块1101、确定模块1102、读取模块1103、移位模块1104、发送模块1105、格式转换模块1106和写入模块1107可以通过由处理器301来调用存储器303中存储的应用程序代码来执行,本申请实施例对此不作任何限制。或者,比如,图13中的处理模块1301和通信模块1302可以通过图3的处理器301和存储器303来实现,具体的,处理模块1301和通信模块1302可以通过由处理器301来调用存储器303中存储的应用程序代码来执行,本申请实施例对此不作任何限制。In an embodiment of the invention, the data processing device is presented in the form of dividing each functional module corresponding to each function, or the data processing device is presented in a form that divides each functional module in an integrated manner. A "module" herein may refer to a particular ASIC, circuitry, processor and memory that executes one or more software or firmware programs, integrated logic circuitry, and/or other devices that provide the functionality described above. In a simple embodiment, those skilled in the art will appreciate that data processing device 110 or data processing device 130 may take the form shown in FIG. For example, the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, and the sending module 1105 in FIG. 11 can be implemented by the processor 301 and the memory 303 of FIG. 3, specifically, the acquiring module 1101 and the determining module. The reading module 1103, the shifting module 1104, and the sending module 1105 can be executed by the processor 301 to call the application code stored in the memory 303, which is not limited in this embodiment. Alternatively, for example, the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, the transmitting module 1105, the format converting module 1106, and the writing module 1107 in FIG. 12 may pass through the processor 301 and the memory 303 of FIG. To be implemented, specifically, the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, the sending module 1105, the format converting module 1106, and the writing module 1107 can be called by the processor 301 to store the memory stored in the memory 303. The application code is executed, and the embodiment of the present application does not impose any limitation on this. Alternatively, for example, the processing module 1301 and the communication module 1302 in FIG. 13 may be implemented by the processor 301 and the memory 303 of FIG. 3. Specifically, the processing module 1301 and the communication module 1302 may be called by the processor 301 in the memory 303. The stored application code is executed, and the embodiment of the present application does not impose any limitation on this.
由于本申请实施例提供的数据处理装置可用于执行上述数据处理方法,因此其所能获得的技术效果可参考上述方法实施例,本申请实施例在此不再赘述。The data processing device provided by the embodiment of the present application can be used to perform the foregoing data processing method. Therefore, the technical effects of the present invention can be referred to the foregoing method embodiments.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式来实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或者数据中心通过有线(例如同轴电缆、光纤、数字用户线(Digital Subscriber Line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是 包含一个或多个可以用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带),光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer readable storage medium can be any available media that can be accessed by a computer or a data storage device that includes one or more servers, data centers, etc. that can be integrated with the media. The usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a Solid State Disk (SSD)) or the like.
尽管在此结合各实施例对本申请进行了描述,然而,在实施所要求保护的本申请过程中,本领域技术人员通过查看所述附图、公开内容、以及所附权利要求书,可理解并实现所述公开实施例的其他变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其他单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。Although the present application has been described herein in connection with the various embodiments, those skilled in the art can Other variations of the disclosed embodiments are achieved. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill several of the functions recited in the claims. Certain measures are recited in mutually different dependent claims, but this does not mean that the measures are not combined to produce a good effect.
尽管结合具体特征及其实施例对本申请进行了描述,显而易见的,在不脱离本申请的精神和范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本申请的示例性说明,且视为已覆盖本申请范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。While the present invention has been described in connection with the specific embodiments and embodiments thereof, various modifications and combinations can be made without departing from the spirit and scope of the application. Accordingly, the description and drawings are to be regarded as It will be apparent to those skilled in the art that various modifications and changes can be made in the present application without departing from the spirit and scope of the application. Thus, it is intended that the present invention cover the modifications and variations of the present invention.
Claims (13)
- 一种数据处理方法,其特征在于,所述方法包括:A data processing method, the method comprising:获取处理器指令流水线发送的读指令,所述读指令包括待读取的第一数据在缓存Cache的外部存储中的地址信息,其中,所述Cache的读写宽度为2P字节,所述第一数据的字节数K≤P,K和P均为正整数;Obtaining a read instruction sent by the processor instruction pipeline, where the read instruction includes address information of the first data to be read in an external storage of the cache cache, wherein the read/write width of the cache is 2 Pbytes, The number of bytes of a data K ≤ P, K and P are positive integers;当所述外部存储支持的数据格式与所述处理器指令流水线支持的数据格式不一致时,根据所述第一数据在所述外部存储中的地址信息,确定第二数据在所述Cache中的地址信息,其中,所述第二数据为第三数据中与所述第一数据对应的数据,所述第三数据为将包含所述第一数据的高速缓存块cacheline进行大小端格式转换后得到的数据,所述cacheline的大小≥2P;Determining, according to the address information of the first data in the external storage, an address of the second data in the Cache, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline Information, wherein the second data is data corresponding to the first data in the third data, and the third data is obtained by performing a size end format conversion on a cache block cacheline including the first data. Data, the size of the cacheline is ≥ 2P;根据所述第二数据在所述Cache中的地址信息,从所述Cache中读取2P字节的第四数据,其中,所述第四数据中包含所述第二数据;And reading, according to the address information of the second data in the Cache, the second data of 2 Pbytes from the Cache, where the fourth data includes the second data;将所述第四数据循环右移第一字节,得到第五数据,其中,所述第五数据中包含所述第二数据,所述第二数据为所述第五数据对应的2P字节地址中低K位地址上的数据,Index 1=~(Address[n:0]+K–1),Index1表示所述第一字节,~表示取反,n=log 2P,Address[n:0]表示所述第一数据在所述Cache的外部存储中的首地址中低(n+1)位地址的值; And shifting the fourth data right to the first byte to obtain the fifth data, where the fifth data includes the second data, and the second data is 2P bytes corresponding to the fifth data. The data on the lower K address of the address, Index 1 = ~ (Address [n: 0] + K - 1), Index1 indicates the first byte, ~ indicates negation, n = log 2 P, Address [n :0] represents the value of the low (n+1)-bit address in the first address of the first data in the external storage of the Cache;将所述第二数据发送给所述处理器指令流水线。The second data is sent to the processor instruction pipeline.
- 根据权利要求1所述的方法,其特征在于,在所述根据所述第二数据在所述Cache中的地址信息,从所述Cache中读取2P字节的第四数据之前,还包括:The method according to claim 1, wherein before the reading of the second data of 2 Pbytes from the Cache according to the address information in the Cache according to the second data, the method further includes:从所述Cache的外部存储中读取包含所述第一数据的cacheline;Reading a cacheline containing the first data from an external storage of the Cache;将所述包含所述第一数据的cacheline进行大小端数据格式转换,得到所述第三数据;Converting the cacheline containing the first data into a data format of a size end to obtain the third data;将所述第三数据写入所述Cache中。The third data is written into the Cache.
- 根据权利要求1或2所述的方法,其特征在于,在所述获取处理器指令流水线发送的读指令之后,还包括:The method according to claim 1 or 2, further comprising: after the obtaining the read instruction sent by the processor instruction pipeline, further comprising:当所述外部存储支持的数据格式与所述处理器指令流水线支持的数据格式一致时,根据所述第一数据在所述外部存储中的地址信息,从所述Cache中读取2P字节的第六数据,其中,所述第六数据中包含所述第一数据;And when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, reading 2 Pbytes from the Cache according to the address information of the first data in the external storage a sixth data, wherein the sixth data includes the first data;将所述第六数据循环右移第二字节,得到第七数据,所述第七数据中包含所述第一数据,所述第一数据为所述第七数据对应的2P字节地址中低K位地址上的数据,其中,Index2=Address[n:0],Index2表示所述第二字节,n=log 2P,Address[n:0]表示所述第一数据在所述Cache的外部存储中的首地址中低(n+1)位地址的值; And shifting the sixth data right to the second byte to obtain the seventh data, where the seventh data includes the first data, where the first data is in a 2P byte address corresponding to the seventh data. Data at a low K-bit address, where Index2=Address[n:0], Index2 represents the second byte, n=log 2 P, and Address[n:0] indicates that the first data is in the Cache The value of the low (n+1)-bit address in the first address in the external storage;将所述第一数据发送给所述处理器指令流水线。Transmitting the first data to the processor instruction pipeline.
- 根据权利要求3所述的方法,其特征在于,在所述根据所述第一数据在所述外部存储中的地址信息,从所述Cache中读取2P字节的第六数据之前,还包括:The method according to claim 3, further comprising: before the reading of the sixth data of 2 Pbytes from the Cache according to the address information in the external storage according to the first data, :从所述Cache的外部存储中读取包含所述第一数据的cacheline;Reading a cacheline containing the first data from an external storage of the Cache;将所述包含所述第一数据的cacheline写入所述Cache中。Writing the cacheline containing the first data into the Cache.
- 根据权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 4, wherein the method further comprises:获取所述处理器指令流水线发送的写指令,所述写指令包括待写入的第八数据和所述第八数据的字节数T,其中,T≤P,T为整数;Obtaining, by the processor instruction pipeline, a write instruction, where the write instruction includes an eighth data to be written and a byte number T of the eighth data, where T≤P, T is an integer;当所述外部存储支持的数据格式与所述处理器指令流水线支持的数据格式不一致时,将所述第八数据循环左移第三字节数,得到2P字节的第九数据,其中,Index 3=~(Address[n:0]+T–1),Index3表示所述第三字节数;When the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the eighth data is rotated to the left by a third byte number to obtain a ninth data of 2P bytes, wherein Index 3=~(Address[n:0]+T-1), Index3 represents the third byte number;将所述第九数据写入所述Cache中。The ninth data is written into the Cache.
- 根据权利要求5所述的方法,其特征在于,在所述获取所述处理器指令流水线发送的写指令之后,还包括:The method of claim 5, after the obtaining the write instruction sent by the processor instruction pipeline, further comprising:当所述外部存储支持的数据格式与所述处理器指令流水线支持的数据格式一致时,将所述第八数据循环左移第四字节数,得到2P字节的第十数据,其中,Index 4=(Address[n:0]),Index4表示所述第四字节数;When the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the eighth data is rotated to the left by a fourth byte number to obtain a 10P data of 10 Pbytes, wherein Index 4=(Address[n:0]), Index4 represents the fourth byte number;将所述第十数据写入所述Cache中。The tenth data is written into the Cache.
- 一种数据处理装置,其特征在于,所述装置包括获取模块、确定模块、读取模块、移位模块和发送模块;A data processing device, comprising: an acquisition module, a determination module, a reading module, a shift module, and a sending module;所述获取模块,用于获取处理器指令流水线发送的读指令,所述读指令包括待读取的第一数据在缓存Cache的外部存储中的地址信息,其中,所述Cache的读写宽度为2P字节,所述第一数据的字节数K≤P,K和P均为正整数;The obtaining module is configured to acquire a read command sent by the processor instruction pipeline, where the read command includes address information of the first data to be read in an external storage of the cache cache, where the read/write width of the cache is 2P bytes, the number of bytes of the first data K ≤ P, K and P are positive integers;所述确定模块,用于当所述外部存储支持的数据格式与所述处理器指令流水线支持的数据格式不一致时,根据所述第一数据在所述外部存储中的地址信息,确定第二数据在所述Cache中的地址信息,其中,所述第二数据为第三数据中与所述第一数据对应的数据,所述第三数据为将包含所述第一数据的高速缓存块cacheline进行大小端格式转换后得到的数据,所述cacheline的大小≥2P;The determining module is configured to determine the second data according to the address information of the first data in the external storage when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline Address information in the Cache, wherein the second data is data corresponding to the first data in the third data, and the third data is a cache block cacheline that includes the first data. The data obtained after the conversion of the large and small end format, the size of the cacheline is ≥ 2P;所述读取模块,用于根据所述第二数据在所述Cache中的地址信息,从所述Cache中读取2P字节的第四数据,其中,所述第四数据中包含所述第二数据;The reading module is configured to read, according to the address information of the second data in the Cache, the second data of 2 Pbytes from the Cache, where the fourth data includes the Two data;所述移位模块,用于将所述第四数据循环右移第一字节,得到第五数据,其中,所述第五数据中包含所述第二数据,所述第二数据为所述第五数据对应的2P字节地址中低K位地址上的数据,Index 1=~(Address[n:0]+K–1),Index1表示所述第一字节,~表示取反,n=log 2P,Address[n:0]表示所述第一数据在所述Cache的外部存储中的首地址中低(n+1)位地址的值; The shifting module is configured to rotate the fourth data right to the first byte to obtain a fifth data, where the fifth data includes the second data, and the second data is the The data on the lower K-bit address in the 2P byte address corresponding to the fifth data, Index 1 = ~ (Address [n: 0] + K - 1), Index 1 indicates the first byte, ~ indicates negation, n =log 2 P, Address[n:0] represents the value of the low (n+1)-bit address in the first address of the first data in the external storage of the Cache;所述发送模块,用于将所述第二数据发送给所述处理器指令流水线。The sending module is configured to send the second data to the processor instruction pipeline.
- 根据权利要求7所述的装置,其特征在于,所述装置还包括格式转换模块和写入模块;The device according to claim 7, wherein the device further comprises a format conversion module and a writing module;所述读取模块,还用于在所述根据所述第二数据在所述Cache中的地址信息,从所述Cache中读取2P字节的第四数据之前,从所述Cache的外部存储中读取包含所述第一数据的cacheline;The reading module is further configured to: store the external data from the Cache before reading the second data of 2 P bytes from the Cache according to the address information in the Cache according to the second data. Reading a cacheline containing the first data;所述格式转换模块,用于将所述包含所述第一数据的cacheline进行大小端数据格式转换,得到所述第三数据;The format conversion module is configured to perform a size end data format conversion on the cacheline including the first data to obtain the third data;所述写入模块,用于将所述第三数据写入所述Cache中。The writing module is configured to write the third data into the Cache.
- 根据权利要求7或8所述的装置,其特征在于,Device according to claim 7 or 8, characterized in that所述读取模块,还用于在所述获取模块获取处理器指令流水线发送的读指令之后,当所述外部存储支持的数据格式与所述处理器指令流水线支持的数据格式一致时,根据所述第一数据在所述外部存储中的地址信息,从所述Cache中读取2P字节的第六数据,其中,所述第六数据中包含所述第一数据;The reading module is further configured to: after the obtaining module acquires a read instruction sent by the processor instruction pipeline, when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, The address information of the first data in the external storage, the second data of 2P bytes is read from the Cache, wherein the sixth data includes the first data;所述移位模块,还用于将所述第六数据循环右移第二字节,得到第七数据,所述第七数据中包含所述第一数据,所述第一数据为所述第七数据对应的2P字节地址中低K位地址上的数据,其中,Index2=Address[n:0],Index2表示所述第二字节,n=log 2P,Address[n:0]表示所述第一数据在所述Cache的外部存储中的首地址中低(n+1)位地址的值; The shifting module is further configured to: shift the sixth data right to the second byte to obtain the seventh data, where the seventh data includes the first data, and the first data is the first data The data on the lower K-bit address of the 2P byte address corresponding to the seven data, wherein Index2=Address[n:0], Index2 represents the second byte, n=log 2 P, and Address[n:0] represents a value of the low (n+1)-bit address in the first address of the first data in the external storage of the Cache;所述发送模块,还用于将所述第一数据发送给所述处理器指令流水线。The sending module is further configured to send the first data to the processor instruction pipeline.
- 根据权利要求9所述的装置,其特征在于,The device of claim 9 wherein:所述读取模块,还用于在所述根据所述第一数据在所述外部存储中的地址信息,从所述Cache中读取2P字节的第六数据之前,从所述Cache的外部存储中读取包含所述第一数据的cacheline;The reading module is further configured to: before the sixth data of 2P bytes is read from the Cache according to the address information in the external storage according to the first data, from outside the Cache Reading a cacheline containing the first data in the storage;所述写入模块,还用于将所述包含所述第一数据的cacheline写入所述Cache中。The writing module is further configured to write the cacheline including the first data into the Cache.
- 根据权利要求7-10任一项所述的装置,其特征在于,所述装置还包括写入模块;The device according to any one of claims 7 to 10, wherein the device further comprises a writing module;所述获取模块,还用于获取所述处理器指令流水线发送的写指令,所述写指令包括待写入的第八数据和所述第八数据的字节数T,其中,T≤P,T为整数;The obtaining module is further configured to acquire a write instruction sent by the processor instruction pipeline, where the write command includes an eighth data to be written and a byte number T of the eighth data, where T≤P, T is an integer;所述移位模块,还用于当所述外部存储支持的数据格式与所述处理器指令流水线支持的数据格式不一致时,将所述第八数据循环左移第三字节数,得到2P字节的第九数据,其中,Index 3=~(Address[n:0]+T–1),Index3表示所述第三字节数;The shifting module is further configured to: when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, shift the eighth data loop to the left by a third byte number to obtain a 2P word The ninth data of the section, wherein Index 3 = ~ (Address [n: 0] + T - 1), Index 3 represents the third number of bytes;所述写入模块,还用于将所述第九数据写入所述Cache中。The writing module is further configured to write the ninth data into the Cache.
- 根据权利要求11所述的装置,其特征在于,The device of claim 11 wherein:所述移位模块,还用于在所述获取模块获取所述处理器指令流水线发送的写指令之后,当所述外部存储支持的数据格式与所述处理器指令流水线支持的数据格式一致时,将所述第八数据循环左移第四字节数,得到2P字节的第十数据,其中,Index 4=(Address[n:0]),Index4表示所述第四字节数;The shifting module is further configured to: after the obtaining module acquires the write instruction sent by the processor instruction pipeline, when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, And shifting the eighth data to the left by a fourth byte number to obtain 10 data of 10 Pbytes, wherein Index 4=(Address[n:0]), and Index4 represents the fourth byte number;所述写入模块,还用于将所述第十数据写入所述Cache中。The writing module is further configured to write the tenth data into the Cache.
- 一种数据处理装置,其特征在于,包括:处理器、存储器、总线和通信接口;A data processing device, comprising: a processor, a memory, a bus, and a communication interface;所述存储器用于存储计算机执行指令,所述处理器与所述存储器通过所述总线连接,当所述数据处理装置运行时,所述处理器执行所述存储器存储的所述计算机执行指令,以使所述数据处理装置执行如权利要求1-6中任意一项所述的数据处理方法。The memory is configured to store a computer executing instructions, the processor is coupled to the memory via the bus, and when the data processing device is in operation, the processor executes the computer executed instructions stored in the memory to The data processing apparatus is caused to perform the data processing method according to any one of claims 1-6.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710157711.3A CN108628638B (en) | 2017-03-16 | 2017-03-16 | Data processing method and device |
CN201710157711.3 | 2017-03-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018166337A1 true WO2018166337A1 (en) | 2018-09-20 |
Family
ID=63521829
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/077026 WO2018166337A1 (en) | 2017-03-16 | 2018-02-23 | Data processing method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108628638B (en) |
WO (1) | WO2018166337A1 (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109683959B (en) * | 2018-12-24 | 2020-12-01 | 安谋科技(中国)有限公司 | Instruction execution method of processor and processor thereof |
CN113157635B (en) * | 2019-09-25 | 2024-01-05 | 支付宝(杭州)信息技术有限公司 | Method and device for realizing contract call on FPGA |
CN111125715A (en) * | 2019-12-18 | 2020-05-08 | 深圳忆联信息系统有限公司 | TCG data processing acceleration method and device based on solid state disk, computer equipment and storage medium |
CN111258785B (en) * | 2020-01-20 | 2023-09-08 | 北京百度网讯科技有限公司 | Data shuffling method and device |
CN113766270B (en) * | 2021-02-26 | 2024-06-18 | 北京沃东天骏信息技术有限公司 | Video playing method, system, server, terminal equipment and electronic equipment |
CN113778526B (en) * | 2021-11-12 | 2022-02-22 | 北京微核芯科技有限公司 | Cache-based pipeline execution method and device |
CN117093510B (en) * | 2023-05-30 | 2024-04-09 | 中国人民解放军军事科学院国防科技创新研究院 | Cache high-efficiency indexing method for general purpose of size end |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040230765A1 (en) * | 2003-03-19 | 2004-11-18 | Kazutoshi Funahashi | Data sharing apparatus and processor for sharing data between processors of different endianness |
CN102135941A (en) * | 2010-08-26 | 2011-07-27 | 华为技术有限公司 | Method and device for writing data from cache to memory |
CN104156323A (en) * | 2014-08-07 | 2014-11-19 | 浪潮(北京)电子信息产业有限公司 | Method and device for reading length of data block of cache memory in self-adaption mode |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9489307B2 (en) * | 2012-10-24 | 2016-11-08 | Texas Instruments Incorporated | Multi domain bridge with auto snoop response |
-
2017
- 2017-03-16 CN CN201710157711.3A patent/CN108628638B/en active Active
-
2018
- 2018-02-23 WO PCT/CN2018/077026 patent/WO2018166337A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040230765A1 (en) * | 2003-03-19 | 2004-11-18 | Kazutoshi Funahashi | Data sharing apparatus and processor for sharing data between processors of different endianness |
CN102135941A (en) * | 2010-08-26 | 2011-07-27 | 华为技术有限公司 | Method and device for writing data from cache to memory |
CN104156323A (en) * | 2014-08-07 | 2014-11-19 | 浪潮(北京)电子信息产业有限公司 | Method and device for reading length of data block of cache memory in self-adaption mode |
Also Published As
Publication number | Publication date |
---|---|
CN108628638B (en) | 2021-02-09 |
CN108628638A (en) | 2018-10-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018166337A1 (en) | Data processing method and device | |
US11294675B2 (en) | Writing prefetched data into intra-core caches of cores identified by prefetching instructions | |
US20250123986A1 (en) | Computing system, pci device manager and initialization method thereof | |
TWI463332B (en) | Provision of extended addressing modes in a single instruction multiple data (simd) data processor | |
CN100428214C (en) | Data sharing device for sharing data among processors with different byte orders and processor | |
US11436146B2 (en) | Storage control apparatus, processing apparatus, computer system, and storage control method | |
US9063860B2 (en) | Method and system for optimizing prefetching of cache memory lines | |
WO2020000482A1 (en) | Nvme-based data reading method, apparatus and system | |
US11275683B2 (en) | Method, apparatus, device and computer-readable storage medium for storage management | |
CN114924793A (en) | Processing unit, computing device and instruction processing method | |
US8359433B2 (en) | Method and system of handling non-aligned memory accesses | |
CN112000589A (en) | Data writing method, data reading device and electronic equipment | |
CN114490439A (en) | Data writing, reading and communication method based on lock-free ring shared memory | |
CN105843771A (en) | Method for communication among EDMA (enhanced direct memory access) devices with different bandwidths in multi-core DSP (digital signal processor) | |
CN110781107A (en) | Low-delay fusion IO control method and device based on DRAM interface | |
US10394733B2 (en) | Data transfer using a descriptor | |
US10802828B1 (en) | Instruction memory | |
WO2019084789A1 (en) | Direct memory access controller, data reading method, and data writing method | |
CN114924792A (en) | Instruction decoding unit, instruction execution unit, and related devices and methods | |
CN113609041A (en) | Data transmission method and system | |
CN119135601B (en) | Stream table unloading system, method, equipment and cluster | |
US11487680B2 (en) | Apparatus and method for burst mode data storage | |
CN119645919B (en) | Data transmission method, device, computing device and storage medium | |
CN120670158A (en) | Memory semantics processing method, device, equipment, medium and product | |
CN114358179A (en) | Prefetch training method for processor, processing device, processor and computing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18766836 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18766836 Country of ref document: EP Kind code of ref document: A1 |