WO2018166337A1

WO2018166337A1 - Data processing method and device

Info

Publication number: WO2018166337A1
Application number: PCT/CN2018/077026
Authority: WO
Inventors: 徐志通; 孙璐; 熊礼文; 崔鲁平; 陈俊锐; 余谓为; 李又麟
Original assignee: 华为技术有限公司
Priority date: 2017-03-16
Filing date: 2018-02-23
Publication date: 2018-09-20
Also published as: CN108628638B; CN108628638A

Abstract

Disclosed in the present application are a data processing method and device, for use in reducing the load hit latency when big-endian data format and little-endian data format mismatch. The method comprises: obtaining a read instruction sent by a processor instruction pipeline, wherein the read instruction comprises address information of first data to be read in an external memory of a cache, the read/write width of the cache is 2P bytes, and the number of bytes of the first data, i.e., K, is less than or equal to P; when the data format supported by the external memory mismatches the data format supported by the processor instruction pipeline, determining address information of second data in the cache according to the address information of the first data in the external memory, wherein the second data is data in third data and corresponding to the first data, and the third data is data obtained by converting a cacheline comprising the first data between big endian format and little endian format; reading 2P bytes of fourth data from the cache according to the address information of the second data in the cache; rotating the fourth data right by a first byte to obtain fifth data; and sending the second data in the fifth data to the processor instruction pipeline.

Description

Data processing method and device

The present application claims the priority of the Chinese Patent Application, the entire disclosure of which is hereby incorporated by reference.

Technical field

The present application relates to the field of computer technology, and in particular, to a data processing method and apparatus.

Background technique

Usually, the hardware system of a computer device needs to support both the big end data format and the little end data format. However, in order to simplify the design, at present, the processor instruction pipeline in the computer device often only supports one data format, for example, only supports the little end data format or only supports the big end data format.

In this way, after the computer device writes the data of the external memory of the cache to the cache in the form of a cacheline, the data read by the computer device from the cache is sent to the processor instruction pipeline. Before processing, you first need to match the data format. When the data format in the external storage operation is consistent with the data format supported by the processor instruction pipeline (Endianness match), it can be sent to the processor instruction pipeline processing. When the data format in the external storage work is inconsistent with the data format supported by the processor instruction pipeline (Endianness Mismatch), as shown in Figure 1, the data read from the Cache needs to be right-shifted by Data Alignment and size. Load Hit Path, such as Endian Conversion and Sign Extension, can be sent to the processor instruction pipeline for subsequent processing. In this process, the data format of the big and small ends is inconsistent. The size of the data format conversion processing logic will inevitably lead to an increase in the load hit latency (Load Hit Latency). Considering that the load hit path is often the key path of the cache design, how to reduce the load hit delay when the data format of the big and small ends is inconsistent is an urgent problem to be solved.

Summary of the invention

The embodiment of the present application provides a data processing method and apparatus, so as to at least reduce a load hit delay when data formats of different sizes are inconsistent.

To achieve the above objective, the embodiment of the present application provides the following technical solutions:

In a first aspect, a data processing method is provided, the method comprising: acquiring a read instruction sent by a processor instruction pipeline, where the read instruction includes address information of an first data to be read in an external storage of a cache cache, where The read/write width of the Cache is 2P bytes, and the number of bytes of the first data is K≤P, and both K and P are positive integers; when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline Determining address information of the second data in the Cache according to the address information of the first data in the external storage, where the second data is data corresponding to the first data in the third data, the third The data is data obtained by converting the cache block cacheline containing the first data into a size-end format, and the size of the cacheline is ≥ 2P; according to the address information of the second data in the Cache, 2P is read from the cache. a fourth data of the byte, wherein the fourth data includes the second data; the fourth data is rotated right by the first byte to obtain a fifth data, wherein the fifth data packet The second data is data on a low-K address in a 2P byte address corresponding to the fifth data, Index 1=~(Address[n:0]+K-1), and Index1 indicates the first data. One byte, ~ indicates negation, n = log ₂ P, Address[n: 0] indicates the value of the low (n+1)-bit address of the first address of the first data in the external storage of the Cache; The second data is sent to the processor instruction pipeline. That is to say, in the embodiment of the present application, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the data read from the Cache is the data converted by the format of the large and small end, and thus is in the Cache. After the data is read, there is no need to perform the format conversion of the large and small end, thereby avoiding the load hit delay caused by the data conversion processing logic of the large and small end data introduced by the data format inconsistent after the data is read from the Cache in the prior art. The added problem reduces the load hit delay when the data format of the big and small ends is inconsistent.

In a possible design, before reading the 2P bytes of the fourth data from the Cache according to the address information in the Cache according to the second data, the method further includes: reading the inclusion from the external storage of the Cache a cacheline of the first data; converting the cacheline containing the first data to a data format of the size end, to obtain the third data; and writing the third data into the cache. Based on the scheme, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the data in the external storage can be written into the Cache, and the data written in the Cache is converted by the size end format. data. Furthermore, after the data is read from the Cache, the size-to-size format conversion is not required, thereby avoiding the data processing of the big-end data format introduced by the inconsistent data format of the large and small ends after reading data from the Cache in the prior art. The resulting increase in load hit latency increases the load hit latency when the data format on the big and small ends is inconsistent.

In a possible design, after acquiring the read instruction sent by the processor instruction pipeline, the method further includes: when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, according to the first data The address information in the external storage reads the 6P bytes of the sixth data from the Cache, wherein the sixth data includes the first data; and the sixth data is rotated right to the second byte to obtain the first Seven data, the seventh data includes the first data, the first data is data on a low K bit address in a 2P byte address corresponding to the seventh data, where Index2=Address[n:0], Index2 Representing the second byte, n=log ₂ P, Address[n:0] indicating the value of the low (n+1)-bit address of the first address of the first data in the external storage of the Cache; Data is sent to the processor instruction pipeline. Based on the scheme, the data in the Cache can be sent to the processor instruction pipeline when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline.

In a possible design, before reading the 6P bytes of the sixth data from the Cache according to the address information in the external storage according to the first data, the method further includes: reading from the external storage of the Cache a cacheline containing the first data; writing the cacheline containing the first data to the cache. Based on the scheme, when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the data in the external storage can be written into the Cache.

In a possible design, the method further includes: acquiring a write instruction sent by the processor instruction pipeline, where the write instruction includes an eighth data to be written and a byte number T of the eighth data, where T≤ P, T is an integer; when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the eighth data is rotated to the left by the third byte number to obtain the ninth data of 2P bytes. Where Index 3 = ~ (Address [n: 0] + T - 1), Index 3 represents the third byte number; the ninth data is written into the Cache. Based on the scheme, the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline.

In a possible design, after acquiring the write instruction sent by the processor instruction pipeline, the method further includes: when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the eighth data is Looping the fourth byte number to the left to obtain the 10th data of 2P bytes, where Index 4=(Address[n:0]), Index4 indicates the fourth byte number; writing the tenth data to the Cache in. Based on the scheme, the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline.

In a second aspect, an embodiment of the present application provides a data processing apparatus, which has a function of implementing behavior of a data processing apparatus in the foregoing method embodiment. This function can be implemented in hardware or in hardware by executing the corresponding software. The hardware or software includes one or more modules corresponding to the functions described above.

In a third aspect, an embodiment of the present application provides a data processing apparatus, including: a processor, a memory, a bus, and a communication interface; the memory is configured to store a computer execution instruction, and the processor is connected to the memory through the bus, when the data is The processor executes the computer-executable instructions stored in the memory to cause the data processing apparatus to perform the data processing method of any of the first aspects described above.

In a fourth aspect, an embodiment of the present application provides a computer readable storage medium, configured to store computer software instructions used by the data processing apparatus, when executed on a computer, to enable the computer to perform any of the foregoing first aspects. A data processing method.

In a fifth aspect, an embodiment of the present application provides a computer program product comprising instructions, which when executed on a computer, enable the computer to perform the data processing method of any of the above first aspects.

For the technical effects brought by any one of the second aspect to the fifth aspect, refer to the technical effects brought by different design modes in the first aspect, and details are not described herein again.

DRAWINGS

1 is a logic block diagram of data processing when the data formats of the big and small ends are inconsistent in the prior art;

2 is a schematic structural diagram of a hierarchical storage multi-core system to which the embodiment of the present application is applied;

FIG. 3 is a schematic structural diagram of hardware of a data processing apparatus according to an embodiment of the present disclosure;

4 is a schematic flowchart 1 of a data processing method according to an embodiment of the present application;

FIG. 5 is a logic block diagram of data processing when the data formats of the big and small ends are inconsistent according to an embodiment of the present disclosure;

FIG. 6 is a second schematic flowchart of a data processing method according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram 1 of an example of a data processing method according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram 2 of an example of a data processing method according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram 3 of an example of a data processing method according to an embodiment of the present disclosure;

FIG. 10 is a schematic diagram 4 of an example of a data processing method according to an embodiment of the present disclosure;

FIG. 11 is a schematic structural diagram 1 of a data processing apparatus according to an embodiment of the present disclosure;

FIG. 12 is a second schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure;

FIG. 13 is a schematic structural diagram 3 of a data processing apparatus according to an embodiment of the present disclosure.

detailed description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. In the description of the present application, unless otherwise stated, "/" means the meaning of or, for example, A/B may represent A or B; "and/or" herein is merely an association describing the associated object. The relationship indicates that there may be three kinds of relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately. In addition, in the description of the present application, "a plurality" means two or more than two.

FIG. 2 is a schematic structural diagram of a hierarchical storage multi-core system to which the embodiment of the present application is applied. As shown in FIG. 2, the multi-core system 100 includes a bus 101, a multi-core processor 102 connected to the bus 101, and a memory 103 connected to the bus 101.

The memory 103 may be a random access memory (English: Random Access Memory, abbreviation: RAM), or a dynamic random access memory (English: Dynamic Random Access Memory, DRAM), etc. No specific limitation.

The bus 101 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The bus can be divided into an address bus, a data bus, a control bus, and the like. For convenience of representation, only one thick line is shown in Figure 2, but it does not mean that there is only one bus or one type of bus.

The multi-core processor 102 includes a plurality of processor cores, such as a processor core 102a, a processor core 102b, ..., a processor core 102c, which may be a central processing unit (English: Central Processing Unit, abbreviation: CPU) The core may be a graphics processing unit (English: Graphic Processing Unit, GPU) core, which is not specifically limited in this embodiment. Among them, these processor cores are mainly used to perform calculations, and each processor core has its own level 1 cache (English: Level 1 Cache, abbreviation: L1C) and Level 2 Cache (abbreviation: L2C); The processor core shares a last level cache (English: Last Level Cache, abbreviation: LLC); multiple multi-core processors share a single memory. When a processor core receives an instruction to read data, first check whether the address exists in L1C. If it exists, the processor core directly reads the data from L1C. If the address does not exist, then the The processor core will continue to look into the L2C, and so on.

Of course, the embodiment of the present application is also applicable to a single-core system or a system including a Cache having a similar hierarchical storage structure, which is not specifically limited in this embodiment of the present application.

FIG. 3 is a schematic diagram showing the hardware structure of a data processing apparatus 30 according to an embodiment of the present application. The data processing device 30 includes a processor 301, a memory 302, a communication interface 304, and a bus 303. The processor 301, the communication interface 304, and the memory 302 are connected to one another via a bus 303.

The processor 301 is the control center of the data processing device 30, connecting the various portions of the entire data processing device 30 via the bus 303, by running or executing software programs and/or modules stored in the memory 302, and recalling stored in the memory 302. The data, various functions of the data processing device 30 and processing data are executed to thereby perform overall monitoring of the data processing device 30.

Optionally, the processor 301 can be any one of the processor cores in FIG. 2 above.

The memory 302 can be used to store software programs and modules, and the processor 301 executes various functional applications and data processing of the data processing device 30 by running software programs and modules stored in the memory 302. The memory 302 mainly includes a storage program area and a storage data area, wherein the storage program area can store an operating system, an application 2 required for at least one function, and the like; the storage data area can store data created according to the use of the data processing apparatus 30, and the like. . Moreover, memory 302 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

Optionally, the memory 302 can be the memory in FIG. 2 above.

The bus 303 can be a PCI bus or an EISA bus or the like. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 3, but it does not mean that there is only one bus or one type of bus.

Alternatively, the bus 303 may be the bus of FIG. 2 described above.

Communication interface 304 is used for communication of data processing device 30 with external devices.

Although not shown, the data processing device 30 may also include a radio frequency (English: Radio Frequency, abbreviated as RF) circuit, an audio circuit, a communication interface, and/or a plurality of sensors, which are not specifically limited in this embodiment of the present application.

The data processing device 30 provided by the embodiment of the present application may be used when the processor 301 is the processor core of the foregoing FIG. 2, the memory 302 is the memory in FIG. 2, and the bus 303 is the bus in FIG. It is the multi-core system in FIG. 2 above, and the embodiment of the present application does not specifically limit the situation.

As shown in FIG. 4 , a schematic flowchart of a data processing method provided by an embodiment of the present application includes the following steps:

S401. The data processing device acquires a read instruction sent by the processor instruction pipeline, where the read instruction includes address information of the first data to be read in an external storage of the Cache.

The read/write width of the Cache is 2P bytes, and the number of bytes of the first data is K≤P, and both K and P are positive integers.

In the embodiment of the present application, the read/write width of the Cache is 2P bytes, which means that when the data is read from the Cache, 2P bytes of data are read each time; when the data is written into the Cache, 2P is written each time. Byte of data.

The Cache in the step S401 may be the L1C, and the external storage of the Cache may be the L2C, the LLC or the memory, etc., in the embodiment of the present application, the data processing device in the embodiment of the present application is the multi-core system in FIG. No specific limitation.

S402. When the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the data processing apparatus determines, according to the address information of the first data in the external storage, the address information of the second data in the Cache, where The data is the data corresponding to the first data in the third data, and the third data is the data obtained by converting the cache line containing the first data into a large and small format, and the size of the cacheline is ≥ 2P.

S403. The data processing device reads the second data of 2 Pbytes from the Cache according to the address information of the second data in the Cache, where the fourth data includes the second data.

S404. The data processing device shifts the fourth data right by the first byte to obtain the fifth data, where the fifth data includes the second data, where the second data is the lower K address of the 2P byte address corresponding to the fifth data. The data on it.

Where Index 1 = ~ (Address [n: 0] + K - 1), Index1 represents the first byte, ~ represents inversion, n = log ₂ P, Address [n: 0] represents the first data in the Cache The value of the low (n+1)-bit address in the first address in the external storage.

It should be noted that, in the embodiment of the present application, the data is shifted to the right to achieve data right alignment. The reason why the data is right-aligned is that when the processor instruction pipeline reads the data, the rightmost byte is the byte corresponding to the first address of the read instruction, so the data needs to be right before being sent to the processor instruction pipeline. Alignment, a unified description here, the details are not described below.

S405. The data processing device sends the second data to the processor instruction pipeline.

That is, in the embodiment of the present application, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the data read from the Cache is data converted by the size end format, and thus is read from the Cache. After the data is fetched, there is no need to perform a large-scale format conversion.

As shown in FIG. 5, the data processing logic block diagram when the data format of the big and small ends provided by the embodiment of the present application is inconsistent. It can be seen from FIG. 5 that, in the embodiment of the present application, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the format conversion of the big end is located in the write channel of the cache, that is, the data is in the data. The size conversion is performed in the process of writing to the Cache from the external storage. Further, when reading data from the Cache, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the Cache can be directly accessed from the Cache. The data after the size conversion is read, and the data is right aligned to obtain the required data.

It should be noted that the symbol bit extension in FIG. 5 is an optional operation in the data processing method provided by the present application. The specific implementation may refer to the existing processing manner, which is not specifically limited in this embodiment of the present application.

According to the data processing method provided by the embodiment of the present application, in the embodiment of the present application, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the first information is determined according to the address information in the external storage. The address information of the second data in the Cache, wherein the second data is the data corresponding to the first data in the third data, and the three data is the data obtained by converting the cacheline containing the first data into a format of the size end; Reading the second data of the 2P bytes of the second data from the Cache according to the address information of the second data in the Cache; then shifting the fourth data right to the first byte to obtain the fifth data including the second data Data, and the second data is sent to the processor instruction pipeline, which is the data on the lower K-bit address of the 2P byte address corresponding to the fifth data. That is to say, in the embodiment of the present application, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the data read from the Cache is the data converted by the format of the large and small end, and thus is in the Cache. After the data is read, there is no need to perform the format conversion of the large and small end, thereby avoiding the load hit delay caused by the data conversion processing logic of the large and small end data introduced by the data format inconsistent after the data is read from the Cache in the prior art. The added problem reduces the load hit delay when the data format of the big and small ends is inconsistent.

Further, before the data processing device reads the second data of 2 Pbytes from the Cache according to the address information of the second data in the Cache (step S403), the data processing device may further include:

The data processing device reads the cacheline containing the first data from the external storage of the Cache; converts the cacheline containing the first data into the data format of the large and small end to obtain the third data; and writes the third data into the Cache.

It should be noted that, in the embodiment of the present application, since the read/write width of the Cache is 2 Pbytes, the size of the cacheline is ≥2P, and the size of the third data is the size of the cacheline, when the third data is written into the Cache, It may be written in a plurality of write processes. For example, if the size of the cacheline is 4P, it is written in two write processes, which is not specifically limited in this embodiment of the present application.

Based on the scheme, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the data in the external storage can be written into the Cache, and the data written in the Cache is converted by the size end format. data. Furthermore, after the data is read from the Cache, the size-to-size format conversion is not required, thereby avoiding the data processing of the big-end data format introduced by the inconsistent data format of the large and small ends after reading data from the Cache in the prior art. The resulting increase in load hit latency increases the load hit latency when the data format on the big and small ends is inconsistent.

Optionally, as shown in FIG. 6, after the data processing device acquires the read command sent by the processor instruction pipeline (step S401), the method may further include the following steps:

S406. When the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the data processing device reads the 6P bytes of the sixth data from the Cache according to the address information of the first data in the external storage, where The sixth data includes the first data.

S407. The data processing device shifts the sixth data right to the second byte to obtain the seventh data. The seventh data includes the first data, where the first data is the lower K address of the 2P byte address corresponding to the seventh data. The data on it.

Where Index2=Address[n:0], Index2 represents the second byte, n=log ₂ P, and Address[n:0] indicates that the first data is low in the first address in the external storage of the Cache (n+1) The value of the bit address.

S408. The data processing device sends the first data to the processor instruction pipeline.

That is, in the embodiment of the present application, when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the size end format conversion is not required. Therefore, the data read from the Cache is the data written to the Cache from the external storage.

Based on the scheme, the data in the Cache can be sent to the processor instruction pipeline when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline.

Further, before the data processing device reads the second data of 2 Pbytes from the Cache according to the address information in the external storage of the first data (step S406), the method may further include:

The data processing device reads the cacheline containing the first data from the external storage of the Cache; and writes the cacheline containing the first data into the Cache.

It should be noted that, in the embodiment of the present application, since the read/write width of the Cache is 2 Pbytes and the size of the cacheline is ≥ 2 P, when the cacheline containing the first data is written into the Cache, it may be written multiple times. The process is written, for example, if the size of the cacheline is 4P, it is written in two write processes, which is not specifically limited in this embodiment of the present application.

Based on the scheme, when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the data in the external storage can be written into the Cache.

Optionally, the data processing method provided by the embodiment of the present application may further include: the data processing device acquires a write instruction sent by the processor instruction pipeline, where the write instruction includes the eighth data to be written and the number of bytes of the eighth data T. Where T≤P, T is an integer; when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the eighth data loop is shifted left by the third byte number to obtain the ninth of the 2P byte Data, where Index3=~(Address[n:0]+T-1), Index3 represents the third byte number; the ninth data is written into the Cache.

Based on the scheme, the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline.

Optionally, after the data processing device acquires the write instruction sent by the processor instruction pipeline, the method may further include: when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, shifting the eighth data to the left The four-byte number yields the tenth data of 2P bytes, where Index 4 = (Address [n: 0]), Index 4 represents the fourth byte number, and the tenth data is written into the Cache.

Based on the scheme, the data in the processor instruction pipeline can be written into the Cache when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline.

The data processing method provided by the embodiment of the present application is further described below with reference to a specific example.

Exemplarily, with 2P=16 bytes, the processor instruction pipeline only supports the little endian data format, the external storage supports the big end data format, and the size of the cacheline is 16 bytes, for example,

When external storage works in big endian data format:

The data processing device inverts the 16-byte cacheline in the external storage in bytes and stores it in the Cache, which is equivalent to storing the 16-byte big endian data in the cache in a little endian format. When the processor instruction pipeline initiates a read instruction, the little endian data can be read directly from the Cache. In addition, since the cacheline is reversed when the data is written to the Cache, when the data is right-aligned, the index that is rotated rightward needs to be inversely compensated.

For example, in Figure 7, it is assumed that the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data at the 0x7, 0x8, 0x9, and 0xA addresses in the external storage, because of the data in the external storage. Stored in big endian mode, therefore, when writing data to the Cache, the entire cacheline needs to be inverted in bytes and written to the Cache, as shown in Figure 7. B0 is written to the 0x5 address in the Cache, B1 is written to the 0x6 address in the Cache, B2 is written to the 0x7 address in the Cache, and B3 is written to the 0x8 address in the Cache. At this point, the data is already stored in little form in the Cache. When the processor instruction pipeline needs to read data at addresses 0x7, 0x8, 0x9, and 0xA, the Cache outputs the cacheline data as shown in mem_data_o. At this time, the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index. Among them, Index=~(Address[n:0]+K-1). In this example, Address[n:0]=7, K=4, therefore, Index=~(Address[n:0]+K–1)=~(7+4–1)=5, that is, loop right Move 5 bytes, as shown in Figure 7. Finally, the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.

Of course, when the data B3, B2, B1, and B0 on the 0x0, 0x1, 0x2, and 0x3 addresses in the processor instruction pipeline are written into the Cache, it is necessary to cyclically shift 5 bytes to the left to obtain the data in the Cache in FIG. The embodiments of the present application are not described herein again.

Or, for example, in FIG. 8, it is assumed that the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data on the addresses of 0xE, 0xF, 0x10, and 0x11 in the external storage, and it is required to cross the cacheline. Since the data is stored in the big end mode in the external storage, when the data is written to the Cache, the entire cacheline needs to be inverted in bytes and then written into the Cache, as shown in FIG. B0 is written to the 0x1E address in the Cache, B1 is written to the 0x1F address in the Cache, B2 is written to the 0x0 address in the Cache, and B3 is written to the 0x1 address in the Cache. At this point, the data is already stored in little form in the Cache. When the processor instruction pipeline needs to read data at addresses 0xE, 0xF, 0x10, and 0x11, the Cache outputs the cacheline data as shown in mem_data_o. At this time, the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index. Among them, Index=~(Address[n:0]+K-1). In this example, Address[n:0]=14, K=4, therefore, Index=~(Address[n:0]+K–1)=~(14+4–1)=14, that is, loop right Move 14 bytes, as shown in Figure 8. Finally, the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.

Of course, when the data B3, B2, B1, and B0 at the addresses 0x0, 0x1, 0x2, and 0x3 in the processor instruction pipeline are written into the Cache, it is necessary to rotate the left byte by 14 bytes to obtain the data in the Cache in FIG. The embodiments of the present application are not described herein again.

When external storage works in little endian data format:

The data processing device writes the 16-byte cacheline in the external storage directly into the Cache. When the processor instruction pipeline initiates a read instruction, the little endian data can be read directly from the Cache. In addition, since the cacheline is not reversed when the data is written to the Cache, the right alignment of the data is performed, and the index that is rotated rightward does not need to be inversely compensated.

For example, in Figure 9, assume that the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data at the 0x7, 0x8, 0x9, and 0xA addresses in the external storage, because of the data in the external storage. Stored in little endian mode, so when writing data to the Cache, you can write the entire cacheline directly into the Cache, as shown in Figure 7. B0 is written to the 0x7 address in the Cache, B1 is written to the 0x8 address in the Cache, B2 is written to the 0x9 address in the Cache, and B3 is written to the 0xA address in the Cache. At this point, the data is still stored in little form in the Cache. When the processor instruction pipeline needs to read data at addresses 0x7, 0x8, 0x9, and 0xA, the Cache outputs the cacheline data as shown in mem_data_o. At this time, the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index. Where Index=Address[n:0]. In this example, Address[n:0]=7, therefore, Index=Address[n:0]=7, that is, the loop is shifted right by 7 bytes, as shown in FIG. Finally, the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.

Of course, when the data B3, B2, B1, and B0 at the addresses 0x0, 0x1, 0x2, and 0x3 in the processor instruction pipeline are written into the Cache, it is necessary to cyclically shift 7 bytes to the left to obtain the data in the Cache in FIG. The embodiments of the present application are not described herein again.

Or, for example, in FIG. 10, it is assumed that the data to be read is B0, B1, B2, and B3 in the external storage, that is, the data on the addresses of 0xE, 0xF, 0x10, and 0x11 in the external storage, and cross-cacheline is required. Since the data is stored in little endian mode in the external storage, when the data is written into the Cache, the entire cacheline can be directly written into the Cache, as shown in FIG. B0 is written to the 0xE address in the Cache, B1 is written to the 0xF address in the Cache, B2 is written to the 0x10 address in the Cache, and B3 is written to the 0x11 address in the Cache. At this point, the data is still stored in little form in the Cache. When the processor instruction pipeline needs to read data at addresses 0xE, 0xF, 0x10, and 0x11, the Cache outputs the cacheline data as shown in mem_data_o. At this time, the target data B3, B2, B1, and B0 can be obtained by appropriately shifting the right-aligned loop of the right-aligned data to the right index. Where Index=Address[n:0]. In this example, Address[n:0]=14, therefore, Index=Address[n:0]=14, that is, the loop is shifted right by 14 bytes, as shown in FIG. Finally, the data B3, B2, B1, and B0 at addresses 0x0, 0x1, 0x2, and 0x3 are sent to the processor instruction pipeline.

Of course, when the data B3, B2, B1, and B0 at the addresses 0x0, 0x1, 0x2, and 0x3 in the processor instruction pipeline are written into the Cache, it is necessary to cyclically shift 14 bytes to the left to obtain the data in the Cache in FIG. The embodiments of the present application are not described herein again.

It should be noted that the above example uses the processor instruction pipeline to support only the small end data format, and the external storage supports the small end data format as an example for description. Of course, the processor instruction pipeline only supports the big end data format, and the external storage supports the data format of the big end. This embodiment does not specifically limit this.

The solution provided by the embodiment of the present application is mainly introduced from the perspective of the data processing method performed by the data processing apparatus. It can be understood that the above data processing apparatus includes a hardware structure and/or a software module corresponding to each function in order to implement the above functions. Those skilled in the art will readily appreciate that the present application can be implemented in a combination of hardware or hardware and computer software in combination with the elements and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.

The embodiment of the present application may divide the function module into the data processing device according to the foregoing method example. For example, each function module may be divided according to each function, or two or more functions may be integrated into one processing module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the module in the embodiment of the present application is schematic, and is only a logical function division, and the actual implementation may have another division manner.

For example, in the case of dividing each functional module by corresponding functions, FIG. 11 shows a possible structural diagram of the data processing apparatus 110 involved in the above embodiment. The data processing apparatus 110 includes an acquisition module 1101, a determination module 1102, a reading module 1103, a shifting module 1104, and a transmitting module 1105. The obtaining module 1101 is configured to support the data processing device 110 to perform step S401 shown in FIG. 4; the determining module 1102 is configured to support the data processing device 110 to perform step S402 shown in FIG. 4; and the reading module 1103 is configured to support the data processing device. 110 executes step S403 shown in FIG. 4; the shifting module 1104 is configured to support the data processing apparatus 110 to perform step S404 shown in FIG. 4; and the transmitting module 1105 is configured to support the data processing apparatus 110 to perform step S405 shown in FIG.

Optionally, as shown in FIG. 12, the data processing apparatus 110 may further include a format conversion module 1106 and a write module 1107. The reading module 1103 is further configured to: before reading the second data of 2 Pbytes from the Cache according to the address information in the Cache according to the second data, reading the cacheline containing the first data from the external storage of the Cache. The format conversion module 1106 is configured to perform a size end data format conversion on the cacheline containing the first data to obtain third data, and a 1107 write module to write the third data into the Cache.

Optionally, the reading module 1103 is further configured to support the data processing device 110 to perform step S406 shown in FIG. 6; the shifting module 1104 is further configured to support the data processing device 110 to perform step S407 shown in FIG. 6; the sending module 1105 further It is used to support the data processing device 110 to perform step S408 shown in FIG.

Optionally, the reading module 1103 is further configured to: before the sixth data of 2 P bytes is read from the Cache in the address information in the external storage according to the first data, read from the external storage of the Cache, including the first The cacheline of the data; the writing module 1107 is further configured to write the cacheline containing the first data into the Cache.

Optionally, the obtaining module 1101 is further configured to acquire a write instruction sent by the processor instruction pipeline, where the write instruction includes the eighth data to be written and the number of bytes T of the eighth data, where T≤P, T is an integer . The shifting module 1104 is further configured to: when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, shift the eighth data loop to the left by the third byte number to obtain the ninth data of 2P bytes, wherein , Index 3 = ~ (Address [n: 0] + T - 1), Index3 represents the third byte number. The writing module 1107 is further configured to write the ninth data into the Cache.

Optionally, the shifting module 1104 is further configured to: after the obtaining module 1101 acquires the write instruction sent by the processor instruction pipeline, when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the eighth data is The fourth byte number is shifted to the left to obtain the 10th data of 2P bytes, wherein Index 4=(Address[n:0]), Index4 represents the fourth byte number; and the writing module 1107 is also used to Ten data is written to the Cache.

All the related content of the steps involved in the foregoing method embodiments may be referred to the functional descriptions of the corresponding functional modules, and details are not described herein again.

FIG. 13 is a schematic diagram showing a possible structure of the data processing apparatus involved in the foregoing embodiment. The data processing apparatus 130 includes: a processing module 1301 and a communication module 1302. . The processing module 1301 can be used to perform the operations that can be performed by the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, the format converting module 1106, and the writing module 1107 in FIG. 11 or FIG. 12; The operation of the sending module 1105 in FIG. 11 or FIG. 12 can be performed. For details, refer to the embodiment shown in FIG. 11 or FIG. 12 , and details are not described herein again.

In an embodiment of the invention, the data processing device is presented in the form of dividing each functional module corresponding to each function, or the data processing device is presented in a form that divides each functional module in an integrated manner. A "module" herein may refer to a particular ASIC, circuitry, processor and memory that executes one or more software or firmware programs, integrated logic circuitry, and/or other devices that provide the functionality described above. In a simple embodiment, those skilled in the art will appreciate that data processing device 110 or data processing device 130 may take the form shown in FIG. For example, the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, and the sending module 1105 in FIG. 11 can be implemented by the processor 301 and the memory 303 of FIG. 3, specifically, the acquiring module 1101 and the determining module. The reading module 1103, the shifting module 1104, and the sending module 1105 can be executed by the processor 301 to call the application code stored in the memory 303, which is not limited in this embodiment. Alternatively, for example, the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, the transmitting module 1105, the format converting module 1106, and the writing module 1107 in FIG. 12 may pass through the processor 301 and the memory 303 of FIG. To be implemented, specifically, the obtaining module 1101, the determining module 1102, the reading module 1103, the shifting module 1104, the sending module 1105, the format converting module 1106, and the writing module 1107 can be called by the processor 301 to store the memory stored in the memory 303. The application code is executed, and the embodiment of the present application does not impose any limitation on this. Alternatively, for example, the processing module 1301 and the communication module 1302 in FIG. 13 may be implemented by the processor 301 and the memory 303 of FIG. 3. Specifically, the processing module 1301 and the communication module 1302 may be called by the processor 301 in the memory 303. The stored application code is executed, and the embodiment of the present application does not impose any limitation on this.

The data processing device provided by the embodiment of the present application can be used to perform the foregoing data processing method. Therefore, the technical effects of the present invention can be referred to the foregoing method embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer readable storage medium can be any available media that can be accessed by a computer or a data storage device that includes one or more servers, data centers, etc. that can be integrated with the media. The usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a Solid State Disk (SSD)) or the like.

Although the present application has been described herein in connection with the various embodiments, those skilled in the art can Other variations of the disclosed embodiments are achieved. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill several of the functions recited in the claims. Certain measures are recited in mutually different dependent claims, but this does not mean that the measures are not combined to produce a good effect.

While the present invention has been described in connection with the specific embodiments and embodiments thereof, various modifications and combinations can be made without departing from the spirit and scope of the application. Accordingly, the description and drawings are to be regarded as It will be apparent to those skilled in the art that various modifications and changes can be made in the present application without departing from the spirit and scope of the application. Thus, it is intended that the present invention cover the modifications and variations of the present invention.

Claims

A data processing method, the method comprising:

Obtaining a read instruction sent by the processor instruction pipeline, where the read instruction includes address information of the first data to be read in an external storage of the cache cache, wherein the read/write width of the cache is 2 Pbytes, The number of bytes of a data K ≤ P, K and P are positive integers;

Determining, according to the address information of the first data in the external storage, an address of the second data in the Cache, when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline Information, wherein the second data is data corresponding to the first data in the third data, and the third data is obtained by performing a size end format conversion on a cache block cacheline including the first data. Data, the size of the cacheline is ≥ 2P;

And reading, according to the address information of the second data in the Cache, the second data of 2 Pbytes from the Cache, where the fourth data includes the second data;

And shifting the fourth data right to the first byte to obtain the fifth data, where the fifth data includes the second data, and the second data is 2P bytes corresponding to the fifth data. The data on the lower K address of the address, Index 1 = ~ (Address [n: 0] + K - 1), Index1 indicates the first byte, ~ indicates negation, n = log 2 P, Address [n :0] represents the value of the low (n+1)-bit address in the first address of the first data in the external storage of the Cache;

The second data is sent to the processor instruction pipeline.
The method according to claim 1, wherein before the reading of the second data of 2 Pbytes from the Cache according to the address information in the Cache according to the second data, the method further includes:

Reading a cacheline containing the first data from an external storage of the Cache;

Converting the cacheline containing the first data into a data format of a size end to obtain the third data;

The third data is written into the Cache.
The method according to claim 1 or 2, further comprising: after the obtaining the read instruction sent by the processor instruction pipeline, further comprising:

And when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, reading 2 Pbytes from the Cache according to the address information of the first data in the external storage a sixth data, wherein the sixth data includes the first data;

And shifting the sixth data right to the second byte to obtain the seventh data, where the seventh data includes the first data, where the first data is in a 2P byte address corresponding to the seventh data. Data at a low K-bit address, where Index2=Address[n:0], Index2 represents the second byte, n=log 2 P, and Address[n:0] indicates that the first data is in the Cache The value of the low (n+1)-bit address in the first address in the external storage;

Transmitting the first data to the processor instruction pipeline.
The method according to claim 3, further comprising: before the reading of the sixth data of 2 Pbytes from the Cache according to the address information in the external storage according to the first data, :

Reading a cacheline containing the first data from an external storage of the Cache;

Writing the cacheline containing the first data into the Cache.
The method according to any one of claims 1 to 4, wherein the method further comprises:

Obtaining, by the processor instruction pipeline, a write instruction, where the write instruction includes an eighth data to be written and a byte number T of the eighth data, where T≤P, T is an integer;

When the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, the eighth data is rotated to the left by a third byte number to obtain a ninth data of 2P bytes, wherein Index 3=~(Address[n:0]+T-1), Index3 represents the third byte number;

The ninth data is written into the Cache.
The method of claim 5, after the obtaining the write instruction sent by the processor instruction pipeline, further comprising:

When the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, the eighth data is rotated to the left by a fourth byte number to obtain a 10P data of 10 Pbytes, wherein Index 4=(Address[n:0]), Index4 represents the fourth byte number;

The tenth data is written into the Cache.
A data processing device, comprising: an acquisition module, a determination module, a reading module, a shift module, and a sending module;

The obtaining module is configured to acquire a read command sent by the processor instruction pipeline, where the read command includes address information of the first data to be read in an external storage of the cache cache, where the read/write width of the cache is 2P bytes, the number of bytes of the first data K ≤ P, K and P are positive integers;

The determining module is configured to determine the second data according to the address information of the first data in the external storage when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline Address information in the Cache, wherein the second data is data corresponding to the first data in the third data, and the third data is a cache block cacheline that includes the first data. The data obtained after the conversion of the large and small end format, the size of the cacheline is ≥ 2P;

The reading module is configured to read, according to the address information of the second data in the Cache, the second data of 2 Pbytes from the Cache, where the fourth data includes the Two data;

The shifting module is configured to rotate the fourth data right to the first byte to obtain a fifth data, where the fifth data includes the second data, and the second data is the The data on the lower K-bit address in the 2P byte address corresponding to the fifth data, Index 1 = ~ (Address [n: 0] + K - 1), Index 1 indicates the first byte, ~ indicates negation, n =log 2 P, Address[n:0] represents the value of the low (n+1)-bit address in the first address of the first data in the external storage of the Cache;

The sending module is configured to send the second data to the processor instruction pipeline.
The device according to claim 7, wherein the device further comprises a format conversion module and a writing module;

The reading module is further configured to: store the external data from the Cache before reading the second data of 2 P bytes from the Cache according to the address information in the Cache according to the second data. Reading a cacheline containing the first data;

The format conversion module is configured to perform a size end data format conversion on the cacheline including the first data to obtain the third data;

The writing module is configured to write the third data into the Cache.
Device according to claim 7 or 8, characterized in that

The reading module is further configured to: after the obtaining module acquires a read instruction sent by the processor instruction pipeline, when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, The address information of the first data in the external storage, the second data of 2P bytes is read from the Cache, wherein the sixth data includes the first data;

The shifting module is further configured to: shift the sixth data right to the second byte to obtain the seventh data, where the seventh data includes the first data, and the first data is the first data The data on the lower K-bit address of the 2P byte address corresponding to the seven data, wherein Index2=Address[n:0], Index2 represents the second byte, n=log 2 P, and Address[n:0] represents a value of the low (n+1)-bit address in the first address of the first data in the external storage of the Cache;

The sending module is further configured to send the first data to the processor instruction pipeline.
The device of claim 9 wherein:

The reading module is further configured to: before the sixth data of 2P bytes is read from the Cache according to the address information in the external storage according to the first data, from outside the Cache Reading a cacheline containing the first data in the storage;

The writing module is further configured to write the cacheline including the first data into the Cache.
The device according to any one of claims 7 to 10, wherein the device further comprises a writing module;

The obtaining module is further configured to acquire a write instruction sent by the processor instruction pipeline, where the write command includes an eighth data to be written and a byte number T of the eighth data, where T≤P, T is an integer;

The shifting module is further configured to: when the data format supported by the external storage is inconsistent with the data format supported by the processor instruction pipeline, shift the eighth data loop to the left by a third byte number to obtain a 2P word The ninth data of the section, wherein Index 3 = ~ (Address [n: 0] + T - 1), Index 3 represents the third number of bytes;

The writing module is further configured to write the ninth data into the Cache.
The device of claim 11 wherein:

The shifting module is further configured to: after the obtaining module acquires the write instruction sent by the processor instruction pipeline, when the data format supported by the external storage is consistent with the data format supported by the processor instruction pipeline, And shifting the eighth data to the left by a fourth byte number to obtain 10 data of 10 Pbytes, wherein Index 4=(Address[n:0]), and Index4 represents the fourth byte number;

The writing module is further configured to write the tenth data into the Cache.
A data processing device, comprising: a processor, a memory, a bus, and a communication interface;

The memory is configured to store a computer executing instructions, the processor is coupled to the memory via the bus, and when the data processing device is in operation, the processor executes the computer executed instructions stored in the memory to The data processing apparatus is caused to perform the data processing method according to any one of claims 1-6.