CN118295709A

CN118295709A - Instruction processing method, chip, electronic equipment and storage medium

Info

Publication number: CN118295709A
Application number: CN202410466327.1A
Authority: CN
Inventors: 廖兴龙; 孙程坤; 张定飞
Original assignee: ARM Technology China Co Ltd
Current assignee: ARM Technology China Co Ltd
Priority date: 2024-04-17
Filing date: 2024-04-17
Publication date: 2024-07-05

Abstract

The present application relates to the technical field of instruction processing, and in particular to an instruction processing method, chip, electronic device and storage medium. In the method, a first prefetch instruction is inserted into the function entry of a sub-function that cannot be inlined, and a first label corresponding to the first prefetch instruction is inserted into the function end of the sub-function that cannot be inlined, a second prefetch instruction is inserted into the function entry of a main function adjusted to the beginning of a compilation unit, and a second label corresponding to the second prefetch instruction is inserted into the end of the compilation unit. Based on the first prefetch instruction, the first label, the second prefetch instruction and the second label, the number of instructions of the sub-function that cannot be inlined and the compilation unit is determined respectively, so that the number of prefetch instructions can be determined in combination with the remaining storage space corresponding to the cache. In this way, the accuracy of the determined number of prefetch instructions can be improved, so that functions with a suitable number of loading instructions can be stored in the cache to reduce the situation of instruction missing.

Description

Instruction processing method, chip, electronic device and storage medium

技术领域Technical Field

本申请涉及指令处理技术领域，特别涉及一种指令处理方法、芯片、电子设备及存储介质。The present application relates to the technical field of instruction processing, and in particular to an instruction processing method, a chip, an electronic device and a storage medium.

背景技术Background technique

在中央处理器(central processing unit，CPU)执行指令的过程中，通常是先从内存中读取指令，并将指令存储至缓存(cache)中，当再次需要执行指令时，直接查找缓存中是否存在相关指令。如果缓存中存在相关指令，即指令缓存命中，则中央处理器可以直接从缓存中读取指令。如果缓存中不存在相关指令，即指令缓存缺失，则中央处理器需要再次从内存中读取指令，并将指令存储至缓存中。如此，在指令缓存缺失的情况下，将会导致中央处理器的处理效率降低。In the process of the central processing unit (CPU) executing instructions, it usually reads the instructions from the memory first and stores the instructions in the cache. When the instructions need to be executed again, it directly checks whether there are relevant instructions in the cache. If there are relevant instructions in the cache, that is, the instruction cache hits, the central processing unit can read the instructions directly from the cache. If there are no relevant instructions in the cache, that is, the instruction cache misses, the central processing unit needs to read the instructions from the memory again and store the instructions in the cache. In this way, in the case of an instruction cache miss, the processing efficiency of the central processing unit will be reduced.

发明内容Summary of the invention

为解决在指令缓存缺失的情况下，将会导致中央处理器的处理效率降低的问题，本申请实施例提供一种指令处理方法、芯片、电子设备及存储介质。In order to solve the problem that the processing efficiency of the central processing unit will be reduced when the instruction cache is missed, the embodiments of the present application provide an instruction processing method, a chip, an electronic device and a storage medium.

第一方面，本申请实施例提供一种指令处理方法，应用于电子设备，电子设备包括处理器和内存，并且处理器包括缓存，方法包括：检测到编译单元的执行指令，其中，编译单元包括目标指令；基于目标指令的数量和缓存的剩余存储空间，确定出对应编译单元的预取指令数量；从内存存储的对应编译单元的指令中，读取预取指令数量的目标指令，并存储至缓存。In a first aspect, an embodiment of the present application provides an instruction processing method, which is applied to an electronic device, wherein the electronic device includes a processor and a memory, and the processor includes a cache, the method comprising: detecting an execution instruction of a compilation unit, wherein the compilation unit includes a target instruction; based on the number of target instructions and the remaining storage space of the cache, determining the number of prefetch instructions of the corresponding compilation unit; reading the target instructions of the number of prefetch instructions from the instructions of the corresponding compilation unit stored in the memory, and storing them in the cache.

基于上述方案，可以提高确定的预取指令数量的准确性，从而可以实现加载指令数量大小合适的函数存储至缓存，以减少指令缺失的情况，进而可以提高处理器的处理效率。Based on the above scheme, the accuracy of the determined number of pre-fetch instructions can be improved, so that functions with an appropriate number of loading instructions can be stored in the cache to reduce instruction misses, thereby improving the processing efficiency of the processor.

可以理解，缓存可以是指下文中提及的目标缓存。不同的处理器具有不同的目标缓存。由于不同的处理器具有不同的分级存储架构，例如，通用存储器可以包括初级缓存(L0 I-cache)、一级缓存(L1I-cache)、二级缓存(L2 I-cache)和三级缓存(L3 I-cache)。专用存储器(例如GPU、NPU等)可以包括初级缓存(L0 I-cache)和一级缓存(L1 I-cache)。因此，在通用存储器执行指令的过程中，目标缓存可以为初级缓存(L0 I-cache)、一级缓存(L1 I-cache)、二级缓存(L2 I-cache)和三级缓存(L3I-cache)。在专用存储器执行指令的过程中，目标缓存可以为初级缓存(L0 I-cache)和一级缓存(L1I-cache)。It can be understood that the cache may refer to the target cache mentioned below. Different processors have different target caches. Since different processors have different hierarchical storage architectures, for example, a general-purpose memory may include a primary cache (L0 I-cache), a first-level cache (L1 I-cache), a second-level cache (L2 I-cache), and a third-level cache (L3 I-cache). A dedicated memory (such as a GPU, an NPU, etc.) may include a primary cache (L0 I-cache) and a first-level cache (L1 I-cache). Therefore, in the process of executing instructions in a general-purpose memory, the target cache may be a primary cache (L0 I-cache), a first-level cache (L1 I-cache), a second-level cache (L2 I-cache), and a third-level cache (L3 I-cache). In the process of executing instructions in a dedicated memory, the target cache may be a primary cache (L0 I-cache) and a first-level cache (L1 I-cache).

在第一方面的一些可选的实现方式中，目标指令包括第一指令和第二指令，第一指令为第一子函数的指令，第二指令为第一函数以及第一子函数的指令，其中第一函数调用第一子函数，第一子函数为不可内联的子函数。In some optional implementations of the first aspect, the target instruction includes a first instruction and a second instruction, the first instruction is an instruction of a first sub-function, the second instruction is an instruction of a first function and the first sub-function, wherein the first function calls the first sub-function, and the first sub-function is a sub-function that cannot be inlined.

在第一方面的一些可选的实现方式中，通过以下方式确定目标指令的数量：在第一子函数中插入第一标签组，以及在编译单元中插入第二标签组，其中第一标签组和第二标签组不同；基于第一标签组，确定第一指令的第一数量；基于第二标签组，确定第二指令的第二数量；基于第一数量和第二数量，确定目标指令的数量。In some optional implementations of the first aspect, the number of target instructions is determined in the following manner: inserting a first label group in a first sub-function, and inserting a second label group in a compilation unit, wherein the first label group and the second label group are different; based on the first label group, determining a first number of first instructions; based on the second label group, determining a second number of second instructions; based on the first number and the second number, determining the number of target instructions.

在第一方面的一些可选的实现方式中，第一标签组包括第一预取指令和第一标签，在第一子函数中插入第一标签组，包括：在编译第一子函数时，在第一子函数的函数入口插入第一预取指令，以及在第一子函数的函数末尾插入第一标签。In some optional implementations of the first aspect, the first label group includes a first prefetch instruction and a first label, and inserting the first label group in the first sub-function includes: when compiling the first sub-function, inserting the first prefetch instruction at the function entry of the first sub-function, and inserting the first label at the function end of the first sub-function.

例如，在第一子函数的函数入口插入第一预取指令，例如“prefetch__func1_end”，以及在第一子函数的函数末尾插入第一标签，例如“__func1_end”。For example, a first prefetch instruction, such as "prefetch__func1_end", is inserted into the function entry of the first sub-function, and a first label, such as "__func1_end", is inserted into the function end of the first sub-function.

在第一方面的一些可选的实现方式中，第二标签组包括第二预取指令和第二标签，在编译单元中插入第二标签组，包括：在编译编译单元时，将第一函数调整至编译单元的开头，在第一函数的函数入口插入第二预取指令，以及在编译单元的末尾插入第二标签。In some optional implementations of the first aspect, the second label group includes a second prefetch instruction and a second label, and inserting the second label group in the compilation unit includes: when compiling the compilation unit, adjusting the first function to the beginning of the compilation unit, inserting the second prefetch instruction at the function entry of the first function, and inserting the second label at the end of the compilation unit.

例如，在第一函数(即下文中提及的主函数)的函数入口插入第二预取指令，例如“prefetch__dummy”，以及在编译单元的末尾插入第二标签，例如“__dummy”。可以理解，“__dummy”可以为空函数，也可以为非空函数，该函数为不可内联的子函数。For example, a second prefetch instruction, such as "prefetch__dummy", is inserted into the function entry of the first function (i.e., the main function mentioned below), and a second label, such as "__dummy", is inserted into the end of the compilation unit. It can be understood that "__dummy" can be an empty function or a non-empty function, which is a sub-function that cannot be inlined.

在第一方面的一些可选的实现方式中，预取指令数量为第一指令的第一数量、或者第二指令的第二数量中的任意一种。In some optional implementations of the first aspect, the number of prefetched instructions is any one of a first number of first instructions or a second number of second instructions.

在第一方面的一些可选的实现中，基于目标指令的数量和缓存的剩余存储空间，确定出对应编译单元的预取指令数量，包括：对应于缓存的剩余存储空间大于第二数量，根据第二数量，确定预取指令数量；或者对应于缓存的剩余存储空间小于或者等于第二数量，且大于第一数量，根据第一数量，确定预取指令数量；或者对应于缓存的剩余存储空间小于第一数量，根据缓存的剩余存储空间，确定预取指令数量。In some optional implementations of the first aspect, the number of prefetch instructions of the corresponding compilation unit is determined based on the number of target instructions and the remaining storage space of the cache, including: corresponding to the remaining storage space of the cache being greater than a second number, the number of prefetch instructions is determined according to the second number; or corresponding to the remaining storage space of the cache being less than or equal to the second number and greater than the first number, the number of prefetch instructions is determined according to the first number; or corresponding to the remaining storage space of the cache being less than the first number, the number of prefetch instructions is determined according to the remaining storage space of the cache.

在一些可选的实现方式中，若主函数的指令数量和不可内联的子函数的指令数量之和小于或等于目标缓存对应的剩余存储空间，则可以将主函数的指令数量和不可内联的子函数的指令数量之和，作为预取指令数量。若主函数的指令数量和不可内联的子函数的指令数量之和大于目标缓存对应的剩余存储空间，且不可内联的子函数的指令数量小于或者等于目标缓存对应的剩余存储空间，则可以将不可内联的子函数的指令数量，作为预取指令数量。若不可内联的子函数的指令数量大于目标缓存对应的剩余存储空间，则可以将目标缓存对应的存储空间的尺寸作为预取指令数量。In some optional implementations, if the sum of the number of instructions of the main function and the number of instructions of the sub-functions that cannot be inlined is less than or equal to the remaining storage space corresponding to the target cache, the sum of the number of instructions of the main function and the number of instructions of the sub-functions that cannot be inlined can be used as the number of pre-fetched instructions. If the sum of the number of instructions of the main function and the number of instructions of the sub-functions that cannot be inlined is greater than the remaining storage space corresponding to the target cache, and the number of instructions of the sub-functions that cannot be inlined is less than or equal to the remaining storage space corresponding to the target cache, the number of instructions of the sub-functions that cannot be inlined can be used as the number of pre-fetched instructions. If the number of instructions of the sub-functions that cannot be inlined is greater than the remaining storage space corresponding to the target cache, the size of the storage space corresponding to the target cache can be used as the number of pre-fetched instructions.

在第一方面的一些可选的实现中，存储到缓存的请求会被硬件根据总线处理数据大小拆分为N条缓存请求，N条缓存请求包括第一缓存请求和第二缓存请求；存储至缓存，包括：向缓存发送第一缓存请求；在接收到缓存的返回结果前，向缓存发送第二缓存请求，第一缓存请求所请求的指令和第二缓存请求所请求的指令不同。In some optional implementations of the first aspect, a request to store in the cache may be split by hardware into N cache requests according to the size of data processed by the bus, and the N cache requests may include a first cache request and a second cache request; storing in the cache may include: sending a first cache request to the cache; before receiving a return result from the cache, sending a second cache request to the cache, and the instruction requested by the first cache request may be different from the instruction requested by the second cache request.

可以理解的是，第一缓存请求与第二缓存请求，直至第N缓存请求可以形成流水。It can be understood that the first cache request and the second cache request, until the Nth cache request can form a pipeline.

可以理解，对于预取指令，硬件需要将预取操作形成流水，即硬件需要快速执行预取指令，如此才能够提高处理器处理效率。其中将预取操作形成流水可以是指，例如预取指令需要预取10条缓存行，在向目标缓存发送第一条缓存请求时，无需等待返回结果即可发送第二条缓存请求，以此类推，如此，相较于在发送第一条缓存请求并接收到返回结果后，再发送第二缓存请求，可以提高处理器处理效率。It can be understood that for prefetch instructions, the hardware needs to form a pipeline for prefetch operations, that is, the hardware needs to execute prefetch instructions quickly, so as to improve the processing efficiency of the processor. Forming a pipeline for prefetch operations can mean, for example, that a prefetch instruction needs to prefetch 10 cache lines. When sending the first cache request to the target cache, the second cache request can be sent without waiting for the return result, and so on. In this way, compared with sending the second cache request after sending the first cache request and receiving the return result, the processing efficiency of the processor can be improved.

第二方面，本申请实施例提供一种芯片，包括处理器和内存，并且处理器包括缓存和处理单元，处理单元用于从内存存储的对应编译单元的指令中，读取预取指令数量的目标指令，并存储至缓存，其中预取指令数量是基于目标指令的数量和缓存的剩余存储空间确定的。In a second aspect, an embodiment of the present application provides a chip comprising a processor and a memory, and the processor comprises a cache and a processing unit, the processing unit being used to read a target instruction of a pre-fetch instruction quantity from the instructions of the corresponding compilation unit stored in the memory, and store them in the cache, wherein the number of pre-fetch instructions is determined based on the number of target instructions and the remaining storage space of the cache.

第三方面，本申请实施例提供一种电子设备，包括：上述芯片。In a third aspect, an embodiment of the present application provides an electronic device, including: the above-mentioned chip.

可以理解，芯片可以包括处理器和内存，并且处理器包括缓存和处理单元，处理单元用于从内存存储的对应编译单元的指令中，读取预取指令数量的目标指令，并存储至缓存，其中预取指令数量是基于目标指令的数量和缓存的剩余存储空间确定的。It can be understood that the chip may include a processor and a memory, and the processor includes a cache and a processing unit, the processing unit is used to read the target instructions of the pre-fetch instruction quantity from the instructions of the corresponding compilation unit stored in the memory, and store them in the cache, wherein the pre-fetch instruction quantity is determined based on the number of target instructions and the remaining storage space of the cache.

第四方面，本申请实施例提供一种可读存储介质，可读存储介质上存储有指令，指令在电子设备上执行时使得电子设备执行上述指令处理方法。In a fourth aspect, an embodiment of the present application provides a readable storage medium, on which instructions are stored. When the instructions are executed on an electronic device, the electronic device executes the above-mentioned instruction processing method.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1根据本申请的一些实例，示出了一种计算机结构的示意图；FIG1 is a schematic diagram showing a computer structure according to some examples of the present application;

图2根据本申请的一些实例，示出了一种从内存中读取预取指令数量的函数存储至缓存的示意图；FIG2 is a schematic diagram showing a function of reading a number of prefetch instructions from a memory and storing it in a cache according to some examples of the present application;

图3根据本申请的一些实例，示出了一种指令处理方法的流程示意图；FIG3 is a flow chart showing a method for processing instructions according to some examples of the present application;

图4根据本申请的一些实例，示出了一种在函数入口插入主标签，以及在函数末尾插入主标签对应的副标签后的主函数的示意图；FIG4 is a schematic diagram showing a main function after inserting a main tag at the function entry and inserting a sub-tag corresponding to the main tag at the end of the function according to some examples of the present application;

图5根据本申请的一些实例，示出了一种在函数入口插入主标签，以及在函数末尾插入主标签对应的副标签后的不可内联的子函数的示意图；FIG5 is a schematic diagram showing a method of inserting a main label at a function entry and a sub-label corresponding to the main label at the end of a function, according to some examples of the present application;

图6根据本申请的一些实例，示出了一种电子设备的结构示意图。FIG6 shows a schematic structural diagram of an electronic device according to some examples of the present application.

具体实施方式Detailed ways

本申请的说明性实施例包括但不限于一种指令处理方法、芯片、电子设备及存储介质。The illustrative embodiments of the present application include but are not limited to an instruction processing method, a chip, an electronic device, and a storage medium.

可以理解，本申请实施例提及的方案可以适用于任意处理器(centralprocessing unit，CPU)，例如通用处理器、图形处理器(graphic processing unit，GPU)、神经网络计算处理器(neural network processing unit，NPU)等。It can be understood that the solutions mentioned in the embodiments of the present application can be applicable to any processor (central processing unit, CPU), such as a general-purpose processor, a graphics processing unit (graphic processing unit, GPU), a neural network processing unit (neural network processing unit, NPU), etc.

下面在介绍本申请实施的技术方案之前，首先对本申请实施例涉及的专业术语进行解释。Before introducing the technical solution implemented in the present application, the professional terms involved in the embodiments of the present application are first explained.

(1)函数：一种通过多条指令组合来实现特定功能的代码段。函数可以包括主函数和子函数，其中子函数可以包含可内联(inline)和不可内联(on-line)等属性。一般而言，一个能运行在操作系统上的程序，都需要一个主函数，主函数意味着建立一个独立进程，且该进程是程序入口，对其他子函数进行调用。可内联的子函数通常采用修饰符“inline”进行定义。不可内联的函数是指没有被频繁调用的子函数，即未采用修饰符“inline”进行定义的子函数。(1) Function: A code segment that implements a specific function by combining multiple instructions. Functions can include main functions and sub-functions, where sub-functions can contain attributes such as inline and non-inline. Generally speaking, a program that can run on an operating system requires a main function. The main function means establishing an independent process, and this process is the program entry point to call other sub-functions. Inlineable sub-functions are usually defined using the modifier "inline". Non-inline functions are sub-functions that are not called frequently, that is, sub-functions that are not defined using the modifier "inline".

(2)多级存储架构：在计算机中，通常采用多级存储架构。参考如图1所示的计算机结构，计算机100可以包括处理器110和内存120，处理器110中可以设置缓存130和处理单元140。其中，缓存130可以为多级缓存，例如初级缓存(L0 I-cache)、一级缓存(L1 I-cache)、二级缓存(L2 I-cache)、三级缓存(L3 I-cache)…N级缓存(Ln I-cahce)。(2) Multi-level storage architecture: In computers, a multi-level storage architecture is usually used. Referring to the computer structure shown in FIG1 , the computer 100 may include a processor 110 and a memory 120, and the processor 110 may be provided with a cache 130 and a processing unit 140. The cache 130 may be a multi-level cache, such as a primary cache (L0 I-cache), a first-level cache (L1 I-cache), a second-level cache (L2 I-cache), a third-level cache (L3 I-cache) ... an N-level cache (Ln I-cahce).

可以理解，在一些实例中，缓存130对应的存储空间的尺寸小于内存120对应的存储空间的尺寸，初级缓存对应的存储空间的尺寸小于一级缓存对应的存储空间的尺寸，一级缓存对应的存储空间的尺寸小于二级缓存对应的存储空间的尺寸，以此类推。并且，由于缓存130在处理器110中，处理单元140与缓存130之间的距离小于处理单元140与内存120之间的距离，因此，处理单元140从缓存130中读取指令的速度高于从内存120中读取指令的速度。It can be understood that in some examples, the size of the storage space corresponding to the cache 130 is smaller than the size of the storage space corresponding to the memory 120, the size of the storage space corresponding to the primary cache is smaller than the size of the storage space corresponding to the first-level cache, the size of the storage space corresponding to the first-level cache is smaller than the size of the storage space corresponding to the second-level cache, and so on. In addition, since the cache 130 is in the processor 110, the distance between the processing unit 140 and the cache 130 is smaller than the distance between the processing unit 140 and the memory 120, and therefore, the speed at which the processing unit 140 reads instructions from the cache 130 is higher than the speed at which it reads instructions from the memory 120.

(3)缓存行(cache line)：缓存130中的最小缓存单位称为缓存行，缓存行的尺寸可以基于缓存130的尺寸确定。例如，缓存130的尺寸为64字节(byte)，则可以将缓存130平均分为8个尺寸为8字节的缓存行，也可以将缓存130平均为分为16个尺寸为4字节的缓存行。在实际应用中，缓存行的尺寸通常为4字节到128字节之间，例如4字节、8字节、16字节、64字节、128字节等。(3) Cache line: The smallest cache unit in cache 130 is called a cache line, and the size of a cache line can be determined based on the size of cache 130. For example, if the size of cache 130 is 64 bytes, cache 130 can be evenly divided into 8 cache lines of 8 bytes, or cache 130 can be evenly divided into 16 cache lines of 4 bytes. In practical applications, the size of a cache line is usually between 4 bytes and 128 bytes, such as 4 bytes, 8 bytes, 16 bytes, 64 bytes, 128 bytes, etc.

(4)可内联的子函数：通常，可内联的子函数可以采用修饰符“inline”进行定义。(4) Inlineable sub-functions: Usually, inlineable sub-functions can be defined using the modifier "inline".

在处理器110需要执行指令时，首先会查找缓存130中是否存在相关指令。例如，先查找初级缓存中是否存在相关指令，如果初级缓存中不存在相关指令，可以查找一级缓存中是否存在相关指令，如果一级缓存中不存在相关指令，则会查找二级缓存中是否存在相关指令，如果二级缓存中不存在相关指令，则会查找三级缓存中是否存在相关指令，以此类推。When the processor 110 needs to execute an instruction, it will first check whether there is a related instruction in the cache 130. For example, it will first check whether there is a related instruction in the primary cache. If there is no related instruction in the primary cache, it can check whether there is a related instruction in the first-level cache. If there is no related instruction in the first-level cache, it will check whether there is a related instruction in the second-level cache. If there is no related instruction in the second-level cache, it will check whether there is a related instruction in the third-level cache, and so on.

如果缓存130中存在相关指令，即指令缓存命中，则处理器110可以直接从缓存130中读取指令。如果缓存130中不存在相关指令，即指令缓存缺失，则处理器110可以从内存120中读取指令。由于缓存130与处理器110之间的距离小于内存120与处理器110之间的距离，处理器110访问缓存130的速度高于处理器110访问内存120的速度，如此，在指令缓存缺失的情况下，将会导致处理器110的处理效率降低。If there are relevant instructions in the cache 130, that is, an instruction cache hit, the processor 110 can directly read the instructions from the cache 130. If there are no relevant instructions in the cache 130, that is, an instruction cache miss, the processor 110 can read the instructions from the memory 120. Since the distance between the cache 130 and the processor 110 is smaller than the distance between the memory 120 and the processor 110, the speed at which the processor 110 accesses the cache 130 is higher than the speed at which the processor 110 accesses the memory 120. In this way, in the case of an instruction cache miss, the processing efficiency of the processor 110 will be reduced.

一般情况下，可以采用指令预取机制来缓解指令缺失。例如，通过分析处理器110执行指令的规律，提前将预设字节数或者预设缓存行的、可能需要再次执行的指令存储至缓存130中。然而，当预设字节数或者预设缓存行设置的较大时，由于缓存130对应的存储空间的尺寸较小，可能无法将预取的指令全部存储至缓存130。当预设字节数或者预设缓存行设置的较小时，可能导致获取的指令不足以缓解指令缺失的情况。因此，可以理解，现有指令预取机制无法准确确定预取的指令的指令数量。In general, an instruction prefetch mechanism can be used to alleviate instruction missing. For example, by analyzing the law of instruction execution by the processor 110, the instructions of the preset number of bytes or preset cache lines that may need to be executed again are stored in the cache 130 in advance. However, when the preset number of bytes or the preset cache lines are set to be large, due to the small size of the storage space corresponding to the cache 130, it may not be possible to store all the prefetched instructions in the cache 130. When the preset number of bytes or the preset cache lines are set to be small, the instructions obtained may not be enough to alleviate the situation of instruction missing. Therefore, it can be understood that the existing instruction prefetch mechanism cannot accurately determine the instruction number of the prefetched instructions.

为了解决上述问题，本申请实施例提供一种指令处理方法。在该方法中，由于在编译过程中，编译器会自动将频繁被调用的可内联的子函数嵌入主函数中，因此，可以在不可内联的子函数的函数入口插入第一预取指令，以及在不可内联的子函数的函数末尾插入第一预取指令对应的第一标签，在调整至编译单元的开头的主函数的函数入口插入第二预取指令，以及在编译单元的末尾插入第二预取指令对应的第二标签。基于第一预取指令、第一标签、第二预取指令和第二标签，分别确定不可内联的子函数和编译单元的指令数量，基于不可内联的子函数和编译单元的指令数量、以及多级缓存中目标缓存对应的存储空间的尺寸，确定预取指令数量。在运行过程中，基于函数的执行时间顺序和预取指令数量，从内存中读取预取指令数量的函数存储至缓存。如此，可以提高确定的预取指令数量的准确性，从而可以实现加载指令数量大小合适的函数存储至缓存，以减少指令缺失的情况，进而可以提高处理器的处理效率。In order to solve the above problems, the embodiment of the present application provides an instruction processing method. In this method, since the compiler will automatically embed the frequently called inline sub-functions into the main function during the compilation process, the first prefetch instruction can be inserted into the function entry of the sub-function that cannot be inlined, and the first label corresponding to the first prefetch instruction can be inserted at the end of the function of the sub-function that cannot be inlined, and the second prefetch instruction can be inserted into the function entry of the main function adjusted to the beginning of the compilation unit, and the second label corresponding to the second prefetch instruction can be inserted at the end of the compilation unit. Based on the first prefetch instruction, the first label, the second prefetch instruction and the second label, the number of instructions of the sub-function and the compilation unit that cannot be inlined is determined respectively, and the number of instructions of the sub-function and the compilation unit that cannot be inlined and the size of the storage space corresponding to the target cache in the multi-level cache is determined. During the operation process, based on the execution time sequence of the function and the number of prefetch instructions, the function of the number of prefetch instructions is read from the memory and stored in the cache. In this way, the accuracy of the determined number of prefetch instructions can be improved, so that the function with the appropriate number of loading instructions can be stored in the cache to reduce the situation of instruction missing, and then the processing efficiency of the processor can be improved.

在一些具体的实现中，主函数可以称为第一函数，不可内联的子函数可以称为第一子函数，主函数的指令和不可内联的子函数的指令可以统称为目标指令。In some specific implementations, the main function may be referred to as a first function, the sub-function that cannot be inlined may be referred to as a first sub-function, and the instructions of the main function and the instructions of the sub-function that cannot be inlined may be collectively referred to as target instructions.

在一些具体的实现中，在链接过程中，若主函数的指令数量和不可内联的子函数的指令数量之和小于或等于目标缓存对应的剩余存储空间，则可以将主函数的指令数量和不可内联的子函数的指令数量之和，作为预取指令数量。若主函数的指令数量和不可内联的子函数的指令数量之和大于目标缓存对应的剩余存储空间，则可以将目标缓存对应的存储空间的尺寸作为预取指令数量。In some specific implementations, during the linking process, if the sum of the number of instructions of the main function and the number of instructions of the sub-functions that cannot be inlined is less than or equal to the remaining storage space corresponding to the target cache, the sum of the number of instructions of the main function and the number of instructions of the sub-functions that cannot be inlined can be used as the number of pre-fetched instructions. If the sum of the number of instructions of the main function and the number of instructions of the sub-functions that cannot be inlined is greater than the remaining storage space corresponding to the target cache, the size of the storage space corresponding to the target cache can be used as the number of pre-fetched instructions.

若不可内联的子函数的指令数量小于或等于目标缓存对应的剩余存储空间，则可以将不可内联的子函数的指令数量作为预取指令数量。若不可内联的子函数的指令数量大于目标缓存对应的剩余存储空间，则可以将目标缓存对应的存储空间的尺寸作为预取指令数量。If the number of instructions of the sub-function that cannot be inlined is less than or equal to the remaining storage space corresponding to the target cache, the number of instructions of the sub-function that cannot be inlined can be used as the number of pre-fetched instructions. If the number of instructions of the sub-function that cannot be inlined is greater than the remaining storage space corresponding to the target cache, the size of the storage space corresponding to the target cache can be used as the number of pre-fetched instructions.

可以理解，不同的处理器具有不同的目标缓存。由于不同的处理器具有不同的分级存储架构，例如，通用存储器可以包括初级缓存(L0 I-cache)、一级缓存(L1 I-cache)、二级缓存(L2 I-cache)和三级缓存(L3 I-cache)。专用存储器(例如GPU、NPU等)可以包括初级缓存(L0 I-cache)和一级缓存(L1 I-cache)。因此，在通用存储器执行指令的过程中，目标缓存可以为初级缓存(L0 I-cache)、一级缓存(L1 I-cache)、二级缓存(L2 I-cache)和三级缓存(L3 I-cache)。在专用存储器执行指令的过程中，目标缓存可以为初级缓存(L0I-cache)和一级缓存(L1 I-cache)。It is understandable that different processors have different target caches. Since different processors have different hierarchical storage architectures, for example, a general-purpose memory may include a primary cache (L0 I-cache), a first-level cache (L1 I-cache), a second-level cache (L2 I-cache), and a third-level cache (L3 I-cache). A dedicated memory (such as a GPU, an NPU, etc.) may include a primary cache (L0 I-cache) and a first-level cache (L1 I-cache). Therefore, in the process of executing instructions in a general-purpose memory, the target cache may be a primary cache (L0 I-cache), a first-level cache (L1 I-cache), a second-level cache (L2 I-cache), and a third-level cache (L3 I-cache). In the process of executing instructions in a dedicated memory, the target cache may be a primary cache (L0 I-cache) and a first-level cache (L1 I-cache).

在一些具体的实现中，如图2所示，示出了一种从内存中读取预取指令数量的函数存储至缓存的示意图。若内存120中多个函数的执行顺序为依次执行函数1、函数2……函数n，其中函数1的执行时间顺序优先于函数2的执行时间顺序，函数2包括指令1、指令2和指令3，缓存130中初级缓存对应的存储空间大于指令1的指令数量，且小于指令1、指令2和指令3的指令数量总和，缓存130中一级缓存对应的存储空间大于指令1、指令2和指令3的指令数量总和。In some specific implementations, as shown in FIG2 , a schematic diagram of a function of reading the number of pre-fetched instructions from the memory and storing it in the cache is shown. If the execution order of multiple functions in the memory 120 is to execute function 1, function 2 ... function n in sequence, wherein the execution time sequence of function 1 takes precedence over the execution time sequence of function 2, and function 2 includes instruction 1, instruction 2, and instruction 3, the storage space corresponding to the primary cache in the cache 130 is greater than the instruction number of instruction 1, and less than the sum of the instruction numbers of instruction 1, instruction 2, and instruction 3, and the storage space corresponding to the first-level cache in the cache 130 is greater than the sum of the instruction numbers of instruction 1, instruction 2, and instruction 3.

当函数1执行完毕，执行函数2时，可以从内存120中读取指令1、指令2和指令3，并存储至缓存130的一级缓存中。此外，也可以从内存120A中读取指令1，并存储至缓存130的初级缓存中。When function 1 is executed and function 2 is executed, instruction 1, instruction 2 and instruction 3 can be read from memory 120 and stored in the first-level cache of cache 130. In addition, instruction 1 can also be read from memory 120A and stored in the primary cache of cache 130.

可以理解，对于频繁被调用的子函数，如果设置为可内联属性，由于在编译过程中，编译器会自动将子函数嵌入主函数中，如此可以减少函数调用开销，提高处理器处理效率。It can be understood that for frequently called sub-functions, if the inline attribute is set, the compiler will automatically embed the sub-function into the main function during the compilation process, which can reduce the function call overhead and improve the processor processing efficiency.

下面对本申请实施例提及的指令处理方法进行介绍，图3示出了一种指令处理方法的流程示意图，如图3所示，该指令处理方法可以包括：The instruction processing method mentioned in the embodiment of the present application is introduced below. FIG3 shows a flowchart of an instruction processing method. As shown in FIG3, the instruction processing method may include:

301：当检测到编译单元的执行指令，根据编译单元中目标指令的数量和缓存的剩余存储空间，确定出对应编译单元的预取指令数量。301: When an execution instruction of a compilation unit is detected, the number of pre-fetch instructions of the corresponding compilation unit is determined according to the number of target instructions in the compilation unit and the remaining storage space of the cache.

可以理解，目标指令可以包括主函数的指令和不可内联的子函数的指令。It can be understood that the target instructions may include instructions of the main function and instructions of the sub-function that cannot be inlined.

在一些可选的实例中，由于在编译过程中，编译器会自动将频繁被调用的可内联的子函数嵌入主函数中，因此可以在不可内联的子函数的函数入口插入第一预取指令，例如“prefetch__func1_end”，以及在不可内联的子函数的函数末尾插入第一标签，例如“__func1_end”，如此，可以基于第一预取指令和第一标签确定不可内联的子函数中指令的指令数量，即第一数量。In some optional instances, since the compiler will automatically embed frequently called inlineable sub-functions into the main function during the compilation process, a first prefetch instruction, such as "prefetch__func1_end", can be inserted at the function entry of the non-inlineable sub-function, and a first label, such as "__func1_end", can be inserted at the function end of the non-inlineable sub-function. In this way, the number of instructions in the non-inlineable sub-function, i.e., the first number, can be determined based on the first prefetch instruction and the first label.

并且，编译器可以先遍历编译单元内所有的函数，通过特征识别出编译单元中的主函数，并将主函数调整至编译单元的开头，然后可以在主函数的函数入口插入第二预取指令，例如“prefetch__dummy”，以及在编译单元的末尾插入第二标签，例如“__dummy”，如此，可以基于第二预取指令和第二标签确定编译单元中指令的指令数量，即第二数量。Furthermore, the compiler may first traverse all functions in the compilation unit, identify the main function in the compilation unit through features, and adjust the main function to the beginning of the compilation unit, and then insert a second prefetch instruction, such as "prefetch__dummy", at the function entry of the main function, and insert a second label, such as "__dummy", at the end of the compilation unit. In this way, the number of instructions in the compilation unit, i.e., the second number, may be determined based on the second prefetch instruction and the second label.

从而，可以基于第一数量和第二数量，确定目标指令的数量。Thus, the number of target instructions may be determined based on the first number and the second number.

302：从内存存储的对应编译单元的指令中，读取预取指令数量的目标指令，并存储至缓存。302: Read target instructions of the pre-fetched instruction quantity from the instructions of the corresponding compilation unit stored in the memory, and store them in the cache.

在一些可选的实例中，当缓存的剩余存储空间大于第二数量，根据第二候选数量，确定预取指令数量。In some optional instances, when the remaining storage space of the cache is greater than the second number, the number of prefetch instructions is determined based on the second candidate number.

在一些可选的实例中，当缓存的剩余存储空间小于或者等于第二数量，且大于第一数量，根据第一数量，确定预取指令数量。In some optional instances, when the remaining storage space of the cache is less than or equal to the second quantity and greater than the first quantity, the number of prefetch instructions is determined based on the first quantity.

在一些可选的实例中，当缓存的剩余存储空间小于第一候选数量，根据缓存的剩余存储空间，确定预取指令数量。In some optional instances, when the remaining storage space of the cache is less than the first candidate quantity, the number of pre-fetch instructions is determined according to the remaining storage space of the cache.

基于本申请实施例，可以提高确定的预取指令数量的准确性，从而可以实现加载指令数量大小合适的函数存储至缓存，以减少指令缺失的情况，进而可以提高处理器的处理效率。Based on the embodiments of the present application, the accuracy of the determined number of pre-fetch instructions can be improved, so that functions with an appropriate number of loading instructions can be stored in the cache to reduce instruction missing situations, thereby improving the processing efficiency of the processor.

可以理解，编译器将源文件翻译为一个可执行文件的过程可以包括编译阶段和链接阶段。在完成链接后，可以进入运行阶段，即可以执行可执行文件，以实现可执行文件中软件的功能。It is understood that the process in which the compiler translates the source file into an executable file may include a compile phase and a link phase. After the link is completed, the run phase may be entered, i.e. the executable file may be executed to implement the functions of the software in the executable file.

在编译阶段，编译器可以自动在不可内联的子函数的函数入口插入硬件提供的预取指令，以及在不可内联的子函数的函数末尾分别插入标签(label)。其中，预取指令可以包括操作码、偏移、缓存(cache)类型、缓存(cache)长度等信息。During the compilation phase, the compiler can automatically insert the prefetch instruction provided by the hardware at the function entry of the sub-function that cannot be inlined, and insert a label at the end of the function of the sub-function that cannot be inlined. The prefetch instruction can include information such as the opcode, offset, cache type, and cache length.

例如，对于长度为32位的预取指令可以采用如下形式：For example, a prefetch instruction with a length of 32 bits may be in the following form:

其中，opcode可以表示操作码，offest可以表示偏移量，mode可以表示缓存类型(即上文提及的目标缓存)，例如，如果mode为2bit，例如mode为b00-b11时，b00-b11这4个编码可以表示初级缓存(L0 I-cache)、一级缓存(L1 I-cache)、二级缓存(L2 I-cache)、三级缓存(L3 I-cache)，其中，00可以表示初级缓存(L0 I-cache)，01可以表示一级缓存(L1 I-cache)，10可以表示二级缓存(L2I-cache)，11可以表示三级缓存(L3 I-cache)。size可以表示预取指令的长度，也可以表示缓存行的条数，其中预取指令的长度和缓存行的条数可以为上文提及的预取指令数量。Among them, opcode can represent the operation code, offest can represent the offset, and mode can represent the cache type (i.e., the target cache mentioned above). For example, if mode is 2 bits, such as mode is b00-b11, the four codes b00-b11 can represent the primary cache (L0 I-cache), the first-level cache (L1 I-cache), the second-level cache (L2 I-cache), and the third-level cache (L3 I-cache), among which 00 can represent the primary cache (L0 I-cache), 01 can represent the first-level cache (L1 I-cache), 10 can represent the second-level cache (L2I-cache), and 11 can represent the third-level cache (L3 I-cache). Size can represent the length of the prefetch instruction, or the number of cache lines, wherein the length of the prefetch instruction and the number of cache lines can be the number of prefetch instructions mentioned above.

在编译阶段，编译器先遍历编译单元内所有的函数，通过特征识别出编译单元中的主函数，并将主函数调整至编译单元的开头。主函数特征包括但并不限于：main函数、kernel修饰符修饰的函数等。During the compilation phase, the compiler first traverses all functions in the compilation unit, identifies the main function in the compilation unit through features, and adjusts the main function to the beginning of the compilation unit. Main function features include but are not limited to: main function, functions modified by kernel modifier, etc.

在编译阶段，编译器可以在所有不可内联的子函数的函数入口插入硬件提供的预取指令，即第一预取指令，以及在不可内联的子函数的函数末尾分别插入标签，即第一预取指令对应的第一标签。During the compilation phase, the compiler may insert a prefetch instruction provided by the hardware, namely, the first prefetch instruction, at the function entry of all sub-functions that cannot be inlined, and insert a label, namely, the first label corresponding to the first prefetch instruction, at the function end of each sub-function that cannot be inlined.

在编译阶段，编译器可以在自动在调整至编译单元的开头的主函数的函数入口插入第二预取指令，例如“prefetch__dummy”，以及在编译单元(即主函数和不可内联函数形成的单元)的末尾插入一个空函数，例如“__dummy”。可以理解，该空函数可以为空函数，也可以为非空函数，该函数为不可内联的子函数。During the compilation phase, the compiler may insert a second prefetch instruction, such as "prefetch__dummy", at the function entry of the main function that is automatically adjusted to the beginning of the compilation unit, and insert an empty function, such as "__dummy", at the end of the compilation unit (i.e., a unit formed by the main function and the non-inline function). It is understood that the empty function may be an empty function or a non-empty function, which is a sub-function that cannot be inlined.

在链接阶段，编译器可以基于不可内联的子函数和编译单元的指令数量、以及多级缓存中目标缓存对应的存储空间的尺寸，确定预取指令数量。然后，在运行阶段，可以将可执行文件加载至内存120中，并跳转到第一条指令处开始运行。In the linking stage, the compiler can determine the number of pre-fetched instructions based on the number of instructions of the sub-functions and compilation units that cannot be inlined and the size of the storage space corresponding to the target cache in the multi-level cache. Then, in the running stage, the executable file can be loaded into the memory 120 and jump to the first instruction to start running.

在一些可选的实例中，若主函数的指令数量和不可内联的子函数的指令数量之和小于或等于目标缓存对应的剩余存储空间，则可以将主函数的指令数量和不可内联的子函数的指令数量之和，作为预取指令数量。若主函数的指令数量和不可内联的子函数的指令数量之和大于目标缓存对应的剩余存储空间，则可以将目标缓存对应的存储空间的尺寸作为预取指令数量。In some optional examples, if the sum of the number of instructions of the main function and the number of instructions of the sub-functions that cannot be inlined is less than or equal to the remaining storage space corresponding to the target cache, the sum of the number of instructions of the main function and the number of instructions of the sub-functions that cannot be inlined can be used as the number of pre-fetched instructions. If the sum of the number of instructions of the main function and the number of instructions of the sub-functions that cannot be inlined is greater than the remaining storage space corresponding to the target cache, the size of the storage space corresponding to the target cache can be used as the number of pre-fetched instructions.

在运行阶段，可以基于函数的执行时间顺序和预取指令数量，从内存120中读取预取指令数量的函数存储至缓存130。如此，可以提高确定的预取指令数量的准确性，从而可以实现加载指令数量大小合适的函数存储至缓存，以减少指令缺失的情况，进而可以提高处理器110的处理效率。In the running stage, based on the execution time sequence of the function and the number of pre-fetched instructions, the function with the number of pre-fetched instructions can be read from the memory 120 and stored in the cache 130. In this way, the accuracy of the determined number of pre-fetched instructions can be improved, so that the function with the appropriate number of loading instructions can be stored in the cache to reduce the situation of instruction missing, thereby improving the processing efficiency of the processor 110.

下面对在自动在编译单元(即主函数和不可内联函数形成的单元)的入口插入第二预取指令，以及在编译单元的末尾插入第二标签进行说明。The following describes the automatic insertion of the second prefetch instruction at the entry of the compilation unit (ie, the unit formed by the main function and the non-inlineable function) and the insertion of the second label at the end of the compilation unit.

在一些具体的实现中，如图4所示，示出了一种在调整至编译单元的开头的主函数的函数入口插入第二预取指令，以及在编译单元的末尾插入第二标签的伪代码。其中，第二预取指令可以为“prefetch__dummy”，第二标签可以为“__dummy”。其中，“__dummy”可以为空函数。如此，可以对整个编译单元的函数进行重排，将主函数放在最前面，将编译器插入的空函数放在最后面，生成最终的函数排序顺序为main-func1-func2-dummy。In some specific implementations, as shown in FIG4 , a pseudo code is shown for inserting a second prefetch instruction at the function entry of the main function adjusted to the beginning of the compilation unit, and inserting a second label at the end of the compilation unit. The second prefetch instruction may be "prefetch__dummy" and the second label may be "__dummy". "__dummy" may be an empty function. In this way, the functions of the entire compilation unit may be rearranged, with the main function placed at the front and the empty function inserted by the compiler placed at the back, generating a final function sorting order of main-func1-func2-dummy.

可以理解，对于图4所示的主函数，bl可以表示函数调用指令，即main函数调用func1和func2。It can be understood that, for the main function shown in FIG. 4 , bl may represent a function call instruction, that is, the main function calls func1 and func2.

可以理解，主函数可以包括main函数、OpenCL中的kernel函数、CUDA中的内核函数等，本申请实施例不作具体限定。It can be understood that the main function may include a main function, a kernel function in OpenCL, a kernel function in CUDA, etc., and the embodiment of the present application does not make specific limitations.

下面对在不可内联的子函数的函数入口插入硬件提供的预取指令，以及在不可内联的子函数的函数末尾分别插入标签(label)进行说明。The following describes how to insert a prefetch instruction provided by the hardware into the function entry of a sub-function that cannot be inlined, and how to insert a label into the function end of a sub-function that cannot be inlined.

在一些具体的实现中，如图5所示，示出了一种在不可内联的子函数的函数入口插入预取指令，以及在不可内联的子函数的函数末尾插入标签的伪代码。其中，主标签可以为“prefetch__func1_end”，标签(label)可以为“__func1_end”。In some specific implementations, as shown in FIG5 , a pseudo code for inserting a prefetch instruction at the function entry of a sub-function that cannot be inlined and inserting a label at the function end of the sub-function that cannot be inlined is shown, wherein the main label may be “prefetch__func1_end” and the label may be “__func1_end”.

可以理解，在链接阶段，可以基于预取指令与标签之间的距离，得到预取指令的长度，或者将其转换为缓存行的条数，其中预取指令的长度和缓存行的条数可以为上文提及的预取指令数量。It can be understood that in the linking stage, the length of the prefetch instruction can be obtained based on the distance between the prefetch instruction and the tag, or it can be converted into the number of cache lines, where the length of the prefetch instruction and the number of cache lines can be the number of prefetch instructions mentioned above.

可以理解的是，本申请实施例列举的预取指令“prefetch__dummy”、“prefetch__func1_end”，以及标签“__dummy”、“__func1_end”仅是部分实例，不代表全部实例。It can be understood that the prefetch instructions "prefetch__dummy", "prefetch__func1_end", and labels "__dummy", "__func1_end" listed in the embodiments of the present application are only some examples and do not represent all examples.

可以理解，对于预取指令，硬件需要将预取操作形成流水，即硬件需要快速执行预取指令，如此才能够提高处理器处理效率。其中将预取操作形成流水可以是指，例如预取指令需要预取10条缓存行，在向目标缓存发送第一条缓存请求时，无需等待返回结果即可发送第二条缓存请求，以此类推，其中，第一缓存请求所请求的指令和第二缓存请求所请求的指令不同，如此，相较于在发送第一条缓存请求并接收到返回结果后，再发送第二缓存请求，可以提高处理器处理效率。It can be understood that for prefetch instructions, the hardware needs to form a pipeline for prefetch operations, that is, the hardware needs to execute prefetch instructions quickly, so as to improve the processing efficiency of the processor. Forming a pipeline for prefetch operations can mean, for example, that a prefetch instruction needs to prefetch 10 cache lines. When sending the first cache request to the target cache, the second cache request can be sent without waiting for the return result, and so on. The instructions requested by the first cache request and the instructions requested by the second cache request are different. In this way, compared with sending the second cache request after sending the first cache request and receiving the return result, the processing efficiency of the processor can be improved.

本申请实施例中，通过基于主标签以及主标签对应的副标签确定预取指令数量，可以提高确定的预取指令数量的准确性，从而可以实现加载指令数量大小合适的函数存储至缓存，例如，可以实现一次性读取待执行的整个函数存储至缓存中，以减少指令缺失的情况，进而可以提高处理器的处理效率。此外，通过在预取指令中设置偏移量，可以往前或者往后一定距离开始预取，可以重复利用硬件通用缓存的机制。而且，通过编译器在编译过程中完成指令预取，可以实现用户无感知预取，提高编程的便利性。In an embodiment of the present application, by determining the number of prefetch instructions based on the main tag and the sub-tag corresponding to the main tag, the accuracy of the determined number of prefetch instructions can be improved, so that a function with a suitable number of loading instructions can be stored in the cache. For example, the entire function to be executed can be read at one time and stored in the cache to reduce the situation of missing instructions, thereby improving the processing efficiency of the processor. In addition, by setting an offset in the prefetch instruction, prefetching can be started a certain distance forward or backward, and the mechanism of the hardware general cache can be reused. Moreover, by completing instruction prefetching during the compilation process by the compiler, user-unaware prefetching can be achieved, improving the convenience of programming.

可以理解，本申请实施例提及的指令处理方法可以应用于电子设备，下面对电子设备的硬件结构进行介绍。下面对电子设备的硬件结构进行介绍。如图6所示，图6示出了一种电子设备的硬件结构示意图。可以理解的是，本申请的电子设备可以为电子设备、台式机(桌面型电脑)、手持计算机、笔记本电脑(膝上型电脑)等电子设备，下面以电子设备为电子设备为例对电子设备的结构进行介绍。It is understandable that the instruction processing method mentioned in the embodiment of the present application can be applied to electronic devices, and the hardware structure of the electronic device is introduced below. The hardware structure of the electronic device is introduced below. As shown in Figure 6, Figure 6 shows a schematic diagram of the hardware structure of an electronic device. It is understandable that the electronic device of the present application can be an electronic device, a desktop computer (desktop computer), a handheld computer, a notebook computer (laptop) and other electronic devices. The structure of the electronic device is introduced below by taking the electronic device as an example of an electronic device.

在一个实施例中，电子设备可以包括一个或多个处理器601，与处理器601中的至少一个连接的系统控制逻辑602，与系统控制逻辑602连接的系统内存603，与系统控制逻辑602连接的非易失性存储器(NVM)604，以及与系统控制逻辑602连接的输入输出(I/O)设备605和网络接口606。In one embodiment, the electronic device may include one or more processors 601, a system control logic 602 connected to at least one of the processors 601, a system memory 603 connected to the system control logic 602, a non-volatile memory (NVM) 604 connected to the system control logic 602, and an input/output (I/O) device 605 and a network interface 606 connected to the system control logic 602.

在一些实施例中，处理器601可以包括一个或多个单核或多核处理器。在一些实施例中，处理器601可以包括通用处理器和专用处理器(例如，图形处理器，应用处理器，基带处理器等)的任意组合。在电子设备采用eNB(Evolved Node B，增强型基站)或RAN(RadioAccess Network，无线接入网)控制器的实施例中，处理器601可以被配置为执行各种符合的实施例。In some embodiments, the processor 601 may include one or more single-core or multi-core processors. In some embodiments, the processor 601 may include any combination of a general-purpose processor and a dedicated processor (e.g., a graphics processor, an application processor, a baseband processor, etc.). In an embodiment where the electronic device uses an eNB (Evolved Node B, enhanced base station) or a RAN (Radio Access Network, wireless access network) controller, the processor 601 may be configured to execute various compliant embodiments.

在一些实例中，处理器601中可以设置缓存607。其中，缓存607可以为多级缓存，例如初级缓存(L0 I-cache)、一级缓存(L1 I-cache)、二级缓存(L2 I-cache)、三级缓存(L3I-cache)…N级缓存(Ln I-cahce)。例如，通用存储器中可以设置初级缓存(L0 I-cache)、一级缓存(L1 I-cache)、二级缓存(L2 I-cache)和三级缓存(L3 I-cache)。专用存储器中可以设置初级缓存(L0 I-cache)和一级缓存(L1 I-cache)。In some examples, a cache 607 may be provided in the processor 601. The cache 607 may be a multi-level cache, such as a primary cache (L0 I-cache), a first-level cache (L1 I-cache), a second-level cache (L2 I-cache), a third-level cache (L3 I-cache) ... N-level cache (Ln I-cahce). For example, a primary cache (L0 I-cache), a first-level cache (L1 I-cache), a second-level cache (L2 I-cache) and a third-level cache (L3 I-cache) may be provided in a general memory. A primary cache (L0 I-cache) and a first-level cache (L1 I-cache) may be provided in a dedicated memory.

在一些实施例中，系统控制逻辑602可以包括任意合适的接口控制器，以向处理器601中的至少一个和/或与系统控制逻辑602通信的任意合适的设备或组件提供任意合适的接口。In some embodiments, system control logic 602 may include any suitable interface controller to provide any suitable interface to at least one of processors 601 and/or any suitable device or component in communication with system control logic 602 .

在一些实施例中，系统控制逻辑602可以包括一个或多个存储器控制器，以提供连接到系统内存603的接口。系统内存603可以用于加载以及存储数据和/或指令6031。在一些实施例中电子设备的内存可以包括任意合适的易失性存储器，例如合适的动态随机存取存储器(DRAM)。In some embodiments, the system control logic 602 may include one or more memory controllers to provide an interface to the system memory 603. The system memory 603 may be used to load and store data and/or instructions 6031. In some embodiments, the memory of the electronic device may include any suitable volatile memory, such as a suitable dynamic random access memory (DRAM).

非易失性存储器(NVM)604可以包括用于存储数据和/或指令的一个或多个有形的、非暂时性的计算机可读介质。在一些实施例中，非易失性存储器(NVM)604可以包括闪存等任意合适的非易失性存储器和/或任意合适的非易失性存储设备，例如HDD(Hard DiskDrive，硬盘驱动器)，CD(Compact Disc，光盘)驱动器，DVD(Digital Versatile Disc，数字通用光盘)驱动器中的至少一个。The non-volatile memory (NVM) 604 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, the non-volatile memory (NVM) 604 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as at least one of a HDD (Hard Disk Drive), a CD (Compact Disc) drive, and a DVD (Digital Versatile Disc) drive.

非易失性存储器(NVM)604可以包括安装电子设备的装置上的一部分存储资源，或者它可以由设备访问，但不一定是设备的一部分。例如，可以经由网络接口606通过网络访问非易失性存储器(NVM)604。The non-volatile memory (NVM) 604 may include a portion of storage resources on a device on which the electronic device is installed, or it may be accessible by the device but is not necessarily a portion of the device. For example, the non-volatile memory (NVM) 604 may be accessed over a network via the network interface 606 .

特别地，系统内存603和非易失性存储器(NVM)604可以分别包括：指令的暂时副本和永久副本。指令可以包括：由处理器601中的至少一个执行时导致电子设备实施本申请实施例中提及的指令处理方法的指令。在一些实施例中，指令、硬件、固件和/或其软件组件可另外地/替代地置于系统控制逻辑602，网络接口606和/或处理器601中。In particular, the system memory 603 and the non-volatile memory (NVM) 604 may include: a temporary copy and a permanent copy of the instruction, respectively. The instruction may include: an instruction that causes the electronic device to implement the instruction processing method mentioned in the embodiment of the present application when executed by at least one of the processors 601. In some embodiments, the instruction, hardware, firmware and/or its software components may be placed in the system control logic 602, the network interface 606 and/or the processor 601 in addition or alternatively.

网络接口606可以包括收发器，用于为电子设备提供无线电接口，进而通过一个或多个网络与任意其他合适的设备(如前端模块，天线等)进行通信。在一些实施例中，网络接口606可以集成于电子设备的其他组件。例如，网络接口606可以集成于处理器601的，系统内存603，非易失性存储器(NVM)604，和具有指令的固件设备(未示出)中的至少一种，当处理器601中的至少一个执行所述指令时，电子设备实现本申请实施例中提及的指令处理方法。The network interface 606 may include a transceiver for providing a radio interface for the electronic device, and then communicating with any other suitable device (such as a front-end module, an antenna, etc.) through one or more networks. In some embodiments, the network interface 606 can be integrated with other components of the electronic device. For example, the network interface 606 can be integrated with at least one of the processor 601, the system memory 603, the non-volatile memory (NVM) 604, and a firmware device (not shown) with instructions. When at least one of the processors 601 executes the instructions, the electronic device implements the instruction processing method mentioned in the embodiments of the present application.

网络接口606可以进一步包括任意合适的硬件和/或固件，以提供多输入多输出无线电接口。例如，网络接口606可以是网络适配器，无线网络适配器，电话调制解调器和/或无线调制解调器。The network interface 606 may further include any suitable hardware and/or firmware to provide a multiple-input multiple-output radio interface. For example, the network interface 606 may be a network adapter, a wireless network adapter, a telephone modem and/or a wireless modem.

在一个实施例中，处理器601中的至少一个可以与用于系统控制逻辑602的一个或多个控制器的逻辑封装在一起，以形成系统封装(SiP)。在一个实施例中，处理器601中的至少一个可以与用于系统控制逻辑602的一个或多个控制器的逻辑集成在同一管芯上，以形成片上系统(SoC)。In one embodiment, at least one of the processors 601 may be packaged together with logic for one or more controllers of the system control logic 602 to form a system in package (SiP). In one embodiment, at least one of the processors 601 may be integrated on the same die with logic for one or more controllers of the system control logic 602 to form a system on chip (SoC).

电子设备可以进一步包括：输入/输出(I/O)设备605。I/O设备605可以包括用户界面，使得用户能够与电子设备进行交互；外围组件接口的设计使得外围组件也能够与电子设备交互。在一些实施例中，电子设备还包括传感器，用于确定与电子设备相关的环境条件和位置信息的至少一种。The electronic device may further include: an input/output (I/O) device 605. The I/O device 605 may include a user interface to enable a user to interact with the electronic device; the design of the peripheral component interface enables the peripheral component to also interact with the electronic device. In some embodiments, the electronic device further includes a sensor for determining at least one of an environmental condition and location information related to the electronic device.

在一些实施例中，用户界面可包括但不限于显示器(例如，液晶显示器，触摸屏显示器等)，扬声器，麦克风，一个或多个相机(例如，静止图像照相机和/或摄像机)，手电筒(例如，发光二极管闪光灯)和键盘。In some embodiments, the user interface may include, but is not limited to, a display (e.g., an LCD display, a touch screen display, etc.), a speaker, a microphone, one or more cameras (e.g., a still image camera and/or a video camera), a flashlight (e.g., an LED flash), and a keyboard.

在一些实施例中，外围组件接口可以包括但不限于非易失性存储器端口、音频插孔和电源接口。In some embodiments, the peripheral component interface may include, but is not limited to, a non-volatile memory port, an audio jack, and a power interface.

在一些实施例中，传感器可包括但不限于陀螺仪传感器，加速度计，近程传感器，环境光线传感器和定位单元。定位单元还可以是网络接口606的一部分或与网络接口606交互，以与定位网络的组件(例如，全球定位系统(GPS)卫星)进行通信。In some embodiments, the sensors may include, but are not limited to, gyroscope sensors, accelerometers, proximity sensors, ambient light sensors, and positioning units. The positioning unit may also be part of or interact with the network interface 606 to communicate with components of a positioning network (e.g., global positioning system (GPS) satellites).

以上介绍了电子设备可能具有的硬件结构，可以理解的是，本申请实施例示意的结构并不构成对电子设备的具体限定。在本申请另一些实施例中，电子设备可以包括比图示更多或更少的部件，或者组合某些部件，或者拆分某些部件，或者不同的部件布置。图示的部件可以以硬件，软件或软件和硬件的组合实现。The above describes the hardware structure that the electronic device may have. It is understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the electronic device. In other embodiments of the present application, the electronic device may include more or fewer components than shown in the figure, or combine certain components, or split certain components, or arrange the components differently. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

在附图中，可以以特定布置和/或顺序示出一些结构或方法特征。然而，应该理解，可能不需要这样的特定布置和/或排序。而是，在一些实施例中，这些特征可以以不同于说明性附图中所示的方式和/或顺序来布置。另外，在特定图中包括结构或方法特征并不意味着暗示在所有实施例中都需要这样的特征，并且在一些实施例中，可以不包括这些特征或者可以与其他特征组合。In the accompanying drawings, some structural or method features may be shown in a specific arrangement and/or order. However, it should be understood that such a specific arrangement and/or order may not be required. Instead, in some embodiments, these features may be arranged in a manner and/or order different from that shown in the illustrative drawings. In addition, the inclusion of structural or method features in a particular figure does not mean that such features are required in all embodiments, and in some embodiments, these features may not be included or may be combined with other features.

本申请公开的实施例可以被实现在硬件、软件、固件或这些实现方法的组合中。本申请的实施例可实现为在可编程系统上执行的计算机程序或程序代码，该可编程系统包括至少一个处理器、存储系统(包括易失性和非易失性存储器和/或存储元件)、至少一个输入设备以及至少一个输出设备。The embodiments disclosed in the present application may be implemented in hardware, software, firmware, or a combination of these implementation methods. The embodiments of the present application may be implemented as a computer program or program code executed on a programmable system, the programmable system including at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.

可将程序代码应用于输入指令，以执行本申请描述的各功能并生成输出信息。可以按已知方式将输出信息应用于一个或多个输出设备。为了本申请的目的，处理系统包括具有诸如例如数字信号处理器(DSP)、微控制器、专用集成电路或微处理器之类的处理器的任何系统。Program code can be applied to input instructions to perform the functions described in this application and generate output information. The output information can be applied to one or more output devices in a known manner. For the purposes of this application, a processing system includes any system having a processor such as, for example, a digital signal processor (DSP), a microcontroller, an application specific integrated circuit, or a microprocessor.

程序代码可以用高级程序化语言或面向对象的编程语言来实现，以便与处理系统通信。在需要时，也可用汇编语言或机器语言来实现程序代码。事实上，本申请中描述的机制不限于任何特定编程语言的范围。在任一情形下，该语言可以是编译语言或解释语言。Program code can be implemented with high-level programming language or object-oriented programming language to communicate with the processing system. When necessary, program code can also be implemented with assembly language or machine language. In fact, the mechanism described in this application is not limited to the scope of any specific programming language. In either case, the language can be a compiled language or an interpreted language.

在一些情况下，所公开的实施例可以以硬件、固件、软件或其任何组合来实现。所公开的实施例还可以被实现为由一个或多个暂时或非暂时性机器可读(例如，计算机可读)存储介质承载或存储在其上的指令，其可以由一个或多个处理器读取和执行。例如，指令可以通过网络或通过其他计算机可读介质分发。因此，机器可读介质可以包括用于以机器(例如，计算机)可读的形式存储或传输信息的任何机制，包括但不限于，软盘、光盘、光碟、只读存储器(CD-ROMs)、磁光盘、只读存储器(ROM)、随机存取存储器(RAM)、可擦除可编程只读存储器(EPROM)、电可擦除可编程只读存储器(EEPROM)、磁卡或光卡、闪存、或用于利用因特网以电、光、声或其他形式的传播信号来传输信息(例如，载波、红外信号数字信号等)的有形的机器可读存储器。因此，机器可读介质包括适合于以机器(例如，计算机)可读的形式存储或传输电子指令或信息的任何类型的机器可读介质。In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried or stored on one or more temporary or non-temporary machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, instructions may be distributed over a network or through other computer-readable media. Therefore, machine-readable media may include any mechanism for storing or transmitting information in a machine (e.g., computer) readable form, including, but not limited to, floppy disks, optical disks, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), random access memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or a tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared signals, digital signals, etc.) using the Internet in electrical, optical, acoustic, or other forms of propagation signals. Therefore, machine-readable media include any type of machine-readable media suitable for storing or transmitting electronic instructions or information in a machine (e.g., computer) readable form.

需要说明的是，本申请各设备实施例中提到的各单元/模块都是逻辑单元/模块，在物理上，一个逻辑单元/模块可以是一个物理单元/模块，也可以是一个物理单元/模块的一部分，还可以以多个物理单元/模块的组合实现，这些逻辑单元/模块本身的物理实现方式并不是最重要的，这些逻辑单元/模块所实现的功能的组合才是解决本申请所提出的技术问题的关键。此外，为了突出本申请的创新部分，本申请上述各设备实施例并没有将与解决本申请所提出的技术问题关系不太密切的单元/模块引入，这并不表明上述设备实施例并不存在其它的单元/模块。It should be noted that the units/modules mentioned in the various device embodiments of the present application are all logical units/modules. Physically, a logical unit/module can be a physical unit/module, or a part of a physical unit/module, or can be implemented as a combination of multiple physical units/modules. The physical implementation method of these logical units/modules themselves is not the most important. The combination of functions implemented by these logical units/modules is the key to solving the technical problems proposed by the present application. In addition, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules that are not closely related to solving the technical problems proposed by the present application, which does not mean that there are no other units/modules in the above-mentioned device embodiments.

需要说明的是，在本专利的示例和说明书中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in the examples and description of this patent, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "including one" do not exclude the existence of other identical elements in the process, method, article or device including the elements.

虽然通过参照本申请的某些实施例，已经对本申请进行了图示和描述，但本领域的普通技术人员应该明白，可以在形式上和细节上对其作各种改变，而不偏离本申请的范围。Although the present application has been illustrated and described with reference to certain embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the present application.

Claims

1. An instruction processing method, applied to an electronic device, wherein the electronic device comprises a processor and a memory, and the processor comprises a cache, wherein the method comprises:

detecting an execution instruction of a compilation unit, wherein the compilation unit includes a target instruction;

Determining the number of prefetch instructions corresponding to the compilation unit based on the number of the target instructions and the remaining storage space of the cache;

The target instructions of the pre-fetched instruction quantity are read from the instructions corresponding to the compilation unit stored in the memory, and stored in the cache.

2. The method according to claim 1, wherein the target instruction comprises a first instruction and a second instruction,

The first instruction is an instruction of a first sub-function, and the second instruction is an instruction of a first function and the first sub-function.

The first function calls the first sub-function, and the first sub-function is a sub-function that cannot be inlined.

3. The method according to claim 2, characterized in that the number of the target instructions is determined by:

Inserting a first label group into the first sub-function, and inserting a second label group into the compilation unit, wherein the first label group and the second label group are different;

Based on the first tag group, determining a first quantity of the first instructions;

Based on the second tag group, determining a second number of the second instructions;

Based on the first number and the second number, the number of the target instructions is determined.

4. The method according to claim 3, wherein the first tag group includes a first prefetch instruction and a first tag,

The inserting the first tag group into the first sub-function comprises:

When compiling the first sub-function, the first prefetch instruction is inserted into the function entry of the first sub-function, and the first label is inserted into the function end of the first sub-function.

5. The method according to claim 3, wherein the second tag group includes a second prefetch instruction and a second tag,

The inserting the second tag group into the compilation unit comprises:

When compiling the compilation unit, the first function is adjusted to the beginning of the compilation unit, the second prefetch instruction is inserted at the function entry of the first function, and the second label is inserted at the end of the compilation unit.

6 . The method according to claim 2 , wherein the number of pre-fetched instructions is any one of a first number of the first instructions or a second number of the second instructions.

7. The method according to claim 6, wherein determining the number of prefetch instructions corresponding to the compilation unit based on the number of the target instructions and the remaining storage space of the cache comprises:

Corresponding to the remaining storage space of the cache being greater than the second number, the number of prefetch instructions is determined according to the second number; or

The remaining storage space corresponding to the cache is less than or equal to the second number and greater than the first number, and the number of prefetch instructions is determined according to the first number; or

Corresponding to the remaining storage space of the cache being less than the first number, the number of pre-fetch instructions is determined according to the remaining storage space of the cache.

8. The method according to any one of claims 1 to 7, characterized in that the request stored in the cache is split into N cache requests by hardware according to the bus processing data size, and the N cache requests include a first cache request and a second cache request;

The storing in the cache comprises:

Sending the first cache request to the cache;

Before receiving the return result of the cache, the second cache request is sent to the cache, and the instruction requested by the first cache request is different from the instruction requested by the second cache request.

9. A chip, comprising a processor and a memory, wherein the processor comprises a cache and a processing unit, wherein:

The processing unit is used to read target instructions of the pre-fetched instruction quantity from the instructions of the corresponding compilation unit stored in the memory, and store them in the cache, wherein

The number of pre-fetched instructions is determined based on the number of target instructions and the remaining storage space of the cache.

10. An electronic device, comprising: the chip according to claim 9.

11. A readable storage medium, characterized in that instructions are stored on the readable storage medium, and when the instructions are executed on an electronic device, the electronic device executes the instruction processing method according to any one of claims 1 to 8.