Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the embodiments of the present disclosure, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.
It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The term "and/or" is used herein to describe only one relationship, and means that three relationships may exist, for example, A and/or B, and that three cases exist, A alone, A and B together, and B alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.
Computer instructions are instructions and commands that direct the operation of a machine, and the execution of instructions arranged in a certain order is the operation of a computer. The work of instruction execution mainly includes access to memory space, computation of data, and control related instructions. It has been found that, for a section of computer program, the computer instructions include control class instructions and calculation class instructions, where if the number of instructions of the control class instructions is large, the execution efficiency of the computer program will be affected, so that the execution efficiency of the processor is affected.
The computer instructions include core computing instructions and a large number of non-computing instructions, for example, the loop instructions include computing instructions and loop control instructions (i.e., the non-computing instructions), where the loop control instructions are used to control the execution process of the loop instructions of each layer in the loop instructions. In the execution process of the loop instruction, a large number of non-computing instructions occupy a large number of instruction cycles, so that the execution efficiency of the loop instruction is affected.
In a related technical solution, the instruction distribution unit may send a plurality of instructions to the computing unit of the processor in one instruction cycle. However, the hardware structure of the technical scheme is complex, and is not suitable for the AI processor scene with high energy efficiency ratio requirements.
Based on the above study, the present disclosure provides an instruction processing method, apparatus, chip, board card, device, and storage medium. In the embodiment of the disclosure, a target circulation instruction can be acquired first, a plurality of circulation instructions to be rebuilt in the target circulation instruction are determined based on the instruction creation parameters of the target circulation instruction, then a first target instruction can be created based on the plurality of circulation instructions to be rebuilt, and instructions associated with the plurality of circulation instructions to be rebuilt in the target circulation instruction are modified based on the first target instruction, so that a second target instruction is obtained. In the above embodiment, by reconstructing the plurality of loop instructions to be reconstructed into the first target instruction, the instruction dimension of the target loop instruction can be reduced, and the number of control class instructions in the target loop instruction can be reduced, so that the duty ratio of the calculation class instructions in the target loop instruction to the instruction period is improved, and the execution efficiency of the processor to the program is improved.
For the sake of understanding the present embodiment, first, an instruction processing method disclosed in the embodiments of the present disclosure will be described in detail, where an execution body of the instruction processing method provided in the embodiments of the present disclosure is generally a computer device with a certain computing capability. In some possible implementations, the instruction processing method may be implemented by way of a processor invoking computer readable instructions stored in a memory.
Referring to fig. 1, a flowchart of an instruction processing method according to an embodiment of the disclosure is shown, where the method includes steps S101 to S107, where:
s101, acquiring a target circulation instruction, wherein the target circulation instruction comprises a plurality of layers of nested circulation instructions.
In the embodiment of the present disclosure, the target cyclic instruction may include a plurality of cyclic instructions, where the plurality of cyclic instructions may be the above-mentioned multi-layer nested cyclic instructions, and the target cyclic instruction is described below by taking a 4-layer nested cyclic instruction (fdim, fdim1, wdim0, wdim 1) as an example, where the cyclic instruction wdim1 is nested inside wdim0, the cyclic instruction wdim0 is nested inside fdim1, the cyclic instruction fdim is nested inside the cyclic instruction fdim0, and specific instruction contents of the 4-layer nested cyclic instruction may be as follows:
Wherein, the for loop instruction (denoted as for 4) in the 7 th line program is the loop instruction wdim, the for loop instruction (denoted as for 3) in the 5 th line program is the loop instruction wdim0, the for loop instruction (denoted as for 2) in the 3 rd line program is the loop instruction fdim1, and the for loop instruction (denoted as for 1) in the 1 st line program is the loop instruction fdim0.
The above 4-layer nested loop instruction can be understood as a loop instruction for performing convolution calculation on the two-dimensional feature map and the two-dimensional convolution kernel.
Here, the specific instruction pattern for each loop instruction in for1 to for4 may be the following pattern:
for (initializing expression statement, condition judgment statement, control condition statement)
{
Cycling the body sentence;
}
At this time, for4 is taken as an example, for (m=0, m=0 in m < kernel_w, m++) is "initialization expression sentence", "m < kernel_w" is "condition judgment sentence", "m++" is "control condition sentence", and "macc (fmp, weight, res)" is "loop body sentence". That is, the initialization expression statement, the condition judgment statement, and the control condition statement may be understood as the control class instructions described above, and the loop body statement may be understood as the calculation class instructions described above. When the number of control instructions is large, more instruction cycles are occupied, so that the execution efficiency of the processor is reduced.
And S103, determining a plurality of circulation instructions to be rebuilt in the target circulation instructions based on instruction creation parameters of the target circulation instructions, wherein the instruction creation parameters comprise instruction demand characteristics in at least one dimension.
Here, the instruction creation parameter set by the user for the target loop instruction may be acquired, and then, the loop instruction matching the instruction creation parameter is determined to be a plurality of loop instructions to be reconstructed among a plurality of loop instructions included in the target loop instruction. Wherein different instruction creation parameters may be set for different target loop instructions.
The instruction creation parameter may include at least one instruction requirement feature, and one or more instruction requirement features in the at least one instruction requirement feature may correspond to a dimension, where the dimension of the instruction requirement feature is used to indicate a feature type of the instruction requirement feature. Based on this, instruction demand characteristics belonging to a plurality of characteristic types may be contained in the instruction creation parameter.
In the embodiment of the disclosure, the feature type may be an identification type feature and a parameter type feature, for example, the instruction creation parameter may be an instruction indication identifier for the identification type feature, and the instruction creation parameter may be an instruction requirement parameter for the parameter type feature, wherein the instruction indication identifier is used for indicating a loop instruction to be reconstructed in a target loop instruction, and the instruction requirement parameter is used for indicating an instruction index requirement of the loop instruction after reconstruction, and the loop instruction after reconstruction may be a second target instruction in the following steps.
In an alternative embodiment, the instruction indication identifier may be an identifier carried in an instruction content of a loop instruction, where the instruction indication identifier is used to indicate whether the loop instruction is a loop instruction to be rebuilt.
In an alternative embodiment, the instruction requirement parameter may include at least one of instruction complexity, instruction execution efficiency, and instruction execution repeatability, and may include other requirement parameters that can be used to indicate the instruction pointer requirement of the post-reconstruction loop instruction, which is not specifically limited in this disclosure and can be implemented as follows.
Here, the instruction complexity may be used to indicate an instruction dimension of the second target instruction. The instruction execution efficiency may be used to indicate the number of instruction cycles occupied by the second target instruction, or to indicate the number of instruction cycles occupied by the control class instruction (or the compute class instruction) in the second target instruction. Instruction execution repeatability may be used to indicate a maximum number of loops for each loop instruction in the second target instruction.
In the embodiment of the disclosure, the plurality of circulation instructions to be rebuilt may be all circulation instructions in the target circulation instruction, or may be part of circulation instructions in the target circulation instruction, and the plurality of circulation instructions to be rebuilt at least comprise multi-layer circulation instructions continuously nested in the target circulation instruction.
And S105, creating a first target instruction based on the plurality of loop instructions to be rebuilt, wherein the first target instruction is used for describing a loop calculation process of the plurality of loop instructions to be rebuilt.
Here, the first target instruction may be a single coarse-grained instruction, wherein the single coarse-grained instruction can be used to describe a loop calculation process of a plurality of loop instructions to be rebuilt. In the embodiment of the disclosure, by creating a plurality of loop instructions to be rebuilt as a single coarse-grained instruction, the number of control class instructions in the target loop instruction may be reduced, for example, the number of control class instructions in each loop instruction to be rebuilt may be reduced.
For the single coarse-grained instruction, the instruction distribution unit may send the single coarse-grained instruction to the computing unit of the processor through one instruction cycle, and after the computing unit of the processor acquires the single coarse-grained instruction, the computing unit of the processor may execute the loop computing process of the plurality of loop instructions to be rebuilt based on the single coarse-grained instruction.
By the processing mode, the occupation proportion of the control class instruction to the instruction period in the target circulation instruction can be effectively reduced, so that the occupation proportion of the calculation class instruction to the instruction period in the target circulation instruction is increased, and the execution efficiency of the processor to the computer program is further improved.
In an embodiment of the present disclosure, at least one first target instruction may be created based on a loop instruction to be reconstructed, where one first target instruction may be created based on a plurality of sequentially nested multi-layer loop instructions among a plurality of loop instructions to be reconstructed, and the number of creation of the first target instructions is not specifically limited in the present disclosure.
For example, assuming that the target loop instruction comprises a multi-layered nested loop instruction of for1-for2-for3-for4-for5-for6-for7-for8, wherein assuming that the plurality of loop instructions to be reconstructed are for6-for7-for8, and for2-for3-for4, then one first target instruction may be created based on for2-for3-for4, and another first target instruction may be created based on for6-for7-for 8.
And S107, modifying the instruction associated with the plurality of circulation instructions to be rebuilt in the target circulation instruction based on the first target instruction to obtain a second target instruction.
In specific implementation, the instructions associated with the plurality of loop instructions to be rebuilt in the target loop instruction can be determined, and the associated instructions are replaced by the coarse-grained instructions, so that the rebuilt loop instruction, namely the second target instruction, is obtained.
And receiving the 4-layer nested loop instruction, wherein the loop instruction to be rebuilt in the 4-layer nested loop instruction is for3 and for4, and then the instruction associated with for3 and for4 in the 4-layer nested loop instruction is the instruction corresponding to the program of the 4 th-8 th line in the computer program, and at the moment, the instruction corresponding to the program of the 4 th-8 th line can be replaced by the first target instruction, so that the second target instruction is obtained. If the first target instruction is represented as tmacc (instruction content of the first target instruction).
Based on this, the 4-layer nested loop instruction described above can be described as:
Here, the instruction content of the first target instruction is used to indicate a storage address of the usage data required in executing the coarse-grained instruction, and is used to indicate loop control parameters of a plurality of loop instructions to be rebuilt (i.e., for3 and for 4). The cycle control parameter is a parameter for controlling the cycle states of for3 and for 4.
In the embodiment of the disclosure, a target circulation instruction can be acquired first, a plurality of circulation instructions to be rebuilt in the target circulation instruction are determined based on the instruction creation parameters of the target circulation instruction, then a first target instruction can be created based on the plurality of circulation instructions to be rebuilt, and instructions associated with the plurality of circulation instructions to be rebuilt in the target circulation instruction are modified based on the first target instruction, so that a second target instruction is obtained. In the above embodiment, by reconstructing the plurality of loop instructions to be reconstructed into the first target instruction, the instruction dimension of the target loop instruction can be reduced, and the number of control class instructions in the target loop instruction can be reduced, so that the duty ratio of the calculation class instructions in the target loop instruction to the instruction period is improved, and the execution efficiency of the processor to the program is improved.
In an optional embodiment, in the case that the instruction creation parameter is the instruction indication identifier, step S103 described above determines, based on the instruction creation parameter of the target loop instruction, a plurality of loop instructions to be reconstructed in the target loop instruction, including the following steps:
S11, acquiring instruction content of each layer of circulation instruction in the target circulation instruction;
S12, determining a plurality of circulating instructions carrying the instruction indication mark in the circulating instructions of each layer based on the instruction content;
S13, determining the cyclic instruction meeting the creation requirement in the cyclic instructions as the cyclic instructions to be rebuilt.
In an embodiment of the present disclosure, the instruction creation parameter may be an instruction indication identifier that is set in advance by a program writer for a loop instruction of a specified level in the target loop instruction. Here, the instruction indication flag may be set in the instruction content of the corresponding loop instruction.
For example, for the for3:for (l=0, l < kernel_h, l++) in the 4-layer nested loop instruction, an instruction indication identifier Q may be set in the instruction content, where the instruction content of the loop instruction for3 may be described as for (l=0, l < kernel_h, l++, Q), where when the loop instruction is carried in a plurality of loop instructions, the loop instruction may be determined as the loop instruction to be reconstructed.
In specific implementation, the instruction content of each loop instruction in the target loop instruction can be traversed, and the corresponding instruction indication identifier is searched from the instruction content of each loop instruction, so that the loop instruction carrying the instruction indication identifier is determined according to the search result.
And determining the loop instruction meeting the creation requirement as a plurality of loop instructions to be rebuilt from the plurality of loop instructions carrying the instruction indication mark, wherein the creation requirement is described below in a split case.
In case one, the creation requirement is an instruction continuity requirement.
In this case, adjacent nested loop instructions of the plurality of loop instructions carrying the instruction indication identifier may be determined as a plurality of loop instructions to be rebuilt.
In the second case, the requirement is created for the identification content determined based on the instruction indication identification.
For the instruction indication identifier, different identifier contents may be set, for example, the identifier contents may be 1 or 0. In the above example, when q=1, it may indicate that for3 is a loop instruction to be rebuilt, and when q=0, it may indicate that for3 is not a loop instruction to be rebuilt.
In this case, a loop instruction whose identification content of the instruction indication identification is 1 may be determined as a plurality of loop instructions to be reconstructed. Further, adjacent nested loop instructions with an identification content of 1 of the instruction indication identification may be determined as a plurality of loop instructions to be rebuilt.
In the above embodiment, by setting the instruction indication identifier in the instruction content of the cyclic instruction to identify a plurality of cyclic instructions to be rebuilt according to the instruction indication identifier, the cyclic instructions to be rebuilt can be flexibly set in each layer of cyclic instructions of the target cyclic instruction, thereby meeting the setting requirement of program writers.
In an alternative embodiment, the instruction creation parameter is an instruction requirement parameter, as shown in fig. 2, which is a flowchart of the step S103, where determining, based on the instruction creation parameter of the target loop instruction, a plurality of loop instructions to be reconstructed in the target loop instruction specifically includes the following steps:
and S21, acquiring instruction demand parameters, wherein the instruction demand parameters comprise at least one of instruction complexity, instruction execution efficiency and instruction execution repeatability.
Here, in the case where the feature type of the instruction demand feature included in the instruction creation parameter is a parameter class feature, the instruction creation parameter may be an instruction demand parameter. The instruction requirement parameter may be used to indicate an instruction pointer requirement of a second target instruction, where the instruction pointer requirement is used to indicate a performance requirement of instruction execution of the second target instruction in a corresponding instruction execution scenario. The instruction requirement parameter may include at least one of instruction complexity, instruction execution efficiency, instruction execution repeatability. The above instruction requirement parameters will be described separately.
Parameter one, instruction complexity.
The instruction complexity may be used to indicate an instruction dimension of the second target instruction, where the instruction dimension may be used to indicate a number of levels of loop instructions contained in the second target instruction, or the instruction complexity may be used to indicate a number of instructions (e.g., a number of control class instructions) contained in the instruction content of the second target instruction.
With the above 4-level nested loop instruction, if the instruction complexity is used to indicate that the number of levels of loop instructions for the second target instruction is 2, then the second target loop instruction may be described as:
and the second parameter is instruction execution efficiency.
The instruction execution efficiency may be used to indicate the number of instruction cycles occupied by the second target instruction, or to indicate the number of instruction cycles occupied by the control class instruction (or the compute class instruction) in the second target instruction.
Here, the number of occupied instruction cycles will affect the instruction execution efficiency of the instruction during execution, where the greater the number of occupied instruction cycles, the lower the instruction execution efficiency and the fewer the number of occupied instruction cycles, the higher the instruction execution efficiency.
In the embodiment of the present disclosure, a mapping relationship between instruction execution efficiency and instruction cycle may be established in advance. The mapping relation is used for indicating that the number of instruction cycles occupied by the second target instruction is B under the condition that the instruction execution efficiency of the second target instruction is A. The instruction execution efficiency a may be a specific value, or may be a value interval, and similarly, the number B of instruction periods may also be a specific value, or may be a value interval, which is not specifically limited in the present disclosure and can be implemented.
And thirdly, instruction execution repeatability.
Instruction execution repeatability may be used to indicate a maximum number of loops for each loop instruction in the second target instruction. The maximum cycle number is understood to be the cycle number of each cycle instruction in each cycle.
Here, the number of execution times of each loop instruction during the execution of the instruction affects the instruction execution efficiency of the instruction, wherein the more the number of execution times of the loop instruction is, the lower the instruction execution efficiency is, and the less the number of execution times of the loop instruction is, the higher the instruction execution efficiency is.
In the embodiment of the disclosure, a loop instruction with a loop number greater than the maximum loop number in the multi-layer nested loop instruction may be recycled, and then, a loop instruction to be rebuilt is determined in the determined loop instruction.
By the processing mode, the circulating instructions with more circulating times in the multi-layer nested circulating instructions can be rebuilt into a single coarse-grained instruction, so that a second target instruction with the circulating times meeting the requirement is obtained, and the instruction execution efficiency is improved.
S22, determining the plurality of circulation instructions to be rebuilt based on a plurality of second circulation instructions matched with the instruction demand parameters in the target circulation instructions.
In the disclosed embodiments, after the instruction demand parameter is obtained, a plurality of second loop instructions matching the instruction demand parameter may be determined in the target loop instructions. In implementation, the instruction dimension information corresponding to the instruction requirement parameter may be determined, so that the loop instruction with the corresponding dimension is determined to be a plurality of second loop instructions in the target loop instruction based on the instruction dimension information.
In an optional embodiment, the step S22, determining the plurality of loop instructions to be rebuilt based on the plurality of second loop instructions matched with the instruction requirement parameter in the target loop instruction, specifically includes the following steps:
(1) Determining instruction dimension information based on the instruction demand parameters, wherein the instruction dimension information is used for indicating the number of circulation layers of a circulation instruction to be rebuilt in the target circulation instruction;
(2) And determining the plurality of loop instructions to be rebuilt based on a plurality of second loop instructions matched with the instruction dimension information in the target loop instructions.
In the embodiment of the disclosure, each instruction requirement parameter can determine one instruction dimension information, or all instruction requirement parameters can determine one instruction dimension information, or part of instruction requirement parameters can determine one instruction dimension information. The determination of the instruction dimension information will be described in detail below.
Case one:
in this case, one instruction dimension information may be determined based on all instruction requirement parameters, and the loop instruction to be reconstructed may be determined based on the instruction dimension information.
In particular, the instruction parameters of the target loop instruction may be obtained, where the instruction parameters of the target loop instruction include at least one of an instruction number of a plurality of loop instructions included in the target loop instruction, a number of loops per loop instruction, a calculated amount per loop instruction, a control class instruction in the target loop instruction, and a calculated class instruction number. Then, the instruction parameters and the instruction demand parameters of the target circulation instruction are input into the deep neural network model for processing, so that instruction dimension information is predicted according to the deep neural network model.
In the embodiment of the disclosure, the deep neural network model may determine the matched instruction dimension information according to the instruction parameter and the instruction demand parameter of the target circulation instruction, so that the reconstructed second target instruction can meet the instruction demand parameter.
Here, the training process of the deep neural network model may be described as follows:
Firstly, a training sample is generated, wherein the training sample comprises input data and a data tag, the input data can be training instruction parameters, and the data tag can be tag dimension information corresponding to the training instruction parameters.
The training samples are then input into the initial deep neural network model for training. In the training process, predicted data can be obtained based on input data, then, a loss function is determined based on the predicted data and a data label, and further, model parameters of an initial deep neural network model are adjusted based on the loss function, so that the deep neural network model meeting training requirements is obtained.
And a second case:
In this case, one instruction dimension information may be determined based on each instruction creation parameter, and one final instruction dimension information may be determined based on all the instruction dimension information, so as to determine a plurality of loop instructions to be reconstructed according to the final instruction dimension information. For example, the largest instruction dimension information of all instruction dimension information may be determined as final instruction dimension information, so as to determine a plurality of loop instructions to be reconstructed according to the largest instruction dimension information.
The first instruction requirement parameter is instruction complexity.
From the above description, the instruction complexity may be used to indicate the number of levels M1 of the loop instruction contained in the second target instruction. At this time, the number of stages M2 of the loop instruction included in the target loop instruction may be determined, and the instruction dimension information A1 may be determined based on the difference between the number of stages M2 and the number of stages M1.
And secondly, the instruction demand parameter is instruction execution efficiency.
From the above description, it is known that the instruction execution efficiency may be used to indicate the number of instruction cycles occupied by the second target instruction, or to indicate the number of instruction cycles occupied by the control class instruction (or the calculation class instruction) in the second target instruction.
In specific implementation, the number of instruction cycles matching the instruction execution efficiency may be determined based on the mapping relationship, and denoted as B1. At this time, the number of instruction cycles occupied by the target loop instruction may be acquired and recorded as B2, and then the instruction dimension information A2 is determined based on the instruction cycle number B2 and the instruction cycle number B1.
Here, a difference between the instruction cycle number B2 and the instruction cycle number B1 may be determined, and the instruction dimension information A2 may be determined based on the difference.
In specific implementation, the number C of instruction periods occupied by each loop instruction in loop instructions nested in each layer can be firstly determined, then the loop layer number of a plurality of continuous loop instructions with the sum of the number greater than or equal to the difference value is determined in the plurality of numbers C, and therefore the instruction dimension information A2 is determined based on the loop layer number.
Third, the instruction requirement parameter is instruction execution repeatability.
In the case where the instruction execution repeatability is used to indicate the maximum number of loops of each loop instruction in the second target instruction, a loop instruction having a loop number greater than the maximum number of loops in the target loop instruction may be determined, and the determined number of loop instructions may be determined as the instruction dimension information A3.
After determining the instruction dimension information A1, the instruction dimension information A2, and the instruction dimension information A3, the largest instruction dimension information among the instruction dimension information A1, the instruction dimension information A2, and the instruction dimension information A3 may be determined as final instruction dimension information.
After determining the instruction dimension information in the manner described above, a plurality of second loop instructions matching the instruction dimension information may be determined among the target loop instructions.
Here, the loop instruction of the plurality of consecutive levels may be determined as the plurality of second loop instructions starting from the loop instruction located at the innermost layer among the target loop instructions, wherein the number of levels of the plurality of consecutive levels is the instruction dimension information described above. Thereafter, a plurality of loop instructions to be reconstructed may be determined from the plurality of second loop instructions, e.g., the plurality of second loop instructions may be determined as the plurality of loop instructions to be reconstructed.
For example, assume that the target loop instruction includes a multi-tier nested loop instruction for1-for2-for3-for4-for5-for6-for7-for8. If the above instruction dimension information a1=3, the instruction dimension information a2=2, and the instruction dimension information a3=3 are determined. At this time, it can be determined that the maximum instruction dimension information 3 is determined as final instruction dimension information. At this time, from among the target loop instructions, the loop instruction at the innermost layer (i.e., for 8), a plurality of loop instructions at successive levels may be determined as a plurality of second loop instructions, for example, for6-for7-for8 may be determined as a plurality of second loop instructions.
In the above embodiment, the method of determining the plurality of to-be-rebuilt circulation instructions in the target circulation instructions through the instruction demand parameters can automatically match the plurality of second circulation instructions meeting the requirements in the target circulation instructions as the to-be-rebuilt circulation instructions according to the user demands, so that the plurality of instruction demands of the user can be met, and the application scene of the technical scheme of the present disclosure is improved.
In an alternative implementation manner, based on the embodiment shown in fig. 2, the technical solution provided in the disclosure further includes the following steps:
A first calculation type is acquired, which is preset, and a second calculation type of each second loop instruction is acquired.
Based on this, S22, the determining the plurality of loop instructions to be rebuilt based on the plurality of second loop instructions matched with the instruction requirement parameter in the target loop instruction specifically further includes the following steps:
Based on the second calculation type, a third loop instruction matched with the first calculation type is determined in the plurality of second loop instructions, and the plurality of loop instructions to be rebuilt are determined based on the third loop instruction.
In the disclosed embodiments, each loop instruction in the target loop instruction includes a corresponding calculation type, where the calculation type is used to indicate a type of corresponding calculation performed by the corresponding loop instruction on the corresponding operand, for example, a calculation type such as a multiplication operation, an addition operation, a convolution operation, and the like. Here, the preset calculation type, that is, the first calculation type may be acquired, and the calculation type of each second loop instruction may be acquired, to obtain the second calculation type. Here, the first calculation type may be a calculation type preset by a programmer.
Next, the second calculation type and the first calculation type may be matched to determine a second calculation type that is the same as the first calculation type, and the second loop instruction determined by the same second calculation type is determined to be a third loop instruction.
In the case where the number of the first calculation types is one, a second calculation type identical to the first calculation type may be determined.
In the case where the number of the first calculation types is plural, it is possible to determine a second calculation type identical to any one of the first calculation types, and determine a second loop instruction determined by the identical second calculation type as a third loop instruction.
Here, the first calculation type may be understood as a type preset by a programmer, and the number of the first calculation types may be plural, and the first calculation type may indicate a loop instruction capable of instruction reconstruction from among the plural second loop instructions. Thus, by matching the second calculation type with the first calculation type, a loop instruction capable of instruction reconstruction can be determined from among the plurality of second loop instructions.
In the case where the number of the first calculation types is plural, it is also possible to determine a first calculation type designated by the user among the plural first calculation types, then determine a second calculation type identical to the designated first calculation type, and determine a second loop instruction determined by the identical second calculation type as a third loop instruction.
Then, a plurality of loop instructions to be rebuilt may be determined based on the third loop instruction. For example, in a case where the third loop instruction is plural, the third loop instruction of the adjacent hierarchy among the plural third loop instructions may be determined as the plural loop instructions to be reconstructed.
The above example is held for the 4-level nested loop instruction (fdim 0, fdim1, wdim0, wdim 1) described above. The 4-layer nested loop instruction is assumed to be a loop instruction for performing convolution computation on the two-dimensional feature map and the two-dimensional convolution kernel.
Assuming that the second loop instructions determined in the 4-level nested loop instructions (fdim, fdim1, wdim0, wdim 1) are wdim0 and wdim1, if the first computation type is convolution computation and the second computation type of the second loop instruction is convolution computation, then the second loop instructions wdim and wdim1 may be determined to be third loop instructions and the third loop instructions may be determined to be a plurality of computation instructions to be reconstructed.
In the above embodiment, after determining the plurality of second loop instructions, a third loop instruction meeting the type requirement may be screened out from the plurality of second loop instructions based on the calculation type, and a plurality of loop instructions to be reconstructed may be determined based on the third loop instruction. Through the processing mode, the fine screening of the target circulation instruction can be realized, so that the circulation instruction which meets the requirements of users more can be screened.
In an alternative embodiment, as shown in fig. 3, the step S105 creates a first target instruction based on the plurality of loop instructions to be reconstructed, and specifically includes the following steps:
S1051, determining address information and circulation information based on the plurality of circulation instructions to be rebuilt, wherein the address information is used for indicating a storage address of data for executing each circulation instruction to be rebuilt, and the circulation information is used for indicating the circulation layer number of the plurality of circulation instructions to be rebuilt and/or the circulation times of each circulation instruction to be rebuilt.
In the embodiment of the disclosure, the cycle information comprises first cycle information and/or second cycle information, the address information comprises data starting address and address stepping information, the first cycle information is used for indicating the cycle layer number of the plurality of cycle instructions to be rebuilt, the second cycle information is used for indicating the cycle times of each cycle instruction to be rebuilt, the data starting address is used for indicating the storage address of starting data for executing the first target instruction, and the address stepping information is used for indicating the address change information of the storage address of data in each cycle instruction to be rebuilt in each execution process.
Specifically, the data start address may be denoted addr, the address step information may be denoted step, the first cycle information may be denoted dim, and the second cycle information may be denoted the number of cycles per layer of the loop instruction described above. For example, if the plurality of loop instructions to be rebuilt are wdim0 and wdim1 of the 4-level nested loop instructions described above, the second loop information may be represented as kernel_h and kernel_w, where kernel_h represents the number of loops of outer loop wdim0 and kernel_w represents the number of loops of inner loop wdim1 during each execution of outer loop wdim 0.
S1052, creating the first target instruction based on the address information and the loop information.
Here, the instruction content of the first target instruction may be determined based on the address information and the loop information, wherein at least one instruction for indicating the address information and the loop information is included in the instruction content of the first target instruction.
Here, taking the above tmacc (instruction content of the first target instruction) as an example, the instruction content of the first target instruction may include the statement "addr, dim=n, size=a1, a1,", and thus the first target instruction may be expressed as tmacc (addr, dim=n, size=a1, a1, ") where the instructions addr and step are used to indicate the above address information, and the instructions dim=n and size=a1, a1,". An is used to indicate the above loop information.
If the target loop instruction is a 4-level nested loop instruction (fdim 0, fdim1, wdim0, wdim 1) and the plurality of loop instructions to be rebuilt are wdim0 and wdim1, then the first target instruction can be represented as:
tmacc(addr,dim=2,size=kernel_hxkernel_w,step)。
Specifically, addr in tmacc is used to indicate the storage address of the starting data of executing the first target instruction, step is used to indicate the address change information of the storage address of the data of each loop instruction to be rebuilt in each execution process. dim may be used to indicate the number of loop layers for the plurality of loop instructions to be rebuilt, and size may be used to indicate the number of loops per loop instruction to be rebuilt.
In the above embodiment, the method of reconstructing the plurality of loop instructions to be reconstructed into the first target instruction can reduce the instruction dimension of the target loop instruction and reduce the number of control class instructions in the target loop instruction, thereby improving the duty ratio of the calculation class instructions in the target loop instruction to the instruction period and improving the execution efficiency of the processor to the program.
In an alternative embodiment, as shown in fig. 4, step S107 modifies, based on the first target instruction, an instruction associated with the plurality of loop instructions to be reconstructed in the target loop instruction to obtain a second target instruction, and specifically includes the following steps:
And S1071, determining a target loop body of a first loop instruction in the target loop instructions, wherein the first loop instruction is a loop instruction of the upper layer of the loop instruction with the highest level in the plurality of loop instructions to be rebuilt.
And S1072, modifying a target loop body in the target loop instruction based on the first target instruction, and obtaining the second target instruction after modification.
In the embodiment of the disclosure, since the loop instructions in the target loop instruction are nested in multiple layers, among the multiple loop instructions to be reconstructed in the target loop instruction, each loop instruction to be reconstructed is also nested in multiple layers.
Here, the highest-level loop instruction among the plurality of loop instructions to be reconstructed may be determined first. Then, a loop instruction of a previous layer of the loop instruction of the highest hierarchy can be determined, and the loop instruction of the previous layer can be determined as the first loop instruction.
In an embodiment of the present disclosure, the target loop body of the first loop instruction includes loop instructions of other levels (e.g., the loop instruction to be rebuilt) nested inside the first loop instruction.
Here, it is assumed that the target loop instruction is the 4-layer nested loop instruction (fdim a0, fdim a1, wdim a0, wdim a 1), wherein the target loop instruction includes:
Wherein, the for loop instruction (denoted as for 4) in the 7 th line program is the loop instruction wdim, the for loop instruction (denoted as for 3) in the 5 th line program is the loop instruction wdim0, the for loop instruction (denoted as for 2) in the 3 rd line program is the loop instruction fdim1, and the for loop instruction (denoted as for 1) in the 1 st line program is the loop instruction fdim0.
Assuming that the plurality of loop instructions to be reconstructed are the loop instruction wdim and the loop instruction wdim1, the first loop instruction may be the loop instruction fdim, and the target loop body of the first loop instruction may be the content corresponding to the 4 th-8 th rows, namely the following content:
After the target loop body is determined, the target loop body in the target loop instruction can be modified based on the first target instruction, so that a second target instruction is obtained. Here, taking the above-mentioned first target instruction tmacc (addr, dim=2, size=kernel_ hxkernel _w, step) as an example, the instruction associated with the target loop in the target loop instruction may be replaced by the first target instruction, so as to obtain the second target instruction. For example, the instruction corresponding to lines 4-8 may be replaced with the first target instruction tmacc (addr, dim=2, size=kernel_ hxkernel _w, step), thereby obtaining the following second target instruction.
In the embodiment, the instruction dimension of the target circulation instruction can be reduced, and the number of control class instructions in the target circulation instruction is reduced, so that the duty ratio of the calculation class instructions in the target circulation instruction to the instruction period is improved, and the execution efficiency of the processor to the program is improved.
In summary, in the embodiment of the disclosure, a target loop instruction may be acquired first, and a plurality of loop instructions to be reconstructed in the target loop instruction may be determined based on an instruction creation parameter of the target loop instruction, and then a first target instruction may be created based on the plurality of loop instructions to be reconstructed, and an instruction associated with the plurality of loop instructions to be reconstructed in the target loop instruction may be modified based on the first target instruction, so as to obtain a second target instruction. In the above embodiment, by reconstructing the plurality of loop instructions to be reconstructed into the first target instruction, the instruction dimension of the target loop instruction can be reduced, and the number of control class instructions in the target loop instruction can be reduced, so that the duty ratio of the calculation class instructions in the target loop instruction to the instruction period is improved, and the execution efficiency of the processor to the program is improved.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
Based on the same inventive concept, the embodiments of the present disclosure further provide an instruction processing apparatus corresponding to the instruction processing method, and since the principle of the apparatus in the embodiments of the present disclosure for solving the problem is similar to that of the instruction processing method in the embodiments of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and the repetition is omitted.
Referring to fig. 5, a schematic diagram of an instruction processing apparatus according to an embodiment of the present disclosure is provided, where the apparatus includes an obtaining unit 51, a determining unit 52, a creating unit 53, and a modifying unit 54,
An obtaining unit 51, configured to obtain a target cyclic instruction, where the target cyclic instruction includes multiple layers of nested cyclic instructions;
a determining unit 52, configured to determine a plurality of loop instructions to be reconstructed in the target loop instructions based on instruction creation parameters of the target loop instructions, where the instruction creation parameters include an instruction requirement feature in at least one dimension;
A creation unit 53 for creating a first target instruction based on the plurality of loop instructions to be reconstructed, the first target instruction describing a loop calculation process of the plurality of loop instructions to be reconstructed;
And a modifying unit 54, configured to modify an instruction associated with the plurality of loop instructions to be rebuilt in the target loop instruction based on the first target instruction, so as to obtain a second target instruction.
In the embodiment of the disclosure, a target circulation instruction in a program can be acquired first, a plurality of circulation instructions to be rebuilt in the target circulation instruction are determined based on the instruction creation parameters of the target circulation instruction, then a first target instruction can be created based on the plurality of circulation instructions to be rebuilt, and an instruction associated with the plurality of circulation instructions to be rebuilt in the target circulation instruction is modified based on the first target instruction to obtain a second target instruction, so that simplification of the instructions associated with the plurality of circulation instructions to be rebuilt is realized, the number of circulation instructions serving as non-calculation instructions is reduced, the duty ratio of core calculation instructions in the program is equivalently improved, and the execution efficiency of a processor on the program is further improved.
In a possible implementation, the modifying unit 54 is further configured to:
Determining a target loop body of a first loop instruction in the target loop instructions, wherein the first loop instruction is a loop instruction of a previous layer of loop instructions with highest level in the plurality of loop instructions to be rebuilt;
And modifying a target loop body in the target loop instruction based on the first target instruction, and obtaining the second target instruction after modification.
In a possible implementation manner, the instruction creation parameter is an instruction indication identifier, and the determining unit 52 is further configured to:
Acquiring instruction content of each layer of circulation instruction in the target circulation instruction;
determining a plurality of circulating instructions carrying the instruction indication identifier in each layer of circulating instructions based on the instruction content;
And determining a loop instruction meeting the creation requirement in the plurality of loop instructions as the plurality of loop instructions to be rebuilt.
In a possible implementation manner, the instruction creation parameter is an instruction requirement parameter, and the determining unit 52 is further configured to:
the method comprises the steps of obtaining instruction demand parameters, wherein the instruction demand parameters comprise at least one of instruction complexity, instruction execution efficiency and instruction execution repeatability;
And determining the plurality of loop instructions to be rebuilt based on a plurality of second loop instructions matched with the instruction demand parameters in the target loop instructions.
In a possible implementation manner, the instruction creation parameter is an instruction requirement parameter, and the determining unit 52 is further configured to:
Determining instruction dimension information based on the instruction demand parameters, wherein the instruction dimension information is used for indicating the number of circulation layers of a circulation instruction to be rebuilt in the target circulation instruction;
and determining the plurality of loop instructions to be rebuilt based on a plurality of second loop instructions matched with the instruction dimension information in the target loop instructions.
In a possible implementation manner, the device is further used for acquiring a preset first calculation type and acquiring a second calculation type of each second loop instruction;
The determining unit 52 is further configured to determine a plurality of third loop instructions that match the first calculation type among the plurality of second loop instructions based on the second calculation type, and determine the plurality of loop instructions to be reconstructed based on the plurality of third loop instructions, where the plurality of third loop instructions are loop instructions of adjacent levels among the plurality of second loop instructions.
In a possible implementation, the creating unit 53 is further configured to:
Determining address information and cycle information based on the plurality of cycle instructions to be rebuilt, wherein the address information is used for indicating a storage address of data for executing each cycle instruction to be rebuilt, and the cycle information is used for indicating the number of cycle layers of the plurality of cycle instructions to be rebuilt and/or the cycle times of each cycle instruction to be rebuilt;
The first target instruction is created based on the address information and the loop information.
In a possible implementation manner, the cycle information comprises first cycle information and/or second cycle information, the address information comprises a data starting address and address stepping information, the first cycle information is used for indicating the number of cycle layers of the plurality of cycle instructions to be rebuilt, the second cycle information is used for indicating the cycle times of each cycle instruction to be rebuilt, the data starting address is used for indicating the storage address of starting data of executing the first target instruction, and the address stepping information is used for indicating address change information of the storage address of data in each execution process of each cycle instruction to be rebuilt.
The process flow of each unit in the apparatus and the interaction flow between units may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.
Corresponding to the instruction processing method in fig. 1, the embodiment of the present disclosure further provides a chip 600, as shown in fig. 6, which is a schematic structural diagram of the chip 600 provided in the embodiment of the present disclosure, including:
A processor 61, a memory 62, and a bus 63, the memory 62 being for storing execution instructions, the processor 61 and the memory 62 being in communication via the bus 63 such that the processor 61 executes the following instructions:
the method comprises the steps of obtaining a target circulation instruction, wherein the target circulation instruction comprises a plurality of layers of nested circulation instructions;
Determining a plurality of circulation instructions to be rebuilt in the target circulation instruction based on instruction creation parameters of the target circulation instruction, wherein the instruction creation parameters comprise instruction demand characteristics under at least one dimension;
Creating a first target instruction based on the plurality of loop instructions to be rebuilt, wherein the first target instruction is used for describing a loop calculation process of the plurality of loop instructions to be rebuilt;
And modifying an instruction associated with the plurality of cyclic instructions to be rebuilt in the target cyclic instruction based on the first target instruction to obtain a second target instruction.
The present disclosure also provides a board card including a package structure packaged with at least one of the above chips. Referring to fig. 7, an exemplary board card is provided that includes the chip 600 described above and may also include other components including, but not limited to, a memory device 702 and an interface device 704.
The memory device is connected with the chip in the chip packaging structure through a bus and is used for storing data. The memory device may include multiple sets of memory cells 706, such as DDR SDRAM (Double sided DATA RATE SDRAM, double speed synchronous dynamic random access memory) or the like. Each group of storage units is connected with the chip through a bus.
The interface device is electrically connected with the chip in the chip packaging structure. The interface means is used to enable data transfer between the chip and an external device 708, e.g. a terminal, a server, a camera, etc. In one embodiment, the interface device may include a PCIE interface, a network interface, or other interfaces, which is not limited by the disclosure.
Embodiments of the present disclosure also provide a computer device including the chip 600 described above or the board card described above.
The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the instruction processing method described in the above method embodiments. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.
The embodiments of the present disclosure further provide a computer program product, where the computer program product carries a program code, where instructions included in the program code may be used to perform steps of the instruction processing method described in the foregoing method embodiments, and specifically reference may be made to the foregoing method embodiments, which are not described herein in detail.
Wherein the above-mentioned computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
It should be noted that the foregoing embodiments are merely specific implementations of the disclosure, and are not intended to limit the scope of the disclosure, and although the disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that any modification, variation or substitution of some of the technical features described in the foregoing embodiments may be made or equivalents may be substituted for those within the scope of the disclosure without departing from the spirit and scope of the technical aspects of the embodiments of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.