CN114237711A - Vector instruction processing method and device - Google Patents
Vector instruction processing method and device Download PDFInfo
- Publication number
- CN114237711A CN114237711A CN202111506929.8A CN202111506929A CN114237711A CN 114237711 A CN114237711 A CN 114237711A CN 202111506929 A CN202111506929 A CN 202111506929A CN 114237711 A CN114237711 A CN 114237711A
- Authority
- CN
- China
- Prior art keywords
- processor
- code unit
- vector
- vector instruction
- instruction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
The invention provides a vector instruction processing method and vector instruction processing equipment. The method comprises the following steps: obtaining a first code unit operable on a first processor, the first processor being a processor of a first instruction set architecture, the first code unit including at least one first vector instruction, the first code unit being a code unit that has been vector optimized; a second code unit is generated that is executable on a second processor, the second processor being a processor of a second instruction set architecture that is different from the first instruction set architecture, the second code unit and the first code unit being code units having the same functionality, the second code unit including at least one second vector instruction, according to the at least one first vector instruction and at least one set of mapping relations for indicating a correspondence relation between the first vector instruction of the first processor and the second vector instruction of the second processor. The invention does not need to manually analyze semantics, thereby improving the optimization efficiency of vector optimization.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a vector instruction processing method and device.
Background
In the computer field, the data processing speed of a processor affects the operation speed of the entire apparatus. To improve the data processing efficiency of the processor, the processor needs to be optimized. For example, vector optimization is performed on functions in the processor, and the execution efficiency of the optimized functions is improved, so that the data processing efficiency of the processor is improved.
Therefore, how to improve the efficiency of vector optimization is an urgent problem to be solved.
Disclosure of Invention
The invention provides a vector instruction processing method and device, which are used for improving the efficiency of vector optimization.
In a first aspect, the present invention provides a vector instruction processing method, including:
obtaining a first code unit operable on a first processor, the first processor being a processor of a first instruction set architecture, the first code unit including at least one first vector instruction, the first code unit being a code unit that has been vector optimized;
and generating a second code unit which is executable on a second processor according to the at least one first vector instruction and at least one set of mapping relation, wherein the second processor is a processor of a second instruction set architecture, the second instruction set architecture is different from the first instruction set architecture in instruction set architecture, the second code unit and the first code unit are code units with the same function, the second code unit comprises at least one second vector instruction, and the mapping relation is used for indicating the corresponding relation between the first vector instruction of the first processor and the second vector instruction of the second processor.
Optionally, the method further comprises:
obtaining a first vector instruction set of the first processor and a second vector instruction set of the second processor;
establishing a corresponding relation between first vector instructions in the first vector instruction set and second vector instructions in the second vector instruction set, wherein one first vector instruction corresponds to at least one second vector instruction, or one second vector instruction corresponds to at least one first vector instruction, and parameter information of the first vector instruction is matched with parameter information of the second vector instruction.
Optionally, the parameter information includes at least one of: the number of parameters, the type of the parameters and the value range of the parameters.
Optionally, the method further comprises:
replacing a third code unit running on the second processor with the second code unit, the specification text of the third code unit matching the specification text of the first code unit, the specification text being used to specify the functionality of the code unit.
Optionally, the code unit comprises at least one of: function, class, object, interface.
Optionally, the first processor and the second processor are two different processors of a plurality of processors: x86, ARM, RISC-V, LOONGARCH, POWER ISA, microprocessor MIPS without internal interlocking pipeline.
In a second aspect, the present invention provides a vector instruction processing method, including:
a first code unit obtaining module, configured to obtain a first code unit that is executable on a first processor, where the first processor is a processor of a first instruction set architecture, the first code unit includes at least one first vector instruction, and the first code unit is a code unit that has been vector optimized;
a second code unit generating module, configured to generate a second code unit that is executable on a second processor according to the at least one first vector instruction and at least one set of mapping relationships, where the second processor is a processor of a second instruction set architecture, the second instruction set architecture is an instruction set architecture different from the first instruction set architecture, the second code unit and the first code unit are code units having the same function, the second code unit includes at least one second vector instruction, and the mapping relationships are used to indicate a correspondence relationship between the first vector instruction of the first processor and the second vector instruction of the second processor.
Optionally, the apparatus further comprises:
a vector instruction set fetch module to fetch a first vector instruction set of the first processor and a second vector instruction set of the second processor;
the correspondence relationship establishing module is configured to establish a correspondence relationship between first vector instructions in the first vector instruction set and second vector instructions in the second vector instruction set, where one first vector instruction corresponds to at least one second vector instruction, or one second vector instruction corresponds to at least one first vector instruction, and parameter information of the first vector instruction is matched with parameter information of the second vector instruction.
Optionally, the parameter information includes at least one of: the number of parameters, the type of the parameters and the value range of the parameters.
Optionally, the apparatus further comprises:
a code unit replacing unit, configured to replace a third code unit running on the second processor with the second code unit, where a description text of the third code unit matches a description text of the first code unit, and the description text is used to describe a function of the code unit.
Optionally, the code unit comprises at least one of: function, class, object, interface.
Optionally, the first processor and the second processor are two different processors of a plurality of processors: x86, ARM, RISC-V, LOONGARCH, POWER ISA, the second processor comprising at least one of: there is no microprocessor MIPS with internal interlocking pipeline stages.
In a third aspect, an embodiment of the present invention further provides an electronic device, including: at least one third processor and memory;
the memory stores computer-executable instructions;
the at least one third processor executing the computer-executable instructions stored by the memory causes the electronic device to implement the method of the first aspect.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by an electronic device, the method according to the first aspect is implemented.
In a fifth aspect, an embodiment of the present invention further provides a computer program, where the computer program is configured to implement the method in the first aspect.
The invention provides a vector instruction processing method and a device, wherein the method comprises the following steps: obtaining a first code unit operable on a first processor, the first processor being a processor of a first instruction set architecture, the first code unit including at least one first vector instruction, the first code unit being a code unit that has been vector optimized; and generating a second code unit operable on a second processor according to the at least one first vector instruction and at least one set of mapping relation, wherein the second processor is a processor of a second instruction set architecture, the second instruction set architecture is different from the first instruction set architecture, the second code unit and the first code unit are code units with the same function, the second code unit comprises at least one second vector instruction, and the mapping relation is used for indicating the corresponding relation between the first vector instruction of the first processor and the second vector instruction of the second processor. According to the embodiment of the invention, the second code unit of the second processor to be subjected to vector optimization can be generated according to the corresponding relation between the vector instructions in different instruction architecture sets and the first code unit of the first processor subjected to vector optimization. Therefore, semantics do not need to be analyzed manually, and optimization efficiency is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for processing vector instructions according to an embodiment of the present invention;
fig. 3 and 4 are schematic diagrams illustrating two relationships between parameters of a set of second vector instructions provided by an embodiment of the present application;
FIG. 5 is a block diagram of a vector instruction processing apparatus according to an embodiment of the present invention;
fig. 6 is a block diagram of an electronic device according to an embodiment of the present invention.
With the above figures, certain embodiments of the invention have been illustrated and described in more detail below. The drawings and the description are not intended to limit the scope of the inventive concept in any way, but rather to illustrate it by those skilled in the art with reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
With the rapid development of the computer field, various electronic devices have appeared. Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. Referring to fig. 1, an electronic device may include a processor and a memory. The memory is used to store computer programs, which may also be referred to as computer-executable instructions. The processor is used for reading the computer program from the memory and executing the computer program.
The processor may be of various types, i.e., of different architectures. Processors of different architectures employ different sets of vector instructions, i.e., the different architectures of processors recognize different vector instructions. In order to improve the running efficiency of the computer program, the characteristics of the processor architecture can be fully utilized to carry out vector optimization on the computer program of the processor.
In the prior art, when vector optimization is performed, firstly, semantics in a function needs to be analyzed; the code logic of the same semantics is then implemented with the corresponding vector instructions.
However, the above process of analyzing semantics requires human intervention, resulting in low efficiency of vector optimization.
In order to solve the above technical problem, the embodiment of the present invention considers multiplexing the optimization result of the processor that has undergone vector optimization. In order to multiplex the optimization results, the corresponding relation between vector instructions in different instruction architecture sets is used, so that the vector instructions in the code unit of the processor which is subjected to vector optimization can be replaced by the vector instructions supported by the processor to be subjected to vector optimization. Therefore, semantics do not need to be analyzed manually, and optimization efficiency is improved.
The following describes the technical solution of the present invention and how to solve the above technical problems with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
FIG. 2 is a flowchart illustrating a vector instruction processing method according to an embodiment of the present invention. Referring to fig. 2, the method includes S101 to S102:
s101: a first code unit is obtained that is executable on a first processor, the first processor being a processor of a first instruction set architecture, the first code unit including at least one first vector instruction, the first code unit being a code unit that has been vector optimized.
Wherein the first code unit is any unit in a computer program, including but not limited to: functions, classes, objects, interfaces, etc.
It should be noted that the first vector instruction is a vector instruction supported by the first processor, so that the first code unit formed by the first vector instruction is executable on the first processor.
Wherein the vector instruction is a computer instruction identified by a vector.
S102: and generating a second code unit operable on a second processor according to the at least one first vector instruction and at least one set of mapping relation, wherein the second processor is a processor of a second instruction set architecture, the second instruction set architecture is different from the first instruction set architecture, the second code unit and the first code unit are code units with the same function, the second code unit comprises at least one second vector instruction, and the mapping relation is used for indicating the corresponding relation between the first vector instruction of the first processor and the second vector instruction of the second processor.
The first processor and the second processor use different instruction set architectures, and the first processor and the second processor in the embodiment of the present invention may be any two different processors of the following processors: x86, ARM (advanced RISC machine), RISC (reduced instruction set computing) -V, LOONGARCH, POWER ISA (Performance Optimization With Enhanced RISC), MIPS (microprocessor With interlocked pipeline stages, microprocessor without internal interlocked pipeline stages).
For example, the first processor may be X86 and the second processor may be MIPS. In MIPS, the second code unit may be formed by multiple MSA (MIPS SIMD architecture) instructions, where SIMD is a representation of data. Thus, the second code unit of the MIPS framework is optimized by a vector, namely the SIMD.
Specifically, firstly, a copy of the first code unit is copied to obtain a copy unit; then, for one or a plurality of continuous first vector instructions in the copy unit, at least one second vector instruction corresponding to the first vector instructions is determined, and the at least one second vector instruction replaces the one or the plurality of continuous first vector instructions in the copy unit. This continues until the replica unit acts as the second code unit when each first vector instruction in the replica unit is replaced with a second vector instruction.
The following illustrates the process of converting a first code unit into a second code unit.
Assume that the first code unit includes 10 vector instructions: VI 10-VI 19, such that the first code unit is first copied into a copy unit, which also includes 10 vector instructions: VI10 to VI 19.
For the first vector instruction VI10, since VI10 and VI20 have the same function, VI10 in the copy unit may be replaced with VI 20. Wherein VI20 is a second vector instruction.
For the second to fifth vector instructions VI11 to VI14, since VI11 to VI14 have the same function as VI21, VI11 to VI14 in the copy unit may be replaced with VI 21. Wherein VI21 is a second vector instruction.
For the sixth vector instruction VI15, since VI15 has the same function as VI22 to VI23, VI15 in the copy unit may be replaced with VI22 to VI 23. Wherein VI 22-VI 23 are second vector instructions.
For the sixth vector instruction VI 16-VI 19, since VI 16-VI 19 have the same function as VI 24-VI 26, VI 16-VI 19 in the copy unit may be replaced with VI 24-VI 26. Wherein VI 24-VI 26 are second vector instructions.
Thus, the second code unit obtained according to the above procedure comprises second vector instructions arranged in the following order: VI20, VI21, VI22, VI23, VI24, VI25, VI 26.
And generating corresponding second code units for all the first code units subjected to vector optimization in the first processor according to the process of the first code units. Of course, for a first code unit that is not vector optimized, there is no need to generate a corresponding second code unit.
It can be seen that, since one first vector instruction may correspond to at least one second vector instruction, and/or at least one first vector instruction may correspond to one second vector instruction, and/or at least two first vector instructions correspond to at least two second vector instructions, the number of first vector instructions comprised by the first code unit and the number of second vector instructions comprised by the second code unit may be the same or different.
It will be appreciated that the functionality of the first code unit and the second code unit is the same, since the first vector instruction and the second vector instruction are vector instructions having the same functionality running on different types of processors.
The mapping relationship is created in advance, and in the embodiment of the present disclosure, the mapping relationship may be created through the following steps: firstly, acquiring a first vector instruction set of a first processor and a second vector instruction set of a second processor; then, a corresponding relation between a first vector instruction in the first vector instruction set and a second vector instruction in the second vector instruction set is established, wherein one first vector instruction corresponds to at least one second vector instruction, or one second vector instruction corresponds to at least one first vector instruction, and the parameter information of the first vector instruction is matched with the parameter information of the second vector instruction.
In practical applications, each type of processor is provided with a corresponding vector instruction manual, and vector instructions which can be supported by the type of processor are recorded in the vector instruction manual, so that a vector instruction set of the type of processor can be obtained from the vector instruction manual. For example, a first vector instruction set is obtained from a vector instruction manual of a first processor, and a second vector instruction set is obtained from a vector instruction manual of a second processor.
Wherein the first vector instruction set includes all first vector instructions executable by the first processor and the second vector instruction set includes all second vector instructions executable by the second processor.
The parameter information includes at least one of: the number of parameters, the type of the parameters and the value range of the parameters. So that the matching of the parameter information may include at least one of: parameter quantity matching, parameter type matching and parameter value range matching.
The parameter number matching may be that the number of parameters of the first vector instruction and the second vector instruction are the same.
The parameter type match may be that the parameter type of the first vector instruction is the same as the parameter type of the second vector instruction. For example, all are character-type parameters, or all are numerical-type parameters.
The parameter value range matching may be that the parameter value range of the first vector instruction is the same as the parameter value range of the second vector instruction.
It can be understood that, in order to ensure that the functions of the code units before and after vector optimization are unchanged, it is necessary to ensure that the number, type, and value ranges of the parameters are all matched.
The correspondence between vector instructions is explained below by two examples.
In a first example, for the x86 pblendvb xmm1, xmm2, xmm0 instructions, the effect is to select the data in xmm2 to be placed in the xmm1 corresponding byte, according to the highest order of each byte, based on the data provided in xmm 0. It can be seen that the parameters of the first vector instruction are xmm1, xmm2, xmm0, the input parameters are xmm2 and xmm0, and the output parameter is xmml.
Assume that the 3 vector registers storing xmm1, xmm2, xmm0 are $ w2, $ w3, $ w4, define $ w0 ═ 0, $ w1 ═ 0x 808080808080808080808080808080808080808080808080808080. Thus, a second unit of code that can be generated on MIPS includes the following vector instructions:
the first second vector instruction: min _ u.b $ w7, $ w1, $ w 4.
A second vector instruction: ceq.b $ w5, $ w7, $ w 4.
Third vector instruction: ceq.b $ w6, $ w5, $ w 0.
Fourth vector instruction: and v $ w3, $ w3, $ w 5.
A fifth second vector instruction: and v $ w2, $ w2, $ w 6.
A sixth second vector instruction: or v $ w2, $ w2, $ w 3.
The min _ u.b $ w7, $ w1, $ w4 are used for fetching w1 by byte, and the small unsigned number of w4 is stored in w 7. It can be seen that the command corresponds to input parameters of $ w1 and $ w4, and an output parameter of $ w 7.
The above ceq.b $ w5, $ w7, $ w4 are used to compare w7 and w4 in bytes, writing 0xff in w5 for equality, and writing 0 in w5 for inequality. It can be seen that the command corresponds to input parameters of $ w7 and $ w4, and an output parameter of $ w 5.
The above ceq.b $ w6, $ w5, $ w0 are used to compare the result w5 of the previous step with all 0's, the same byte writing 0xff to $ w6, and the different byte writing 0 to $ w 6. It can be seen that the command corresponds to input parameters of $ w5 and $ w0, and an output parameter of $ w 6.
The above-mentioned and.v $ w3, $ w3, $ w5 are used to write data in w3 and w5 to w3 (bytes of 0xff in w5 correspond to data retention in w 3). It can be seen that the command corresponds to input parameters of $ w3 and $ w5, and an output parameter of $ w 3. The above-mentioned and.v $ w2, $ w2, $ w6 are used to write the data in w2 and w6 into w2 (the byte with 0xff in w6 corresponds to the data reservation in w2, i.e. corresponds to the byte data reservation with 0 in w 5). It can be seen that the command corresponds to input parameters of $ w2 and $ w6, and an output parameter of $ w 2.
The above-mentioned or.v $ w2, $ w2, $ w3 are used to OR $ w2 and $ w3 to get the final result. It can be seen that the command corresponds to input parameters of $ w2 and $ w3, and an output parameter of $ w 2.
The parameter relationship diagram shown in fig. 3 can be obtained according to the sequence of the combination of the six second vector instructions. As can be seen from fig. 3, the input parameters of the second vector instruction group formed by the six second vector instructions are $ w1, $ w4, $ w0, $ w3, the output parameters are $ w2, and the rest parameters are intermediate parameters. Since the input parameters $ w1 and $ w0 are two fixed values, the input parameters thereof are equivalent to $ w4 and $ w3, respectively. Since $ w4 corresponds to xmm0, $ w3 corresponds to xmm2, and $ w2 corresponds to xmm1, the first vector instruction group and the second vector instruction group have the same number of input parameters and the same number of output parameters. In addition, the types of input parameters and output parameters are also the same and are numerical values. In the above examples, the value ranges of the input parameters and the output parameters are not limited.
In a second example, in x86, pclmulqdq xmm3, xmm6 and 0x01 are used for polynomial multiplication without carry, the input parameters are xmm3 and xmm6, the output parameters are xmm3, and 0x01 is used for indicating the type of multiplication, namely without carry.
Thus, xmm3 may be denoted by w3 and xmm6 by w31 in the MIPS. It can be seen that the parameters of the first vector instruction are xmm1, xmm2, xmm0, the input parameters are xmm2 and xmm0, and the output parameter is xmml.
The second code unit may thus be generated to include the following second vector instructions:
the first second vector instruction: ilvev.d $ w29, $ w3, $ w 3.
A second vector instruction: ilvod.d $ w28, $ w3, $ w 3.
Third vector instruction: ilvl.d $ w27, $ w29, $ w 28.
Fourth vector instruction: vmuhp $ w29, $ w27, $ w 31.
A fifth second vector instruction: vmulp $ w28, $ w27, $ w 31.
A sixth second vector instruction: ilvev.d $ w3, $ w29, $ w 28.
The first three instructions ilvev.d $ w29, $ w3, $ w3, ilvod.d $ w28, $ w3, $ w3, ilvl.d $ w27, $ w29, and $ w28 are used to exchange two 64-bit data of w3 for position. The first vector instruction has input parameters of $ w3, output parameters of $ w29, the second vector instruction has input parameters of $ w3, output parameters of $ w28, the third vector instruction has input parameters of $ w28 and $ w29, and the output parameters of $ w 27. Thus the first three have an input parameter of $ w3 and an output parameter of $ w 27.
The above-mentioned vmuhp $ w29, $ w27, $ w31 are used for carryless polynomial multiplication of the high order bits. It can be seen that the fourth vector instruction has input parameters of $ w31 and $ w27 and output parameters of $ w 29.
The above-mentioned vmulp $ w28, $ w27, $ w31 are used for carryless polynomial multiplication on the lower bits. It can be seen that the fifth vector instruction has input parameters of $ w31 and $ w27 and output parameters of $ w 28.
The above ilvev.d $ w3, $ w29, $ w28 are used to take even results for the last 0x01, and ilvod.d takes odd results if 0x10 is used. It can be seen that the fourth vector instruction has input parameters of $ w28 and $ w29 and output parameters of $ w 3.
The parameter relationship diagram shown in fig. 4 can be obtained according to the sequence of the combination of the six second vector instructions. As can be seen from fig. 4, the input parameters of the second vector instruction group formed by the six second vector instructions are $ w3, $ w31, the output parameters are $ w3, and the rest parameters are intermediate parameters. As can be seen from the foregoing description, $ w3 corresponds to xmm3 and $ w31 corresponds to xmm6, such that the first vector command and the second vector command set have the same number of input parameters and the same number of output parameters. In addition, the types of input parameters and output parameters are also the same and are numerical values. In the above examples, the value ranges of the input parameters and the output parameters are not limited.
The embodiment of the invention does not need to establish the mapping relation manually, is favorable for reducing the labor cost and improving the establishing efficiency of the mapping relation. In addition, errors of artificial problems are easy to occur when the mapping relation is created manually, and the error rate can be reduced when the mapping relation is created by a machine, so that the accuracy of the second code unit is improved, and the running success rate of the computer program after vector optimization is further improved.
In practical applications, there may be a portion of the first vector instructions in the first set of vector instructions that are not mappable with the second vector instructions in the second set of vector instructions, and/or there may be a portion of the second vector instructions in the second set of vector instructions that are not mappable with the first vector instructions in the first set of vector instructions. For the first vector instructions which cannot be mapped, the second vector instructions corresponding to the first vector instructions can be generated through semantic analysis. For the second vector instructions which cannot be mapped, the corresponding first vector instructions can be generated through semantic analysis.
Of course, after the second unit of code is generated, the second unit of code may be deployed to a second processor such that the second processor may operate properly for the second unit of code. This may improve the efficiency of the execution of the application of the second code unit by the second processor.
In the prior art, a third code unit on the second processor corresponding to the second code unit without vector optimization may be manually determined, and then the third code unit running in the second processor is replaced with the second code unit. However, this method requires a lot of labor cost, and it is inefficient to find the third code unit. Especially in case of more code units in the second processor, the efficiency is lower.
It should be noted that the third code unit that runs in the second processor herein refers to a code unit that can run in the second processor, and is not a code unit that is running. The running code unit cannot be replaced, and if the replacement may cause the code unit to fail to run, the computer program execution reports an error.
In order to improve the efficiency of finding the third code unit, the embodiment of the present invention may automatically find the third code unit, replace the third code unit running on the second processor with the second code unit, match the description text of the third code unit with the description text of the first code unit, and use the description text for describing the function of the code unit.
Specifically, first, the description texts of all code units operable in the second processor may be matched with the description text of the first code unit one by one to find the best matching code unit as the third code unit.
The description text matching process can be obtained by using text-based matching degree calculation, and the description text of each code unit can be obtained by matching degree with the description text of the first code unit. So that the code unit corresponding to the maximum matching degree is the third code unit. The matching degree between the two explanatory texts may include at least one of the following information: the number and proportion of the same characters. The matching process between the explanatory texts may adopt an existing text matching algorithm, which is not limited by the embodiment of the present invention.
Of course, before the third code unit can be replaced with the second code unit, a determination may also be made as to whether the second code unit is correct, i.e., whether the second code unit is operational on the second processor. Replacing the third code unit with the second code unit if operational; if not, the developer may also be prompted to modify the second code unit. The third code element may be replaced with the second code element if it is run on the second processor after the modification. Therefore, the execution failure of the processor caused by directly replacing the third code unit with the second code unit can be avoided, and the execution success rate of the processor is improved.
It can be seen that, since only a small number of vector instructions may need to be modified for modifying the second code unit, not all of the vector instructions may need to be modified, even though the second code unit needs to be modified by a developer, the embodiment of the present invention may still effectively improve the optimization efficiency compared to manually analyzing semantics to generate the second code unit.
FIG. 5 is a block diagram of a vector instruction processing apparatus according to an embodiment of the present invention. Referring to fig. 5, the vector instruction processing apparatus 200 includes: a first code unit acquisition module 201 and a second code unit generation module 202.
The first code unit obtaining module 201 is configured to obtain a first code unit that is executable on a first processor, where the first processor is a processor of a first instruction set architecture, the first code unit includes at least one first vector instruction, and the first code unit is a code unit that has been subjected to vector optimization.
A second code unit generating module 202, configured to generate a second code unit that is executable on a second processor according to the at least one first vector instruction and at least one set of mapping relationships, where the second processor is a processor of a second instruction set architecture, the second instruction set architecture is an instruction set architecture different from the first instruction set architecture, the second code unit and the first code unit are code units having the same function, the second code unit includes at least one second vector instruction, and the mapping relationships are used to indicate a correspondence relationship between the first vector instruction of the first processor and the second vector instruction of the second processor.
Optionally, the apparatus further comprises:
a vector instruction set fetch module to fetch a first vector instruction set of the first processor and a second vector instruction set of the second processor;
the correspondence relationship establishing module is configured to establish a correspondence relationship between first vector instructions in the first vector instruction set and second vector instructions in the second vector instruction set, where one first vector instruction corresponds to at least one second vector instruction, or one second vector instruction corresponds to at least one first vector instruction, and parameter information of the first vector instruction is matched with parameter information of the second vector instruction.
Optionally, the parameter information includes at least one of: the number of parameters, the type of the parameters and the value range of the parameters.
Optionally, the apparatus further comprises:
a code unit replacing unit, configured to replace a third code unit running on the second processor with the second code unit, where a description text of the third code unit matches a description text of the first code unit, and the description text is used to describe a function of the code unit.
Optionally, the code unit comprises at least one of: function, class, object, interface.
Optionally, the first processor and the second processor are two different processors of a plurality of processors: x86, ARM, RISC-V, LOONGARCH, POWER ISA, MIPS.
The apparatus provided in the embodiment of the present invention may be used to implement the technical solution of the method embodiment shown in fig. 2, and the implementation principle and the technical effect are similar, which are not described herein again.
It is worth to be noted that the above units and modules are implemented in a circuit building manner and integrated in the second processor.
Fig. 6 is a block diagram illustrating an electronic device 600 according to an exemplary embodiment of the invention. The electronic device 600 comprises a memory 602 and at least one third processor 601.
The memory 602 stores, among other things, computer-executable instructions.
The at least one third processor 601 executes computer-executable instructions stored by the memory 602 to cause the electronic device to implement the method of fig. 2 as previously described.
In addition, the electronic device may further include a receiver 603 and a transmitter 604, the receiver 603 being configured to receive information from the remaining apparatuses or devices and forward the information to the third processor 601, and the transmitter 604 being configured to transmit the information to the remaining apparatuses or devices.
It will be appreciated that the third processor herein is a processor for performing the method of fig. 2, and may be any processor. For example, the first processor may be a second processor or may be the remaining processors.
The electronic device is an embodiment of an apparatus corresponding to the method shown in fig. 2, and specifically, reference may be made to detailed description of the embodiment of the method shown in fig. 2, which is not described herein again.
An embodiment of the present invention further provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by an electronic device, the method shown in fig. 2 is implemented.
An embodiment of the present invention further provides a computer program, where the computer program is configured to implement the method shown in fig. 2.
The foregoing description is only exemplary of the preferred embodiments of the invention and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents is encompassed without departing from the spirit of the disclosure. For example, the above features and (but not limited to) features having similar functions disclosed in the present invention are mutually replaced to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the invention. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and are not necessarily intended to limit the order or sequence of any particular one, Unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.
Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.
The term "module," as used herein, refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.
Claims (12)
1. A method for vector instruction processing, comprising:
obtaining a first code unit operable on a first processor, the first processor being a processor of a first instruction set architecture, the first code unit including at least one first vector instruction, the first code unit being a code unit that has been vector optimized;
and generating a second code unit which is executable on a second processor according to the at least one first vector instruction and at least one set of mapping relation, wherein the second processor is a processor of a second instruction set architecture, the second instruction set architecture is different from the first instruction set architecture in instruction set architecture, the second code unit and the first code unit are code units with the same function, the second code unit comprises at least one second vector instruction, and the mapping relation is used for indicating the corresponding relation between the first vector instruction of the first processor and the second vector instruction of the second processor.
2. The method of claim 1, further comprising:
obtaining a first vector instruction set of the first processor and a second vector instruction set of the second processor;
establishing a corresponding relation between first vector instructions in the first vector instruction set and second vector instructions in the second vector instruction set, wherein one first vector instruction corresponds to at least one second vector instruction, or one second vector instruction corresponds to at least one first vector instruction, and parameter information of the first vector instruction is matched with parameter information of the second vector instruction.
3. The method of claim 2, wherein the parameter information comprises at least one of: the number of parameters, the type of the parameters and the value range of the parameters.
4. The method according to any one of claims 1 to 3, further comprising:
replacing a third code unit running on the second processor with the second code unit, the specification text of the third code unit matching the specification text of the first code unit, the specification text being used to specify the functionality of the code unit.
5. The method of any of claims 1 to 3, wherein the code unit comprises at least one of: function, class, object, interface.
6. The method of any of claims 1 to 3, wherein the first processor and the second processor are two different processors of a plurality of processors: x86, ARM, RISC-V, LOONGARCH, POWER ISA, microprocessor MIPS without internal interlocking pipeline.
7. A vector instruction processing apparatus, comprising:
a first code unit obtaining module, configured to obtain a first code unit that is executable on a first processor, where the first processor is a processor of a first instruction set architecture, the first code unit includes at least one first vector instruction, and the first code unit is a code unit that has been vector optimized;
a second code unit generating module, configured to generate a second code unit that is executable on a second processor according to the at least one first vector instruction and at least one set of mapping relationships, where the second processor is a processor of a second instruction set architecture, the second instruction set architecture is an instruction set architecture different from the first instruction set architecture, the second code unit and the first code unit are code units having the same function, the second code unit includes at least one second vector instruction, and the mapping relationships are used to indicate a correspondence relationship between the first vector instruction of the first processor and the second vector instruction of the second processor.
8. The apparatus of claim 7, further comprising:
a vector instruction set fetch module to fetch a first vector instruction set of the first processor and a second vector instruction set of the second processor;
the correspondence relationship establishing module is configured to establish a correspondence relationship between first vector instructions in the first vector instruction set and second vector instructions in the second vector instruction set, where one first vector instruction corresponds to at least one second vector instruction, or one second vector instruction corresponds to at least one first vector instruction, and parameter information of the first vector instruction is matched with parameter information of the second vector instruction.
9. The apparatus of claim 7 or 8, further comprising:
a code unit replacing unit, configured to replace a third code unit running on the second processor with the second code unit, where a description text of the third code unit matches a description text of the first code unit, and the description text is used to describe a function of the code unit.
10. An electronic device, comprising: at least one third processor and memory;
the memory stores computer-executable instructions;
the at least one third processor executing the computer-executable instructions stored by the memory causes the electronic device to implement the method of any of claims 1-6.
11. A computer-readable storage medium having computer-executable instructions stored thereon for implementing the method of any one of claims 1 to 6 when the computer-executable instructions are executed by an electronic device.
12. A computer program for implementing the method according to any one of claims 1 to 6.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111506929.8A CN114237711A (en) | 2021-12-10 | 2021-12-10 | Vector instruction processing method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111506929.8A CN114237711A (en) | 2021-12-10 | 2021-12-10 | Vector instruction processing method and device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN114237711A true CN114237711A (en) | 2022-03-25 |
Family
ID=80754857
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202111506929.8A Pending CN114237711A (en) | 2021-12-10 | 2021-12-10 | Vector instruction processing method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114237711A (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130346781A1 (en) * | 2012-06-20 | 2013-12-26 | Jaewoong Chung | Power Gating Functional Units Of A Processor |
| CN103946797A (en) * | 2011-12-06 | 2014-07-23 | 英特尔公司 | System, apparatus and method for translating vector instructions |
| CN106293631A (en) * | 2011-09-26 | 2017-01-04 | 英特尔公司 | For providing vector scatter operation and the instruction of aggregation operator function and logic |
-
2021
- 2021-12-10 CN CN202111506929.8A patent/CN114237711A/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106293631A (en) * | 2011-09-26 | 2017-01-04 | 英特尔公司 | For providing vector scatter operation and the instruction of aggregation operator function and logic |
| CN103946797A (en) * | 2011-12-06 | 2014-07-23 | 英特尔公司 | System, apparatus and method for translating vector instructions |
| US20130346781A1 (en) * | 2012-06-20 | 2013-12-26 | Jaewoong Chung | Power Gating Functional Units Of A Processor |
Non-Patent Citations (2)
| Title |
|---|
| 付永华: "《C++高级语言程序设计》", 31 March 2007, 中国电力出版社, pages: 10 * |
| 张秀宏: "《WebAssembly原理与核心技术》", 31 October 2020, 机械工业出版社, pages: 218 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109408528B (en) | Database script generation method and device, calculation device and storage medium | |
| US10579349B2 (en) | Verification of a dataflow representation of a program through static type-checking | |
| CN112148585A (en) | Method, system, article of manufacture, and apparatus for code review assistance for dynamic type languages | |
| CN109947431B (en) | Code generation method, device, equipment and storage medium | |
| CN110941655A (en) | A data format conversion method and device | |
| CN107515739B (en) | Method and device for improving code execution performance | |
| CN113157651B (en) | Method, system, equipment and medium for renaming resource files of android project in batches | |
| CN106407111A (en) | Terminal test apparatus, terminal test device and variable maintenance method | |
| CN117827523B (en) | Model exception handling method and device, electronic equipment and storage medium | |
| CN111596970B (en) | Method, device, equipment and storage medium for dynamic library delay loading | |
| CN112230995B (en) | Instruction generation method and device and electronic equipment | |
| US20240311686A1 (en) | Model compiling method and apparatus, and model running system | |
| JP5440287B2 (en) | Symbolic execution support program, method and apparatus | |
| CN114237711A (en) | Vector instruction processing method and device | |
| CN116893819A (en) | Program compiling method, device, chip, electronic device and storage medium | |
| CN116881133A (en) | Method and system for generating full-scene test case set based on message log | |
| CN110908896A (en) | Test method and device based on decision tree | |
| CN112306502B (en) | Code generation method and device | |
| US20090235223A1 (en) | Program generation apparatus and program generation method | |
| CN115951936B (en) | Chip adaptation method, device, equipment and medium of vectorization compiler | |
| KR102500395B1 (en) | Apparatus and method for repairing bug source code for program | |
| CN113031952A (en) | Method and device for determining execution code of deep learning model and storage medium | |
| CN112905181B (en) | Model compiling and running method and device | |
| CN118656082B (en) | A parallel computing program optimization method, device, equipment and storage medium | |
| CN113934639B (en) | Data processing method, device, readable medium and electronic device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |