[go: up one dir, main page]

CN109947427B - Method and apparatus for converting non-serial-parallel control flow graph into data flow - Google Patents

Method and apparatus for converting non-serial-parallel control flow graph into data flow

Info

Publication number
CN109947427B
CN109947427B CN201811344350.4A CN201811344350A CN109947427B CN 109947427 B CN109947427 B CN 109947427B CN 201811344350 A CN201811344350 A CN 201811344350A CN 109947427 B CN109947427 B CN 109947427B
Authority
CN
China
Prior art keywords
node
serial
instruction
parallel
operand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811344350.4A
Other languages
Chinese (zh)
Other versions
CN109947427A (en
Inventor
Y·张
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Publication of CN109947427A publication Critical patent/CN109947427A/en
Application granted granted Critical
Publication of CN109947427B publication Critical patent/CN109947427B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/433Dependency analysis; Data or control flow analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/453Data distribution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/456Parallelism detection

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

本申请提供了用于将非串并行控制流图转换为数据流的方法和设备。用于将非串并行控制流图转换为数据流的方法和设备示例设备包括:节点分析器,该节点分析器用于检测序列化代码中的非串并行节点;以及指令生成器,该指令生成器用于:生成包括消耗操作数的、针对与检测到的非串并行节点相关联的在先节点的指令;生成用于组合针对在先节点生成的指令的结果的组合指令;并且输出组合指令和针对在先节点生成的指令,以便生成数据流代码。

The present application provides a method and apparatus for converting a non-serial parallel control flow graph into a data flow. An example apparatus for converting a non-serial parallel control flow graph into a data flow includes: a node analyzer for detecting non-serial parallel nodes in serialized code; and an instruction generator for: generating an instruction for a preceding node associated with the detected non-serial parallel node, including a consumed operand; generating a combined instruction for combining the results of the instructions generated for the preceding node; and outputting the combined instruction and the instruction generated for the preceding node to generate data flow code.

Description

Method and apparatus for converting non-serial-parallel control flow graph into data flow
Technical Field
The present disclosure relates generally to compilers in computing systems, and more particularly, to methods and apparatus for improving compiler efficiency in computing systems by converting non-serial-parallel control flow graphs into data streams.
Background
Many computing systems operate in accordance with a control flow architecture. In a control flow architecture, execution of instructions of a program is driven by a program counter that steps through the instructions of the program. In other words, the execution order of the instructions of the program is defined by the structure of the program itself. In some cases, the control flow architecture may operate improperly when attempting to implement parallel processing. For example, a program may declare an instruction to be executed even though its inputs (e.g., operands) have not been updated by a parallel operation instruction.
Some computing systems utilize a data flow architecture. The dataflow architecture is not program-defined instruction execution order driven. Alternatively, the dataflow architecture executes instructions according to the availability of inputs (e.g., operands) of the instructions. For example, if an instruction has three operands, a computing system utilizing a dataflow architecture will execute the instruction once the three operands are provided to the instruction by the other instruction(s) on which the instruction depends. Thus, the data flow architecture may be executed in a highly parallel environment without fear that an instruction will execute before the data dependencies of the instruction are updated/satisfied. For example, the data flow architecture may be used in a large-scale computing system that uses a large number of processing elements to highly parallelize processing.
Drawings
FIG. 1 is a block diagram of an example system for transcoding a control stream into a data stream.
Fig. 2 is a block diagram of an example implementation of the non-serial-to-parallel converter of fig. 1.
FIG. 3 illustrates an example control flow graph of serial-parallel.
FIG. 4 illustrates an example control flow graph including nodes that are not serially parallel.
Fig. 5 is a flowchart representative of example machine readable instructions for implementing the non-serial-to-parallel converter of fig. 3 and/or 4.
FIG. 6 is a block diagram of an example processing device that may execute the instructions of FIG. 5 to implement the non-serial-to-parallel converter of FIG. 3 and/or FIG. 4.
The figures are not drawn to scale. As used in this patent, any component (e.g., layer, film, region, or plate) that is positioned (e.g., positioned, located, disposed, or formed on, etc.) in any manner indicates that the referenced component is in contact with another component, or that the referenced component is above the other component and that one or more intervening components are located between the referenced component and the other component. The recitation that any component is in contact with another means that there are no intervening components between the two components.
Detailed Description
The serial-parallel graph is widely used for application engineering and algorithm research theory. If the graph is a serial-parallel graph, some of the graph problems, typically NP-Complete, can be resolved in linear time. Highly efficient algorithms have been developed to generate highly parallel data stream code for control flow graphs that are serial-parallel graphs.
The directed graph G is two-terminal serial-parallel and has endpoints s and t if it can be generated by the following sequence of operations:
1. A new graph is created that consists of a single edge pointing from s to t.
2. Given two-terminal serial-parallel graphs X and Y, with endpoints sX, tX, sY and tY, a new graph g=p (X, Y) is formed by identifying s=sx=sy and t=tx=ty. This is called a parallel combination of X and Y.
3. Given two-terminal serial-parallel graphs X and Y, with endpoints sX, tX, sY and tY, a new graph g=s (X, Y) is formed by identifying s=sx, tx=sy and t=ty. This is called a serial combination of X and Y.
An undirected graph is a two-terminal serial-parallel graph with endpoints s and t if the undirected graph forms a directed two-terminal serial-parallel graph with endpoints s and t for some orientations of the edges of the undirected graph. If for some two vertices s and t, the directed or undirected graph is serial-parallel with both ends of those two endpoints.
While the data flow architecture may be advantageously used in a computing system, not all programs are designed and/or capable of being written directly in data flow code. For example, existing programs may be implemented for control flow architecture. The methods and apparatus disclosed herein facilitate converting control flow sequences to data flow programs. In particular, the methods and apparatus disclosed herein identify a non-serial-parallel control stream (e.g., the example non-serial-parallel stream of fig. 4) in a program and convert the non-serial-parallel control stream to a data stream code. Thus, in some disclosed examples, legacy programs with complex control flows may run in a highly parallel data flow architecture.
FIG. 1 is a block diagram of an example system 100 that includes example input code 102 compiled by an example compiler 103 to generate example dataflow assembly code 112. For example, system 100 may be implemented within a compiler to convert control flow code into data flow code.
The input code 102 of the illustrated example is a serialization instruction software program that is converted and compiled into assembly code (e.g., for execution on a computing platform). The example input code 102 is written in the C programming language. Or the input code may be written in any other language such as c++, fortran, java, etc.
The example compiler 103 includes an example Intermediate Representation (IR) transformer 304, an example serialized instruction transformer 106, an example control flow to data flow transformer 108, and an example non-serial-to-parallel transformer 110. The compiler 103 of the illustrated example is a LLVM compiler that has been modified to include a non-serial-to-parallel converter 110. Or any other type of pre-existing or newly created compiler may be used.
The IR transformer 104 of the illustrated example transforms the example input code 102 into an intermediate representation code. The example IR transformer 104 transforms the input code 102 into an LLVM intermediate representation. Or the IR transformer 104 may transform the input code 102 into any type of intermediate representation (e.g., standard portable intermediate representation, java bytecode, public intermediate representation, etc.) that is recognized by a compiler used in the system 100.
The example serialized instruction converter 106 converts the intermediate representation from the IR converter 104 into serialized control flow machine instructions for the target machine. According to the illustrated example, the serialization instruction converter 104 converts the intermediate representation code, one portion at a time, in order to facilitate verification and conversion to dataflow instructions.
The example control flow to data flow converter 108 converts the control flow serialization instructions into data flow assembly code 112. The example control stream to data stream converter 108 converts serial-parallel control stream codes into data stream codes using well known techniques.
The example control flow to data flow converter 108 uses two basic data flow instructions SWITCH and PICK.
The PICK instruction PICKs values from two registers based on assertions (predictes). PICK has the format%r=pick%r0,%r1,%r2, where%r is the sign of the virtual register in LLVM. Such instructions may be written in the C programming language as%r= (%r0);
The SWITCH instruction sends an operand to one of the two identified registers based on the predicate. The SWITCH has the format%r1,%r2=switch%r0,%r3, where the format sends the value in% r3 to either% r1 or% r2 based on the assertion in% r 0. In the C language, this format is similar to if (%r0)%r2=%r3else% r1=%r3.
For example, FIG. 3 illustrates an example control flow graph 300 of serial-parallel. In the illustrated example, the first node 302 assigns a value of X to either X11 or X12 based on the value of C1. (e.g., if C1 is false, x11=x, and if C1 is true, x12=x.) similarly, when C2 is false, the second node 304 assigns the value X11 to X21, and when C2 is true, the second node 304 assigns the value X11 to X22. When C3 is false, the third node 306 assigns the value X12 to X31, and when C3 is true, the third node 306 assigns the value X12 to X32. Finally, fourth node 308 is a Phi operation operating on X21, X22, X31, and X32. The Phi function is a pseudo function in the form of a static single assignment. The Phi function will assign X4 a value (e.g., one of X21, X22, X31, or X32) that controls the fourth node 308.
According to the example control flow graph 300, the second node 304 is serial with the first node 302 and parallel with the third node 306. Thus, because control flow graph 300 is serial-parallel, control flow graph 300 may be easily converted to a data stream using well-known techniques.
For example, in transcoding a control stream represented by fig. 3 to a data stream code, the example control stream-to-data stream converter will generate the PICK tree x2=pickc2, X21, X22, x3=pickc3, X31, X32, and x4=pickc1, X2, X3.
In practice, many control flow graphs are serial-parallel or can be easily reduced to serial-parallel using well-known techniques. However, some control flow graphs are not easily convertible. For example, goto ("go"), indirect jumps, or optimizations such as inline function calls and expansion loops may create very complex control flow graphs that cannot be reduced to serial-parallel graphs. For example, fig. 4 illustrates an example control flow graph 400 generated using a goto command, the example control flow graph 400 including a first node 402, a second node 404, a third node 406, a fourth node 408, a fifth node 410, and a sixth node 412. The control flow graph 400 of fig. 4 is non-serial-parallel in that the example first node 402 and the example second node 404 are not added to the graph using serial or parallel combinations.
The example control flow to data flow converter 108 also includes an example non-serial-to-parallel converter 110 to detect non-serial-to-parallel code within the control flow serialization code from the serialization instruction converter 106 and convert the non-serial-to-parallel code into data flow code. For example, when applied to the instruction illustrated in FIG. 4, the non-serial-to-parallel converter 110 detects that the instruction includes non-serial-to-parallel code that cannot be converted to parallel optimized data stream code according to existing methods. For example, in calculating the output value (out-going value) of X from the example fourth node 408, evaluation of the predicate value C2 alone is insufficient. It is also necessary to evaluate C3. In addition, the order of evaluation is important because if a branch to C2 is taken, then C3 will have no value in the data flow machine. Complex logic combinations of C2 and C3 or evaluations of both branches are needed to calculate X4 in the fourth node 408 and also for X5 in the fifth node 410, both of which waste power.
To facilitate converting non-serial-parallel code into data stream code that may be used in, for example, a highly parallel environment, the example non-serial-parallel converter 110 utilizes a consumption operand, any value of which is consumed by the system. An example consuming operand is% ign, which may be referred to as a ignore operand. The value assigned to the consuming operand is not stored. According to the illustrated example, the non-serial-to-parallel converter 110 includes a consume operand in the PICK instruction to consider branches of the node tree that may be unrelated to determining the particular value.
For example, non-serial-to-parallel converter 110 may generate a PICK tree for determining the value of X4 in example fourth node 408, x2=pickc2, x21% ign, x3=pickc3, X31,% ign, and x4=pickc1, X2, X3.
Fig. 2 is a block diagram of an example implementation of the non-serial-to-parallel converter 100 of fig. 1. The example non-serial-to-parallel converter 110 of fig. 2 includes an example non-serial-to-parallel detector 202, an example node analyzer 204, and an example instruction generator 206.
The example non-serial-to-parallel detector 202 analyzes the control stream serialization code processed by the example control stream-to-data stream converter 108 to identify non-serial-to-parallel code. The example non-serial detector 202 uses a conceptual Control Dependency Graph (CDG) to detect non-serial-parallel graphs. In a conventional control flow graph, node B relies on node a for control if node B is on one path from node a to the egress (sink) node of the control flow graph and there are other paths from node a to the egress without going through node B. In the corresponding control dependency graph, there are edges from A to B. Node a is referred to as the parent node of node B. The example non-serial-parallel detector 202 builds a control dependency graph for a control flow graph of an input program. When a CDG has the property that each node has one and only one control depends on the parent node, the graph is serial-parallel. The example non-serial-parallel detector 202 identifies a non-serial-parallel graph when at least one node does not have one and only one control-dependent parent node (e.g., more than one parent node, zero parent nodes, etc.).
For example, the non-serial-parallel detector 202 may apply a set of rules to identify non-serial-parallel codes. In some examples, the non-serial-parallel detector 202 may determine that the code is not serial-parallel (e.g., does not satisfy the rules of serial-parallel code). In other examples, the non-serial-parallel detector 202 may determine that the code is non-serial-parallel (e.g., satisfies a rule of non-serial-parallel code). For example, when the code includes a node that is neither in serial nor parallel with other nodes, the non-serial-parallel detector 202 may determine that the code is non-serial-parallel. For example, codes that do not meet the following rules may be classified as non-serial-parallel:
Given two-terminal serial-parallel graphs X and Y, with endpoints sX, tX, sY and tY, a new graph g=p (X, Y) is formed by identifying s=sx=sy and t=tx=ty. This is called a parallel combination of X and Y.
Given two-terminal serial-parallel graphs X and Y, with endpoints sX, tX, sY and tY, a new graph g=s (X, Y) is formed by identifying s=sx, tx=sy and t=ty. This is called a serial combination of X and Y.
When the example non-serial-parallel detector 202 identifies non-serial-parallel code, the non-serial-parallel detector 202 triggers analysis by the example node analyzer 204.
The example node analyzer 204 of the illustrated example analyzes the code to identify non-serial-parallel node(s). For example, node analyzer 204 may identify nodes that are not serial and/or parallel. The example node analyzer 204 triggers the example instruction generator 206 to determine PICK instructions for the non-serial-parallel nodes.
The example instruction generator 206 generates PICK instructions for converting non-serial-parallel control stream serialization code into data stream code. The example instruction generator 206 generates a dataflow code by analyzing non-serial-parallel nodes and preceding nodes to generate a PICK instruction that includes a consume operand (e.g., an ignore operand such as%ign). Although the illustrated example uses PICK instructions, any instruction of a compiler that selects values from operands based on assertions may be utilized.
When analyzing a node and the preceding nodes of that node, the example instruction generator 206 generates a PICK instruction for each preceding node and sets a first operand to a value that may propagate to a non-serial-parallel node and sets a second operand to a consuming operand (e.g., because a second possible path from the preceding node does not propagate to the non-serial-parallel node). For example, when generating a PICK instruction for the fourth node 408 of fig. 4, for the second node 404 of fig. 4, the instruction generator 206 generates a PICK instruction x2=pickc2, X21,% ign. In this example, the second output from the second node 404 does not propagate to the fourth node 408 and is therefore set to consume operands in order to ensure that the PICK instruction is executed and does not propagate a value to X4 when C2 is true, thereby remaining consistent with the desired operation programmed in the input code. Thus, instruction generator 206 may generate a PICK tree for fourth node 408 of X2=PICK C2, X21% ign, X3=PICK C3, X31,% ign, and X4=PICK C1, X2, X3.
In some examples, the instruction generator 206 may utilize a white value such as zero (zero) where the processing architecture on which the code is to be executed does not propagate the consuming operands, and may include a SWITCH instruction to consume the zero if execution of the code branch is far from the node. For example, for the fourth node 408, the instruction generator may generate the following dataflow code:
X2=PICK C2,X21,0
X3=PICK C3,X31,0
X’4=PICK C1,X2,X3
e2 = | C1& & C2// take branch C2- > B5
E3 =c1 & & C3// take branch C2- > B5
E=E1||E2
X4,%ign=SWITCH E,X’4
In this example, & is a shorted logical AND and || is a shorted logical OR. Since only the second operand is evaluated if the first operand does not determine the output of the operator, the short-circuit nature of those operators saves power consumption. Furthermore, the white value (e.g., zero) never changes when propagating in the PICK tree, which also consumes less power during execution.
Although an example manner of implementing the non-serial-to-parallel converter 110 is illustrated in fig. 1, one or more of the elements, processes, and/or devices illustrated in fig. 1 may be combined, split, rearranged, omitted, eliminated, and/or implemented in any other way. Further, the example non-serial-to-parallel detector 202, the example node analyzer 204, the example instruction generator 206, and/or, more generally, the example non-serial-to-parallel converter 110 may be implemented by hardware, software, firmware, and/or any combination of hardware, software, and/or firmware. Thus, for example, any of the example non-serial-to-parallel detector 202, the example node analyzer 204, the example instruction generator 206, and/or, more generally, the example non-serial-to-parallel converter 110 may be implemented by one or more analog or digital circuits, logic circuits, programmable processor(s), application Specific Integrated Circuit (ASIC), programmable logic device(s) (PLD), and/or field programmable logic device(s) (FPLD). When reading any of the apparatus claims or system claims of this patent to cover a purely software and/or firmware implementation, at least one of the example non-serial-to-parallel detector 202, the example node analyzer 204, the example instruction generator 206, and/or, more generally, the example non-serial-to-parallel converter 110 is expressly defined herein to include a non-transitory computer-readable storage device or storage disk, such as a memory, digital Versatile Disk (DVD), compact Disk (CD), blu-ray disk, etc., that includes the software and/or firmware. Further, the example non-serial-to-parallel converter 110 may include one or more elements, processes, and/or devices in addition to, or in lieu of, those illustrated in fig. 2, and/or may include more than one of any or all of the illustrated elements, processes, and devices.
A flowchart representative of example machine readable instructions for implementing the non-serial-to-parallel converter 110 is shown in fig. 5. In an example, the machine readable instructions include a program for execution by a processor, such as the application processor 601 of fig. 6 and/or the processor 612 shown in the example processor platform 600 discussed below in connection with fig. 6. Although the program may be embodied in software stored on a non-transitory computer readable storage medium, such as a CD-ROM, floppy disk, hard drive, digital Versatile Disk (DVD), blu-ray disk, or memory associated with the processor 612, the entire program and/or parts thereof could alternatively be executed by a device other than the processor 612 and/or embodied in firmware or dedicated hardware. Further, while the example program is described with reference to the flowchart illustrated in fig. 5, many other methods of implementing the example non-serial-to-parallel converter 110 may alternatively be used. For example, the order of execution of the blocks may be changed, and/or some of the blocks described may be changed, eliminated, or combined. Additionally or alternatively, any or all of the blocks may be implemented by one or more hardware circuits (e.g., discrete and/or integrated analog and/or digital circuits, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), comparators, operational amplifiers (op-amps), logic circuits, etc.) structured to perform the corresponding operations without executing software or firmware.
As mentioned above, the example process of fig. 5 may be implemented using encoded instructions (e.g., computer and/or machine readable instructions) stored on a non-transitory computer and/or machine readable medium, such as a hard disk drive, a flash memory, a read-only memory, a compact disk, a digital versatile disk, a cache, a random access memory, and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, during brief instances, during temporary buffering, and/or during caching of the information). As used herein, the term "non-transitory computer-readable storage medium" is expressly defined to include any type of computer-readable storage device and/or storage disk and to exclude propagating signals and to exclude transmission media. "comprising" and "including" (and all forms and tenses thereof) are used herein as open-ended terms. Thus, whenever a claim lists any content followed by any form of "comprising" or "including" (e.g., including, containing, etc.), it is to be understood that additional elements, items, etc. may be present without exceeding the scope of the corresponding claim. As used herein, when the phrase "at least" is used as a transitional term in the preamble of a claim, it is open-ended in the same manner as the terms "comprising" and "including.
The process of FIG. 5 begins when the non-serial-to-parallel detector 202 detects a non-serial-to-parallel flow graph being processed by the example control flow to data flow converter 108 (block 502). For example, the non-serial-parallel detector 202 may apply one or more rules to the control flow serialization code to test the non-serial-parallel nodes. In response to detecting the non-serial-parallel flow graph, the example node analyzer 204 detects a non-serial-parallel node (block 504). For example, when processing the code illustrated in fig. 4, the example node analyzer 204 may detect that the example fourth node 408 is a non-serial-parallel node.
In response to detecting the non-serial-parallel node, the example node analyzer 204 selects a node in the serialization code that precedes the non-serial-parallel node (block 506). For example, when the fourth node 408 of FIG. 4 is processed as a non-serial-parallel node, the node analyzer 204 selects the example second node 404 as the prior node. The example instruction generator 206 generates a PICK instruction with an ignore operand for the selected prior node (block 508). For example, when the second node 404 of fig. 4 is processed as the previous node of the fourth node 408, the instruction generator 206 generates x2=pickc2, X21,% ign. Although the example instruction generator 206 generates PICK instructions, any similar instructions for a particular compiler environment employing consuming operands may be utilized. For example, any instruction that ensures evaluation of assertions with consuming operands (e.g., ignore operands, white values, etc.) may be utilized.
The example node analyzer 204 determines if another prior node exists (block 510). When there is another prior node, control returns to block 506 to generate a PICK instruction for the prior node. For example, after processing the second node 404, a third node 406 may be processed.
When there is no other prior node (block 510), the example instruction generator 206 generates an instruction to combine the results of the PICK instruction from the prior node. According to the illustrated example, instruction generator 206 generates a PICK instruction to combine the results to a prior node. For example, when generating instructions for the fourth node 408, the example instruction generator 206 generates x4=pickc1, X2, X3 after generating PICK instructions for the second node 404 and the third node 406.
The example node analyzer 204 determines if another non-serial-parallel node is present in the code being analyzed (block 514). When there is another non-serial-parallel node, control returns to block 502 to process the next node to generate the data stream assembly code 112. When there is no other non-serial-parallel node, the example instruction generator 206 outputs the generated dataflow code (block 516), and the routine of FIG. 5 ends.
While the foregoing examples generate instructions for an environment in which consuming operands are propagated, in some environments, such consuming operands are not propagated. In an environment that does not support propagation of the consuming operands, the consuming operands may be replaced with a white value such as zero (e.g., x2=pickc2, X21,0 and x3=pickc3, X31, 0). In addition, the instructions for combining the elements (block 512) may include several instructions for selecting the appropriate value. For example, the combined instruction may store values in temporary registers (X '4 = PICK C1, X2, X3), and may select final values using switches to ensure that white values are consumed (e2= | c1& & C2; e3 = c1& & C3; E = e1|e2; and X4,% ign = SWITCH, X' 4).
Fig. 6 is a block diagram of an example processor platform 600 capable of executing the instructions of fig. 5 to implement the non-serial-to-parallel converter 110 of fig. 1. The processor platform 600 may be, for example, a server, a personal computer, a mobile device (e.g., a cellular telephone, a smart phone, a tablet device such as an iPad TM), a Personal Digital Assistant (PDA), an internet appliance, or any other type of computing device.
The processor platform 600 of the illustrated example includes a processor 612. The processor 612 of the illustrated example is hardware. For example, the processor 612 may be implemented by one or more integrated circuits, logic circuits, microprocessors or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor-based (e.g., silicon-based) device. In this example, the processor 612 implements the non-serial-parallel detector 202, the node analyzer 204, and the instruction generator 206.
The processor 612 of the illustrated example includes a local memory 613 (e.g., cache). The processor 612 of the illustrated example communicates with a main memory including a volatile memory 614 and a non-volatile memory 616 via a bus 618. The volatile memory 614 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), dynamic Random Access Memory (DRAM), RAMBUS Dynamic Random Access Memory (RDRAM), and/or any other type of random access memory device. The non-volatile memory 616 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 614, 616 is controlled by a memory controller.
The processor platform 600 of the illustrated example also includes interface circuitry 620. Interface circuit 620 may be implemented by any type of interface standard, such as an ethernet interface, a Universal Serial Bus (USB), and/or a PCI express interface.
In the illustrated example, one or more input devices 622 are connected to the interface circuit 620. Input device(s) 622 permit a user to input data and/or commands into processor 612. The input device(s) may be implemented by, for example, an audio sensor, microphone, camera (stationary or video), keyboard, buttons, mouse, touch screen, trackpad, trackball, isocentre mouse, and/or voice recognition system.
One or more output devices 624 are also connected to the interface circuit 620 of the illustrated example. The output device 624 may be implemented, for example, by a display device (e.g., a Light Emitting Diode (LED), an Organic Light Emitting Diode (OLED), a liquid crystal display, a cathode ray tube display (CRT), a touch screen, a haptic output device, a printer, and/or speakers). Thus, the interface circuit 620 of the illustrated example typically includes a graphics driver card, a graphics driver chip, and/or a graphics driver processor.
The interface circuit 620 of the illustrated example also includes communication devices such as a transmitter, receiver, transceiver, modem, and/or network interface card to facilitate exchange of data with external machines (e.g., any kind of computing device) via a network 626 (e.g., an ethernet connection, digital Subscriber Line (DSL), telephone line, coaxial cable, cellular telephone system, etc.).
The processor platform 600 of the illustrated example also includes one or more mass storage devices 628 for storing software and/or data. Examples of such mass storage devices 628 include floppy disk drives, hard disk drives, compact disk drives, blu-ray disc drives, RAID systems, and Digital Versatile Disk (DVD) drives.
The encoded instructions 632 of fig. 5 may be stored in the mass storage device 628, in the volatile memory 614, in the non-volatile memory 616, and/or on a removable tangible computer-readable storage medium, such as a CD or DVD.
Example methods, apparatus, systems, and articles of manufacture to convert non-serial-parallel control flow graphs to data flows are disclosed herein. Further examples and combinations thereof include the following.
Example 1 includes an apparatus for converting control flow code into data flow code, the apparatus comprising a node analyzer to detect non-serial-parallel nodes in a serialized code, and an instruction generator to generate an instruction for a preceding node including a consuming operand, generate a combined instruction to combine results of the instructions, and output the combined instruction and the instruction to generate the data flow code.
Example 2 includes the apparatus as defined in example 1, wherein the consumption operand is a ignore operand.
Example 3 includes the apparatus as defined in example 1 or example 2, further comprising a non-serial-parallel detector to detect a control flow graph of the serialized code is a non-serial-parallel flow graph.
Example 4 includes the apparatus as defined in example 1 or example 2, wherein the instruction is a PICK instruction.
Example 5 includes the apparatus as defined in example 1, wherein the consumption operand is a white value.
Example 6 includes the apparatus as defined in example 5, wherein the instruction generator is further to generate an instruction to consume the white value.
Example 7 includes the apparatus as defined in example 1 or example 2, wherein the node analyzer is to detect the non-serial-parallel node by analyzing a control dependency graph.
Example 8 includes the apparatus as defined in example 7, wherein the node analyzer is to identify the non-serial-parallel node when the control dependency graph has more than one parent node for the non-serial-parallel node.
Example 9 includes a non-transitory computer-readable medium comprising instructions that, when executed, cause a machine to at least detect a non-serial-parallel node in a serialization code, generate an instruction for a prior node that includes a consuming operand, generate a combined instruction to combine results of the instructions, and output the combined instruction and the instructions to generate a dataflow code.
Example 10 includes the non-transitory computer-readable medium as defined in example 9, wherein the consumption operand is a ignore operand.
Example 11 includes the non-transitory computer-readable medium as defined in example 9 or example 10, wherein the instructions, when executed, cause the machine to detect a control flow graph of the serialization code is a non-serial-parallel flow graph.
Example 12 includes the non-transitory computer-readable medium as defined in example 9 or example 10, wherein the instructions are PICK instructions.
Example 13 includes the non-transitory computer-readable medium as defined in example 9, wherein the consumption operand is a white value.
Example 14 includes the non-transitory computer-readable medium as defined in example 13, wherein the instructions, when executed, cause the machine to generate instructions to consume the white value.
Example 15 includes the non-transitory computer-readable medium as defined in example 9 or example 10, wherein the instructions, when executed, cause the machine to detect the non-serial-parallel node by analyzing a control dependency graph.
Example 16 includes the non-transitory computer-readable medium as defined in example 15, wherein the instructions, when executed, cause the machine to identify the non-serial-parallel node when the control dependency graph has more than one parent node for the non-serial-parallel node.
Example 17 includes a method for compiling code, the method comprising detecting non-serial-parallel nodes in a serialized code, generating an instruction for a prior node comprising consuming operands, generating a combined instruction for combining results of the instructions, and outputting the combined instruction and the instructions to generate a dataflow code.
Example 18 includes the method as defined in example 17, wherein the consumption operand is a ignore operand.
Example 19 includes the method as defined in example 17 or example 18, further comprising detecting that the control flow graph of the serialized code is a non-serial-parallel flow graph.
Example 20 includes the method as defined in example 17 or example 18, wherein the instruction is a PICK instruction.
Example 21 includes the method as defined in example 17, wherein the consumption operand is a white value.
Example 22 includes the method as defined in example 21, further comprising generating an instruction to consume the white value.
Example 23 includes the method as defined in example 17 or example 18, further comprising detecting the non-serial-parallel node by analyzing a control dependency graph.
Example 24 includes the method as defined in example 23, further comprising identifying the non-serial-parallel node when the control dependency graph has more than one parent node for the non-serial-parallel node.
Example 25 includes an apparatus for converting control flow code into data flow code, the apparatus comprising means for detecting non-serial-parallel nodes in a serialized code, and means for generating instructions to generate instructions for a prior node comprising consuming operands, generating instructions to combine results of the instructions, and outputting the combined instructions and the instructions to generate data flow code.
Example 26 includes the apparatus as defined in example 25, wherein the consumption operand is a ignore operand.
Example 27 includes the method as defined in example 25 or example 26, further comprising means for detecting that the control flow graph of the serialized code is a non-serial-parallel flow graph.
Example 28 includes the apparatus as defined in example 25 or example 26, wherein the instruction is a PICK instruction.
Example 29 includes the apparatus as defined in example 25, wherein the consumption operand is a white value.
Example 30 includes the apparatus as defined in example 29, wherein the means for generating instructions is further to generate means for consuming the white value.
Example 31 includes the apparatus as defined in example 25 or example 26, wherein the means for detecting is to detect the non-serial-parallel node by analyzing a control dependency graph.
Example 32 includes the apparatus as defined in example 31, wherein the means for detecting is to identify the non-serial-parallel node when the control dependency graph has more than parent nodes for the non-serial-parallel node.
From the foregoing, it will be appreciated that example methods, apparatus and articles of manufacture have been disclosed that are capable of efficiently converting non-serial-parallel serialized code into data stream code. In some examples, the use of the consumption operand in the converted instruction allows the initial non-serial-parallel code to be executed on a parallel computing system (such as a high performance computing system operating a dataflow architecture).
Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the appended claims either literally or under the doctrine of equivalents.

Claims (15)

1.一种用于将控制流代码转换为数据流代码的设备,所述设备包括:1. A device for converting a control flow code into a data flow code, the device comprising: 逻辑电路;Logic circuits; 节点分析器,所述节点分析器用于检测序列化代码中的非串并行节点;以及A node analyzer for detecting non-serial parallel nodes in serialized code; and 指令生成器,所述指令生成器用于:An instruction generator, the instruction generator being configured to: 生成针对与检测到的非串并行节点相关联的在先节点的指令,所述指令中的至少一者包括消耗操作数,所述消耗操作数被标识为忽略操作数,所述忽略操作数用于使得被赋给所述忽略操作数的任何值被消耗而不被存储;generating instructions for a preceding node associated with the detected non-serial parallel node, at least one of the instructions including a consume operand, the consume operand being identified as an ignore operand, the ignore operand being operand for causing any value assigned to the ignore operand to be consumed and not stored; 生成用于组合针对所述在先节点生成的指令的结果的组合指令;以及generating a combining instruction for combining results of the instructions generated for the preceding node; and 输出所述组合指令和针对所述在先节点生成的指令,以便生成数据流代码,其中,所述节点分析器或所述指令生成器中的至少一者由所述逻辑电路实现。The combined instruction and the instruction generated for the previous node are output to generate a data flow code, wherein at least one of the node analyzer or the instruction generator is implemented by the logic circuit. 2.如权利要求1所述的设备,还包括非串并行检测器,所述非串并行检测器用于检测所述序列化代码的控制流图是非串并行流图。2 . The apparatus according to claim 1 , further comprising a non-serial-parallel detector, configured to detect that the control flow graph of the serialized code is a non-serial-parallel flow graph. 3.如权利要求1所述的设备,其特征在于,所述指令是PICK指令。3. The device of claim 1, wherein the instruction is a PICK instruction. 4.如权利要求1所述的设备,其特征在于,所述消耗操作数包括白值。4. The apparatus of claim 1, wherein the consumption operand comprises a white value. 5.如权利要求4所述的设备,其特征在于,所述指令生成器还用于生成用于消耗所述白值的指令。5 . The device according to claim 4 , wherein the instruction generator is further configured to generate an instruction for consuming the white value. 6.如权利要求1所述的设备,其特征在于,所述节点分析器用于通过分析控制依赖图来检测所述非串并行节点。6. The apparatus of claim 1, wherein the node analyzer is configured to detect the non-serial parallel nodes by analyzing a control dependency graph. 7.如权利要求6所述的设备,其特征在于,所述节点分析器用于:当所述控制依赖图具有针对所述非串并行节点的多于一个的父节点时,标识所述非串并行节点。7. The apparatus of claim 6, wherein the node analyzer is configured to identify the non-serial parallel node when the control dependency graph has more than one parent node for the non-serial parallel node. 8.一种用于编译代码的方法,所述方法包括:8. A method for compiling code, the method comprising: 检测序列化代码中的非串并行节点;Detect non-serial parallel nodes in serialized code; 生成针对与检测到的非串并行节点相关联的在先节点的指令,所述指令中的至少一者包括消耗操作数,所述消耗操作数被标识为忽略操作数,所述忽略操作数用于使得被赋给所述忽略操作数的任何值被消耗而不被存储;generating instructions for a preceding node associated with the detected non-serial parallel node, at least one of the instructions including a consume operand, the consume operand being identified as an ignore operand, the ignore operand being operand for causing any value assigned to the ignore operand to be consumed and not stored; 生成用于组合针对所述在先节点生成的指令的结果的组合指令;以及generating a combining instruction for combining results of the instructions generated for the preceding node; and 输出针对所述在先节点生成的所述组合指令和所述指令以便生成数据流代码。The combined instruction and the instruction generated for the preceding node are output to generate a data flow code. 9.如权利要求8所述的方法,还包括:检测所述序列化代码的控制流图是非串并行流图。9 . The method of claim 8 , further comprising: detecting that the control flow graph of the serialized code is a non-serial-parallel flow graph. 10.如权利要求8所述的方法,其特征在于,所述指令是PICK指令。10. The method of claim 8, wherein the instruction is a PICK instruction. 11.如权利要求8所述的方法,其特征在于,所述消耗操作数包括白值。11. The method of claim 8, wherein the consumption operand comprises a white value. 12.如权利要求11所述的方法,还包括:生成用于消耗所述白值的指令。The method of claim 11 , further comprising generating an instruction for consuming the white value. 13.如权利要求8所述的方法,还包括:通过分析控制依赖图来检测所述非串并行节点。13. The method of claim 8, further comprising detecting the non-serial parallel nodes by analyzing a control dependency graph. 14.如权利要求13所述的方法,还包括:当所述控制依赖图具有针对所述非串并行节点的多于一个的父节点时,标识所述非串并行节点。14. The method of claim 13, further comprising identifying the non-serial parallel node when the control dependency graph has more than one parent node for the non-serial parallel node. 15.一种计算机可读介质,包括指令,所述指令在被执行时使得机器执行如权利要求8-14中的任一项所述的方法。15. A computer-readable medium comprising instructions which, when executed, cause a machine to perform the method of any one of claims 8 to 14.
CN201811344350.4A 2017-12-20 2018-11-13 Method and apparatus for converting non-serial-parallel control flow graph into data flow Active CN109947427B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15/849,307 US10496383B2 (en) 2017-12-20 2017-12-20 Methods and apparatus to convert a non-series-parallel control flow graph to data flow
US15/849,307 2017-12-20

Publications (2)

Publication Number Publication Date
CN109947427A CN109947427A (en) 2019-06-28
CN109947427B true CN109947427B (en) 2025-07-29

Family

ID=65231075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811344350.4A Active CN109947427B (en) 2017-12-20 2018-11-13 Method and apparatus for converting non-serial-parallel control flow graph into data flow

Country Status (3)

Country Link
US (1) US10496383B2 (en)
CN (1) CN109947427B (en)
DE (1) DE102018128776A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200026745A1 (en) * 2019-09-27 2020-01-23 Intel Corporation Apparatuses, methods, and systems for instructions of a matrix operations accelerator
US11922148B2 (en) * 2021-12-20 2024-03-05 Tactical Computing Laboratories, LLC Systems and methods for application performance profiling and improvement

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6763515B1 (en) * 2000-06-05 2004-07-13 National Instruments Corporation System and method for automatically generating a graphical program to perform an image processing algorithm

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5251322A (en) * 1987-08-13 1993-10-05 Digital Equipment Corporation Method of operating a computer graphics system including asynchronously traversing its nodes
US20040243529A1 (en) * 1996-03-25 2004-12-02 Stoneman Martin L. Machine computational-processing systems for simulated-humanoid autonomous decision systems
US6463579B1 (en) * 1999-02-17 2002-10-08 Intel Corporation System and method for generating recovery code
US7702499B1 (en) * 2000-05-02 2010-04-20 Cadence Design Systems, Inc. Systems and methods for performing software performance estimations
US7559066B2 (en) * 2000-08-08 2009-07-07 International Business Machines Corporation CICS BMS (basic message service) meta model
US7275079B2 (en) * 2000-08-08 2007-09-25 International Business Machines Corporation Common application metamodel including C/C++ metamodel
US6934938B2 (en) * 2002-06-28 2005-08-23 Motorola, Inc. Method of programming linear graphs for streaming vector computation
US7774769B2 (en) * 2005-09-22 2010-08-10 Intel Corporation Transmitting trace-specific information in a transformed application
US7954059B2 (en) * 2006-07-24 2011-05-31 National Instruments Corporation Automatic conversion of text-based code having function overloading and dynamic types into a graphical program for compiled execution
US8039718B2 (en) * 2009-07-22 2011-10-18 Limagrain Europe S.A. Inbred corn line BB33
US8417961B2 (en) * 2010-03-16 2013-04-09 Oracle International Corporation Apparatus and method for implementing instruction support for performing a cyclic redundancy check (CRC)
CN102360306A (en) * 2011-10-19 2012-02-22 上海交通大学 Method for extracting and optimizing information of cyclic data flow charts in high-level language codes
US8806464B2 (en) * 2012-04-26 2014-08-12 Hewlett-Packard Development Company, L.P. Process flow optimized directed graph traversal
US9182957B2 (en) * 2012-07-10 2015-11-10 Loring Craymer Method and system for automated improvement of parallelism in program compilation
US9081583B2 (en) * 2012-08-23 2015-07-14 National Instruments Corporation Compile time execution
US20140075423A1 (en) * 2012-09-13 2014-03-13 International Business Machines Corporation Efficiently solving the "use-def" problem involving label variables
US9189318B2 (en) * 2013-05-15 2015-11-17 Oracle International Corporation Path-sensitive analysis framework for bug checking
CN103870340B (en) * 2014-03-06 2017-11-07 华为技术有限公司 Data processing method, control node and stream computing system in stream computing system
EP2958044B1 (en) * 2014-06-20 2019-09-18 Secure-IC SAS A computer implemented method and a system for controlling dynamically the execution of a code
US9946520B1 (en) * 2015-02-26 2018-04-17 MathNimbus Inc. Conversion of interpretive language functions into web applications or services
US9772925B2 (en) * 2015-10-22 2017-09-26 Microsoft Technology Licensing, Llc Storage access debugging with disassembly and symbol entries
CN105589728B (en) * 2015-12-16 2019-03-29 西安文理学院 A kind of instruction idiom recognition methods based on subgraph semanteme isomorphism

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6763515B1 (en) * 2000-06-05 2004-07-13 National Instruments Corporation System and method for automatically generating a graphical program to perform an image processing algorithm

Also Published As

Publication number Publication date
US10496383B2 (en) 2019-12-03
US20190042216A1 (en) 2019-02-07
DE102018128776A1 (en) 2019-06-27
CN109947427A (en) 2019-06-28

Similar Documents

Publication Publication Date Title
US10402177B2 (en) Methods and systems to vectorize scalar computer program loops having loop-carried dependences
EP3588285A1 (en) Sequence optimizations in a high-performance computing environment
US10402176B2 (en) Methods and apparatus to compile code to generate data flow code
KR101385420B1 (en) Execution of dynamic languages via metadata extraction
US20110078424A1 (en) Optimizing program code using branch elimination
US8726255B2 (en) Recompiling with generic to specific replacement
CN112100072B (en) Static detection method, device, equipment and medium for application program code
JP4745341B2 (en) System, method and apparatus for dependency chain processing
CN112148294A (en) Method and apparatus for intentional programming for heterogeneous systems
US9176760B2 (en) Fast, combined forwards-backwards pass global optimization framework for dynamic compilers
US9817643B2 (en) Incremental interprocedural dataflow analysis during compilation
US20240143296A1 (en) METHODS AND APPARATUS FOR COMBINING CODE LARGE LANGUAGE MODELS (LLMs) WITH COMPILERS
US20130139137A1 (en) Systems and Methods for Customizing Optimization/Transformation/ Processing Strategies
US20160357529A1 (en) Parallel computing apparatus and parallel processing method
US9256409B2 (en) Building reusable function summaries for frequently visited methods to optimize data-flow analysis
CN113885877B (en) Compiling method, device, equipment and medium
KR20190015285A (en) Query optimizer for CPU utilization and code refactoring
CN116266119A (en) Methods, apparatus and articles of manufacture for generating code embedding dependent on use
US8935512B2 (en) Instruction operation code generation system
CN109947427B (en) Method and apparatus for converting non-serial-parallel control flow graph into data flow
US8117604B2 (en) Architecture cloning for power PC processors
CN109582368B (en) Method and apparatus for mapping a single statically dispatched instruction onto a dataflow graph in a dataflow architecture
US10599406B2 (en) Generating executable files through compiler optimization
US20090228869A1 (en) Annotating exception information in a computer program
US10108405B2 (en) Compiling apparatus and compiling method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant