CN109947427B

CN109947427B - Method and apparatus for converting non-serial-parallel control flow graph into data flow

Info

Publication number: CN109947427B
Application number: CN201811344350.4A
Authority: CN
Inventors: Y·张
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 2017-12-20
Filing date: 2018-11-13
Publication date: 2025-07-29
Anticipated expiration: 2038-11-13
Also published as: US10496383B2; US20190042216A1; DE102018128776A1; CN109947427A

Abstract

The present application provides a method and apparatus for converting a non-serial parallel control flow graph into a data flow. An example apparatus for converting a non-serial parallel control flow graph into a data flow includes: a node analyzer for detecting non-serial parallel nodes in serialized code; and an instruction generator for: generating an instruction for a preceding node associated with the detected non-serial parallel node, including a consumed operand; generating a combined instruction for combining the results of the instructions generated for the preceding node; and outputting the combined instruction and the instruction generated for the preceding node to generate data flow code.

Description

Method and apparatus for converting non-serial-parallel control flow graph into data flow

Technical Field

The present disclosure relates generally to compilers in computing systems, and more particularly, to methods and apparatus for improving compiler efficiency in computing systems by converting non-serial-parallel control flow graphs into data streams.

Background

Many computing systems operate in accordance with a control flow architecture. In a control flow architecture, execution of instructions of a program is driven by a program counter that steps through the instructions of the program. In other words, the execution order of the instructions of the program is defined by the structure of the program itself. In some cases, the control flow architecture may operate improperly when attempting to implement parallel processing. For example, a program may declare an instruction to be executed even though its inputs (e.g., operands) have not been updated by a parallel operation instruction.

Some computing systems utilize a data flow architecture. The dataflow architecture is not program-defined instruction execution order driven. Alternatively, the dataflow architecture executes instructions according to the availability of inputs (e.g., operands) of the instructions. For example, if an instruction has three operands, a computing system utilizing a dataflow architecture will execute the instruction once the three operands are provided to the instruction by the other instruction(s) on which the instruction depends. Thus, the data flow architecture may be executed in a highly parallel environment without fear that an instruction will execute before the data dependencies of the instruction are updated/satisfied. For example, the data flow architecture may be used in a large-scale computing system that uses a large number of processing elements to highly parallelize processing.

Drawings

FIG. 1 is a block diagram of an example system for transcoding a control stream into a data stream.

Fig. 2 is a block diagram of an example implementation of the non-serial-to-parallel converter of fig. 1.

FIG. 3 illustrates an example control flow graph of serial-parallel.

FIG. 4 illustrates an example control flow graph including nodes that are not serially parallel.

Fig. 5 is a flowchart representative of example machine readable instructions for implementing the non-serial-to-parallel converter of fig. 3 and/or 4.

FIG. 6 is a block diagram of an example processing device that may execute the instructions of FIG. 5 to implement the non-serial-to-parallel converter of FIG. 3 and/or FIG. 4.

The figures are not drawn to scale. As used in this patent, any component (e.g., layer, film, region, or plate) that is positioned (e.g., positioned, located, disposed, or formed on, etc.) in any manner indicates that the referenced component is in contact with another component, or that the referenced component is above the other component and that one or more intervening components are located between the referenced component and the other component. The recitation that any component is in contact with another means that there are no intervening components between the two components.

Detailed Description

The serial-parallel graph is widely used for application engineering and algorithm research theory. If the graph is a serial-parallel graph, some of the graph problems, typically NP-Complete, can be resolved in linear time. Highly efficient algorithms have been developed to generate highly parallel data stream code for control flow graphs that are serial-parallel graphs.

The directed graph G is two-terminal serial-parallel and has endpoints s and t if it can be generated by the following sequence of operations:

1. A new graph is created that consists of a single edge pointing from s to t.

2. Given two-terminal serial-parallel graphs X and Y, with endpoints sX, tX, sY and tY, a new graph g=p (X, Y) is formed by identifying s=sx=sy and t=tx=ty. This is called a parallel combination of X and Y.

3. Given two-terminal serial-parallel graphs X and Y, with endpoints sX, tX, sY and tY, a new graph g=s (X, Y) is formed by identifying s=sx, tx=sy and t=ty. This is called a serial combination of X and Y.

An undirected graph is a two-terminal serial-parallel graph with endpoints s and t if the undirected graph forms a directed two-terminal serial-parallel graph with endpoints s and t for some orientations of the edges of the undirected graph. If for some two vertices s and t, the directed or undirected graph is serial-parallel with both ends of those two endpoints.

While the data flow architecture may be advantageously used in a computing system, not all programs are designed and/or capable of being written directly in data flow code. For example, existing programs may be implemented for control flow architecture. The methods and apparatus disclosed herein facilitate converting control flow sequences to data flow programs. In particular, the methods and apparatus disclosed herein identify a non-serial-parallel control stream (e.g., the example non-serial-parallel stream of fig. 4) in a program and convert the non-serial-parallel control stream to a data stream code. Thus, in some disclosed examples, legacy programs with complex control flows may run in a highly parallel data flow architecture.

FIG. 1 is a block diagram of an example system 100 that includes example input code 102 compiled by an example compiler 103 to generate example dataflow assembly code 112. For example, system 100 may be implemented within a compiler to convert control flow code into data flow code.

The input code 102 of the illustrated example is a serialization instruction software program that is converted and compiled into assembly code (e.g., for execution on a computing platform). The example input code 102 is written in the C programming language. Or the input code may be written in any other language such as c++, fortran, java, etc.

The example compiler 103 includes an example Intermediate Representation (IR) transformer 304, an example serialized instruction transformer 106, an example control flow to data flow transformer 108, and an example non-serial-to-parallel transformer 110. The compiler 103 of the illustrated example is a LLVM compiler that has been modified to include a non-serial-to-parallel converter 110. Or any other type of pre-existing or newly created compiler may be used.

The IR transformer 104 of the illustrated example transforms the example input code 102 into an intermediate representation code. The example IR transformer 104 transforms the input code 102 into an LLVM intermediate representation. Or the IR transformer 104 may transform the input code 102 into any type of intermediate representation (e.g., standard portable intermediate representation, java bytecode, public intermediate representation, etc.) that is recognized by a compiler used in the system 100.

The example serialized instruction converter 106 converts the intermediate representation from the IR converter 104 into serialized control flow machine instructions for the target machine. According to the illustrated example, the serialization instruction converter 104 converts the intermediate representation code, one portion at a time, in order to facilitate verification and conversion to dataflow instructions.

The example control flow to data flow converter 108 converts the control flow serialization instructions into data flow assembly code 112. The example control stream to data stream converter 108 converts serial-parallel control stream codes into data stream codes using well known techniques.

The example control flow to data flow converter 108 uses two basic data flow instructions SWITCH and PICK.

The PICK instruction PICKs values from two registers based on assertions (predictes). PICK has the format%r=pick%r0,%r1,%r2, where%r is the sign of the virtual register in LLVM. Such instructions may be written in the C programming language as%r= (%r0);

The SWITCH instruction sends an operand to one of the two identified registers based on the predicate. The SWITCH has the format%r1,%r2=switch%r0,%r3, where the format sends the value in% r3 to either% r1 or% r2 based on the assertion in% r 0. In the C language, this format is similar to if (%r0)%r2=%r3else% r1=%r3.

For example, FIG. 3 illustrates an example control flow graph 300 of serial-parallel. In the illustrated example, the first node 302 assigns a value of X to either X11 or X12 based on the value of C1. (e.g., if C1 is false, x11=x, and if C1 is true, x12=x.) similarly, when C2 is false, the second node 304 assigns the value X11 to X21, and when C2 is true, the second node 304 assigns the value X11 to X22. When C3 is false, the third node 306 assigns the value X12 to X31, and when C3 is true, the third node 306 assigns the value X12 to X32. Finally, fourth node 308 is a Phi operation operating on X21, X22, X31, and X32. The Phi function is a pseudo function in the form of a static single assignment. The Phi function will assign X4 a value (e.g., one of X21, X22, X31, or X32) that controls the fourth node 308.

According to the example control flow graph 300, the second node 304 is serial with the first node 302 and parallel with the third node 306. Thus, because control flow graph 300 is serial-parallel, control flow graph 300 may be easily converted to a data stream using well-known techniques.

For example, in transcoding a control stream represented by fig. 3 to a data stream code, the example control stream-to-data stream converter will generate the PICK tree x2=pickc2, X21, X22, x3=pickc3, X31, X32, and x4=pickc1, X2, X3.

In practice, many control flow graphs are serial-parallel or can be easily reduced to serial-parallel using well-known techniques. However, some control flow graphs are not easily convertible. For example, goto ("go"), indirect jumps, or optimizations such as inline function calls and expansion loops may create very complex control flow graphs that cannot be reduced to serial-parallel graphs. For example, fig. 4 illustrates an example control flow graph 400 generated using a goto command, the example control flow graph 400 including a first node 402, a second node 404, a third node 406, a fourth node 408, a fifth node 410, and a sixth node 412. The control flow graph 400 of fig. 4 is non-serial-parallel in that the example first node 402 and the example second node 404 are not added to the graph using serial or parallel combinations.

The example control flow to data flow converter 108 also includes an example non-serial-to-parallel converter 110 to detect non-serial-to-parallel code within the control flow serialization code from the serialization instruction converter 106 and convert the non-serial-to-parallel code into data flow code. For example, when applied to the instruction illustrated in FIG. 4, the non-serial-to-parallel converter 110 detects that the instruction includes non-serial-to-parallel code that cannot be converted to parallel optimized data stream code according to existing methods. For example, in calculating the output value (out-going value) of X from the example fourth node 408, evaluation of the predicate value C2 alone is insufficient. It is also necessary to evaluate C3. In addition, the order of evaluation is important because if a branch to C2 is taken, then C3 will have no value in the data flow machine. Complex logic combinations of C2 and C3 or evaluations of both branches are needed to calculate X4 in the fourth node 408 and also for X5 in the fifth node 410, both of which waste power.

To facilitate converting non-serial-parallel code into data stream code that may be used in, for example, a highly parallel environment, the example non-serial-parallel converter 110 utilizes a consumption operand, any value of which is consumed by the system. An example consuming operand is% ign, which may be referred to as a ignore operand. The value assigned to the consuming operand is not stored. According to the illustrated example, the non-serial-to-parallel converter 110 includes a consume operand in the PICK instruction to consider branches of the node tree that may be unrelated to determining the particular value.

For example, non-serial-to-parallel converter 110 may generate a PICK tree for determining the value of X4 in example fourth node 408, x2=pickc2, x21% ign, x3=pickc3, X31,% ign, and x4=pickc1, X2, X3.

Fig. 2 is a block diagram of an example implementation of the non-serial-to-parallel converter 100 of fig. 1. The example non-serial-to-parallel converter 110 of fig. 2 includes an example non-serial-to-parallel detector 202, an example node analyzer 204, and an example instruction generator 206.

The example non-serial-to-parallel detector 202 analyzes the control stream serialization code processed by the example control stream-to-data stream converter 108 to identify non-serial-to-parallel code. The example non-serial detector 202 uses a conceptual Control Dependency Graph (CDG) to detect non-serial-parallel graphs. In a conventional control flow graph, node B relies on node a for control if node B is on one path from node a to the egress (sink) node of the control flow graph and there are other paths from node a to the egress without going through node B. In the corresponding control dependency graph, there are edges from A to B. Node a is referred to as the parent node of node B. The example non-serial-parallel detector 202 builds a control dependency graph for a control flow graph of an input program. When a CDG has the property that each node has one and only one control depends on the parent node, the graph is serial-parallel. The example non-serial-parallel detector 202 identifies a non-serial-parallel graph when at least one node does not have one and only one control-dependent parent node (e.g., more than one parent node, zero parent nodes, etc.).

For example, the non-serial-parallel detector 202 may apply a set of rules to identify non-serial-parallel codes. In some examples, the non-serial-parallel detector 202 may determine that the code is not serial-parallel (e.g., does not satisfy the rules of serial-parallel code). In other examples, the non-serial-parallel detector 202 may determine that the code is non-serial-parallel (e.g., satisfies a rule of non-serial-parallel code). For example, when the code includes a node that is neither in serial nor parallel with other nodes, the non-serial-parallel detector 202 may determine that the code is non-serial-parallel. For example, codes that do not meet the following rules may be classified as non-serial-parallel:

Given two-terminal serial-parallel graphs X and Y, with endpoints sX, tX, sY and tY, a new graph g=p (X, Y) is formed by identifying s=sx=sy and t=tx=ty. This is called a parallel combination of X and Y.

Given two-terminal serial-parallel graphs X and Y, with endpoints sX, tX, sY and tY, a new graph g=s (X, Y) is formed by identifying s=sx, tx=sy and t=ty. This is called a serial combination of X and Y.

When the example non-serial-parallel detector 202 identifies non-serial-parallel code, the non-serial-parallel detector 202 triggers analysis by the example node analyzer 204.

The example node analyzer 204 of the illustrated example analyzes the code to identify non-serial-parallel node(s). For example, node analyzer 204 may identify nodes that are not serial and/or parallel. The example node analyzer 204 triggers the example instruction generator 206 to determine PICK instructions for the non-serial-parallel nodes.

The example instruction generator 206 generates PICK instructions for converting non-serial-parallel control stream serialization code into data stream code. The example instruction generator 206 generates a dataflow code by analyzing non-serial-parallel nodes and preceding nodes to generate a PICK instruction that includes a consume operand (e.g., an ignore operand such as%ign). Although the illustrated example uses PICK instructions, any instruction of a compiler that selects values from operands based on assertions may be utilized.

When analyzing a node and the preceding nodes of that node, the example instruction generator 206 generates a PICK instruction for each preceding node and sets a first operand to a value that may propagate to a non-serial-parallel node and sets a second operand to a consuming operand (e.g., because a second possible path from the preceding node does not propagate to the non-serial-parallel node). For example, when generating a PICK instruction for the fourth node 408 of fig. 4, for the second node 404 of fig. 4, the instruction generator 206 generates a PICK instruction x2=pickc2, X21,% ign. In this example, the second output from the second node 404 does not propagate to the fourth node 408 and is therefore set to consume operands in order to ensure that the PICK instruction is executed and does not propagate a value to X4 when C2 is true, thereby remaining consistent with the desired operation programmed in the input code. Thus, instruction generator 206 may generate a PICK tree for fourth node 408 of X2=PICK C2, X21% ign, X3=PICK C3, X31,% ign, and X4=PICK C1, X2, X3.

In some examples, the instruction generator 206 may utilize a white value such as zero (zero) where the processing architecture on which the code is to be executed does not propagate the consuming operands, and may include a SWITCH instruction to consume the zero if execution of the code branch is far from the node. For example, for the fourth node 408, the instruction generator may generate the following dataflow code:

X2=PICK C2,X21,0

X3=PICK C3,X31,0

X’4=PICK C1,X2,X3

e2 = | C1& & C2// take branch C2- > B5

E3 =c1 & & C3// take branch C2- > B5

E=E1||E2

X4,%ign=SWITCH E,X’4

In this example, & is a shorted logical AND and || is a shorted logical OR. Since only the second operand is evaluated if the first operand does not determine the output of the operator, the short-circuit nature of those operators saves power consumption. Furthermore, the white value (e.g., zero) never changes when propagating in the PICK tree, which also consumes less power during execution.

Although an example manner of implementing the non-serial-to-parallel converter 110 is illustrated in fig. 1, one or more of the elements, processes, and/or devices illustrated in fig. 1 may be combined, split, rearranged, omitted, eliminated, and/or implemented in any other way. Further, the example non-serial-to-parallel detector 202, the example node analyzer 204, the example instruction generator 206, and/or, more generally, the example non-serial-to-parallel converter 110 may be implemented by hardware, software, firmware, and/or any combination of hardware, software, and/or firmware. Thus, for example, any of the example non-serial-to-parallel detector 202, the example node analyzer 204, the example instruction generator 206, and/or, more generally, the example non-serial-to-parallel converter 110 may be implemented by one or more analog or digital circuits, logic circuits, programmable processor(s), application Specific Integrated Circuit (ASIC), programmable logic device(s) (PLD), and/or field programmable logic device(s) (FPLD). When reading any of the apparatus claims or system claims of this patent to cover a purely software and/or firmware implementation, at least one of the example non-serial-to-parallel detector 202, the example node analyzer 204, the example instruction generator 206, and/or, more generally, the example non-serial-to-parallel converter 110 is expressly defined herein to include a non-transitory computer-readable storage device or storage disk, such as a memory, digital Versatile Disk (DVD), compact Disk (CD), blu-ray disk, etc., that includes the software and/or firmware. Further, the example non-serial-to-parallel converter 110 may include one or more elements, processes, and/or devices in addition to, or in lieu of, those illustrated in fig. 2, and/or may include more than one of any or all of the illustrated elements, processes, and devices.

A flowchart representative of example machine readable instructions for implementing the non-serial-to-parallel converter 110 is shown in fig. 5. In an example, the machine readable instructions include a program for execution by a processor, such as the application processor 601 of fig. 6 and/or the processor 612 shown in the example processor platform 600 discussed below in connection with fig. 6. Although the program may be embodied in software stored on a non-transitory computer readable storage medium, such as a CD-ROM, floppy disk, hard drive, digital Versatile Disk (DVD), blu-ray disk, or memory associated with the processor 612, the entire program and/or parts thereof could alternatively be executed by a device other than the processor 612 and/or embodied in firmware or dedicated hardware. Further, while the example program is described with reference to the flowchart illustrated in fig. 5, many other methods of implementing the example non-serial-to-parallel converter 110 may alternatively be used. For example, the order of execution of the blocks may be changed, and/or some of the blocks described may be changed, eliminated, or combined. Additionally or alternatively, any or all of the blocks may be implemented by one or more hardware circuits (e.g., discrete and/or integrated analog and/or digital circuits, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), comparators, operational amplifiers (op-amps), logic circuits, etc.) structured to perform the corresponding operations without executing software or firmware.

As mentioned above, the example process of fig. 5 may be implemented using encoded instructions (e.g., computer and/or machine readable instructions) stored on a non-transitory computer and/or machine readable medium, such as a hard disk drive, a flash memory, a read-only memory, a compact disk, a digital versatile disk, a cache, a random access memory, and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, during brief instances, during temporary buffering, and/or during caching of the information). As used herein, the term "non-transitory computer-readable storage medium" is expressly defined to include any type of computer-readable storage device and/or storage disk and to exclude propagating signals and to exclude transmission media. "comprising" and "including" (and all forms and tenses thereof) are used herein as open-ended terms. Thus, whenever a claim lists any content followed by any form of "comprising" or "including" (e.g., including, containing, etc.), it is to be understood that additional elements, items, etc. may be present without exceeding the scope of the corresponding claim. As used herein, when the phrase "at least" is used as a transitional term in the preamble of a claim, it is open-ended in the same manner as the terms "comprising" and "including.

The process of FIG. 5 begins when the non-serial-to-parallel detector 202 detects a non-serial-to-parallel flow graph being processed by the example control flow to data flow converter 108 (block 502). For example, the non-serial-parallel detector 202 may apply one or more rules to the control flow serialization code to test the non-serial-parallel nodes. In response to detecting the non-serial-parallel flow graph, the example node analyzer 204 detects a non-serial-parallel node (block 504). For example, when processing the code illustrated in fig. 4, the example node analyzer 204 may detect that the example fourth node 408 is a non-serial-parallel node.

In response to detecting the non-serial-parallel node, the example node analyzer 204 selects a node in the serialization code that precedes the non-serial-parallel node (block 506). For example, when the fourth node 408 of FIG. 4 is processed as a non-serial-parallel node, the node analyzer 204 selects the example second node 404 as the prior node. The example instruction generator 206 generates a PICK instruction with an ignore operand for the selected prior node (block 508). For example, when the second node 404 of fig. 4 is processed as the previous node of the fourth node 408, the instruction generator 206 generates x2=pickc2, X21,% ign. Although the example instruction generator 206 generates PICK instructions, any similar instructions for a particular compiler environment employing consuming operands may be utilized. For example, any instruction that ensures evaluation of assertions with consuming operands (e.g., ignore operands, white values, etc.) may be utilized.

The example node analyzer 204 determines if another prior node exists (block 510). When there is another prior node, control returns to block 506 to generate a PICK instruction for the prior node. For example, after processing the second node 404, a third node 406 may be processed.

When there is no other prior node (block 510), the example instruction generator 206 generates an instruction to combine the results of the PICK instruction from the prior node. According to the illustrated example, instruction generator 206 generates a PICK instruction to combine the results to a prior node. For example, when generating instructions for the fourth node 408, the example instruction generator 206 generates x4=pickc1, X2, X3 after generating PICK instructions for the second node 404 and the third node 406.

The example node analyzer 204 determines if another non-serial-parallel node is present in the code being analyzed (block 514). When there is another non-serial-parallel node, control returns to block 502 to process the next node to generate the data stream assembly code 112. When there is no other non-serial-parallel node, the example instruction generator 206 outputs the generated dataflow code (block 516), and the routine of FIG. 5 ends.

While the foregoing examples generate instructions for an environment in which consuming operands are propagated, in some environments, such consuming operands are not propagated. In an environment that does not support propagation of the consuming operands, the consuming operands may be replaced with a white value such as zero (e.g., x2=pickc2, X21,0 and x3=pickc3, X31, 0). In addition, the instructions for combining the elements (block 512) may include several instructions for selecting the appropriate value. For example, the combined instruction may store values in temporary registers (X '4 = PICK C1, X2, X3), and may select final values using switches to ensure that white values are consumed (e2= | c1& & C2; e3 = c1& & C3; E = e1|e2; and X4,% ign = SWITCH, X' 4).

Fig. 6 is a block diagram of an example processor platform 600 capable of executing the instructions of fig. 5 to implement the non-serial-to-parallel converter 110 of fig. 1. The processor platform 600 may be, for example, a server, a personal computer, a mobile device (e.g., a cellular telephone, a smart phone, a tablet device such as an iPad ^TM), a Personal Digital Assistant (PDA), an internet appliance, or any other type of computing device.

The processor platform 600 of the illustrated example includes a processor 612. The processor 612 of the illustrated example is hardware. For example, the processor 612 may be implemented by one or more integrated circuits, logic circuits, microprocessors or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor-based (e.g., silicon-based) device. In this example, the processor 612 implements the non-serial-parallel detector 202, the node analyzer 204, and the instruction generator 206.

The processor 612 of the illustrated example includes a local memory 613 (e.g., cache). The processor 612 of the illustrated example communicates with a main memory including a volatile memory 614 and a non-volatile memory 616 via a bus 618. The volatile memory 614 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), dynamic Random Access Memory (DRAM), RAMBUS Dynamic Random Access Memory (RDRAM), and/or any other type of random access memory device. The non-volatile memory 616 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 614, 616 is controlled by a memory controller.

The processor platform 600 of the illustrated example also includes interface circuitry 620. Interface circuit 620 may be implemented by any type of interface standard, such as an ethernet interface, a Universal Serial Bus (USB), and/or a PCI express interface.

In the illustrated example, one or more input devices 622 are connected to the interface circuit 620. Input device(s) 622 permit a user to input data and/or commands into processor 612. The input device(s) may be implemented by, for example, an audio sensor, microphone, camera (stationary or video), keyboard, buttons, mouse, touch screen, trackpad, trackball, isocentre mouse, and/or voice recognition system.

One or more output devices 624 are also connected to the interface circuit 620 of the illustrated example. The output device 624 may be implemented, for example, by a display device (e.g., a Light Emitting Diode (LED), an Organic Light Emitting Diode (OLED), a liquid crystal display, a cathode ray tube display (CRT), a touch screen, a haptic output device, a printer, and/or speakers). Thus, the interface circuit 620 of the illustrated example typically includes a graphics driver card, a graphics driver chip, and/or a graphics driver processor.

The interface circuit 620 of the illustrated example also includes communication devices such as a transmitter, receiver, transceiver, modem, and/or network interface card to facilitate exchange of data with external machines (e.g., any kind of computing device) via a network 626 (e.g., an ethernet connection, digital Subscriber Line (DSL), telephone line, coaxial cable, cellular telephone system, etc.).

The processor platform 600 of the illustrated example also includes one or more mass storage devices 628 for storing software and/or data. Examples of such mass storage devices 628 include floppy disk drives, hard disk drives, compact disk drives, blu-ray disc drives, RAID systems, and Digital Versatile Disk (DVD) drives.

The encoded instructions 632 of fig. 5 may be stored in the mass storage device 628, in the volatile memory 614, in the non-volatile memory 616, and/or on a removable tangible computer-readable storage medium, such as a CD or DVD.

Example methods, apparatus, systems, and articles of manufacture to convert non-serial-parallel control flow graphs to data flows are disclosed herein. Further examples and combinations thereof include the following.

Example 1 includes an apparatus for converting control flow code into data flow code, the apparatus comprising a node analyzer to detect non-serial-parallel nodes in a serialized code, and an instruction generator to generate an instruction for a preceding node including a consuming operand, generate a combined instruction to combine results of the instructions, and output the combined instruction and the instruction to generate the data flow code.

Example 2 includes the apparatus as defined in example 1, wherein the consumption operand is a ignore operand.

Example 3 includes the apparatus as defined in example 1 or example 2, further comprising a non-serial-parallel detector to detect a control flow graph of the serialized code is a non-serial-parallel flow graph.

Example 4 includes the apparatus as defined in example 1 or example 2, wherein the instruction is a PICK instruction.

Example 5 includes the apparatus as defined in example 1, wherein the consumption operand is a white value.

Example 6 includes the apparatus as defined in example 5, wherein the instruction generator is further to generate an instruction to consume the white value.

Example 7 includes the apparatus as defined in example 1 or example 2, wherein the node analyzer is to detect the non-serial-parallel node by analyzing a control dependency graph.

Example 8 includes the apparatus as defined in example 7, wherein the node analyzer is to identify the non-serial-parallel node when the control dependency graph has more than one parent node for the non-serial-parallel node.

Example 9 includes a non-transitory computer-readable medium comprising instructions that, when executed, cause a machine to at least detect a non-serial-parallel node in a serialization code, generate an instruction for a prior node that includes a consuming operand, generate a combined instruction to combine results of the instructions, and output the combined instruction and the instructions to generate a dataflow code.

Example 10 includes the non-transitory computer-readable medium as defined in example 9, wherein the consumption operand is a ignore operand.

Example 11 includes the non-transitory computer-readable medium as defined in example 9 or example 10, wherein the instructions, when executed, cause the machine to detect a control flow graph of the serialization code is a non-serial-parallel flow graph.

Example 12 includes the non-transitory computer-readable medium as defined in example 9 or example 10, wherein the instructions are PICK instructions.

Example 13 includes the non-transitory computer-readable medium as defined in example 9, wherein the consumption operand is a white value.

Example 14 includes the non-transitory computer-readable medium as defined in example 13, wherein the instructions, when executed, cause the machine to generate instructions to consume the white value.

Example 15 includes the non-transitory computer-readable medium as defined in example 9 or example 10, wherein the instructions, when executed, cause the machine to detect the non-serial-parallel node by analyzing a control dependency graph.

Example 16 includes the non-transitory computer-readable medium as defined in example 15, wherein the instructions, when executed, cause the machine to identify the non-serial-parallel node when the control dependency graph has more than one parent node for the non-serial-parallel node.

Example 17 includes a method for compiling code, the method comprising detecting non-serial-parallel nodes in a serialized code, generating an instruction for a prior node comprising consuming operands, generating a combined instruction for combining results of the instructions, and outputting the combined instruction and the instructions to generate a dataflow code.

Example 18 includes the method as defined in example 17, wherein the consumption operand is a ignore operand.

Example 19 includes the method as defined in example 17 or example 18, further comprising detecting that the control flow graph of the serialized code is a non-serial-parallel flow graph.

Example 20 includes the method as defined in example 17 or example 18, wherein the instruction is a PICK instruction.

Example 21 includes the method as defined in example 17, wherein the consumption operand is a white value.

Example 22 includes the method as defined in example 21, further comprising generating an instruction to consume the white value.

Example 23 includes the method as defined in example 17 or example 18, further comprising detecting the non-serial-parallel node by analyzing a control dependency graph.

Example 24 includes the method as defined in example 23, further comprising identifying the non-serial-parallel node when the control dependency graph has more than one parent node for the non-serial-parallel node.

Example 25 includes an apparatus for converting control flow code into data flow code, the apparatus comprising means for detecting non-serial-parallel nodes in a serialized code, and means for generating instructions to generate instructions for a prior node comprising consuming operands, generating instructions to combine results of the instructions, and outputting the combined instructions and the instructions to generate data flow code.

Example 26 includes the apparatus as defined in example 25, wherein the consumption operand is a ignore operand.

Example 27 includes the method as defined in example 25 or example 26, further comprising means for detecting that the control flow graph of the serialized code is a non-serial-parallel flow graph.

Example 28 includes the apparatus as defined in example 25 or example 26, wherein the instruction is a PICK instruction.

Example 29 includes the apparatus as defined in example 25, wherein the consumption operand is a white value.

Example 30 includes the apparatus as defined in example 29, wherein the means for generating instructions is further to generate means for consuming the white value.

Example 31 includes the apparatus as defined in example 25 or example 26, wherein the means for detecting is to detect the non-serial-parallel node by analyzing a control dependency graph.

Example 32 includes the apparatus as defined in example 31, wherein the means for detecting is to identify the non-serial-parallel node when the control dependency graph has more than parent nodes for the non-serial-parallel node.

From the foregoing, it will be appreciated that example methods, apparatus and articles of manufacture have been disclosed that are capable of efficiently converting non-serial-parallel serialized code into data stream code. In some examples, the use of the consumption operand in the converted instruction allows the initial non-serial-parallel code to be executed on a parallel computing system (such as a high performance computing system operating a dataflow architecture).

Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the appended claims either literally or under the doctrine of equivalents.

Claims

1. A device for converting a control flow code into a data flow code, the device comprising:

Logic circuits;

A node analyzer for detecting non-serial parallel nodes in serialized code; and

An instruction generator, the instruction generator being configured to:

generating instructions for a preceding node associated with the detected non-serial parallel node, at least one of the instructions including a consume operand, the consume operand being identified as an ignore operand, the ignore operand being operand for causing any value assigned to the ignore operand to be consumed and not stored;

generating a combining instruction for combining results of the instructions generated for the preceding node; and

The combined instruction and the instruction generated for the previous node are output to generate a data flow code, wherein at least one of the node analyzer or the instruction generator is implemented by the logic circuit.

2 . The apparatus according to claim 1 , further comprising a non-serial-parallel detector, configured to detect that the control flow graph of the serialized code is a non-serial-parallel flow graph.

3. The device of claim 1, wherein the instruction is a PICK instruction.

4. The apparatus of claim 1, wherein the consumption operand comprises a white value.

5 . The device according to claim 4 , wherein the instruction generator is further configured to generate an instruction for consuming the white value.

6. The apparatus of claim 1, wherein the node analyzer is configured to detect the non-serial parallel nodes by analyzing a control dependency graph.

7. The apparatus of claim 6, wherein the node analyzer is configured to identify the non-serial parallel node when the control dependency graph has more than one parent node for the non-serial parallel node.

8. A method for compiling code, the method comprising:

Detect non-serial parallel nodes in serialized code;

The combined instruction and the instruction generated for the preceding node are output to generate a data flow code.

9 . The method of claim 8 , further comprising: detecting that the control flow graph of the serialized code is a non-serial-parallel flow graph.

10. The method of claim 8, wherein the instruction is a PICK instruction.

11. The method of claim 8, wherein the consumption operand comprises a white value.

The method of claim 11 , further comprising generating an instruction for consuming the white value.

13. The method of claim 8, further comprising detecting the non-serial parallel nodes by analyzing a control dependency graph.

14. The method of claim 13, further comprising identifying the non-serial parallel node when the control dependency graph has more than one parent node for the non-serial parallel node.

15. A computer-readable medium comprising instructions which, when executed, cause a machine to perform the method of any one of claims 8 to 14.