CN113128152A

CN113128152A - Netlist partitioning method of multi-die FPGA (field programmable Gate array) based on time sequence

Info

Publication number: CN113128152A
Application number: CN202110429301.6A
Authority: CN
Inventors: 杜学军; 惠锋; 虞健; 刘佩; 董志丹
Original assignee: Wuxi Zhongwei Yixin Co Ltd
Current assignee: Wuxi Zhongwei Yixin Co Ltd
Priority date: 2021-04-21
Filing date: 2021-04-21
Publication date: 2021-07-16

Abstract

The invention discloses a netlist partitioning method of a multi-die FPGA based on time sequence, relating to the technical field of FPGA, the method comprises the steps of sequentially traversing from large to small according to the time sequence cost value of each node to be optimized under the current distribution result after the initial distribution result is obtained by segmenting the input netlist of a user, and determining the time sequence cost value of the nodes to be optimized after being redistributed to other sub netlists according to the relationship between the distribution results of the adjacent nodes with direct connection relationship, and adjusting the distribution result of each node to be optimized to distribute the node to the sub-netlist with the minimum time sequence cost value, circularly updating to obtain each sub-netlist, the method can reduce the sub-netlist crossing times among the sub-netlists obtained by segmentation, particularly reduce the sub-netlist crossing times of the critical path, further reduce the time delay of the critical path, optimize the time sequence of the design and improve the speed of the overall design.

Description

Netlist partitioning method of multi-die FPGA (field programmable Gate array) based on time sequence

Technical Field

The invention relates to the technical field of FPGA (field programmable gate array), in particular to a sequential-based netlist partitioning method for a multi-die FPGA.

Background

An FPGA (Field Programmable Gate Array) is a Programmable logic device of hardware, and is widely applied to prototype verification in integrated circuit design besides the fields of mobile communication, data center, etc., so as to effectively verify the correctness of circuit functions and accelerate the circuit design speed.

With the increasing scale of integrated circuits and the realization of complex functions, the demand for the number of programmable logic resources of FPGAs is increasing, and in order to avoid the increase of processing difficulty and the reduction of production yield caused by the increase of Chip area, the interconnection design of a plurality of FPGA dies is realized by using Silicon Stack Interconnection (SSI), cogos (Chip on Wafer on Substrate) or other technologies, so as to form a multi-die FPGA, and the required circuit structure is realized by using the logic resources on the plurality of FPGA dies.

However, this poses a challenge to the layout and routing of the multi-die FPGA, and thus how to reasonably arrange the complex circuits on multiple chips to obtain better performance is a key issue in the design flow of the multi-die FPGA. Before layout and wiring of the multi-die FPGA are carried out, firstly, a user netlist corresponding to the multi-die FPGA is required to be divided into a plurality of connected sub-netlists, each sub-netlist corresponds to one FPGA die, and then layout and wiring are carried out on the corresponding FPGA die according to each sub-netlist. Therefore, the partitioning method for the user netlist directly affects the layout and routing process of the multi-die FPGA and also affects the final performance of the multi-die FPGA. At present, when a user netlist is divided, only the matching degree between the logic resource quantity contained in an FPGA bare chip is considered, that is, only the corresponding FPGA bare chip is required to meet the logic resource requirement of a sub-netlist, but the performance of a multi-die FPGA is often difficult to guarantee by the method.

Disclosure of Invention

The invention provides a netlist partitioning method of a multi-die FPGA based on time sequence aiming at the problems and the technical requirements, and the technical scheme of the invention is as follows:

a netlist partitioning method of a multi-die FPGA based on time sequence comprises the following steps:

acquiring a user input netlist, and distributing each instance module in the user input netlist to a sub-netlist corresponding to each FPGA bare chip in the multi-bare-chip FPGA according to a predetermined algorithm to obtain an initial distribution result;

determining the time sequence cost value of each node to be optimized in the user input netlist according to the initial distribution result;

sequentially traversing each node to be optimized according to the sequence of the time sequence cost values of the nodes to be optimized from large to small, determining the time sequence cost value of the nodes to be optimized after the nodes to be optimized are redistributed to other sub netlists according to the relationship between the distribution results of the nodes to be optimized and the adjacent nodes with direct connection relationship, wherein the nodes to be optimized and the adjacent nodes with direct connection relationship are respectively example modules input into the netlists by users;

adjusting the distribution result of each node to be optimized to distribute the node to the sub netlist with the minimum time sequence cost value, wherein the smaller the time sequence cost value is, the smaller the number of times of spanning the sub netlist of the path where the corresponding node to be optimized is located is, and the number of the spanning sub netlists is the number of two example modules with direct connection relation on the path in two different sub netlists respectively;

updating the time sequence cost value of each node to be optimized according to the adjusted distribution result, and re-executing the step of sequentially traversing each node to be optimized from large to small according to the time sequence cost value of each node to be optimized until a preset cycle condition is reached, and segmenting to obtain a sub netlist corresponding to each FPGA bare chip, wherein each sub netlist comprises all distributed example modules and a netlist net among the example modules;

the logic resource quantity on each FPGA bare chip meets the logic resource requirement of the corresponding sub-netlist obtained through segmentation, the input signal connection point leading-out end on each FPGA bare chip meets the input signal quantity of the corresponding sub-netlist, the output signal connection point leading-out end on each FPGA bare chip meets the output signal quantity of the corresponding sub-netlist, and the FPGA bare chip connected with the IO pin of the multi-die FPGA meets the IO port requirement of the corresponding sub-netlist.

According to the further technical scheme, the time sequence cost value of each node to be optimized is updated according to the auxiliary time sequence cost of each reachable register node, and the relationship between the distribution results of each node to be optimized and the adjacent nodes with the direct connection relationship is used for updating the auxiliary time sequence cost of each reachable register node;

the reachable register node of each node to be optimized comprises an example module which is located on the same path with the node to be optimized and located at the downstream of the node to be optimized in the signal transmission direction and is of a register type.

According to a further technical scheme, the updating of the time sequence cost value of each node to be optimized according to the auxiliary time sequence cost of each reachable register node comprises the following steps:

updating the time sequence cost value of the node to be optimized to be the maximum value between the current time sequence cost value and the auxiliary time sequence cost of each reachable register node: c1 ═ max (C1, C2_1, C2_2 … C2_ k), where C1 is the timing cost value of the node to be optimized, and C2_1, C2_2 … C2_ k is the auxiliary timing cost of each reachable register node of the node to be optimized.

and taking the time sequence cost value of the node to be optimized and the auxiliary time sequence cost of each reachable register node corresponding to the time sequence cost value as the updated time sequence cost value of the node to be optimized according to the maximum value weighted by the corresponding weight: max (C1 × w) of C1_C1,C2_1×w_{C2_1},C2_2×w_{C2_2}…C2_k×w_{C2_k}) Wherein C1 is the timing cost value of the node to be optimized, C2_1, C2_2 … C2_ k are the auxiliary timing costs of each reachable register node of the node to be optimized, w_CAnd the weight corresponding to the node C is shown, and the weights corresponding to different nodes are the same or different.

According to the further technical scheme, the weight of each node is determined according to the estimated time sequence allowance of the path where the node is located, and the smaller the estimated time sequence allowance of the path is, the higher the corresponding time sequence tension degree is, and the larger the weight of the node is.

The further technical scheme is that the time sequence cost value of the nodes to be optimized after being redistributed to other sub netlists is determined according to the relationship between the distribution results of the nodes to be optimized and the adjacent nodes with direct connection relationship, and the time sequence cost value comprises the following steps:

for any one reallocated child netlist to be reallocated of the node to be optimized, sequentially traversing all adjacent nodes which are in direct connection with the node to be optimized, updating the time sequence cost value of the node to be optimized in each traversal, determining the updated time sequence cost value of the node to be optimized after all the adjacent nodes are traversed as the time sequence cost value of the node to be optimized when the node to be optimized is reallocated to the child netlist to be reallocated, wherein each traversal time is as follows:

if the node to be optimized and the adjacent node with the direct connection relation are in the same sub-netlist under the current distribution result, the node to be optimized is redistributed to the sub-netlist to be redistributed and then is in different sub-netlists with the adjacent node, the auxiliary time sequence cost of each reachable register node of the node to be optimized is increased, and the time sequence cost of the node to be optimized is updated to be the maximum value between the current time sequence cost of the node to be optimized and the auxiliary time sequence cost of each reachable register node;

if the node to be optimized and the adjacent node with the direct connection relation are in different sub netlists under the current distribution result, the node to be optimized is redistributed to the sub netlists to be redistributed and then is in the same sub netlist with the adjacent node, the auxiliary time sequence cost of each reachable register node of the node to be optimized is reduced, and the time sequence cost of the node to be optimized is updated to be the maximum value between the current time sequence cost of the node to be optimized and the auxiliary time sequence cost of each reachable register node;

otherwise, keeping the time sequence cost value of the node to be optimized under the current distribution result unchanged.

According to the further technical scheme, the auxiliary time sequence cost of each reachable register node of the node to be optimized is changed according to the variable quantity corresponding to the path where the node to be optimized is located, and the variable quantities corresponding to different paths are the same or different.

According to the further technical scheme, the auxiliary time sequence cost of each reachable register node of the node to be optimized is changed according to the variation corresponding to the signal segment of the node to be optimized in the path, and the variation corresponding to different signal segments in the same path is the same or different.

According to the further technical scheme, the time sequence cost value of each node to be optimized is updated according to the adjusted distribution result, and the method comprises the following steps of:

traversing all father nodes of the nodes to be optimized, and for each father node of the nodes to be optimized, if the nodes to be optimized and the father nodes of the nodes to be optimized are not in the same child netlist under the current distribution result, increasing the auxiliary time sequence cost of each reachable register node of the nodes to be optimized, and updating the time sequence cost value of the nodes to be optimized to the maximum value between the current time sequence cost value of the node to be optimized and the auxiliary time sequence cost of each reachable register node;

and/or traversing all child nodes of the node to be optimized, and for each child node of the node to be optimized, if the node to be optimized and the child node thereof are not in the same child netlist under the current distribution result, increasing the auxiliary time sequence cost of each reachable register node of the child node of the node to be optimized, and updating the time sequence cost value of the node to be optimized to the maximum value between the current time sequence cost value of the node to be optimized and the auxiliary time sequence cost of each reachable register node;

the parent node of the node to be optimized is an example module which has a direct connection relationship with the node to be optimized and is located at the upstream of the signal transmission direction, and the child node of the node to be optimized is an example module which has a direct connection relationship with the node to be optimized and is located at the downstream of the signal transmission direction.

According to the further technical scheme, when the time sequence cost value of the node to be optimized is updated by the aid of the auxiliary time sequence cost of each reachable register node of the node to be optimized, the time sequence cost value of the node to be optimized is synchronously updated when the auxiliary time sequence cost of each reachable register node is updated.

According to the further technical scheme, the reachable register node of each node to be optimized comprises all example modules which are located on the same path with the node to be optimized, are located at the downstream of the signal transmission direction of the node to be optimized and are of the type of a register;

or the reachable register node of each node to be optimized comprises a part of example modules which are positioned on the same path with the node to be optimized and are positioned at the downstream of the node to be optimized in the signal transmission direction, are of the type of register and meet the preset condition.

According to the further technical scheme, when the instance module and the node to be optimized are in the same clock domain, the instance module is determined to meet the preset condition.

According to the further technical scheme, when the instance module and the node to be optimized are not on the same false path, the instance module is determined to meet the preset condition, and the false path is a path through which a transmission signal cannot pass.

According to the further technical scheme, the node to be optimized comprises all the example modules in the user input netlist, or comprises part of the example modules which are positioned on the preset path in the user input netlist.

According to the further technical scheme, when the nodes to be optimized are traversed in sequence according to the sequence of the time sequence cost values of the nodes to be optimized from large to small, if the time sequence cost values of at least two nodes to be optimized are equal, the nodes to be optimized with the equal time sequence cost values are traversed in sequence according to the topological order of the nodes to be optimized from first to last.

According to a further technical scheme, each instance module in a netlist input by a user is distributed to a sub-netlist corresponding to each FPGA die in a multi-die FPGA according to a predetermined algorithm to obtain an initial distribution result, and the method comprises the following steps:

calculating scores of all the example modules distributed to the sub netlists capable of containing the example modules according to the target function, and distributing all the example modules to the sub netlists with the highest scores corresponding to all the example modules;

the target function is constructed based on at least one chip performance parameter corresponding to the layout target, and the more excellent the chip performance parameter realized by allocating the instance module to the sub-netlist, the higher the corresponding score; the capability of each sub-netlist to accommodate is determined by the logic resources contained in the corresponding FPGA die, and includes a range of types and numbers of instance modules that the sub-netlist can accommodate.

According to the further technical scheme, the layout target corresponds to a plurality of chip performance parameters, the target function comprises a plurality of sub-target functions, and each sub-target function is constructed based on one or more chip performance parameters;

when each instance module is distributed, the scores for distributing each instance module to the sub netlists capable of containing the instance module are calculated in stages according to each sub-target function and distributed.

According to a further technical scheme, when the instance modules are distributed by stages by utilizing all sub-target functions, the holding capacity of the same sub-netlist at different stages is the same or different.

According to a further technical scheme, the accommodating capacities of two sub netlists with the same logic resources contained in corresponding FPGA bare chips are the same or different.

The beneficial technical effects of the invention are as follows:

the method adjusts the distribution result based on the time sequence cost value of the node to be optimized after obtaining the initial distribution result, so as to reduce the sub-netlist crossing times of the predetermined path (usually referred to as a critical path), because the sub-netlist crossing times are realized by the cross-bare chip connecting line in the silicon connecting layer of the multi-bare chip FPGA, the time delay of the cross-bare chip connecting line is much larger than that of other connecting lines, the time delay of the cross-bare chip connecting line is usually 1.0ns, and the time delay of other connecting lines is about 0.2 ns-0.3 ns, therefore, the sub-netlist crossing times can influence the speed of realizing the multi-bare chip FPGA. The netlist segmentation method can reduce the times of sub-netlist crossing, particularly the times of sub-netlist crossing of a critical path, further reduce the time delay of the critical path, optimize the time sequence of design and improve the speed of overall design.

And when the distribution result is adjusted based on the time sequence cost value of the node to be optimized, the time sequence cost value of the node to be optimized is updated by the auxiliary time sequence cost of the accessible register node, so that the netlist graph does not need to be tracked every time during updating, the updating method is simplified, and the optimization efficiency is improved.

Although some multi-FPGA systems (composed of multiple FPGA chips) also involve netlist division, different FPGA chips in the multi-FPGA systems are connected through IO of the FPGA chips, so that the number of cross-division signals is limited by the number of IO of a single FPGA chip (about 2000 IO at most), and when the netlist is divided, the constraint condition of the number of cross-division signals is difficult to achieve. Different FPGA bare chips in the multi-bare-chip FPGA are connected through silicon stacking connection points, so that the number of cross-division signals is limited by the number of the silicon stacking connection points of a single FPGA bare chip, the number of the cross-division signals is far greater than the number of IO (input/output) of an FPGA chip, generally about 40000, the difficulty in achieving the constraint condition of the number of the cross-division signals is low when a netlist is divided, and the rest constraint conditions are considered in the remaining capacity, so that the method disclosed by the application can be realized and practically applied in the scene of the multi-bare-chip FPGA.

Drawings

FIG. 1 is a method flow diagram of one embodiment of a netlist partitioning method as disclosed herein.

Fig. 2 is a schematic diagram of a partial structure of a multi-die FPGA to which the present application is directed.

FIG. 3 is a method flow diagram of another embodiment of a netlist partitioning method as disclosed herein.

FIG. 4 is a method flow diagram of yet another embodiment of a netlist partitioning method as disclosed herein.

Detailed Description

The following further describes the embodiments of the present invention with reference to the drawings.

The application discloses a netlist partitioning method of a multi-die FPGA based on time sequence, which comprises the following steps, please refer to FIG. 1:

and step S1, obtaining a user input netlist, and distributing each instance module in the user input netlist to a sub-netlist corresponding to each FPGA die in the multi-die FPGA according to a predetermined algorithm to obtain an initial distribution result. Referring to fig. 2, a partial schematic structural diagram of a multi-die FPGA mainly includes a substrate 1, a silicon connection layer 2, and a plurality of FPGA dies (fig. 2 includes the FPGA die 1 and the FPGA die 2) stacked in sequence from bottom to top, where the silicon connection layer 2 covers all the FPGA dies. Each FPGA die contains a plurality of logic resources with different types and quantities, including at least one of IOBs, LUTs, REGs, DSPs, BRAMs, Clocks, CMTs and GTP, and the types and/or quantities of the logic resources contained in the different FPGA dies are the same or different. In the present application, at least one FPGA die in the multi-die FPGA has a different number of logic resources than other FPGA dies, and/or at least one FPGA die has a different type of logic resources than other FPGA dies.

Each FPGA bare chip also comprises a plurality of input signal connection points for inputting signals and a plurality of output signal connection points for outputting signals, the input signal connection points and the output signal connection points are realized by silicon stacking connection points 3 arranged on the FPGA bare chip, and logic resources inside the FPGA bare chip can be directly and indirectly connected to the input signal connection points and/or the output signal connection points for inputting and outputting signals. Different FPGA bare chips are connected through an input signal connection point and/or an output signal connection point by using a cross-bare chip connection line 4 in a silicon connection layer 2 to form a signal cascade structure between the FPGA bare chips. Silicon connecting layer 2 is range upon range of the setting on base plate 1, and is concrete, and silicon connecting layer 2 keeps away from the one side growth of FPGA bare chip and has had little protruding ball, and silicon connecting layer 2 passes through little protruding ball and connects base plate 1. The silicon connection layer 2 is further provided with a silicon through hole 5, and corresponding input signal connection points and/or output signal connection points on the FPGA bare chip are connected to the substrate 1 through the silicon through hole 5 on the silicon connection layer 2 and then connected to pins connected with the substrate and used for signal extraction.

The obtained user input netlist obtained by the method is specific to the whole multi-die FPGA, the total logic resource requirement of the obtained user input netlist exceeds the number of logic resources on any one FPGA die, and the total logic resource requirement is less than or equal to the sum of the number of logic resources of all FPGA dies. The netlist dividing method provided by the application is characterized in that a user input netlist aiming at the whole multi-die FPGA is divided into a plurality of sub netlists, each sub netlist corresponds to one FPGA die, so that layout and wiring can be respectively carried out on the corresponding FPGA die according to the sub netlist corresponding to each FPGA die, and wiring of the whole multi-die FPGA is completed. The user input netlist comprises a plurality of instance modules and netlist nets among the different instance modules, common instance modules comprise GTP, a look-up table, a register, PCIE, EMAC, CMT, BRAM, DSP, IOB and the like, and the types of the instance modules contained in the user input netlist are not more than the types of logic resources in the multi-die FPGA. Each sub-netlist obtained by partitioning also comprises a plurality of example modules and netlist nets among different example modules, and when the corresponding FPGA bare chip is laid out and wired according to the sub-netlist, the FPGA bare chip is required to be used for realizing the function of the corresponding sub-netlist, so that the most basic requirement during partitioning of the netlist is that the FPGA bare chip can meet the logic resource requirement of the corresponding sub-netlist obtained by partitioning. The method mainly comprises the following two aspects: an instance module of the sub-netlist needs to be implemented using logic resources within the FPGA die, and a signal connection function of the sub-netlist needs to be implemented using input signal connection points and/or output signal connection points on the FPGA die. In addition to this, there is also one aspect as follows: the IO port of the sub netlist needs to be realized by using pins connected by the FPGA bare chip.

Therefore, the present application first determines a constraint condition of a sub netlist corresponding to each FPGA die, and must be performed on the basis of meeting the constraint condition in the following segmentation process, including: the logic resource quantity on each FPGA bare chip meets the logic resource requirement of the corresponding sub-netlist obtained through segmentation, the input signal connection point leading-out end on each FPGA bare chip meets the input signal quantity of the corresponding sub-netlist, the output signal connection point leading-out end on each FPGA bare chip meets the output signal quantity of the corresponding sub-netlist, and the FPGA bare chip connected with the IO pin of the multi-die FPGA meets the IO port requirement of the corresponding sub-netlist. The requirement that the number of logic resources on the FPGA die meets the logic resource requirement of the corresponding sub-netlist obtained by segmentation mainly comprises the following steps: the method comprises the steps of determining the accommodation capacity of a corresponding sub-netlist according to logic resources contained in each FPGA die in the multi-die FPGA, wherein the accommodation capacity of the sub-netlist comprises the type and number range of example modules which can be accommodated by the sub-netlist, the number range comprises the maximum number and/or the minimum number, the type of the example modules which can be contained by the sub-netlist is not more than the type of the logic resources contained in the corresponding FPGA die, and the number of the example modules of each type is within the range of the number of the logic resources of the type contained in the corresponding FPGA die. For example, the containment capability of a child netlist can be determined as: including at most one GTP, at most 2 PCIE, and at least one, and at most 3 EMACs. In actual application, the number range of each type of instance module may be determined according to the set module usage rate, for example, the module usage rate is set to be 0-80%, and then the maximum number of each type of instance module is 80% of the number of the type of logic resources included in the corresponding FPGA die.

The example modules can be sequentially distributed according to any one of the following sequences: (1) and sequentially distributing all the example modules according to the reading sequence of each example module when the user inputs the netlist, namely distributing each time when the example modules are read when the example modules are sequentially read. (2) After all the example modules in the user input netlist are read in, all the example modules are sequenced according to a preset rule and are sequentially distributed according to the sequencing sequence, the preset rule is set according to actual needs, and for example, the preset rule can be sequentially distributed after sequencing of a certain chip performance parameter of all the example modules.

Optionally, before the user input netlist is divided, the IO ports in the user input netlist are allocated to corresponding pins on the multi-die FPGA to serve as the IO pins. The method for allocating the IO port comprises the following steps: and allocating at least one IO port to a corresponding pin of the multi-die FPGA according to a user instruction, or allocating at least one IO port to a corresponding pin of the multi-die FPGA according to any sequence arrangement, or allocating at least one IO port to a corresponding pin of the multi-die FPGA according to an IO automatic arrangement algorithm.

The method for distributing each instance module in the user input netlist to the sub-netlist corresponding to each FPGA die in the multi-die FPGA according to the predetermined algorithm to obtain the initial distribution result comprises the following steps: and calculating the scores of the example modules distributed to the sub netlists capable of accommodating the example modules according to the objective function, and distributing the example modules to the sub netlists with the highest scores. The target function is constructed based on at least one chip performance parameter corresponding to the layout target, and the more excellent the chip performance parameter realized by allocating the instance module to the sub-netlist, the higher the corresponding score. The chip performance parameters include at least one of power consumption of a single FPGA die, power consumption of a multi-die FPGA, timing margins, number of signals across the FPGA die, and clock parameters. For example, if the objective function is constructed based on the power consumption of a single FPGA die, the example module is assigned to the sub-netlist so that the lower the power consumption of the single FPGA die is, the higher the corresponding score is, and therefore, the assignment effect is to assign the example module to the sub-netlist which minimizes the power consumption of the single FPGA die.

As described above, the accommodation capacity of each sub-netlist is determined by the logic resources contained in the corresponding FPGA die, and in the present application, the accommodation capacities of two sub-netlists with the same logic resources contained in the corresponding FPGA die are the same or different, for example, for sub-netlist 1 and sub-netlist 2 corresponding to two FPGA dies respectively containing 10 GTP in total, the accommodation capacity of sub-netlist 1 is 8 GTP at most, and the accommodation capacity of sub-netlist 2 is 6 GTP at most; or, for example, sub-netlist 1 can contain 3-8 GTP, and sub-netlist 2 can contain 5-8 GTP.

Optionally, when the layout target corresponds to a plurality of chip performance parameters, the target function constructed based on the plurality of chip performance parameters corresponding to the layout target includes a plurality of sub-target functions, each sub-target function is constructed based on one or more chip performance parameters, and may be represented as: the objective function is sub-objective function 1+ sub-objective function 2+ sub-objective function 3 … …, and the method of allocating the instance module based on the objective function includes the following two ways:

(1) and directly utilizing the target function to realize the constraint conditions corresponding to each sub-target function at the same stage, namely directly calculating and distributing the scores of each instance module to each sub-netlist capable of accommodating the instance module according to the whole target function. In this case, each instance module only needs to calculate once to obtain a corresponding score and perform once allocation, and the result obtained by allocation can satisfy all chip performance parameters corresponding to the layout target.

(2) And realizing constraint conditions corresponding to different sub-target functions at different stages by using the target function, namely calculating scores for distributing each instance module to the sub-netlists capable of containing the instance modules in stages according to each sub-target function and distributing the scores. In this case, each instance module is calculated to obtain a corresponding score and is allocated once in each stage, and the result obtained by the allocation of the stage meets all chip performance parameters corresponding to the sub-target functions of the stage on the basis of the result obtained by the allocation of the previous stage. For example, in the first stage, the scores corresponding to the example modules are calculated according to the sub-targeting function 1 for distribution, and the distribution result meets the chip performance parameters corresponding to the sub-targeting function 1. And then, calculating the corresponding scores of each instance module according to the sub-target function 2 in the second stage, and redistributing, wherein the result obtained by the distribution meets the chip performance parameters corresponding to the sub-target function 2 on the basis of the distribution result in the first stage, and so on.

In this case, the plurality of sub-targeting functions constructed based on the plurality of chip performance parameters have a plurality of different combinations and arrangements, including (1) a plurality of sub-targeting functions, i.e. for example, an objective function constructed based on 4 chip performance parameters can be implemented based on 2 sub-targeting functions, or can be implemented based on 3 or more sub-targeting functions, for example when an objective function needs to be built based on the number of signals across the FPGA die, timing margins, power consumption and clock parameters of a single FPGA die, an objective function, namely sub-target function 1 (number of signals across FPGA dies) + sub-target function 2 (timing margin + power consumption of a single FPGA die + clock parameter) can be constructed, alternatively, the objective function (sub-target function 1 (number of signals across FPGA dies) + sub-target function 2 (timing margin + power consumption of a single FPGA die) + sub-target function 3 (clock parameter) may be constructed. (2) Each sub-target function is constructed based on a plurality of different combinations of chip performance parameters, for example, in the above example, when the sub-target function 1 and the sub-target function 2 are implemented as the same, the target function (sub-target function 1 (number of signals across FPGA dies) + sub-target function 2 (timing margin + power consumption of a single FPGA die + clock parameter) may be constructed, or the target function (sub-target function 1 (number of signals across FPGA dies + timing margin) + sub-target function 2 (power consumption of a single FPGA die + clock parameter) may be constructed. (3) Different sub-target functions have different sequential execution orders, for example, when the target function is implemented by the sub-target function 1 (number of signals across the FPGA die) and the sub-target function 2 (timing margin + power consumption of a single FPGA die + clock parameter), the target function may be constructed as the sub-target function 1+ the sub-target function 2 (firstly, the distribution is performed based on the sub-target function 1), or the target function may be constructed as the sub-target function 2+ the sub-target function 1 (firstly, the distribution is performed based on the sub-target function 2). The three cases can be combined with each other, so that the constructed sub-target functions have different combination modes and arrangement sequences, and thus have different distribution processes.

In the above process, no matter the specific form of the sub-target function used in each stage, when the instance module is distributed in stages by using each sub-target function, the instance module needs to be distributed into each sub-netlist capable of accommodating the instance module, and optionally, the accommodating capacities of the same sub-netlist at different stages are the same or different. For example, for the case of different accommodating capacities, for a sub-netlist corresponding to an FPGA die that contains 10 GTP in total, the accommodating capacity of the sub-netlist is 8 GTP at most when the sub-target function 1 is used for allocation in the first stage, and the accommodating capacity of the sub-netlist is 6 GTP at most when the sub-target function 2 is used for allocation in the second stage.

Step S2, determining the timing cost value C1 of each node to be optimized in the user input netlist according to the initial allocation result, please refer to the flowchart shown in fig. 3.

Optionally, in this application, the node to be optimized includes all the instance modules in the user-input netlist, or includes some instance modules in the user-input netlist that are located on a predetermined path, where the predetermined path is usually a critical path preset by a user.

In this step, the timing cost value C1 of each instance module is first initialized, typically to 0. In addition to this, a secondary timing cost C2 for each instance module is also initialized, again typically to 0. For each node V to be optimized_iIt also has one or more reachable register nodes, each node V to be optimized_iIs accessible to register node V_iThe REG comprises the node V to be optimized_iOn the same path and at the node V to be optimized_iDownstream of the signal transmission direction, is an example module of the type register.

Optionally, each node to be optimized V_iIs accessible to register node V_iThe REG comprises the node V to be optimized_iOn the same path and at the node V to be optimized_iIs of the type of all instance modules of the register downstream in the direction of signal transmission. Or, each node V to be optimized_iIs accessible to register node V_iThe REG comprises a node V to be optimized_iOn the same path and at the node V to be optimized_iIs a register and satisfies a predetermined condition. Wherein, the current instance module and the node V to be optimized_iWhen the same clock domain is used, the example module is determined to meet the preset condition, and the operation can be used for screening and removing the node V to be optimized_iInstance modules not in the same clock domain, not as V_iThe reachable register nodes of (a) are taken into account, thereby not considering the influence of the cross-clock domain signal. And/or, when the instance module and the node V to be optimized_iWhen the virtual path is not on the same false path, determining that the instance module meets the preset condition, wherein the false path is a path which cannot be passed by the transmission signal and is caused by logic or signal collision and the like, and the operation can screen and remove the instance module on the virtual path and does not serve as V_iThe reachable register nodes are considered, thereby avoiding the influence caused by logic or signal collision and the like. Finally, a section to be optimizedPoint V_iCorresponding all reachable register nodes V_iThe reachable register set rrs (readable Registers set) of REG can be expressed as:

RRS(V_i)＝{V_i_REG1(C2_1)，V_i_REG2(C2_2)，…V_i_REGk(C2_k)}；

wherein V_iREG1(C2_1) represents the node V to be optimized_iReachable register node V with an auxiliary timing cost of C2_1_i_REG1，V_iREG2(C2_2) represents the node V to be optimized_iReachable register node V with an auxiliary timing cost of C2_2_iREG2, and so on. Each instance module of type register may be a reachable register node of one or more nodes to be optimized.

According to the initial distribution result, the initialization value based on the time sequence cost value of each instance module and the initialization value of the auxiliary time sequence cost, and combining each node V to be optimized_iCorresponding reachable register node V_iREG, the timing cost value C1 of each node to be optimized under the current allocation result can be determined as follows:

traversing the node V to be optimized_iAll father nodes V of_PFor node V to be optimized_iEach parent node V of_PIf node V is to be optimized_iWith its parent node V_PIf the current distribution result is not in the same sub-netlist, increasing the node V to be optimized_iEach reachable register node V of_iSecondary timing cost of REG C2, and node V to be optimized_iThe time sequence cost value C1 is updated to the current time sequence cost value C1 and each reachable register node V_iMaximum value between secondary timing cost C2 of REG. If node V is to be optimized_iWith its parent node V_PIf the current distribution result is in the same child netlist, the parent node V is finished_PAnd (4) continuing the next traversal operation.

And/or traversing the node V to be optimized_iAll child nodes V of_SFor node V to be optimized_iEach child node V of_SIf node V is to be optimized_iWith its child node V_SIf the current distribution result is not in the same sub-netlist, increasing the node V to be optimized_iChild node V of_SEach reachable register node V of_SThe auxiliary time sequence cost of the REG is to optimize the node V_iThe time sequence cost value C1 is updated to the current time sequence cost value C1 and each reachable register node V_iMaximum value between secondary timing cost C2 of REG. If node V is to be optimized_iWith its child node V_SUnder the current distribution result, in the same sub-netlist, ending the sub-node V_SAnd (4) continuing the next traversal operation.

Wherein, the node V to be optimized_iParent node V of_PIs with the node V to be optimized_iInstance module with direct connection and upstream in signal transmission direction, node V to be optimized_iChild node V of_SIs with the node V to be optimized_iEach node to be optimized may include a plurality of parent nodes and/or a plurality of child nodes. In the process, the optimization node V is treated by the method_iEach parent node V of_PAnd child node V_SThe order of the passes is not limited.

At the increase of the node V to be optimized_iEach reachable register node V of_iWhen the supplementary timing cost C2 of _ REG is needed to increase C2 by the corresponding variation, the present application provides the following two changing principles:

(a) according to the node V to be optimized_iThe variable quantity corresponding to the path changes the node V to be optimized_iEach reachable register node V of_iThe secondary timing cost C2 of REG has the same or different variance for different paths. In this case, all the nodes V to be optimized on the same path_iHave the same variable quantity, and the nodes V to be optimized on different paths_iMay be the same or different. For example, if the node 1 to be optimized and the node 2 to be optimized are both located on the path 1 and the corresponding variation is 1, and the node 3 to be optimized is located on the path 2 and the corresponding variation is 2, then each node 1 to be optimized may be changed to the corresponding variation of 1The auxiliary time sequence cost of each reachable register node of the node to be optimized 2 is increased by 1, and the auxiliary time sequence cost of each reachable register node of the node to be optimized 3 is increased by 2. The variation corresponding to different paths is configured in advance, and can be configured into different variations according to the criticality of different paths.

(b) According to the node V to be optimized_iChanging the node V to be optimized according to the variation corresponding to the signal segment in the path_iEach reachable register node V of_iThe secondary timing cost C2 of REG is the same or different in the variation corresponding to different signal segments in the same path. This situation can be regarded as a further extension of the above-mentioned situation (a), except that different paths may be configured with different variation amounts, different nodes to be optimized on the same path may also be configured with different variation amounts, for example, if the node to be optimized 1 and the node to be optimized 2 are located on the signal segment 1 of the path 1 and the corresponding variation amount is 1, and the node to be optimized 3 is located on the signal segment 2 of the path 1 and the corresponding variation amount is 2, the auxiliary timing cost of each reachable register node of the node to be optimized 1 is increased by 1, the auxiliary timing cost of each reachable register node of the node to be optimized 2 is increased by 1, and the auxiliary timing cost of each reachable register node of the node to be optimized 3 is increased by 2. The division of the signal segments on the same path and the respective corresponding variation are configured in advance, the division conditions of the signal segments of different paths can be the same or different, and the variation of each signal segment of different paths can be the same or different. In practical application, the user may further divide the signal segment and configure corresponding variation only by a part of paths in the input netlist, and the rest of paths do not divide the signal segment, or may divide the signal segment and configure corresponding variation for all the paths.

Similarly, in increasing the node V to be optimized_iChild node V of_SEach reachable register node V of_SThe secondary timing cost C2 of REG can be determined according to the node V to be optimized_iThe variation corresponding to the path changes the child node V_SEach reachable register node V of_SSecondary timing cost of _ REG, orAccording to the node V to be optimized_iThe variation quantity corresponding to the signal segment located in the path changes the child node V_SEach reachable register node V of_SSecondary timing cost of _ REG C2, or according to child node V_SThe variation corresponding to the path changes the child node V_SEach reachable register node V of_SSecondary timing cost of _ REG C2, or according to child node V_SThe variation quantity corresponding to the signal segment located in the path changes the child node V_SEach reachable register node V of_SSecondary timing cost of REG C2.

At node V to be optimized_iWhen the time sequence cost value C1 is updated, the time sequence cost value C1 is always updated to be the node V to be optimized_iCurrent time sequence cost value C1 and node V to be optimized_iEach reachable register node V of_iMaximum between the secondary timing cost C2 of REG itself provides two update mechanisms:

update mechanism one, non-weighted update

Node V to be optimized_iThe updated time-series cost value C1 of (a) is expressed as: c1 ═ max (C1, C2_1, C2_2 … C2_ k), C2_1, C2_2 … C2_ k are the auxiliary timing costs of the respective reachable register nodes of the node to be optimized, respectively.

Update mechanism two, update by weight

Node V to be optimized_iThe updated time-series cost value C1 of (a) is expressed as: max (C1 × w) of C1_C1,C2_1×w_{C2_1},C2_2×w_{C2_2}…C2_k×w_{C2_k}) C2_1, C2_2 … C2_ k are auxiliary timing costs of each reachable register node of the node to be optimized, w_CRepresenting the weight corresponding to node C, with the weights corresponding to different nodes being the same or different, e.g. w_C1Representing a node V to be optimized_iCorresponding weight, w_{C2_1}～w_{C2_k}In turn for the node V to be optimized_iEach reachable register node V of_iThe weight corresponding to _ REG. The weight of each node is determined according to the estimated time sequence allowance of the path where the node is located, the estimated time sequence allowance of the path is smaller, the corresponding time sequence tension degree is higher, and the weight of the node is larger. The estimated time sequence margin of the path can be determined according to the corresponding clock period of the register, the number of logic levels of the critical path, the fan-out number of each line network on the path and the like.

Optionally, whichever of the above update mechanisms is adopted to update the node V to be optimized_iThe updated time sequence cost value C1 synchronously updates the time sequence cost value of the node to be optimized when the auxiliary time sequence cost of each reachable register node is updated, namely, the time sequence cost value C2 of each reachable register node is updated and compared with the current time sequence cost value of the node to be optimized, so that the time sequence cost value of the node to be optimized is updated at the maximum value without determining after the auxiliary time sequence costs of all reachable register nodes are updated, the time sequence cost value of the node to be optimized and the auxiliary time sequence costs of the reachable register nodes can be updated synchronously, and the updating speed is increased.

For each node V to be optimized_iRespectively traversing corresponding father node and/or child node according to the method, thereby obtaining the node V to be optimized_iTiming cost value C1, auxiliary timing cost C2 and corresponding reachable register nodes V_iSecondary timing cost of REG C2. Traversing all the nodes V to be optimized in sequence_iThe above operation is performed. For all nodes V to be optimized_iThe traversal sequence can be configured in advance, and the nodes V to be optimized can be traversed in sequence according to the topology sequence_i。

And step S3, traversing each node to be optimized in turn according to the time sequence cost value C1 of each node to be optimized from big to small. When the nodes to be optimized are traversed sequentially according to the sequence of the time sequence cost values of the nodes to be optimized from large to small, if the time sequence cost values of at least two nodes to be optimized are equal, the nodes to be optimized with the equal time sequence cost values are traversed sequentially according to the sequence of the topology sequence of the nodes to be optimized from first to last.

For each traversed node V to be optimized_iAccording to the node V to be optimized_iAnd the relationship between the distribution results of the adjacent nodes with the direct connection relationship determines the node to be optimizedV_iThe timing cost value C1 after reassignment to other sub netlists. Each node V to be optimized_iThe time sequence cost value of the node to be optimized is updated according to the auxiliary time sequence cost C2 of each reachable register node, and the relationship between the distribution results of each node to be optimized and the adjacent nodes with direct connection relationship is used for updating the auxiliary time sequence cost C2 of each reachable register node.

Taking the sub netlist capable of containing the node to be optimized from all other sub netlists as the sub netlist to be reallocated of the node to be optimized, and determining the timing cost value C1 after the node to be optimized is reallocated to the sub netlist to be reallocated for any one of the reallocateble sub netlists of the node to be optimized, please refer to fig. 4: sequentially traversing all adjacent nodes which are in direct connection with the node to be optimized, updating the time sequence cost value of the node to be optimized during each traversal, determining the time sequence cost value of the node to be optimized, which is obtained by updating after all the adjacent nodes are traversed, as the time sequence cost value of the node to be optimized when the node to be optimized is redistributed to the sub netlist to be redistributed, wherein each traversal:

(1) and if the node to be optimized and the adjacent node thereof with the direct connection relation are in the same sub-netlist under the current distribution result, the node to be optimized is redistributed to the sub-netlist to be redistributed and then is in different sub-netlists with the adjacent node, the auxiliary time sequence cost C2 of each reachable register node of the node to be optimized is increased, and the time sequence cost C1 of the node to be optimized is updated to be the maximum value between the current time sequence cost C1 and the auxiliary time sequence cost C2 of each reachable register node.

(2) And if the node to be optimized and the adjacent node thereof with the direct connection relation are in different sub netlists under the current distribution result, the node to be optimized is redistributed to the sub netlists to be redistributed and then is in the same sub netlist with the adjacent node thereof, the auxiliary time sequence cost C2 of each reachable register node of the node to be optimized is reduced, and the time sequence cost C1 of the node to be optimized is updated to be the maximum value between the current time sequence cost C1 and the auxiliary time sequence cost C2 of each reachable register node.

(3) And if the node to be optimized and the adjacent node thereof with the direct connection relation are in the same sub-netlist or not in the same sub-netlist after the node to be optimized is redistributed to the sub-netlist under the current distribution result, keeping the time sequence cost value C1 of the node to be optimized unchanged.

In both the above cases (1) and (2), the auxiliary timing cost C2 of each reachable register node of the node to be optimized needs to be changed, specifically, in case (1), it is increased, and in case (2), it is decreased, and in any case, when the auxiliary timing cost C2 of the reachable register node is changed, it needs to be changed according to the corresponding variation amount, and the following two changing principles are provided in the present application: (a) and changing the auxiliary time sequence cost C2 of each reachable register node of the node to be optimized according to the variable quantity corresponding to the path of the node to be optimized, wherein the variable quantities corresponding to different paths are the same or different. (b) And changing the auxiliary time sequence cost of each reachable register node of the node to be optimized according to the variation corresponding to the signal segment of the node to be optimized in the path, wherein the variation corresponding to different signal segments in the same path is the same or different. The specific content is similar to that in the step S2, and is not described in detail herein.

In this step, the node to be optimized and its adjacent node having a direct connection relationship are respectively instance modules input by the user into the netlist, the adjacent node having a direct connection relationship with the node to be optimized includes a parent node and/or a child node of the node to be optimized, and the definition of the parent node and the child node of the node to be optimized is as described above. Therefore, the neighboring node in the step having a direct connection relationship with the node to be optimized includes one of all the parent nodes and the child nodes of the node to be optimized, and the step is actually implemented as follows: and traversing all father nodes and each child node of the nodes to be optimized in sequence to serve as adjacent nodes which are in direct connection with the nodes to be optimized, updating the time sequence cost value of the nodes to be optimized in sequence by the method in each traversal, obtaining the time sequence cost value C1 when the nodes to be optimized are redistributed to the child netlist to be redistributed after all father nodes and the child nodes of the nodes to be optimized are traversed, and presetting the traversal sequence of all the adjacent nodes. And updating the time sequence cost value of the node to be optimized according to any one of the situations (1), (2) and (3) in each pass, wherein the updating situations of the time sequence cost values of the nodes to be optimized are the same or different for different adjacent nodes, for example, the parent node 1 is updated according to the situation (1), and the parent node 2 is updated … … according to the situation (2).

During each pass, if the time sequence cost value of the node to be optimized is updated according to the condition (1) or (2), there are two update mechanisms, namely non-weight update and weight update, which are similar to the specific operation of the step S2, and are not described in detail herein.

Step S4, adjusting the distribution result of each node to be optimized to distribute the node to the sub netlist with the minimum timing cost value, where the smaller the timing cost value is, the smaller the number of times of spanning the sub netlist of the path where the corresponding node to be optimized is located is, and the number of spanning sub netlists is the number of two instance modules having a direct connection relationship on the path in two different sub netlists respectively.

Step S5, updating the time sequence cost value of each node to be optimized according to the adjusted distribution result, and for each node V to be optimized_iUpdating the node V to be optimized according to the current distribution result_iThe method of the timing cost value C1 and the step S2 determine each node V to be optimized according to the initial distribution result_iThe sequential cost value C1 method under the initial allocation result is similar and will not be described in detail herein.

Each node V to be optimized is obtained by updating_iAnd after the time sequence cost value C1 under the current distribution result, re-executing the steps S3-S5 to carry out next cycle updating until a sub netlist corresponding to each FPGA die is obtained by division when a preset cycle condition is reached. Wherein the predetermined cycling conditions include: and/or the sub netlist crossing times of all the preset paths are smaller than the corresponding preset values, and/or the sub netlist crossing times of each preset path are smaller than the corresponding preset values, and/or the sub netlist crossing times of all the preset paths are smaller than the corresponding preset values, and/or the preset loop times are reached. Ideally all reservations would be madeThe times of crossing the sub netlist of the path are all 0, that is, all the predetermined paths do not cross the sub netlist, and at this time, the C1 and C2 of all the nodes to be optimized on the predetermined paths are all 0.

Each sub netlist obtained through division comprises all the example modules and netlist nets among the example modules, and based on the netlist division method provided by the application, the number of logic resources on each FPGA bare chip meets the logic resource requirement of the corresponding sub netlist obtained through division, the leading-out end of an input signal connection point on each FPGA bare chip meets the number of input signals of the corresponding sub netlist, the leading-out end of an output signal connection point on each FPGA bare chip meets the number of output signals of the corresponding sub netlist, and the IO port of the multi-die FPGA meets the IO port requirement of the corresponding sub netlist.

The above is only a preferred embodiment of the present application, and the present invention is not limited to the above embodiments. It is to be understood that other modifications and variations directly derivable or suggested by those skilled in the art without departing from the spirit and concept of the present invention are to be considered as included within the scope of the present invention.

Claims

1. A netlist partitioning method for a multi-die FPGA based on timing, the method comprising:

acquiring a user input netlist, and distributing each instance module in the user input netlist to a sub-netlist corresponding to each FPGA bare chip in a multi-bare-chip FPGA according to a predetermined algorithm to obtain an initial distribution result;

adjusting the distribution result of each node to be optimized to distribute the node to the sub netlist with the minimum time sequence cost value, wherein the smaller the time sequence cost value is, the smaller the number of times of spanning sub netlists of the path where the corresponding node to be optimized is located is, and the number of the spanning sub netlists is the number of two example modules with direct connection relation on the path in two different sub netlists respectively;

the number of logic resources on each FPGA bare chip meets the logic resource requirement of the corresponding sub-netlist obtained through segmentation, the leading-out end of the input signal connection point on each FPGA bare chip meets the number of input signals of the corresponding sub-netlist, the leading-out end of the output signal connection point on each FPGA bare chip meets the number of output signals of the corresponding sub-netlist, and the FPGA bare chip connected with the IO pin of the multi-die FPGA meets the IO port requirement of the corresponding sub-netlist.

2. The method of claim 1,

updating the time sequence cost value of each node to be optimized according to the auxiliary time sequence cost of each reachable register node, wherein the relationship between the distribution results of each node to be optimized and the adjacent nodes with the direct connection relationship is used for updating the auxiliary time sequence cost of each reachable register node;

3. The method of claim 2, wherein updating the timing cost value of each node to be optimized according to the auxiliary timing cost of its respective reachable register node comprises:

4. The method of claim 2, wherein updating the timing cost value of each node to be optimized according to the auxiliary timing cost of its respective reachable register node comprises:

and taking the time sequence cost value of the node to be optimized and the auxiliary time sequence cost of each reachable register node corresponding to the time sequence cost value as the updated time sequence cost value of the node to be optimized according to the maximum value weighted by the corresponding weight: max (C1 × w) of C1_C1,C2_1×w_{C2_1},C2_2×w_{C2_2}…C2_k×w_{C2_k}) Wherein C1 is the timing cost value of the node to be optimized, C2_1, C2_2 … C2_ k are auxiliary timing costs of each reachable register node of the node to be optimized, w_CAnd the weight corresponding to the node C is shown, and the weights corresponding to different nodes are the same or different.

5. The method of claim 4,

the weight of each node is determined according to the estimated time sequence allowance of the path where the node is located, and the smaller the estimated time sequence allowance of the path is, the higher the corresponding time sequence tension degree is, and the larger the weight of the node is.

6. The method according to claim 2, wherein the determining the timing cost value after the node to be optimized is redistributed to other sub netlists according to the relationship between the distribution results of the node to be optimized and the adjacent nodes thereof having direct connection relationship comprises:

for any one reallocated child netlist to be reallocated of the node to be optimized, sequentially traversing all adjacent nodes which are in direct connection with the node to be optimized, updating the time sequence cost value of the node to be optimized in each traversal, determining that the updated time sequence cost value of the node to be optimized after all the adjacent nodes are traversed is the time sequence cost value of the node to be optimized when the node to be optimized is reallocated to the child netlist to be reallocated, wherein each traversal is as follows:

if the node to be optimized and the adjacent node thereof with the direct connection relation are in the same sub-netlist under the current distribution result, the node to be optimized is redistributed to the sub-netlist to be redistributed and then is in different sub-netlists with the adjacent node, the auxiliary time sequence cost of each reachable register node of the node to be optimized is increased, and the time sequence cost of the node to be optimized is updated to be the maximum value between the current time sequence cost of the node to be optimized and the auxiliary time sequence cost of each reachable register node;

if the node to be optimized and the adjacent node thereof with the direct connection relation are in different sub netlists under the current distribution result, the node to be optimized is redistributed to the sub netlist to be redistributed and then is in the same sub netlist with the adjacent node thereof, the auxiliary time sequence cost of each reachable register node of the node to be optimized is reduced, and the time sequence cost of the node to be optimized is updated to be the maximum value between the current time sequence cost of the node to be optimized and the auxiliary time sequence cost of each reachable register node;

7. The method of claim 6,

and changing the auxiliary time sequence cost of each reachable register node of the node to be optimized according to the variable quantity corresponding to the path of the node to be optimized, wherein the variable quantities corresponding to different paths are the same or different.

8. The method of claim 6,

and changing the auxiliary time sequence cost of each reachable register node of the node to be optimized according to the variation corresponding to the signal segment of the node to be optimized in the path, wherein the variation corresponding to different signal segments in the same path is the same or different.

9. The method of claim 2, wherein updating the timing cost value of each node to be optimized according to the adjusted distribution result comprises, for each node to be optimized:

traversing all father nodes of the node to be optimized, and for each father node of the node to be optimized, if the node to be optimized and the father node of the node to be optimized are not in the same child netlist under the current distribution result, increasing the auxiliary time sequence cost of each reachable register node of the node to be optimized, and updating the time sequence cost value of the node to be optimized to the maximum value between the current time sequence cost value of the node to be optimized and the auxiliary time sequence cost of each reachable register node;

the parent node of the node to be optimized is an example module which is in direct connection with the node to be optimized and is located upstream in the signal transmission direction, and the child node of the node to be optimized is an example module which is in direct connection with the node to be optimized and is located downstream in the signal transmission direction.

10. The method according to claim 6 or 9,

and when the auxiliary time sequence cost of each reachable register node of the node to be optimized is used for updating the time sequence cost value of the node to be optimized, synchronously updating the time sequence cost value of the node to be optimized when the auxiliary time sequence cost of each reachable register node is updated.

11. The method of claim 2,

the reachable register node of each node to be optimized comprises all example modules which are located on the same path with the node to be optimized, located at the downstream of the node to be optimized in the signal transmission direction and are of the type of a register;

12. The method of claim 11,

and when the instance module and the node to be optimized are in the same clock domain, determining that the instance module meets the preset condition.

13. The method of claim 11,

and when the instance module and the node to be optimized are not on the same false path, determining that the instance module meets the preset condition, wherein the false path is a path through which a transmission signal cannot pass.

14. The method of claim 1,

and the node to be optimized comprises all the example modules in the user input netlist or comprises part of the example modules on the preset path in the user input netlist.

15. The method of claim 1,

when the nodes to be optimized are traversed sequentially according to the sequence of the time sequence cost values of the nodes to be optimized from large to small, if the time sequence cost values of at least two nodes to be optimized are equal, the nodes to be optimized with the equal time sequence cost values are traversed sequentially according to the sequence of the topology sequence of the nodes to be optimized from first to last.

16. The method of claim 1, wherein the assigning each instance module in the user-input netlist to a corresponding sub-netlist of each FPGA die in the multi-die FPGA according to a predetermined algorithm to obtain an initial assignment result comprises:

calculating scores of all the example modules distributed to all the sub netlists capable of accommodating the example modules according to an objective function, and distributing all the example modules to the sub netlists with the highest scores corresponding to all the example modules;

the target function is constructed based on at least one chip performance parameter corresponding to the layout target, and the more excellent the chip performance parameter realized by allocating the instance module to the sub-netlist, the higher the corresponding score; the containment capability of each sub-netlist, including the range of types and numbers of instance modules that the sub-netlist can contain, is determined by the logic resources contained in the corresponding FPGA die.

17. The method of claim 16, wherein the layout target corresponds to a plurality of chip performance parameters, and the target function comprises a plurality of sub-target functions, each sub-target function being constructed based on one or more chip performance parameters;

when each instance module is distributed, the scores for distributing each instance module to each sub-netlist capable of accommodating the instance module are calculated in stages according to each sub-target function and distributed.

18. The method of claim 17,

when the instance modules are distributed by stages by using the sub-goal functions, the accommodating capacity of the same sub-netlist at different stages is the same or different.

19. The method of claim 16,

the corresponding FPGA dies contain two sub-netlists with the same logic resources that have the same or different holding capabilities.