Disclosure of Invention
The invention aims to provide a power consumption optimization method, an integrated circuit and a chip, which can realize ICG replication and placement of key registers affecting the overall performance of a circuit by utilizing the ICG replication function of an EDA tool, and can enable the time sequence of paths between load registers to be optimized to the utmost extent on the premise of not increasing the complexity of a flow and keeping the power consumption of the integrated circuit, thereby improving the performance of the chip.
The technical scheme provided by the invention is as follows:
The invention provides a power consumption optimization method, which is used for time sequence control of a load register in a submodule circuit of an integrated circuit, and comprises the following steps:
acquiring a first endpoint register with timing violations in the load register based on a timing report of the submodule circuit acquired by EDA;
Acquiring all first load registers driven by ICG in the load registers;
Acquiring a second end point register which is driven by the ICG and limits the overall performance of the circuit in the load register according to the first load register and the first end point register;
Load registers with combination logic with the second endpoint registers are respectively obtained, and each load register is divided into a plurality of logically mutually incoherent register files according to the combination logic;
one ICG is replicated for each of the register files by the EDA tool for driving the CP end of each of the load registers within the register file.
According to the scheme, through the time sequence report of the submodule circuit obtained by EDA, the first end point register with time sequence violations in the load registers can be judged, and then through determining all the first load registers driven by the ICG in the load registers, the second end point register which is driven by the ICG and limits the overall performance of the circuit in the load registers can be obtained, the load registers with combined logic with the second end point register are found out according to the logic relation, each load register can be divided into a plurality of register files which are not mutually coherent logically, and through copying one ICG for each register file to drive the CP end of each load register in the register file, the fact that the start point register and the end point register which are logically connected and are the critical paths share one ICG as much as possible can be realized, so that the extra pessimistic degree is required to be considered on the clock tree which is caused by premature clock bifurcation because of the ICG is not shared is reduced, the time sequence performance of the critical paths is improved, and the performance of the whole chip is improved.
In some embodiments, the separately obtaining load registers having combinational logic with each of the second endpoint registers further includes:
and acquiring a starting point register positioned at a logic starting point in a load register with combination logic of each second ending point register.
In some embodiments, the copying, by the EDA tool, of one ICG for each of the register files further comprises:
calculating the physical center of each register file according to the physical distribution range from the starting register to the second ending register in each register file;
the ICG is placed at the physical center.
In some embodiments, the CP terminal of each ICG is connected to a clock source, the Q terminal of each ICG is connected to the CP terminal of each load register in the register file, and the E terminal of each ICG is configured to receive a gating signal.
In some embodiments, further comprising performing ICG replication for other load registers of the first load register not belonging to each of the register files by EDA tool according to preset conditions, and driving a plurality of the other load registers by one ICG.
In some embodiments, the preset condition is a physical distribution of the other load registers, and a number of drives per ICG.
In some implementations, the first load register and the first endpoint register are interleaved to obtain the second endpoint register of the load registers that is driven by ICG and limits overall circuit performance.
In some embodiments, after dividing each load register into a plurality of register files that are logically mutually incoherent according to the combinational logic, the method further includes:
And re-acquiring a time sequence report of the sub-module circuit, judging whether the time sequence of each load register with the combinational logic in each register file is accurate or not, and re-dividing the register file when judging that the time sequence of each load register with the combinational logic in each register file is not accurate.
In a second aspect, the present application provides an integrated circuit, where the integrated circuit uses the power consumption optimization method described in the first aspect to perform timing control of a load register in a sub-module circuit.
In a third aspect, the application provides a chip comprising the integrated circuit of the second aspect.
According to the power consumption optimization method, the integrated circuit and the chip, the ICG replication function of the EDA tool is utilized, ICG replication and placement of key registers affecting the overall performance of the circuit are realized, and the time sequence of paths between load registers can be optimized to the greatest extent on the premise that the complexity of a flow is not increased and the power consumption of the integrated circuit is saved, so that the performance of the chip is improved.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will explain the specific embodiments of the present invention with reference to the accompanying drawings. It is evident that the drawings in the following description are only examples of the invention, from which other drawings and other embodiments can be obtained by a person skilled in the art without inventive effort.
For the sake of simplicity of the drawing, the parts relevant to the present invention are shown only schematically in the figures, which do not represent the actual structure thereof as a product. Additionally, in order to simplify the drawing for ease of understanding, components having the same structure or function in some of the drawings are shown schematically with only one of them, or only one of them is labeled. Herein, "a" means not only "only this one" but also "more than one" case.
With the continuous development of integrated circuits, not only the circuit performance requirements are continuously improved, but also the constraint on power consumption is more and more strict. In very large scale integrated circuits, the consumption of dynamic power consumption, i.e. the power consumption caused by the inversion of the data and clock signals, accounts for 95% of the overall power consumption, wherein the dynamic power consumption on the clock tree accounts for about 35% of the overall dynamic power consumption. If a part of the functional circuits in the whole design do not need to work for a period of time, which means that the data signal of the data port (D end) of the register will not be flipped, the clock signal of the clock port (CP end) of the register will not need to be flipped, so that the clock will not be flipped, and the power consumption caused by flipping on the clock tree can be avoided.
Referring to fig. 1 of the specification, by using a gating control signal generated by a gating unit control logic to enter an E terminal of a gating unit (ICG), a clock sent from a clock source (clock port) is turned off and turned into a turned-off clock, and then the turned-off clock is used as a clock input (clock pin) of a sub-module, so that a clock signal of a clock port (CP terminal) of a register can be realized without turning over.
The turn-off means that the clock signal turned over all the time every cycle becomes a fixed 0 as shown in fig. 2, so that the CP terminals of the devices (Buffer) and Registers (REGS) on the clock tree are not turned over, and thus the power consumption is reduced. The control logic of the gating unit is built by a register, and the gating control signal is set to 0 by detecting that the submodule does not need to work, so that the clock behind the gating unit can be turned off.
Detailed description of the structure in fig. 1 as shown in fig. 3, registers in the sub-module circuit are referred to as load registers, and registers controlling the logic of the ICG/E port are referred to as gating control registers.
However, as the scale of the integrated circuit design itself increases, the functions of the sub-module circuit become complex, and the number of load registers in the module circuit increases, so that in order to build a clock tree from the ICG/Q to the CP end of all load register groups, the clock tree delay is required to be larger, and the delay from the clock source to the ICG/CP is smaller. However, the gating control registers may be distributed among the load register groups, so the delay from the clock source to the CP of the gating control registers is large, as well as the delay of the data from the gating control registers to the ICG/E is also large. As shown in fig. 4, the transmit clock path delay is longer, the transmit data path delay is longer, but the capture clock path delay is more, and the setup time from the gating control register to the ICG is easily problematic according to the definition of the setup time, and does not meet the performance requirements of the circuit.
Meanwhile, in the prior art, when the distribution problem of ICG is solved for the situation that the number of load registers is large, a common method is to make automatic ICG copying through an EDA tool. For example, as shown in fig. 5, ICG is copied into icg_cp1 and icg_cp2, and the E terminals of the ICG after copying are commonly connected to the original gating control signal, that is, the condition that ICG is turned off is the same, without changing the function of the circuit. But icg_cp1 and icg_cp2 may control a part of the load registers, respectively, as load register group a and load register group B are split in the figure. After the ICGs are replicated in this way, the establishment time is advantageously satisfied because the number of load registers behind each ICG is reduced, resulting in a reduced ICG/Q to load register clock delay, i.e., an increased clock delay from the clock source to the ICG/CP port, corresponding to a longer capture clock path delay in fig. 4, which contributes to improved circuit performance.
However, in the prior art, the automatic ICG replication of the tool is mainly based on the physical distribution position of the load register, and meanwhile, the EDA tool adopts a heuristic algorithm, so that the ICG replication scheme has certain randomness. As shown in fig. 6, in a random ICG replication scheme, registers that are physically close to each other but not logically interacted with each other may be allocated under the same ICG, e.g., registers within load register group B/a have no logically interacted with each other, but load register group B has logically interacted with load register group a. Furthermore, in the timing check of the digital integrated circuit, because the clock paths from icg_cp2/CP to the load register group B (transmit clock path) and icg_cp1/CP to the load register group a (capture clock path) are physically located at different positions, the Voltage/Temperature/Process conditions in which they are located are different, and in the worst case, all devices on the transmit clock path are in slower Process conditions, lower Voltage and higher Temperature, and all devices on the capture clock path are in faster Process conditions, higher Voltage and normal Temperature, resulting in a large delay of the transmit clock path and a small delay of the capture clock path. To cover such a practical scenario, the delay of the transmit clock path needs to be additionally multiplied by a factor greater than 1, and the delay of the capture clock path needs to be additionally multiplied by a factor less than 1, i.e., the transmit clock path delay is increased and the capture clock delay is decreased, so that the timing performance from register group B to register group a is poor. But if register group B through register group a are themselves critical paths that are difficult to meet timing requirements, such paths will be referred to as bottlenecks in performance improvement of the entire chip.
To solve such a problem, the present application will have a logically connected and critical path start and end registers sharing as much as one ICG, so that their shared paths will not be under different voltages/samples/processes, and no additional pessimistic coefficients need to be multiplied to cover such a scenario, which is helpful for overall timing improvement. The present application will be described in detail with reference to the accompanying drawings:
in one embodiment, referring to fig. 7 of the specification, the present invention provides a power consumption optimization method for timing control of load registers in a sub-module circuit of an integrated circuit, comprising the steps of:
S100, acquiring a first end point register with timing violations in a load register based on a timing report of the sub-module circuit acquired by EDA.
Timing violations are situations where the signal fails to meet timing constraints in digital circuit designs, and typically occur when the clock frequency is too high or the data path delay is too long. Timing violations are mainly classified into Setup Time (Setup Time) violations, where data fails to stabilize before a clock edge arrives, resulting in data that may be captured erroneously, hold Time (Hold Time) violations, where data changes too fast after a clock edge, resulting in data that may be captured erroneously, multi-cycle path violations, where some paths are delayed by more than one clock cycle, and clock domain violations, where signal transmission between clock domains of different frequencies or phases does not meet timing requirements. In the current design without clock tree, the timing report of the submodule circuit of the integrated circuit can be automatically generated based on EDA, and the END point register with timing violations in each load register can be found out, wherein the END point register is the most critical END point register (which can be expressed as VIO_END_REGS) for limiting the timing of the overall performance.
S200, acquiring all first load registers driven by the ICG in the load registers. In particular, all LOAD registers driven by the ICG (which may be denoted icg_load_regs) among the LOAD registers may be found using the underlying commands of the EDA itself.
S300, according to the first load register and the first end point register, acquiring a second end point register which is driven by the ICG and limits the overall performance of the circuit in the load register.
In one particular implementation, a second endpoint register (which may be denoted as vio_icg_load_end_regs) driven by the ICG and limiting the overall performance of the circuit may be obtained by performing an intersection operation on the first LOAD register and the first endpoint register.
S400, load registers with combination logic with the second endpoint registers are obtained respectively, and each load register is divided into a plurality of logically mutually incoherent register files according to the combination logic.
In one embodiment, each of the LOAD registers having combinational logic with each of the second endpoint registers is obtained, and further comprising obtaining a START register (which may be denoted as vio_icg_load_start_regs) located at the logic START point in the LOAD register having combinational logic with each of the second endpoint registers vio_icg_load_end_regs. Several logically mutually independent register files, each load register divided by combinational logic, are shown in fig. 8.
S500, copying an ICG for each register file through an EDA tool to drive the CP end of each load register in the register file.
As shown in FIG. 9, the load registers with combinational logic share one ICG for driving, so that the need of considering extra pessimistic degree on a clock tree caused by premature clock bifurcation due to the fact that the ICG is not shared can be reduced, and the time sequence performance of a critical path is improved, and the performance of the whole chip is improved. The CP end of each ICG is connected with a clock source, the Q end of each ICG is connected with the CP end of each load register in the corresponding register file, and the E end of each ICG is used for receiving a gating signal.
According to the scheme, through the time sequence report of the submodule circuit obtained by EDA, the first end point register with time sequence violations in the load registers can be judged, and then through determining all the first load registers driven by the ICG in the load registers, the second end point register which is driven by the ICG and limits the overall performance of the circuit in the load registers can be obtained, the load registers with combined logic with the second end point register are found out according to the logic relation, each load register can be divided into a plurality of register files which are not mutually coherent logically, and through copying one ICG for each register file to drive the CP end of each load register in the register file, the fact that the start point register and the end point register which are logically connected and are the critical paths share one ICG as much as possible can be realized, so that the extra pessimistic degree is required to be considered on the clock tree which is caused by premature clock bifurcation because of the ICG is not shared is reduced, the time sequence performance of the critical paths is improved, and the performance of the whole chip is improved.
In one embodiment, based on the previous embodiment, copying one ICG for each register file by the EDA tool further comprises:
Calculating the physical center of each register file according to the physical distribution range from the starting register to the second ending register in each register file, and placing the ICG at the physical center.
Specifically, for each register file, the respective LOAD registers from the START register vio_icg_load_start_regs_k to the second END register vio_icg_load_end_regs_k will have one copy of icg_k replicated by the EDA tool to drive their CP ENDs, while the physical center of the register file is calculated from the physical distribution of the START register vio_icg_load_start_regs_k to the second END register vio_icg_load_end_regs_k, with icg_k being placed in this physical center position to ensure that all registers in the register file can share the longest clock length.
By utilizing the ICG copying function of the EDA tool and realizing ICG copying and placement of key registers affecting the overall performance of the circuit, the time sequence of paths between load registers can be optimized to the utmost extent on the premise of not increasing the complexity of the flow and keeping the power consumption of the integrated circuit, and further the performance of the chip is improved.
In an embodiment, on the basis of the previous embodiment, further comprising performing ICG copy for other load registers not belonging to each register file in the first load register by the EDA tool according to a preset condition, so that a plurality of other load registers are driven by one ICG. The preset conditions are the physical distribution of the other load registers, and the number of drives per ICG. The number of drives per ICG may be determined by the user and entered into the EDA tool before replication.
Because not all load registers in the submodule circuit are provided with combinational logic, a plurality of load registers are independent, and at the moment, the ICG replication can be carried out on the load registers which are driven by the ICG and have no combinational logic according to physical distribution by adopting a conventional EDA replication method, so that the time sequence control of all load registers which are driven by the ICG in the submodule circuit can be realized.
In one embodiment, after dividing each load register into a plurality of logically mutually incoherent register files according to the combinational logic based on the previous embodiment, the method further comprises the steps of re-acquiring a time sequence report of the sub-module circuit, judging whether the time sequence of each load register with combinational logic in each register file is accurate or not, and re-dividing the register files when judging whether the time sequence of each load register with combinational logic is accurate or not, so that the combinational logic of the load registers in each register file is accurate, and ICG control errors are avoided, and circuit performance is affected.
The power consumption optimization method of the application can be realized in the form of script to be suitable for the existing EDA tool. In addition, all replicated icg_ks may be set to dont _touch to ensure that their connection is not changed by the later tool's policy to automatically replicate ICGs, while all replicated icg_ks may be set to soft_fixed to ensure that their physical locations are not moved by the tool.
In one embodiment, the present application provides an integrated circuit that uses the power consumption optimization method of the foregoing embodiment to perform timing control of a load register in a sub-module circuit.
In one embodiment, the present application provides a chip comprising the integrated circuit of the previous embodiment.
It should be noted that the above embodiments can be freely combined as needed. The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.