Disclosure of Invention
In view of this, the present invention provides a low power consumption parallel multiplier, which has a simple circuit structure, a fast calculation speed and low power consumption.
In order to achieve the purpose, the invention provides the following technical scheme:
a low power consumption parallel multiplier comprising: a partial product generation module, a partial product compression module and a carry skip adder,
the partial product generation module comprises a Booth coding circuit and a decoding circuit, wherein the Booth coding circuit codes adjacent bit values of a first multiplier into a target parameter, and the decoding circuit decodes bit values of a second multiplier and the target parameter into a partial product;
the partial product compression module comprises a one-bit full adder and a summation circuit, wherein the one-bit full adder outputs an inverted value of a carry bit according to the partial product, the output circuit adds the partial products to generate two target partial products with different weights, and the generated target partial products are output to a lower-level compression module;
the carry skip adder comprises a plurality of CSA modules, and each CSA module comprises a plurality of the one-bit full adders for obtaining the target product.
Preferably, the Booth encoding circuit includes: a first exclusive-OR gate, a first exclusive-OR gate and a second exclusive-OR gate,
a first bit value of the first multiplier and a second bit value of the first multiplier are respectively used as input ends of the first exclusive-OR gate, and an output end of the first exclusive-OR gate is used for outputting a first target parameter;
the second bit value of the first multiplier and the third bit value of the first multiplier are respectively used as the input end of the first exclusive-or gate, and the output end of the first exclusive-or gate is used for outputting a second target parameter;
a first bit value of the first multiplier and a second bit value of the first multiplier are respectively used as input ends of the second exclusive-or gate, and an output end of the second exclusive-or gate is used for outputting a third target parameter;
a third bit value of the first multiplier is taken as a fourth target parameter.
Preferably, the decoding circuit includes: a third exclusive-OR gate, a fourth exclusive-OR gate, a first NAND gate, a second NAND gate and a third NAND gate,
the first bit value of the second multiplier and the fourth target parameter are used as input ends of the first exclusive-or gate;
a second bit value of the second multiplier and the fourth target parameter are used as input ends of the second exclusive-or gate;
the output end of the third exclusive-or gate, the second target parameter and the first target parameter are used as the input end of the first nand gate;
the output end of the fourth exclusive-or gate and the third target parameter are used as the input end of the second nand gate;
and the output end of the first NAND gate and the output end of the second NAND gate are used as the input ends of the third NAND gate, and the output end of the third NAND gate is used for outputting the partial product.
Preferably, the logic output of the one-bit full adder is:
Co=AB+Ci(A+B)
wherein S is a target partial product, A is a first partial product, B is a second partial product, Ci is a third partial product, C0Is the carry value of the partial product.
Preferably, the one-bit full adder includes: a first transistor, a second transistor, a third transistor, a fourth transistor, a fifth transistor, a sixth transistor, a seventh transistor, an eighth transistor, a ninth transistor, and a tenth transistor,
the source electrode of the first transistor, the source electrode of the second transistor and the source electrode of the seventh transistor are connected and are connected with Vcc;
the drain electrode of the first transistor, the drain electrode of the second transistor and the source electrode of the third transistor are connected;
the drain electrode of the seventh transistor is connected with the source electrode of the eighth transistor;
the drain electrode of the third transistor, the drain electrode of the eighth transistor, the drain electrode of the fourth transistor and the drain electrode of the ninth transistor are connected, and the common connection end is used for outputting an inverted value of the carry bit;
the source electrode of the fourth transistor, the drain electrode of the fifth transistor and the drain electrode of the sixth transistor are connected;
a source of the ninth transistor is connected to a drain of the tenth transistor;
the source electrode of the fifth transistor, the source electrode of the sixth transistor and the source electrode of the tenth transistor are all grounded;
the grid electrode of the third transistor is connected with the grid electrode of the fourth transistor and is used as the input end of the third partial product;
the grid electrode of the first transistor, the grid electrode of the seventh transistor, the grid electrode of the fifth transistor and the grid electrode of the ninth transistor are used as input ends of the first partial product;
and the grid electrode of the second transistor, the grid electrode of the eighth transistor, the grid electrode of the sixth transistor and the grid electrode of the tenth transistor are used as input ends of the second partial product.
Preferably, the summing circuit includes: a fifth xor gate and a sixth xor gate,
the first partial product and the second partial product are used as input terminals of the fifth exclusive-or gate, the output terminal of the fifth exclusive-or gate and the third partial product are used as input terminals of the sixth exclusive-or gate, and the output terminal of the sixth exclusive-or gate is used for outputting the target partial product.
Preferably, said carry skip adder comprises 4 8-bit modules, said 8-bit module comprises two 4-bit CSA modules, said CSA modules comprise 4 of said one-bit full adders and a 2-input data selector.
Preferably, four of the one-bit full adders are cascaded, and if the input of the previous full adder is not inverted, the input of the current full adder is inverted.
Preferably, each one-bit full adder in the 4-bit CSA module generates a carry bit, and the four carry bits are subjected to an and operation to obtain an output as the control end of the 2-input data selector, the carry input of the lowest-order full adder and the carry output of the highest-order full adder in the 4-bit CSA module are used as the input end of the 2-input data selector, and the output end of the 2-input data selector is used as the carry output of the 4-bit CSA module.
Compared with the prior art, the technical scheme provided by the invention has the following advantages:
the invention provides a low-power consumption parallel multiplier, comprising: the partial product generation module comprises a Booth coding circuit and a decoding circuit, wherein the Booth coding circuit codes adjacent bit values of a first multiplier into target parameters, the decoding circuit decodes bit values of a second multiplier and the target parameters into partial products, and the partial product generation module reduces the number of the partial products by half, greatly saves the area of a multiplier circuit, improves the operation speed of the multiplier circuit and is easy for realizing VLSI. The partial product compression module comprises a one-bit full adder and an output circuit, wherein the one-bit full adder outputs an inverted value of a carry according to the partial product, the output circuit adds the partial products to generate two target partial products with different weights, and outputs the generated target partial products to a lower-level compression module, and the speed of compressing the partial products is greatly improved by the partial product compression module. The carry skip adder comprises a plurality of CSA modules, and each CSA module comprises a plurality of one-bit full adders and a 2-input data selector and is used for obtaining a target product. The partial product compression module adopts a one-bit full adder, so that the circuit structure is simple while the requirement of quick calculation is met, and extremely low power consumption is realized.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a low-power consumption parallel multiplier, comprising: the partial product generation module comprises a Booth coding circuit and a decoding circuit, wherein the Booth coding circuit codes adjacent bit values of a first multiplier into target parameters, the decoding circuit decodes bit values of a second multiplier and the target parameters into partial products, and the partial product generation module reduces the number of the partial products by half, greatly saves the area of a multiplier circuit, improves the operation speed of the multiplier circuit and is easy for realizing VLSI. The partial product compression module comprises a one-bit full adder and a summation circuit, wherein the one-bit full adder outputs an inverted value of a carry bit according to the partial product, the output circuit adds the partial products to generate two target partial products with different weights, and outputs the generated target partial products to a lower-level compression module, and the speed of compressing the partial products is greatly improved by the partial product compression module. The carry skip adder comprises a plurality of CSA modules, and each CSA module comprises a plurality of one-bit full adders and a 2-input data selector and is used for obtaining a target product. The partial product compression module adopts a one-bit full adder, so that the circuit structure is simple while the requirement of quick calculation is met, and extremely low power consumption is realized.
Referring to fig. 2, fig. 2 is a schematic diagram illustrating a low power consumption parallel multiplier according to the present embodiment. The parallel multiplier comprises: a partial product generation module 10, a partial product compression module 20, and a carry skip adder 30.
The partial product generation module comprises a Booth coding circuit and a decoding circuit, wherein the Booth coding circuit codes adjacent bit values of a first multiplier into a target parameter, and the decoding circuit decodes bit values of a second multiplier and the target parameter into a partial product; the partial product generating module reduces the number of partial products by half, greatly saves the area of a multiplier circuit, improves the operation speed of the multiplier circuit and is easy to realize VLSI.
The partial product compression module comprises a one-bit full adder and a summation circuit, wherein the one-bit full adder outputs an inverted value of a carry bit according to the partial product, the output circuit adds the partial products to generate two target partial products with different weights, and the generated target partial products are output to a lower-level compression module; the partial volume compression module greatly increases the speed of compressing the partial volume.
The carry skip adder comprises a plurality of CSA modules, the CSA modules comprise a plurality of one-bit full adders and 2-input data selectors and are used for obtaining target products, the partial product compression module adopts the one-bit full adder, fast calculation is achieved, meanwhile, the circuit structure is simple, and extremely low power consumption is achieved.
On the basis of the foregoing embodiments, the present embodiment provides a specific structure of a Booth encoding circuit, as shown in fig. 3, including: a first exclusive-OR gate I1, a first exclusive-OR gate I2, and a second exclusive-OR gate I3,
a first bit value B of the first multiplier2i-1A second bit value B of the first multiplier2iThe output ends of the first exclusive-OR gates I1 are respectively used as input ends of the first exclusive-OR gates I2;
a second bit value B of the first multiplier2iAnd a third bit value B of the first multiplier2i+1The first exclusive-or gate I2 is respectively used as an input end, and an output end is used for outputting a second target parameter Z;
a first bit value B of the first multiplier2i-1A second bit value B of the first multiplier2iThe output ends of the second exclusive-or gates I3 are respectively used as input ends of the second exclusive-or gates I1;
a third bit value B of the first multiplier2i+1As a fourth target parameter Neg.
In addition, the present embodiment provides a decoding circuit, as shown in fig. 4, including: a third exclusive-OR gate I4, a fourth exclusive-OR gate I5, a first NAND gate I6, a second NAND gate I7 and a third NAND gate I8,
a first bit value A of the second multiplierj-1And the fourth target parameter Neg as an input of the first exclusive or gate I4;
a second bit value A of the second multiplierjAnd the fourth target parameter Neg as an input of the second exclusive or gate I5;
the output end of the third exclusive-or gate I4, the second target parameter Z and the first target parameter X2 are used as input ends of the first NAND gate I6;
the output end of the fourth exclusive-or gate I5 and the third target parameter X1 serve as input ends of the second NAND gate I7;
the output of the first NAND gate I6 and the output of the second NAND gate I7 serve as the inputs of the third NAND gate I8, the output of which is used for outputting the partial product PPij。
With reference to fig. 3 and 4, the truth table of the above circuit diagram is as follows:
TABLE 1
| B2i+1 |
B2i |
B2i-1 |
Func
|
Neg
|
Z
| X1
|
X2 |
|
| 0
|
0
|
0
|
0
|
0
|
1
|
1
|
0
|
| 0
|
0
|
1
|
+A
|
0
|
1
|
0
|
1
|
| 0
|
1
|
0
|
+A
|
0
|
0
|
0
|
1
|
| 0
|
1
|
1
|
+2A
|
0
|
0
|
1
|
0
|
| 1
|
0
|
0
|
-2A
|
1
|
0
|
1
|
0
|
| 1
|
0
|
1
|
-A
|
1
|
0
|
0
|
1
|
| 1
|
1
|
0
|
-A
|
1
|
1
|
0
|
1
|
| 1
|
1
|
1
|
0
|
1
|
1
|
1
|
0 |
As is well known, the size of the multiplier is determined by both the Booth encoding unit and the partial product compression unit. The number of transistors required by the Booth coding unit has great influence on the total area of the multiplier, the number of partial products is reduced by half, the occupied area and the generated delay are very small, and the coding method enables the circuit structure to be regular and simple and is easier to realize by VLSI.
It should be noted that the truth table is exemplified by multiplying two N-bit numbers, wherein the multiplicand and the multiplier are respectively represented by Ai and Bi.
In the above table, Neg, Z, X1, and X2 are encoded by 3 adjacent bits of the multiplier B, and Func is a partial product. Decoding the obtained result to obtain partial product PPij。
In addition, the partial product compression module has a function of performing an accumulation operation of all partial products, wherein a widely used compression method is a Wallace Tree (Wallace Tree) structure, the structure groups the partial products in rows, each row corresponds to a group of adders, the addition operation of each row is performed simultaneously, a carry generated by a previous row is transmitted to a next row, and a new partial product is generated. The new partial product is simplified in the same way until the last two rows of partial products remain, and then the products are obtained by adding with a fast adder. The Wallace tree structure has the advantage of high operation speed and is suitable for multiplication operation with more than 16 bits.
In the compressor array structure, the types of the compression devices which are adopted are more 3:2 compressors, 4:2 compressors and high-order compressors. Of these, the 3:2 compressor configuration is the most basic, while the 4:2 compressor configuration is most widely used because it is very structured and the compression ratio of 2:1 is very good. Although the compression ratio of the high-order compressor is high, the structure is too complex, and the connecting lines are not regular.
The 3:2 compressor works by adding 3 partial products of the same bit by an adder to generate 2 partial products with different weights, and then outputting the partial products to the next compressor, wherein the compression ratio is 3: 2. The 3:2 compressor in the invention adopts a 1-bit full adder structure, and the logic output of the 1-bit full adder is as follows:
Co=AB+Ci(A+B)
wherein S is a target partial product, A is a first partial product, B is a second partial product, Ci is a third partial product, C0Is the carry value of the partial product.
As shown in fig. 5, the present embodiment provides a circuit structure of a 1-bit full adder, where the one-bit full adder includes: a first transistor, a second transistor, a third transistor, a fourth transistor, a fifth transistor, a sixth transistor, a seventh transistor, an eighth transistor, a ninth transistor, and a tenth transistor,
the source electrode of the first transistor, the source electrode of the second transistor and the source electrode of the seventh transistor are connected and are connected with Vcc;
the drain electrode of the first transistor, the drain electrode of the second transistor and the source electrode of the third transistor are connected;
the drain electrode of the seventh transistor is connected with the source electrode of the eighth transistor;
the drain electrode of the third transistor, the drain electrode of the eighth transistor, the drain electrode of the fourth transistor and the drain electrode of the ninth transistor are connected, and the common connection end is used for outputting an inverted value of the carry bit;
the source electrode of the fourth transistor, the drain electrode of the fifth transistor and the drain electrode of the sixth transistor are connected;
a source of the ninth transistor is connected to a drain of the tenth transistor;
the source electrode of the fifth transistor, the source electrode of the sixth transistor and the source electrode of the tenth transistor are all grounded;
the grid electrode of the third transistor is connected with the grid electrode of the fourth transistor and is used as the input end of the third partial product;
the grid electrode of the first transistor, the grid electrode of the seventh transistor, the grid electrode of the fifth transistor and the grid electrode of the ninth transistor are used as input ends of the first partial product;
and the grid electrode of the second transistor, the grid electrode of the eighth transistor, the grid electrode of the sixth transistor and the grid electrode of the tenth transistor are used as input ends of the second partial product.
As can be seen from the figure, the one-bit full adder provided by the scheme adopts a mirror circuit structure to obtain the inverted value of the carry
As soon as the input arrives, C can be obtained immediately
oThe delay is the same as that of the smallest sized inverter. In addition, the pull-up/pull-down network has only two transistors each, which can provide a good I
on/I
offAnd (4) the ratio.
On the basis of the above, the present embodiment further provides a specific structure of a summing circuit, as shown in fig. 6, the output circuit includes: a fifth exclusive or gate and a sixth exclusive or gate.
Specifically, the first partial product and the second partial product are used as input terminals of the fifth exclusive or gate, the output terminal of the fifth exclusive or gate and the third partial product are used as input terminals of the sixth exclusive or gate, and the output terminal of the sixth exclusive or gate is used for outputting the target partial product.
The inventor considers that:
the 3:2 compressor, when used, is typically fitted with other types of compressors, such as a 4:2 compressor. The 5 inputs of the 4:2 compressor comprise 4 partial product signals and 1 carry signal of the previous stage to the current stage, and the 3 output signals comprise 2 output signals of the current stage and 1 carry output signal output to the next stage compressor structure. Thus, a 4:2 compressor is also referred to as a 5:3 compressor. The 4:2 compressor has the advantages of good compression ratio and regular circuit structure. The 4:2 compressor employed in this embodiment is a structure in which a selector and an exclusive or gate are combined, as shown in fig. 7.
As can be seen from fig. 7, the compressor with this structure has substantially uniform delays in three paths, and is well balanced in delay and area, and is a very ideal 4:2 compressor structure.
As shown in fig. 2, the second-order booth encoding algorithm generates 8 partial products, the number of numbers to be added in each column varies from 2 to 8 when accumulation is performed, and if 4:2 compressors are all used, resource waste occurs, so a mixed structure array combining 3:2 compressors and 4:2 compressors is used in the present invention. When the number of added numbers is 8, the compressor configuration is as shown in fig. 8, and when the number of added numbers is 6 to 7, the compressor configuration is as shown in fig. 9. When the number of added numbers is smaller, the compressor unit needs only one 4:2 compressor or only one 3:2 compressor.
On the basis of the above embodiments, the carry skip adder employed in this embodiment includes 4 8-bit modules, where the 8-bit module includes two 4-bit CSA modules, and the CSA module includes 4 one-bit full adders and a 2-input data selector. Preferably, four of the one-bit full adders are cascaded, and if the input of the previous full adder is not inverted, the input of the current full adder is inverted.
Specifically, the carry skip adder may be divided into a number of small blocks, each block being a 4-bit modified adder structure, and the blocks being connected to form a 32-bit adder. The adder adopts modules with the same size, so that the complexity, the non-modularity and the high energy consumption caused by different sizes in low voltage are avoided. Connecting 4 1-bit full adders in series can obtain a 4-bit CSA module.
Each one-bit full adder in the 4-bit CSA module generates a transmission carry, and the four transmission carries are subjected to AND operation, the obtained output is used as the control end of the 2-input data selector, the carry input of the lowest-bit full adder and the carry output of the highest-bit full adder in the 4-bit CSA module are used as the input end of the 2-input data selector, and the output end of the 2-input data selector is used as the carry output of the 4-bit CSA module.
It should be noted that, the 1-bit full adder used in this embodiment adds the carry-propagate output P, and its logic output is:
the logic circuit in which the generation of the outputs P and S is as described in fig. 10.
Specifically, the exclusive or gate uses pass-pipe logic to generate P and S. It is particularly important to note that all xor gate outputs are buffered since it is important to maintain a high switching current ratio in the subthreshold region. Because if there is no buffering, V _ OH of the output S will be less than 0.9V _ DD and the driving current will be small, resulting in a slow circuit. The circuit schematic of the xor gate is shown in fig. 11.
Further, the 4-bit CSA module is formed by cascading 4 one-bit full adders FA. To avoid outputting the carry of the previous stage
The speed of the whole circuit is reduced by performing the inversion again, so that the input is
The input A, B of the partial level FA is also inverted, resulting in a constant P value, but only an inversion of the S value.
In this module, P*=P3P2P1P0If this value is 1, the carry is skipped. To avoid the low switching current ratio associated with high fan-in at sub-threshold voltages, a combination of nand gates and nor gates is used instead of directly using four-input and gates, as shown in fig. 12.
The carry output of each 4-bit CSA module is obtained from a 2-input MUX, as shown in FIG. 13. The inverter serves as an output buffer, and the function of the inverter is dual: first, let the output signal
More strongly, secondly because sometimes C0 can be given directly to
Assigned, without any intermediate buffering, C0 would go directly through all 8 4-bit blocks, an inverter would avoid this. And no buffering is provided, the driving current is small, the intensity of the final output signal can be greatly reduced, and the overall speed of the adder can also be reduced. And because the carry output is inverted again, part of the input of the 4-bit CSA module of the next stage needs to be inverted again. Such two 4-bit CSA modules constitute one 8-bit module. And the final 32-bit CSA is composed of 4 8-bit modules connected in series.
In particular, FIG. 14 shows a circuit schematic of a first 4-bit CSA module in an 8-bit module. Fig. 15 is a circuit schematic of an 8-bit module. Similarly, the final 32-bit CSA circuit schematic is shown in fig. 16.
In summary, the low power consumption parallel multiplier provided by the present invention includes: the partial product generation module comprises a Booth coding circuit and a decoding circuit, wherein the Booth coding circuit codes adjacent bit values of a first multiplier into target parameters, the decoding circuit decodes bit values of a second multiplier and the target parameters into partial products, and the partial product generation module reduces the number of the partial products by half, greatly saves the area of a multiplier circuit, improves the operation speed of the multiplier circuit and is easy for realizing VLSI. The partial product compression module comprises a one-bit full adder and an output circuit, wherein the one-bit full adder outputs an inverted value of a carry according to the partial product, the output circuit adds the partial products to generate two target partial products with different weights, and outputs the generated target partial products to a lower-level compression module, and the speed of compressing the partial products is greatly improved by the partial product compression module. The carry skip adder comprises a plurality of CSA modules, and each CSA module comprises a plurality of one-bit full adders and a 2-input data selector and is used for obtaining a target product. The partial product compression module adopts a one-bit full adder, so that the circuit structure is simple and low power consumption is realized while the requirement of quick calculation is met.
Compared with the traditional multiplier with the structure, the multiplier simplifies the complexity of circuit implementation, reduces the difficulty of layout implementation, effectively improves the running speed of the multiplier and greatly reduces the power consumption.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.