CN117149129B

CN117149129B - Special large integer multiplication microcontroller

Info

Publication number: CN117149129B
Application number: CN202311427949.5A
Authority: CN
Inventors: 朱柯嘉; 何捷; 胡伟
Original assignee: Common Mode Semiconductor Technology Suzhou Co ltd
Current assignee: Common Mode Semiconductor Technology Suzhou Co ltd
Priority date: 2023-10-31
Filing date: 2023-10-31
Publication date: 2024-01-26
Anticipated expiration: 2043-10-31
Also published as: CN117149129A

Abstract

The invention provides a special large integer multiplication microcontroller, which comprises M multiplied by N multiplication modules, wherein the M multiplied modules are arranged according to M rows and N columns, the M multiplied modules are sequentially arranged from right to left in a first column to an N column, and the M multiplied modules are sequentially arranged from top to bottom in a first row to an M row; the controller performs multiplication operations in M rounds per row, and in-row operations perform N256×256 unit multiplication operations in column order. The invention decomposes large integer multiplication into unit module multiplication, each unit module multiplication is realized by hardware. Simultaneously defining a microcontroller and an instruction set, and controlling operation to realize a variable length large number multiplier at a firmware level; the flexibility aiming at different application scenes is ensured, and the consumption of resources and security holes by a software scheme are avoided.

Description

Special large integer multiplication microcontroller

Technical Field

The present invention relates to the field of information security, and more particularly to a dedicated large integer multiplication microcontroller.

Background

With the development of information technology, information security is increasingly important. The security chip realizes the protection of information in a physical layer, and has very high reliability and security. Among various algorithms constituting the information security chip, large integer multiplication is one of the most commonly used operations. Such as the well-known RSA algorithm, is based on large integer modular multiplication operations. Therefore, the efficient and reliable calculation of large integer multiplication is of great importance for the design of information security systems.

Currently, there are software solutions and hardware solutions for large integer multiplications. The software scheme is flexible and can be directly operated on the host processor. The hardware scheme has better security and faster speed and is generally used in a special system.

However, software solutions require systems with powerful processors, which are not satisfactory for embedded applications and certain low cost applications. And the software solution is vulnerable to attack and hacking. The hardware scheme has high efficiency, high speed and high safety, but can only aim at data with specific digits, and lacks flexibility. If the data bit selection is relatively short, such as 256 bits, the high-density application scene cannot be satisfied. If the data bits are selected to be long, such as 4096 bits, then the resources are idle for the general application. The pure hardware products are difficult to support different application scenes on the same platform, and repeated development and resource waste are often caused.

Disclosure of Invention

The invention aims to solve the defect problems of the existing large integer multiplication software and hardware schemes, and provides a special large integer multiplication microcontroller; the large integer multiplication is decomposed into unit module multiplications, each unit module multiplication is realized by circuit hardware, the large integer multiplication is generated based on unit module multiplication combination, and the large integer multiplication is realized by adopting a microcontroller and an instruction microstrip code algorithm based on the microcontroller, so that the large integer multiplier with variable length can be realized, the flexibility aiming at different application scenes is ensured, and the consumption and security holes of a pure software scheme on resources are avoided.

The technical scheme of the invention is as follows:

a special large integer multiplication microcontroller comprises M multiplied by N multiplication modules, wherein the M multiplied modules are arranged according to M rows and N columns, the first column to the N column are sequentially arranged from right to left, and the first row to the M row are sequentially arranged from top to bottom;

the controller executes multiplication operation to carry out M rounds according to the rows, and the operation in the rows carries out N times of 256 multiplied by 256 units according to the column sequence to respectively generate 256-bit final products and partial products of the corresponding rows;

the operation of the corresponding row is finished, the final product of 256 bits is stored as the lower half part of the result of the row, and the partial product of 256 multiplied by N bits is used as the initial value accumulated by the first multiplication module of the next row;

the 256-bit final product of the first row to the Mth row is arranged from low to high to form 256 XN bits of the final product, and the 256 XN bit partial product of the Mth row is used as 256 XN bits of the final product.

Further, the intra-row operation includes the steps of:

in each row, multiplication modules 1-N are arranged in sequence from right to left, the multiplication module 1 performs 256-bit shift accumulation operation once to generate a final product 1 with low 256 bits and a temporary product l with high 256 bits, the final product of the multiplication module 1 is stored as the lowest 256 bits of multiplication, and the temporary product 1 of the multiplication module 1 is used as the initial value of shift accumulation of the multiplication module 2;

the multiplication module 2 performs 256-bit shift accumulation operation once to generate a low 256-bit partial product 1 and a high temporary product 2, the 256-bit partial product 1 of the multiplication module 2 is stored, and the temporary product 2 of the multiplication module 2 is used as an initial value of shift accumulation of the multiplication module 3;

sequentially executing the multiplication modules 2 until the multiplication module N-1;

the multiplication module N performs 256-bit displacement accumulation operation once to generate a low 256-bit partial product N-1 and a high 256-bit partial product N, and the 256-bit partial product N-1 and the 256-bit partial product N of the multiplication module N are stored;

each row generates 256-bit final products and N groups of 256-bit partial products through N times of unit multiplication operation according to the steps, and then high (N-1) 256-bit data of the partial products of the previous row are overlapped to obtain the final N groups of 256-bit partial products of the row operation; for the first row, the partial product initialization value for the previous row is all zeros.

Further, the microcontroller comprises an accumulator ACC, a shift register M1, a multiplicand register M2, an addition module, a multiplicand controller and a control module; when the microcontroller executes the MXN-bit multiplication operation A X B, an M+N-bit result C is obtained.

Further, the accumulator ACC is configured to store a result generated by each 256-bit shift accumulation operation.

Further, the shift register M1; at the beginning of the operation, for saving the multiplier; in the operation process, shifting the lowest bit of the multiplier by one bit every time, and shifting the highest bit of the product in; after 256 operation cycles are finished, the original multiplier is completely shifted out, the lower 256 bits of the product are shifted into the shift register M1, and the upper 256 bits are provided with an accumulator ACC;

the multiplicand register M2; the device is used for storing multiplicand and is kept unchanged in the operation process;

the multiplicand controller; under the control of the shift register M1, if the output is 1, the multiplicand is output to the adder, and if the output is 0, all 0 is output to the adder, so that the multiplication 1 or the multiplication 0 step of binary multiplication operation is realized.

Further, the addition module comprises 8 32-bit unit adders, and each unit adder comprises a 32-bit full adder, a 1-bit carry device and a carry selector; the unit adder has two working modes, namely a carry-in-advance chain mode and a cascade mode, and the carry input in the corresponding mode is selected through a carry selector.

Further, the addition module specifically executes the following operation steps:

carry-ahead step: the control module drives the addition module to work in a carry-ahead chain mode, the 32 unit adders are in parallel connection to obtain 8 32-bit results and 8 carry bits, the carry bits of the stage are added when the carry bits of the stage are reserved in the next carry-ahead step, and one-bit 256-bit multiplication can be completed in 1 period;

carry look ahead step 256 times;

cascading steps: the control module drives the addition module to work in a cascade carry mode, the addition number is set to be 0, all saved carries after 256 times of accumulation are subjected to accumulation adjustment through one-time addition, 256-bit output is generated, and the 256-bit output is stored in the shift register M1.

Further, the 32-bit full adder inputs two 32-bit data and a 1-bit low-order carry, and generates a 32-bit sum output and a 1-bit high-order carry.

Further, the 1-bit carry device has a carry storage function; in the carry-lookahead chain mode, the carry device saves the carry of the current period and adds the carry to the lowest bit of the adder of the stage when the next calculation period is shifted and added; in the cascade mode, the carry device saves the current period carry and inputs the current period carry to the carry input end of the adder of the next stage in the next calculation period.

A data processing method of special large integer multiplication microcontroller, carry out the following operation;

s1, starting multiplication operation;

s2, setting the current line number as 1 and the column number as 1;

s3, loading the data A to a multiplicand register M2;

s4, loading the data B into a shift register M1;

s5, identifying a column number;

when column number=1, loading the low 256 bits of data of the previous line partial product to the accumulator ACC, the previous line partial product initialization value being all zeros for the first line; performing unit multiplication operation to obtain 256 bits of the current row, finally accumulating in a shift register M1, adding 1 to the column number, and returning to S5;

when N > column number >1, executing unit multiplication operation to obtain 256-bit temporary product and 256-bit partial product of the current column, wherein the 256-bit temporary product is stored in an accumulator ACC as input of the multiplication operation of the next column, and the 256-bit partial product is stored in a shift register M1; adding 1 to the column number, and returning to S5;

when the column number=n, performing a unit multiplication operation to obtain two groups of 256-bit outputs of the current column, and accumulating the two groups of 256-bit outputs of the current column as two groups of 256-bit portions of the current column into a shift register M1 to S6;

s6, identifying a line number;

when the line number is not equal to M, completing the current line operation, and overlapping the partial products of N rows and 256 bits of high (N-1) partial products of the previous line to obtain the final N groups of 256 bits of partial products of the current line operation; the row number is added with 1, the column number is reset to 1, and S5 is returned;

when the line number=m, overlapping the partial products of the N groups of 256 bits with the high (N-1) 256 bits of the partial products of the previous line to obtain the final partial products of the N groups of 256 bits of the present line operation;

at this time, all the line operations are completed, the N groups of 256-bit partial products of the M-th line are copied to the RH area as the final product high order, the 256-bit final products of the M-th line are spliced and copied to the RL area as the final product low order, and the final product high order and low order are spliced as the operation result.

The invention has the beneficial effects that:

the invention decomposes large integer multiplication into unit module multiplication, each unit module multiplication is realized by hardware. Simultaneously defining a microcontroller and an instruction set, and controlling operation to realize a variable length large number multiplier at a firmware level; the flexibility aiming at different application scenes is ensured, and the consumption of resources and security holes by a software scheme are avoided.

Additional features and advantages of the invention will be set forth in the detailed description which follows.

Drawings

The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular descriptions of exemplary embodiments of the invention as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the invention.

FIG. 1 shows a schematic diagram of four rows and four columns of 16 multiplication modules operation according to one embodiment of the invention.

FIG. 2 illustrates a schematic of an intra-row operation according to one embodiment of the invention.

FIG. 3 shows a schematic diagram of superimposing a previous row partial product in an intra-row operation according to one embodiment of the invention.

FIG. 4 illustrates an operational flow diagram according to one embodiment of the invention.

FIG. 5 shows a unit multiplication data path schematic according to one embodiment of the invention.

FIG. 6 shows a schematic diagram of a unit multiplication microcontroller according to one embodiment of the invention.

FIG. 7 illustrates a storage allocation schematic according to one embodiment of the present invention.

Detailed Description

Preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein.

Example 1:

the invention provides a special large integer multiplication microcontroller, which comprises M multiplied by N multiplication modules, wherein the M multiplied modules are arranged according to M rows and N columns, the M multiplied modules are sequentially arranged from right to left in a first column to an N column, and the M multiplied modules are sequentially arranged from top to bottom in a first row to an M row;

In this embodiment, the dedicated large integer multiplication microcontroller may perform (256×m) ×256×n) bit large integer multiplication; the number of block operations required by the algorithm is m×n. For example, when m=4 and n=4, large-number multiplication with 1024×1024 bits can be completed through 16 block operations, and fig. 1 shows a schematic diagram of four rows and four columns of 16 multiplication modules.

When m=4, n=8, 1024×2048 bit large integer multiplication can be completed by 32 block operations. Large number multiplication of different lengths can be achieved by selecting the values of M and N.

The intra-row operations in this embodiment are shown in fig. 2 and 3, respectively, and are schematic intra-row operations.

FIG. 3 shows a schematic diagram of superimposing a previous row partial product in an intra-row operation according to one embodiment of the invention; the method comprises the following steps:

Example 2:

as shown in fig. 5, a schematic diagram of a unit multiplication operation data path is shown; the microcontroller comprises an accumulator ACC, a shift register M1, a multiplicand register M2, an addition module, a multiplicand controller and a control module; when the microcontroller executes the multiplying operation A multiplied by B by M multiplied by N bits, an M+N bit result C is obtained;

the accumulator ACC is used for storing the result generated by 256-bit displacement accumulation operation each time.

The shift register M1; at the beginning of the operation, for saving the multiplier; in the operation process, shifting the lowest bit of the multiplier by one bit every time, and shifting the highest bit of the product in; after 256 operation cycles are finished, the original multiplier is completely shifted out, the lower 256 bits of the product are shifted into the shift register M1, and the upper 256 bits are provided with an accumulator ACC;

The addition module completes 256-bit addition operation and comprises 8 32-bit unit adders, wherein each unit adder comprises a 32-bit full adder, a 1-bit carry device and a carry selector; the unit adder has two working modes, namely a carry-in-advance chain mode and a cascade mode, and carries input in a corresponding mode is selected through a carry selector;

in this embodiment, the addition module specifically executes the following steps:

carry look ahead step 256 times;

Wherein: the 32-bit full adder inputs two 32-bit data and a 1-bit low-order carry, and generates a 32-bit sum output and a 1-bit high-order carry; the 1-bit carry device has a carry storage function; in the carry-lookahead chain mode, the carry device saves the carry of the current period and adds the carry to the lowest bit of the adder of the stage when the next calculation period is shifted and added; in the cascade mode, the carry device saves the current period carry and inputs the current period carry to the carry input end of the adder of the next stage in the next calculation period.

In this embodiment, in the carry-lookahead chain mode, 8 adders are operated in parallel, each time the carry of the present stage is reserved for the next accumulation, so that a 256-bit multiplication can be completed in 1 cycle. Since multiplication can be considered as a shifted accumulation of additions, the retention of the current carry to the next accumulation does not affect the calculation result. After 256 times of accumulation, the adder is converted into a cascade mode, and all saved carry bits are accumulated and adjusted again. At this point 8 cycles complete a 256 bit addition. In the multiplication operation process, only 1/256 period works in the cascade mode, and the rest time works in the carry-ahead chain mode, so that a shorter 32-bit adder can be used for realizing 256-bit long addition, and hardware resources are greatly saved.

Example 3:

in the present invention, as shown in fig. 4, the following operation steps are performed in the data processing step of the microcontroller;

s1, starting multiplication operation;

s2, setting the current line number as 1 and the column number as 1;

s3, loading the data A to a multiplicand register M2;

s4, loading the data B into a shift register M1;

s5, identifying a column number;

s6, identifying a line number;

Example 4:

the whole multiplication operation is realized by adopting a microcontroller architecture, and the architecture system comprises a program counter, an instruction memory, an instruction decoder and an instruction state machine.

When the program is executed, the instruction is fetched from the program memory, decoded by the instruction decoder, and the corresponding instruction state machine is started. After the execution of the instruction state machine is finished, except for the jump instruction, the program counter is incremented under the other conditions, and the next instruction is read and executed. For the jump instruction, the jump is performed to the designated address, and the functions of each module in the framework system are as follows:

an instruction counter completes counting and addressing the program addresses. It has two modes, and when it is working normally, one instruction is executed, the instruction counter is self-increased by one, then the next instruction is obtained from new address. When executing a jump instruction, the value of the instruction counter is overridden by the destination value provided by the jump instruction, and the program implements the jump operation.

An instruction memory for storing microcode. When the chip is used alone, the instruction memory must be a non-volatile memory (such as a flash memory, a read-only memory, or an electrically erasable programmable read-only memory, etc.). And after the system is reset and powered on, the microcontroller loads instruction codes from the instruction memory to execute. If the present chip is integrated as part of a system on a chip, the instruction memory may be random access memory. After the system is reset and powered on, the upper computer is responsible for loading the microcode into the random access memory and informing the microcontroller to execute the code.

And the instruction decoder decodes the instructions according to the defined instruction set, and then starts the state machines corresponding to the instructions according to the decoding result.

And the instruction state machine is used for each instruction, and each instruction corresponds to one state machine and is responsible for specific execution of the instruction.

The state machine controls the storage module to load and store data, and controls the data path module to reset, add, multiply, shift and jump operations.

The memory is a random access memory. The memory access instruction can read and write the memory for acquiring the multiplier and the multiplicand and storing the intermediate result and the final product.

The addition, multiplication and shift operations are completed under the control of the instruction state machine.

The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described.

Claims

1. The special large integer multiplication microcontroller is characterized by comprising M multiplied by N multiplication modules, wherein the M multiplied modules are arranged according to M rows and N columns, the M multiplied modules are sequentially arranged from right to left in a first column to an N column, and the M multiplied modules are sequentially arranged from top to bottom in a first row to an M column;

sequentially performing M-round row operation, arranging 256-bit final products of the first row to the Mth row from low to high to form 256 XN-bit final products, and taking 256 XN-bit partial products of the Mth row as 256 XN-bit final products;

the microcontroller comprises an accumulator ACC, a shift register M1, a multiplicand register M2, an addition module, a multiplicand controller and a control module; when the microcontroller executes M multiplied by N bits, A multiplied by B, an M+N bit result C is obtained;

the addition module comprises 8 32-bit unit adders, and each unit adder comprises a 32-bit full adder, a 1-bit carry device and a carry selector; the unit adder has two working modes, namely a carry-in-advance chain mode and a cascade mode, and carries input in a corresponding mode is selected through a carry selector;

the addition module specifically executes the following operation steps:

carry look ahead step 256 times;

2. The special purpose large integer multiplication microcontroller of claim 1 wherein the intra-row operation comprises the steps of:

3. The special purpose large integer multiplication microcontroller of claim 1 wherein the accumulator ACC is configured to hold a result of each 256-bit shift accumulation operation.

4. The special purpose large integer multiplication microcontroller of claim 1 wherein:

5. The special purpose large integer multiplication micro-controller of claim 1, wherein said 32 bit full adder inputs two 32 bit data and a 1 bit low order carry, producing a 32 bit sum output and a 1 bit high order carry.

6. The special large integer multiplication microcontroller according to claim 1, wherein said 1-bit carry machine has a carry save function; in the carry-lookahead chain mode, the carry device saves the carry of the current period and adds the carry to the lowest bit of the adder of the stage when the next calculation period is shifted and added; in the cascade mode, the carry device saves the current period carry and inputs the current period carry to the carry input end of the adder of the next stage in the next calculation period.

7. A data processing method of a dedicated large integer multiplication micro-controller according to any of claims 1 to 6, characterized in that the following operations are performed;

s1, starting multiplication operation;

s2, setting the current line number as 1 and the column number as 1;

s3, loading the data A to a multiplicand register M2;

s4, loading the data B into a shift register M1;

s5, identifying a column number;

s6, identifying a line number;