[go: up one dir, main page]

CN117149129B - Special large integer multiplication microcontroller - Google Patents

Special large integer multiplication microcontroller Download PDF

Info

Publication number
CN117149129B
CN117149129B CN202311427949.5A CN202311427949A CN117149129B CN 117149129 B CN117149129 B CN 117149129B CN 202311427949 A CN202311427949 A CN 202311427949A CN 117149129 B CN117149129 B CN 117149129B
Authority
CN
China
Prior art keywords
bit
multiplication
carry
row
bits
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311427949.5A
Other languages
Chinese (zh)
Other versions
CN117149129A (en
Inventor
朱柯嘉
何捷
胡伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Common Mode Semiconductor Technology Suzhou Co ltd
Original Assignee
Common Mode Semiconductor Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Common Mode Semiconductor Technology Suzhou Co ltd filed Critical Common Mode Semiconductor Technology Suzhou Co ltd
Priority to CN202311427949.5A priority Critical patent/CN117149129B/en
Publication of CN117149129A publication Critical patent/CN117149129A/en
Application granted granted Critical
Publication of CN117149129B publication Critical patent/CN117149129B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/487Multiplying; Dividing
    • G06F7/4876Multiplying
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Nonlinear Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a special large integer multiplication microcontroller, which comprises M multiplied by N multiplication modules, wherein the M multiplied modules are arranged according to M rows and N columns, the M multiplied modules are sequentially arranged from right to left in a first column to an N column, and the M multiplied modules are sequentially arranged from top to bottom in a first row to an M row; the controller performs multiplication operations in M rounds per row, and in-row operations perform N256×256 unit multiplication operations in column order. The invention decomposes large integer multiplication into unit module multiplication, each unit module multiplication is realized by hardware. Simultaneously defining a microcontroller and an instruction set, and controlling operation to realize a variable length large number multiplier at a firmware level; the flexibility aiming at different application scenes is ensured, and the consumption of resources and security holes by a software scheme are avoided.

Description

Special large integer multiplication microcontroller
Technical Field
The present invention relates to the field of information security, and more particularly to a dedicated large integer multiplication microcontroller.
Background
With the development of information technology, information security is increasingly important. The security chip realizes the protection of information in a physical layer, and has very high reliability and security. Among various algorithms constituting the information security chip, large integer multiplication is one of the most commonly used operations. Such as the well-known RSA algorithm, is based on large integer modular multiplication operations. Therefore, the efficient and reliable calculation of large integer multiplication is of great importance for the design of information security systems.
Currently, there are software solutions and hardware solutions for large integer multiplications. The software scheme is flexible and can be directly operated on the host processor. The hardware scheme has better security and faster speed and is generally used in a special system.
However, software solutions require systems with powerful processors, which are not satisfactory for embedded applications and certain low cost applications. And the software solution is vulnerable to attack and hacking. The hardware scheme has high efficiency, high speed and high safety, but can only aim at data with specific digits, and lacks flexibility. If the data bit selection is relatively short, such as 256 bits, the high-density application scene cannot be satisfied. If the data bits are selected to be long, such as 4096 bits, then the resources are idle for the general application. The pure hardware products are difficult to support different application scenes on the same platform, and repeated development and resource waste are often caused.
Disclosure of Invention
The invention aims to solve the defect problems of the existing large integer multiplication software and hardware schemes, and provides a special large integer multiplication microcontroller; the large integer multiplication is decomposed into unit module multiplications, each unit module multiplication is realized by circuit hardware, the large integer multiplication is generated based on unit module multiplication combination, and the large integer multiplication is realized by adopting a microcontroller and an instruction microstrip code algorithm based on the microcontroller, so that the large integer multiplier with variable length can be realized, the flexibility aiming at different application scenes is ensured, and the consumption and security holes of a pure software scheme on resources are avoided.
The technical scheme of the invention is as follows:
a special large integer multiplication microcontroller comprises M multiplied by N multiplication modules, wherein the M multiplied modules are arranged according to M rows and N columns, the first column to the N column are sequentially arranged from right to left, and the first row to the M row are sequentially arranged from top to bottom;
the controller executes multiplication operation to carry out M rounds according to the rows, and the operation in the rows carries out N times of 256 multiplied by 256 units according to the column sequence to respectively generate 256-bit final products and partial products of the corresponding rows;
the operation of the corresponding row is finished, the final product of 256 bits is stored as the lower half part of the result of the row, and the partial product of 256 multiplied by N bits is used as the initial value accumulated by the first multiplication module of the next row;
the 256-bit final product of the first row to the Mth row is arranged from low to high to form 256 XN bits of the final product, and the 256 XN bit partial product of the Mth row is used as 256 XN bits of the final product.
Further, the intra-row operation includes the steps of:
in each row, multiplication modules 1-N are arranged in sequence from right to left, the multiplication module 1 performs 256-bit shift accumulation operation once to generate a final product 1 with low 256 bits and a temporary product l with high 256 bits, the final product of the multiplication module 1 is stored as the lowest 256 bits of multiplication, and the temporary product 1 of the multiplication module 1 is used as the initial value of shift accumulation of the multiplication module 2;
the multiplication module 2 performs 256-bit shift accumulation operation once to generate a low 256-bit partial product 1 and a high temporary product 2, the 256-bit partial product 1 of the multiplication module 2 is stored, and the temporary product 2 of the multiplication module 2 is used as an initial value of shift accumulation of the multiplication module 3;
sequentially executing the multiplication modules 2 until the multiplication module N-1;
the multiplication module N performs 256-bit displacement accumulation operation once to generate a low 256-bit partial product N-1 and a high 256-bit partial product N, and the 256-bit partial product N-1 and the 256-bit partial product N of the multiplication module N are stored;
each row generates 256-bit final products and N groups of 256-bit partial products through N times of unit multiplication operation according to the steps, and then high (N-1) 256-bit data of the partial products of the previous row are overlapped to obtain the final N groups of 256-bit partial products of the row operation; for the first row, the partial product initialization value for the previous row is all zeros.
Further, the microcontroller comprises an accumulator ACC, a shift register M1, a multiplicand register M2, an addition module, a multiplicand controller and a control module; when the microcontroller executes the MXN-bit multiplication operation A X B, an M+N-bit result C is obtained.
Further, the accumulator ACC is configured to store a result generated by each 256-bit shift accumulation operation.
Further, the shift register M1; at the beginning of the operation, for saving the multiplier; in the operation process, shifting the lowest bit of the multiplier by one bit every time, and shifting the highest bit of the product in; after 256 operation cycles are finished, the original multiplier is completely shifted out, the lower 256 bits of the product are shifted into the shift register M1, and the upper 256 bits are provided with an accumulator ACC;
the multiplicand register M2; the device is used for storing multiplicand and is kept unchanged in the operation process;
the multiplicand controller; under the control of the shift register M1, if the output is 1, the multiplicand is output to the adder, and if the output is 0, all 0 is output to the adder, so that the multiplication 1 or the multiplication 0 step of binary multiplication operation is realized.
Further, the addition module comprises 8 32-bit unit adders, and each unit adder comprises a 32-bit full adder, a 1-bit carry device and a carry selector; the unit adder has two working modes, namely a carry-in-advance chain mode and a cascade mode, and the carry input in the corresponding mode is selected through a carry selector.
Further, the addition module specifically executes the following operation steps:
carry-ahead step: the control module drives the addition module to work in a carry-ahead chain mode, the 32 unit adders are in parallel connection to obtain 8 32-bit results and 8 carry bits, the carry bits of the stage are added when the carry bits of the stage are reserved in the next carry-ahead step, and one-bit 256-bit multiplication can be completed in 1 period;
carry look ahead step 256 times;
cascading steps: the control module drives the addition module to work in a cascade carry mode, the addition number is set to be 0, all saved carries after 256 times of accumulation are subjected to accumulation adjustment through one-time addition, 256-bit output is generated, and the 256-bit output is stored in the shift register M1.
Further, the 32-bit full adder inputs two 32-bit data and a 1-bit low-order carry, and generates a 32-bit sum output and a 1-bit high-order carry.
Further, the 1-bit carry device has a carry storage function; in the carry-lookahead chain mode, the carry device saves the carry of the current period and adds the carry to the lowest bit of the adder of the stage when the next calculation period is shifted and added; in the cascade mode, the carry device saves the current period carry and inputs the current period carry to the carry input end of the adder of the next stage in the next calculation period.
A data processing method of special large integer multiplication microcontroller, carry out the following operation;
s1, starting multiplication operation;
s2, setting the current line number as 1 and the column number as 1;
s3, loading the data A to a multiplicand register M2;
s4, loading the data B into a shift register M1;
s5, identifying a column number;
when column number=1, loading the low 256 bits of data of the previous line partial product to the accumulator ACC, the previous line partial product initialization value being all zeros for the first line; performing unit multiplication operation to obtain 256 bits of the current row, finally accumulating in a shift register M1, adding 1 to the column number, and returning to S5;
when N > column number >1, executing unit multiplication operation to obtain 256-bit temporary product and 256-bit partial product of the current column, wherein the 256-bit temporary product is stored in an accumulator ACC as input of the multiplication operation of the next column, and the 256-bit partial product is stored in a shift register M1; adding 1 to the column number, and returning to S5;
when the column number=n, performing a unit multiplication operation to obtain two groups of 256-bit outputs of the current column, and accumulating the two groups of 256-bit outputs of the current column as two groups of 256-bit portions of the current column into a shift register M1 to S6;
s6, identifying a line number;
when the line number is not equal to M, completing the current line operation, and overlapping the partial products of N rows and 256 bits of high (N-1) partial products of the previous line to obtain the final N groups of 256 bits of partial products of the current line operation; the row number is added with 1, the column number is reset to 1, and S5 is returned;
when the line number=m, overlapping the partial products of the N groups of 256 bits with the high (N-1) 256 bits of the partial products of the previous line to obtain the final partial products of the N groups of 256 bits of the present line operation;
at this time, all the line operations are completed, the N groups of 256-bit partial products of the M-th line are copied to the RH area as the final product high order, the 256-bit final products of the M-th line are spliced and copied to the RL area as the final product low order, and the final product high order and low order are spliced as the operation result.
The invention has the beneficial effects that:
the invention decomposes large integer multiplication into unit module multiplication, each unit module multiplication is realized by hardware. Simultaneously defining a microcontroller and an instruction set, and controlling operation to realize a variable length large number multiplier at a firmware level; the flexibility aiming at different application scenes is ensured, and the consumption of resources and security holes by a software scheme are avoided.
Additional features and advantages of the invention will be set forth in the detailed description which follows.
Drawings
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular descriptions of exemplary embodiments of the invention as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the invention.
FIG. 1 shows a schematic diagram of four rows and four columns of 16 multiplication modules operation according to one embodiment of the invention.
FIG. 2 illustrates a schematic of an intra-row operation according to one embodiment of the invention.
FIG. 3 shows a schematic diagram of superimposing a previous row partial product in an intra-row operation according to one embodiment of the invention.
FIG. 4 illustrates an operational flow diagram according to one embodiment of the invention.
FIG. 5 shows a unit multiplication data path schematic according to one embodiment of the invention.
FIG. 6 shows a schematic diagram of a unit multiplication microcontroller according to one embodiment of the invention.
FIG. 7 illustrates a storage allocation schematic according to one embodiment of the present invention.
Detailed Description
Preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein.
Example 1:
the invention provides a special large integer multiplication microcontroller, which comprises M multiplied by N multiplication modules, wherein the M multiplied modules are arranged according to M rows and N columns, the M multiplied modules are sequentially arranged from right to left in a first column to an N column, and the M multiplied modules are sequentially arranged from top to bottom in a first row to an M row;
the controller executes multiplication operation to carry out M rounds according to the rows, and the operation in the rows carries out N times of 256 multiplied by 256 units according to the column sequence to respectively generate 256-bit final products and partial products of the corresponding rows;
the operation of the corresponding row is finished, the final product of 256 bits is stored as the lower half part of the result of the row, and the partial product of 256 multiplied by N bits is used as the initial value accumulated by the first multiplication module of the next row;
the 256-bit final product of the first row to the Mth row is arranged from low to high to form 256 XN bits of the final product, and the 256 XN bit partial product of the Mth row is used as 256 XN bits of the final product.
In this embodiment, the dedicated large integer multiplication microcontroller may perform (256×m) ×256×n) bit large integer multiplication; the number of block operations required by the algorithm is m×n. For example, when m=4 and n=4, large-number multiplication with 1024×1024 bits can be completed through 16 block operations, and fig. 1 shows a schematic diagram of four rows and four columns of 16 multiplication modules.
When m=4, n=8, 1024×2048 bit large integer multiplication can be completed by 32 block operations. Large number multiplication of different lengths can be achieved by selecting the values of M and N.
The intra-row operations in this embodiment are shown in fig. 2 and 3, respectively, and are schematic intra-row operations.
FIG. 3 shows a schematic diagram of superimposing a previous row partial product in an intra-row operation according to one embodiment of the invention; the method comprises the following steps:
in each row, multiplication modules 1-N are arranged in sequence from right to left, the multiplication module 1 performs 256-bit shift accumulation operation once to generate a final product 1 with low 256 bits and a temporary product l with high 256 bits, the final product of the multiplication module 1 is stored as the lowest 256 bits of multiplication, and the temporary product 1 of the multiplication module 1 is used as the initial value of shift accumulation of the multiplication module 2;
the multiplication module 2 performs 256-bit shift accumulation operation once to generate a low 256-bit partial product 1 and a high temporary product 2, the 256-bit partial product 1 of the multiplication module 2 is stored, and the temporary product 2 of the multiplication module 2 is used as an initial value of shift accumulation of the multiplication module 3;
sequentially executing the multiplication modules 2 until the multiplication module N-1;
the multiplication module N performs 256-bit displacement accumulation operation once to generate a low 256-bit partial product N-1 and a high 256-bit partial product N, and the 256-bit partial product N-1 and the 256-bit partial product N of the multiplication module N are stored;
each row generates 256-bit final products and N groups of 256-bit partial products through N times of unit multiplication operation according to the steps, and then high (N-1) 256-bit data of the partial products of the previous row are overlapped to obtain the final N groups of 256-bit partial products of the row operation; for the first row, the partial product initialization value for the previous row is all zeros.
Example 2:
as shown in fig. 5, a schematic diagram of a unit multiplication operation data path is shown; the microcontroller comprises an accumulator ACC, a shift register M1, a multiplicand register M2, an addition module, a multiplicand controller and a control module; when the microcontroller executes the multiplying operation A multiplied by B by M multiplied by N bits, an M+N bit result C is obtained;
the accumulator ACC is used for storing the result generated by 256-bit displacement accumulation operation each time.
The shift register M1; at the beginning of the operation, for saving the multiplier; in the operation process, shifting the lowest bit of the multiplier by one bit every time, and shifting the highest bit of the product in; after 256 operation cycles are finished, the original multiplier is completely shifted out, the lower 256 bits of the product are shifted into the shift register M1, and the upper 256 bits are provided with an accumulator ACC;
the multiplicand register M2; the device is used for storing multiplicand and is kept unchanged in the operation process;
the multiplicand controller; under the control of the shift register M1, if the output is 1, the multiplicand is output to the adder, and if the output is 0, all 0 is output to the adder, so that the multiplication 1 or the multiplication 0 step of binary multiplication operation is realized.
The addition module completes 256-bit addition operation and comprises 8 32-bit unit adders, wherein each unit adder comprises a 32-bit full adder, a 1-bit carry device and a carry selector; the unit adder has two working modes, namely a carry-in-advance chain mode and a cascade mode, and carries input in a corresponding mode is selected through a carry selector;
in this embodiment, the addition module specifically executes the following steps:
carry-ahead step: the control module drives the addition module to work in a carry-ahead chain mode, the 32 unit adders are in parallel connection to obtain 8 32-bit results and 8 carry bits, the carry bits of the stage are added when the carry bits of the stage are reserved in the next carry-ahead step, and one-bit 256-bit multiplication can be completed in 1 period;
carry look ahead step 256 times;
cascading steps: the control module drives the addition module to work in a cascade carry mode, the addition number is set to be 0, all saved carries after 256 times of accumulation are subjected to accumulation adjustment through one-time addition, 256-bit output is generated, and the 256-bit output is stored in the shift register M1.
Wherein: the 32-bit full adder inputs two 32-bit data and a 1-bit low-order carry, and generates a 32-bit sum output and a 1-bit high-order carry; the 1-bit carry device has a carry storage function; in the carry-lookahead chain mode, the carry device saves the carry of the current period and adds the carry to the lowest bit of the adder of the stage when the next calculation period is shifted and added; in the cascade mode, the carry device saves the current period carry and inputs the current period carry to the carry input end of the adder of the next stage in the next calculation period.
In this embodiment, in the carry-lookahead chain mode, 8 adders are operated in parallel, each time the carry of the present stage is reserved for the next accumulation, so that a 256-bit multiplication can be completed in 1 cycle. Since multiplication can be considered as a shifted accumulation of additions, the retention of the current carry to the next accumulation does not affect the calculation result. After 256 times of accumulation, the adder is converted into a cascade mode, and all saved carry bits are accumulated and adjusted again. At this point 8 cycles complete a 256 bit addition. In the multiplication operation process, only 1/256 period works in the cascade mode, and the rest time works in the carry-ahead chain mode, so that a shorter 32-bit adder can be used for realizing 256-bit long addition, and hardware resources are greatly saved.
Example 3:
in the present invention, as shown in fig. 4, the following operation steps are performed in the data processing step of the microcontroller;
s1, starting multiplication operation;
s2, setting the current line number as 1 and the column number as 1;
s3, loading the data A to a multiplicand register M2;
s4, loading the data B into a shift register M1;
s5, identifying a column number;
when column number=1, loading the low 256 bits of data of the previous line partial product to the accumulator ACC, the previous line partial product initialization value being all zeros for the first line; performing unit multiplication operation to obtain 256 bits of the current row, finally accumulating in a shift register M1, adding 1 to the column number, and returning to S5;
when N > column number >1, executing unit multiplication operation to obtain 256-bit temporary product and 256-bit partial product of the current column, wherein the 256-bit temporary product is stored in an accumulator ACC as input of the multiplication operation of the next column, and the 256-bit partial product is stored in a shift register M1; adding 1 to the column number, and returning to S5;
when the column number=n, performing a unit multiplication operation to obtain two groups of 256-bit outputs of the current column, and accumulating the two groups of 256-bit outputs of the current column as two groups of 256-bit portions of the current column into a shift register M1 to S6;
s6, identifying a line number;
when the line number is not equal to M, completing the current line operation, and overlapping the partial products of N rows and 256 bits of high (N-1) partial products of the previous line to obtain the final N groups of 256 bits of partial products of the current line operation; the row number is added with 1, the column number is reset to 1, and S5 is returned;
when the line number=m, overlapping the partial products of the N groups of 256 bits with the high (N-1) 256 bits of the partial products of the previous line to obtain the final partial products of the N groups of 256 bits of the present line operation;
at this time, all the line operations are completed, the N groups of 256-bit partial products of the M-th line are copied to the RH area as the final product high order, the 256-bit final products of the M-th line are spliced and copied to the RL area as the final product low order, and the final product high order and low order are spliced as the operation result.
Example 4:
the whole multiplication operation is realized by adopting a microcontroller architecture, and the architecture system comprises a program counter, an instruction memory, an instruction decoder and an instruction state machine.
When the program is executed, the instruction is fetched from the program memory, decoded by the instruction decoder, and the corresponding instruction state machine is started. After the execution of the instruction state machine is finished, except for the jump instruction, the program counter is incremented under the other conditions, and the next instruction is read and executed. For the jump instruction, the jump is performed to the designated address, and the functions of each module in the framework system are as follows:
an instruction counter completes counting and addressing the program addresses. It has two modes, and when it is working normally, one instruction is executed, the instruction counter is self-increased by one, then the next instruction is obtained from new address. When executing a jump instruction, the value of the instruction counter is overridden by the destination value provided by the jump instruction, and the program implements the jump operation.
An instruction memory for storing microcode. When the chip is used alone, the instruction memory must be a non-volatile memory (such as a flash memory, a read-only memory, or an electrically erasable programmable read-only memory, etc.). And after the system is reset and powered on, the microcontroller loads instruction codes from the instruction memory to execute. If the present chip is integrated as part of a system on a chip, the instruction memory may be random access memory. After the system is reset and powered on, the upper computer is responsible for loading the microcode into the random access memory and informing the microcontroller to execute the code.
And the instruction decoder decodes the instructions according to the defined instruction set, and then starts the state machines corresponding to the instructions according to the decoding result.
And the instruction state machine is used for each instruction, and each instruction corresponds to one state machine and is responsible for specific execution of the instruction.
The state machine controls the storage module to load and store data, and controls the data path module to reset, add, multiply, shift and jump operations.
The memory is a random access memory. The memory access instruction can read and write the memory for acquiring the multiplier and the multiplicand and storing the intermediate result and the final product.
The addition, multiplication and shift operations are completed under the control of the instruction state machine.
The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described.

Claims (7)

1. The special large integer multiplication microcontroller is characterized by comprising M multiplied by N multiplication modules, wherein the M multiplied modules are arranged according to M rows and N columns, the M multiplied modules are sequentially arranged from right to left in a first column to an N column, and the M multiplied modules are sequentially arranged from top to bottom in a first row to an M column;
the controller executes multiplication operation to carry out M rounds according to the rows, and the operation in the rows carries out N times of 256 multiplied by 256 units according to the column sequence to respectively generate 256-bit final products and partial products of the corresponding rows;
the operation of the corresponding row is finished, the final product of 256 bits is stored as the lower half part of the result of the row, and the partial product of 256 multiplied by N bits is used as the initial value accumulated by the first multiplication module of the next row;
sequentially performing M-round row operation, arranging 256-bit final products of the first row to the Mth row from low to high to form 256 XN-bit final products, and taking 256 XN-bit partial products of the Mth row as 256 XN-bit final products;
the microcontroller comprises an accumulator ACC, a shift register M1, a multiplicand register M2, an addition module, a multiplicand controller and a control module; when the microcontroller executes M multiplied by N bits, A multiplied by B, an M+N bit result C is obtained;
the addition module comprises 8 32-bit unit adders, and each unit adder comprises a 32-bit full adder, a 1-bit carry device and a carry selector; the unit adder has two working modes, namely a carry-in-advance chain mode and a cascade mode, and carries input in a corresponding mode is selected through a carry selector;
the addition module specifically executes the following operation steps:
carry-ahead step: the control module drives the addition module to work in a carry-ahead chain mode, the 32 unit adders are in parallel connection to obtain 8 32-bit results and 8 carry bits, the carry bits of the stage are added when the carry bits of the stage are reserved in the next carry-ahead step, and one-bit 256-bit multiplication can be completed in 1 period;
carry look ahead step 256 times;
cascading steps: the control module drives the addition module to work in a cascade carry mode, the addition number is set to be 0, all saved carries after 256 times of accumulation are subjected to accumulation adjustment through one-time addition, 256-bit output is generated, and the 256-bit output is stored in the shift register M1.
2. The special purpose large integer multiplication microcontroller of claim 1 wherein the intra-row operation comprises the steps of:
in each row, multiplication modules 1-N are arranged in sequence from right to left, the multiplication module 1 performs 256-bit shift accumulation operation once to generate a final product 1 with low 256 bits and a temporary product l with high 256 bits, the final product of the multiplication module 1 is stored as the lowest 256 bits of multiplication, and the temporary product 1 of the multiplication module 1 is used as the initial value of shift accumulation of the multiplication module 2;
the multiplication module 2 performs 256-bit shift accumulation operation once to generate a low 256-bit partial product 1 and a high temporary product 2, the 256-bit partial product 1 of the multiplication module 2 is stored, and the temporary product 2 of the multiplication module 2 is used as an initial value of shift accumulation of the multiplication module 3;
sequentially executing the multiplication modules 2 until the multiplication module N-1;
the multiplication module N performs 256-bit displacement accumulation operation once to generate a low 256-bit partial product N-1 and a high 256-bit partial product N, and the 256-bit partial product N-1 and the 256-bit partial product N of the multiplication module N are stored;
each row generates 256-bit final products and N groups of 256-bit partial products through N times of unit multiplication operation according to the steps, and then high (N-1) 256-bit data of the partial products of the previous row are overlapped to obtain the final N groups of 256-bit partial products of the row operation; for the first row, the partial product initialization value for the previous row is all zeros.
3. The special purpose large integer multiplication microcontroller of claim 1 wherein the accumulator ACC is configured to hold a result of each 256-bit shift accumulation operation.
4. The special purpose large integer multiplication microcontroller of claim 1 wherein:
the shift register M1; at the beginning of the operation, for saving the multiplier; in the operation process, shifting the lowest bit of the multiplier by one bit every time, and shifting the highest bit of the product in; after 256 operation cycles are finished, the original multiplier is completely shifted out, the lower 256 bits of the product are shifted into the shift register M1, and the upper 256 bits are provided with an accumulator ACC;
the multiplicand register M2; the device is used for storing multiplicand and is kept unchanged in the operation process;
the multiplicand controller; under the control of the shift register M1, if the output is 1, the multiplicand is output to the adder, and if the output is 0, all 0 is output to the adder, so that the multiplication 1 or the multiplication 0 step of binary multiplication operation is realized.
5. The special purpose large integer multiplication micro-controller of claim 1, wherein said 32 bit full adder inputs two 32 bit data and a 1 bit low order carry, producing a 32 bit sum output and a 1 bit high order carry.
6. The special large integer multiplication microcontroller according to claim 1, wherein said 1-bit carry machine has a carry save function; in the carry-lookahead chain mode, the carry device saves the carry of the current period and adds the carry to the lowest bit of the adder of the stage when the next calculation period is shifted and added; in the cascade mode, the carry device saves the current period carry and inputs the current period carry to the carry input end of the adder of the next stage in the next calculation period.
7. A data processing method of a dedicated large integer multiplication micro-controller according to any of claims 1 to 6, characterized in that the following operations are performed;
s1, starting multiplication operation;
s2, setting the current line number as 1 and the column number as 1;
s3, loading the data A to a multiplicand register M2;
s4, loading the data B into a shift register M1;
s5, identifying a column number;
when column number=1, loading the low 256 bits of data of the previous line partial product to the accumulator ACC, the previous line partial product initialization value being all zeros for the first line; performing unit multiplication operation to obtain 256 bits of the current row, finally accumulating in a shift register M1, adding 1 to the column number, and returning to S5;
when N > column number >1, executing unit multiplication operation to obtain 256-bit temporary product and 256-bit partial product of the current column, wherein the 256-bit temporary product is stored in an accumulator ACC as input of the multiplication operation of the next column, and the 256-bit partial product is stored in a shift register M1; adding 1 to the column number, and returning to S5;
when the column number=n, performing a unit multiplication operation to obtain two groups of 256-bit outputs of the current column, and accumulating the two groups of 256-bit outputs of the current column as two groups of 256-bit portions of the current column into a shift register M1 to S6;
s6, identifying a line number;
when the line number is not equal to M, completing the current line operation, and overlapping the partial products of N rows and 256 bits of high (N-1) partial products of the previous line to obtain the final N groups of 256 bits of partial products of the current line operation; the row number is added with 1, the column number is reset to 1, and S5 is returned;
when the line number=m, overlapping the partial products of the N groups of 256 bits with the high (N-1) 256 bits of the partial products of the previous line to obtain the final partial products of the N groups of 256 bits of the present line operation;
at this time, all the line operations are completed, the N groups of 256-bit partial products of the M-th line are copied to the RH area as the final product high order, the 256-bit final products of the M-th line are spliced and copied to the RL area as the final product low order, and the final product high order and low order are spliced as the operation result.
CN202311427949.5A 2023-10-31 2023-10-31 Special large integer multiplication microcontroller Active CN117149129B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311427949.5A CN117149129B (en) 2023-10-31 2023-10-31 Special large integer multiplication microcontroller

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311427949.5A CN117149129B (en) 2023-10-31 2023-10-31 Special large integer multiplication microcontroller

Publications (2)

Publication Number Publication Date
CN117149129A CN117149129A (en) 2023-12-01
CN117149129B true CN117149129B (en) 2024-01-26

Family

ID=88910543

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311427949.5A Active CN117149129B (en) 2023-10-31 2023-10-31 Special large integer multiplication microcontroller

Country Status (1)

Country Link
CN (1) CN117149129B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101042639A (en) * 2005-12-30 2007-09-26 英特尔公司 Multiplier
CN101790718A (en) * 2007-08-10 2010-07-28 爱特梅尔公司 Method and system for large number multiplication
CN103942028A (en) * 2014-04-15 2014-07-23 中国科学院数据与通信保护研究教育中心 Large integer multiplication method and device applied to password technology
CN107797962A (en) * 2017-10-17 2018-03-13 清华大学 Computing array based on neutral net

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101042639A (en) * 2005-12-30 2007-09-26 英特尔公司 Multiplier
CN101790718A (en) * 2007-08-10 2010-07-28 爱特梅尔公司 Method and system for large number multiplication
CN103942028A (en) * 2014-04-15 2014-07-23 中国科学院数据与通信保护研究教育中心 Large integer multiplication method and device applied to password technology
CN107797962A (en) * 2017-10-17 2018-03-13 清华大学 Computing array based on neutral net

Also Published As

Publication number Publication date
CN117149129A (en) 2023-12-01

Similar Documents

Publication Publication Date Title
EP2202635B1 (en) System and method for a multi-schema branch predictor
US5583804A (en) Data processing using multiply-accumulate instructions
US6038652A (en) Exception reporting on function generation in an SIMD processor
US11681497B2 (en) Concurrent multi-bit adder
JPH10187438A (en) How to reduce transitions for multiplier inputs.
US20230084523A1 (en) Data Processing Method and Device, and Storage Medium
US20130151821A1 (en) Method and instruction set including register shifts and rotates for data processing
JPH06202850A (en) Data processor
US7013321B2 (en) Methods and apparatus for performing parallel integer multiply accumulate operations
US6560624B1 (en) Method of executing each of division and remainder instructions and data processing device using the method
CN117149129B (en) Special large integer multiplication microcontroller
US7590235B2 (en) Reduction calculations in elliptic curve cryptography
CN117762492A (en) Data processing method, device, computer equipment and readable storage medium
JPH08221257A (en) Divider for data processor
EP1785862A2 (en) Method and apparatus for pipeline processing
JP2000039995A (en) Flexible accumulate register file to be used in high performance microprocessor
JP2000081966A (en) Arithmetic unit
JP2001142695A (en) Loading a constant into a storage location, loading a constant into a destination storage location, loading a constant into a register, determining the number of sign bits, normalizing a binary number, and instructions in a computer system
US8001358B2 (en) Microprocessor and method of processing data including peak value candidate selecting part and peak value calculating part
JP3837386B2 (en) Information processing device
JPH09223009A (en) Device and method for processing data
KR100315303B1 (en) Digital signal processor
JP2006072961A (en) Memory circuit of arithmetic processing unit
KR20080052194A (en) Reconfigurable Processor Operation Method and Apparatus
JP3894135B2 (en) Information processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant