[go: up one dir, main page]

CN115344236B - Polynomial multiplication method, polynomial multiplier, device and medium - Google Patents

Polynomial multiplication method, polynomial multiplier, device and medium Download PDF

Info

Publication number
CN115344236B
CN115344236B CN202211276167.1A CN202211276167A CN115344236B CN 115344236 B CN115344236 B CN 115344236B CN 202211276167 A CN202211276167 A CN 202211276167A CN 115344236 B CN115344236 B CN 115344236B
Authority
CN
China
Prior art keywords
polynomial
multiplier
multiplication
unit
modulo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211276167.1A
Other languages
Chinese (zh)
Other versions
CN115344236A (en
Inventor
朱敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Muchuang Integrated Circuit Design Co ltd
Original Assignee
Wuxi Muchuang Integrated Circuit Design Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Muchuang Integrated Circuit Design Co ltd filed Critical Wuxi Muchuang Integrated Circuit Design Co ltd
Priority to CN202211276167.1A priority Critical patent/CN115344236B/en
Publication of CN115344236A publication Critical patent/CN115344236A/en
Application granted granted Critical
Publication of CN115344236B publication Critical patent/CN115344236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/60Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
    • G06F7/72Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
    • G06F7/724Finite field arithmetic
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/60Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
    • G06F7/72Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
    • G06F7/728Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic using Montgomery reduction

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a polynomial multiplication operation method, a polynomial multiplier, equipment and a medium, which are applied to the technical field of computers and comprise the following steps: respectively carrying out merging NTT transformation on input polynomials a (x) and B (x) to obtain a polynomial A (x) with a coefficient a and a polynomial B (x) with a coefficient B, carrying out point multiplication operation on the coefficient a and the coefficient B to obtain a point multiplication operation result C (x), and carrying out merging INTT transformation on the C (x) to obtain C (x), wherein the operation positions of the coefficient a and the coefficient B are exchanged in the process of carrying out the modulo reduction operation of the merging INTT transformation on the C (x). The invention solves the technical problem that the storage capacity of the constant of the existing polynomial multiplication is too large, and can realize the technical effect of reducing half of the storage capacity and storage overhead of the constant.

Description

Polynomial multiplication method, polynomial multiplier, device, and medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a polynomial multiplication method, a polynomial multiplier, a device, and a medium.
Background
Polynomial multiplication in a finite field is a main bottleneck of hardware implementation performance and efficiency of a lattice-based cryptographic scheme, and Number Theoretical Transformation (NTT) can effectively reduce the operation complexity of the process. The parameter diversity of NTT presents many practical challenges to a unified and efficient hardware implementation.
The prior art mainly has the following two defects: on one hand, the flexibility of circuit configuration is lost, the dynamic adjustment of algorithm parameters cannot be adapted, and only partial parameters or limited calculation modes can be supported. On the other hand, the utilization efficiency of hardware is low, and in order to support all algorithms, the calculation bit width of a data path needs to be compatible with the maximum value of the modulus q, so that the resource utilization rate of the hardware is too low when the public key encryption algorithm is executed, and the data storage capacity is too large.
Disclosure of Invention
The invention mainly aims to provide a polynomial multiplication method, a polynomial multiplier, equipment and a medium, which can solve the technical problem that the storage capacity of the existing polynomial multiplication constant is too large.
To achieve the above object, a first aspect of the embodiments of the present invention provides a polynomial multiplication method, including:
combining the pretreatment step with NTT transformation to obtain combined NTT transformation, and respectively carrying out the combined NTT transformation on input polynomials a (x) and B (x) to obtain polynomials A (x) and B (x), wherein the coefficient of the polynomial A (x) is a, and the coefficient of the polynomial B (x) is B;
performing a point multiplication operation on the coefficient a of the polynomial A (x) and the coefficient B of the polynomial B (x) to obtain a point multiplication operation result C (x);
combining the post-processing step with INTT transformation to obtain combined INTT transformation, and performing the combined INTT transformation on the point multiplication operation result C (x) to obtain C (x);
wherein, in the process of performing the modulo reduction operation of the merged INTT transform on the dot product operation result C (x), the operation positions of the coefficient a and the coefficient b are exchanged.
In an embodiment of the present invention, the dot product operation uses Montgomery algorithm.
In an embodiment of the present invention, a modulus of the dot product result C (x) is q, and the method further includes:
acquiring the bit width of the modulus q;
under the condition that the bit width of the modulus q is within a preset first range value, performing polynomial multiplication operation in a dual-channel mode;
under the condition that the bit width of the modulus q is within a preset second range value, executing polynomial multiplication operation in a single-channel mode;
wherein the first range of values and the second range of values do not coincide.
In an embodiment of the present invention, the first range value is equal to or less than gbits, the second range value is greater than gbits and less than 2 gbits, and G is a positive integer.
A second aspect of the embodiments of the present invention provides a polynomial multiplier for performing the polynomial multiplication method according to the first aspect, including a modulo operation unit, where the modulo operation unit includes a plurality of data selectors, a first modulo reduction unit, a modulo multiplication unit, a second modulo reduction unit, and a modulo addition unit;
the operation types which can be configured to be executed by the modular operation unit comprise CT-BFU, GS-BFU, modular addition operation, modular subtraction operation and modular multiplication operation;
wherein, in the case of the operation type GS-BFU that the modulo operation unit is configured to perform, the first modulo reduction unit reads the input coefficient a of polynomial a (x) and the input coefficient B of polynomial B (x) in reverse order, performing a modulo reduction operation.
In an embodiment of the present invention, the modular multiplication unit employs a Montgomery algorithm.
In an embodiment of the present invention, the data path modes of the modulo operation unit include a single path mode and a multi-path mode;
in the case that the bit width of the modulus q is within a preset first range value, the data path mode of the modulo operation unit is configured as a two-channel mode;
in case the bit width of the modulus q is within a preset second range of values, the data path mode of the modulo operation unit is configured as single channel mode.
In an embodiment of the present invention, the modular multiplication unit includes a first multiplier, a second multiplier, a third multiplier, a zero-judging unit, and a modular adder;
the first multiplier, the second multiplier and the third multiplier are cascaded, an output end of the first multiplier is connected with an input end of the zero judging unit and a first input end of the modulo adder, an output end of the zero judging unit is connected with a second input end of the modulo adder, and an output end of the third multiplier is connected with a third input end of the modulo adder.
In an embodiment of the present invention, the first multiplier, the second multiplier and the third multiplier each include: the system comprises a KA multiplier, a first data selector and a second data selector;
the first output end of the KA multiplier is connected with the second data input end of the first data selector and one data input end of the second data selector, and the second output end of the KA multiplier is connected with the first data input end of the first data selector and the second data input end of the second data selector.
A third aspect of an embodiment of the present invention provides an electronic device, including:
the present invention also provides a method of polynomial multiplication provided by the first aspect of embodiments of the present invention, wherein the polynomial multiplication is performed by a processor.
A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the polynomial multiplication method provided in the first aspect of the embodiments of the present invention.
As can be seen from the foregoing embodiments of the present invention, the polynomial multiplication method, the polynomial multiplier, the device, and the medium provided in the present invention combine the pre-processing step with the NTT transform to obtain a combined NTT transform, and respectively perform the combined NTT transform on the input polynomials a (x) and B (x) to obtain the polynomials a (x) and B (x), where the coefficient of the polynomial a (x) is a, the coefficient of the polynomial B (x) is B, perform a point multiplication operation on the coefficient a of the polynomial a (x) and the coefficient B of the polynomial B (x) to obtain a point multiplication result C (x), combine the post-processing step with the INTT transform to obtain a combined INTT transform, and perform the combined INTT transform on the point multiplication result C (x) to obtain C (x), where the operation positions of the coefficient a and the coefficient B are exchanged in a modulo reduction operation process of the combined t transform on the point multiplication result C (intx). Compared with the prior art that 2M constants need to be stored, the method and the device can only store M constants, and reduce the storage cost of the constants.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 schematically shows a flow chart of a method of classical polynomial multiplication.
FIG. 2 schematically shows a flow diagram of a method of polynomial multiplication according to an embodiment of the invention.
FIG. 3 schematically illustrates a data flow diagram using a method of polynomial multiplication in accordance with an embodiment of the present invention.
Fig. 4 schematically shows a data flow diagram of an improved GS-BFU according to an embodiment of the present invention.
Fig. 5 schematically shows a structural diagram of a polynomial multiplier according to an embodiment of the invention.
Fig. 6 schematically shows a structural diagram of a modulo operation unit according to an embodiment of the present invention.
Fig. 7 schematically shows a structural diagram of a modular multiplication unit according to an embodiment of the present invention.
Fig. 8 schematically shows a structural diagram of a first multiplier according to an embodiment of the present invention.
Fig. 9 schematically shows a block diagram of an electronic device adapted to implement a method of polynomial multiplication according to an embodiment of the invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It is to be understood that this description is made only by way of example and not as a limitation on the scope of the invention. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B, and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, and C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.).
In the technical scheme of the invention, the data acquisition, collection, storage, use, processing, transmission, provision, invention, application and other processing are all in accordance with the regulations of relevant laws and regulations, necessary security measures are taken, and the public order and good custom are not violated.
Fig. 1 schematically shows a flow chart of a method of classical polynomial multiplication.
As shown in fig. 1, the calculation flow of the classical polynomial multiplication method is as follows: the input polynomials a (x) and b (x) are multiplied to obtain the resultant polynomial c (x). Wherein a (x), b (x), c (x) are all
Figure 38098DEST_PATH_IMAGE001
Polynomials on rings, each element in a ring being represented in a form of a polynomial of order N, each coefficient of the polynomial being in the field
Figure 361763DEST_PATH_IMAGE002
And (4) inside.
The specific flow of the existing polynomial multiplication method sequentially comprises the following steps: (1) Pretreatment of polynomials
Figure 13936DEST_PATH_IMAGE003
Wherein
Figure 940304DEST_PATH_IMAGE004
Is a 2N-order primitive unit root of the domain; (2) NTT (Number therapeutic Transform) transformation; (3) multiplying the coefficients of the frequency domain correspondingly; (4) INTT (Inverse Number therapeutic Transform) Transform; (5) Polynomial post-processing
Figure 364463DEST_PATH_IMAGE005
. The algorithm flow needs to carry out pre-processing and post-processing on the coefficients of the polynomial, and has extra multiplication operation cost and storage cost of constant coefficients.
The embodiment of the invention provides a polynomial multiplication operation method, which can be used in the processes of key generation, encryption and decryption of a lattice cipher algorithm, is used for completing the operation acceleration of polynomial multiplication and unloading the performance bottleneck of a system, and comprises the following steps: combining the preprocessing step with NTT transformation to obtain combined NTT transformation, and respectively carrying out the combined NTT transformation on input polynomials a (x) and B (x) to obtain polynomials A (x) and B (x), wherein the coefficient of the polynomial A (x) is a, the coefficient of the polynomial B (x) is B, carrying out point multiplication operation on the coefficient a of the polynomial A (x) and the coefficient B of the polynomial B (x) to obtain a point multiplication operation result C (x), combining the post-processing step with INTT transformation to obtain combined INTT transformation, carrying out the combined INTT transformation on the point multiplication operation result C (x) to obtain C (x), wherein the operation positions of the coefficient a and the coefficient B are exchanged in the process of carrying out modular subtraction operation on the point multiplication operation result C (x) on the combined INTT transformation. Compared with the prior art that 2M constants need to be stored, the method and the device can only store M constants, and reduce the storage cost of the constants.
FIG. 2 schematically shows a flow diagram of a method of polynomial multiplication according to an embodiment of the invention.
As shown in FIG. 2, the polynomial multiplication method of the embodiment includes operations S210 to S230.
In operation S210, the preprocessing step is combined with the NTT transform to obtain a combined NTT transform, and the combined NTT transform is performed on the input polynomials a (x) and B (x), respectively, to obtain polynomials a (x) and B (x), where the coefficient of the polynomial a (x) is a and the coefficient of the polynomial B (x) is B.
In operation S220, a dot product operation is performed on the coefficients a of the polynomial a (x) and the coefficients B of the polynomial B (x) to obtain a dot product operation result C (x).
In operation S230, the post-processing step and the INTT transform are combined to obtain a combined INTT transform, and the combined INTT transform is performed on the dot product result C (x) to obtain C (x).
And exchanging the operation positions of the coefficient a and the coefficient b in the process of performing the modulo reduction operation of the merged INTT transform on the dot product operation result C (x).
The NTT algorithm implementation includes two algorithms, namely, cooley Tukey and Gentleman Sande, which respectively correspond to two structures, namely, CT-BFU (butterfly unit) in a time domain and GS-BFU in a frequency domain.
Referring to FIG. 3, FIG. 3 shows the polynomial dimension equal to 8 (the number of NTT stages is log) 2 8= 3), a data flow diagram using a polynomial multiplication method according to the present invention is given, in which the NTT transform uses CT-BFU and the INTT transform uses GS-BFU. As shown in fig. 3, the pre-processing of the polynomial coefficients is implicitly incorporated into the NTT transform and the post-processing into the INTT transform. For a polynomial coefficient with the dimension of M, 2M modular multiplication operations are reduced, and storage of M constants is reduced.
In the invention, the data flow diagram of the existing GS-BFU is improved, and the specific change is shown in FIG. 4, wherein the value of one output port is changed from (a-b) w to (b-a) w. This modification, which utilizes the semi-nature of the twiddle factor w, can be deduced from the elimination theorem: twiddle factors for NTT and INTT stages
Figure 452505DEST_PATH_IMAGE006
The corresponding sets are essentially the same number of sets.
Figure 1429DEST_PATH_IMAGE007
Formula (1)
Therefore, only any part of constants need to be stored, and the other part of constants pass through a finite field
Figure 528225DEST_PATH_IMAGE008
The inversion is achieved only by storing M constants, more inversion operation is achieved by exchanging the input sequence of the modular subtraction operation, namely the original input sequence is a-b, and the input sequence of the modular subtraction operation is b-a, so that extra computing resources are prevented from being introduced.
In an embodiment of the present invention, the point multiplication operation uses a Montgomery algorithm, which can reduce the number of critical paths of polynomial multiplication operation.
In an embodiment of the present invention, the modulus of the dot product result C (x) is q, and the method further includes: acquiring the bit width of the modulus q, executing polynomial multiplication operation by adopting a double-path mode under the condition that the bit width of the modulus q is within a preset first range value, and executing polynomial multiplication operation by adopting a single-path mode under the condition that the bit width of the modulus q is within a preset second range value, wherein the first range value is not overlapped with the second range value.
In an embodiment of the present invention, the first range value is equal to or less than gbits, the second range value is greater than gbits and less than 2 gbits, and G is a positive integer.
In one example, G =15, when the bit width of the modulus q is within a preset first range of values, i.e., q ≦ 2 15 (that is, the bit width of the modulus q does not exceed 15 bits), the method can be configured into a dual-path mode, a data path of 30 bits inside the method can be divided into an upper part and a lower part, a plurality of coefficients of a polynomial are processed or NTT conversion of two polynomials is performed in parallel, and when the bit width of the modulus q is within a preset second range value, that is, 2 bits 15 <q<2 30 (i.e., the bit width of the modulus q exceeds 15 bits), the data path inside the accelerator will be configured in single path mode to support the current NTT transform parameters.
Fig. 5 schematically shows a structural diagram of a polynomial multiplier according to an embodiment of the invention.
As shown in fig. 5, the polynomial multiplier includes a modulo operation unit, a coefficient storage unit, a constant storage unit, a read/write reordering unit, a control unit, and an address generation unit. The modular arithmetic unit is used for realizing operations such as CT-BFU, GS-BFU, modular addition, modular subtraction, modular multiplication and the like according to different configuration information and supporting two working modes of a single channel and a double channel. The coefficient storage unit is used for storing polynomial coefficients and intermediate results participating in polynomial multiplication. The constant storage unit is used for mainly storing parameters configuring functions, including pre-calculated constants and the like. The read/write sorting unit is used for corresponding two independent coefficients on each data bus in the dual-channel working mode provided by the invention, and the read/write rearrangement unit sends the corresponding coefficients to the input ports of the corresponding operation units or stores the corresponding coefficients back into the storage unit in a special hardware routing mode under the condition that 4 coefficients simultaneously participate in the operation in the calculation, thereby reducing the cost and the conflict of accessing the memory. The control unit is used for outputting control signals and is responsible for signal interaction with peripheral circuits. The address generating unit is used for realizing conflict-free access to the coefficient storage unit in cooperation with the distribution condition of the coefficients in storage, namely ensuring that two coefficients are stored on the same storage address at the same time in different ways.
The working flow of the polynomial multiplier of the embodiment of the present invention is described by taking the multiplication of the polynomial a (x) and the polynomial b (x) as an example:
s1, static configuration information loading configuration: receiving global configuration under the cooperation of the control unit, storing a pre-calculation constant of the twiddle factor into a constant storage unit, and completing static configuration of a calculation path according to NTT parameters;
s2, reading the coefficient a of the polynomial a (x) into a coefficient storage unit in the polynomial multiplier, executing NTT calculation after the control unit completes the configuration of the modular arithmetic unit, and organizing data into an access mode required by the modular arithmetic unit in the current or next round of conversion by the reading rearrangement unit and the writing rearrangement unit in each round of NTT conversion process;
s3, repeating S1-S2 to complete NTT transformation of the polynomial b (x);
s4, completing the one-to-one corresponding multiplication of the frequency domain coefficients of the polynomials a (x) and b (x) under the scheduling of the control unit, and storing an operation result C (x) into a coefficient storage unit;
s5, finishing INTT transformation under the condition that the coefficient of the operation result C (x) is configured to be GS-BFU by the modular operation unit to obtain C (x);
the coefficients of S6, c (x) are stored in the coefficient storage unit of the polynomial multiplier, and are prepared for subsequent continuous operation, and can also be sent out through the output port.
Fig. 6 schematically shows a structural diagram of a modulo operation unit according to an embodiment of the present invention.
As shown in fig. 6, the modulo operation unit of this embodiment includes a plurality of data selectors, a first modulo reduction unit, a modulo multiplication unit, a second modulo reduction unit, and a modulo addition unit, and the types of operation that the modulo operation unit can be configured to perform include CT-BFU, GS-BFU, modulo addition operation, modulo reduction operation, and modulo multiplication operation, wherein in the case of the type of operation GS-BFU that the modulo operation unit is configured to perform, the first modulo reduction unit reads the input coefficient a of polynomial a (x) and the coefficient B of polynomial B (x) in reverse order to exchange the positions of the coefficient a and the coefficient B to perform the modulo reduction operation. 2M modular multiplication operations can be reduced, and the number of storage constants is reduced from 2M to M.
In an embodiment of the present invention, the data path mode of the modulo operation unit includes a single path mode and a multi-path mode, where the data path mode of the modulo operation unit is the single path mode when the type of operation performed by the modulo operation unit is CT-BFU, and the data path mode of the modulo operation unit is the dual path mode when the type of operation performed by the modulo operation unit is GS-BFU.
In the present invention, the control signals include three control signals, which are respectively used to control the data selector in the control arithmetic unit to implement different functions, wherein the first two control signals configure the calculation function, and the last control signal configures the data path mode, as shown in table 1. When the calculation function control signal =00, the operation type executed by the modulo operation unit is configured to be CT-BFU; when the calculation function control signal =01, the type of the operation executed by the modulo operation unit is GS-BFU; when the calculation function control signal =10, the operation type executed by the modular operation unit is configured to be modular multiplication operation; when the calculation function control signal =01, the operation types configured to be executed by the modulo operation unit are modulo addition operation and modulo subtraction operation; when the path mode control signal =0, the data path mode configured to be executed by the modulo operation unit is a single path mode; when the lane mode control signal =1, the data lane mode that the modulo operation unit is configured to execute is a dual lane mode.
TABLE 1
Figure 462683DEST_PATH_IMAGE009
In an embodiment of the present invention, the modular multiplication unit employs a Montgomery algorithm.
Fig. 7 schematically shows a block diagram of a modular multiplication unit according to an embodiment of the present invention.
As shown in fig. 7, the modulo multiplication unit includes a first multiplier, a second multiplier, a third multiplier, a zero-judging unit, and a modulo adder. The first multiplier, the second multiplier and the third multiplier are cascaded, the output end of the first multiplier is connected with the input end of the zero judging unit and the first input end of the modular adder, the output end of the zero judging unit is connected with the second input end of the modular adder, and the output end of the third multiplier is connected with the third input end of the modular adder. The modular multiplication unit can reduce the critical path and realize the maximized hardware utilization rate under different parameter configurations.
The zero judging unit is used for outputting the carry data to be 1 when the input data is not equal to 0 and outputting the carry data to be 0 when the input data is equal to 0.
In an example, taking the input operands x and y with 30 bits as an example, the multiplication result z under modulo q is calculated for the two 30-bit operands x and y. Since the second multiplier is modulo the intermediate result, half of the data in the upper half is discarded, thus reducing the number of half-order multipliers by half, and the first and second multipliers are full-order multipliers. As can be seen from the figure, the critical path affecting the final result comes from three cascaded multipliers. In the Montgomery algorithm, the lower half of the final multiplication result is discarded by right shifting. Therefore, the low-bit information of the operation result can be obtained in advance through the pre-operation, namely, the shift truncation is performed first and then the operation is performed, instead of the accurate calculation and then the shift truncation, and the critical path is effectively reduced.
According to the invention, after a first multiplier, the low 30-bit data (namely Temp _ L and Temp _ H in the figure are high 30 bits) in the 60-bit result output by the first multiplier is judged, namely a zero judgment unit is adopted, and when Temp _ L =0, the carry of the low 30-bit data in the final modular adder is zero; when Temp _ L ≠ 0, the carry is 1. Therefore, the design can finally reduce the size of the modular adder by half, and the critical path of the whole modular multiplier can be reduced by 15%.
Fig. 8 schematically shows a block diagram of a first multiplier according to an embodiment of the invention.
As shown in fig. 8, the first multiplier, the second multiplier and the third multiplier in fig. 7 each include: a KA multiplier (Karatsuba multiplier), a first data selector and a second data selector, wherein the first output end of the KA multiplier is connected with the second data input end of the first data selector and one data input end of the second data selector, and the second output end of the KA multiplier is connected with the first data input end of the first data selector and the second data input end of the second data selector. The modular multiplication unit may support a dual-pass mode.
In the present invention, the Karatsuba multiplier is used to efficiently obtain the partial product and the full product of two operands to implement a two-pass configuration of a polynomial multiplier, the specific circuit implementation of which is shown in fig. 8. The main working principle is as follows: taking the multiplication of two 30-bit data, M [29 ] and N [29 ] 0, according to the Karatsuba multiplication principle, they can be expressed as:
Figure 737938DEST_PATH_IMAGE010
formula (2)
Wherein M is 1 A high of M15bit, N 1 A high of N15bit 0 A low of M15bit, N 0 Then it is a low 15bit of N. The product P of the two can be written as:
Figure 351322DEST_PATH_IMAGE011
formula (3)
If M in the above formula is to be 1 N 1 By P 1 Denotes that M 0 N 0 With P 0 Is represented by (M) 0 +M 1 )(N 0 +N 1 ) By P 01 If expressed, the above formula can be finally rewritten as:
Figure 107575DEST_PATH_IMAGE012
formula (4)
In fig. 8, the rightmost second data selector gates the correct result to the corresponding high-order output and low-order output according to the path configuration information, so as to support the dual-path mode.
According to the embodiment of the present invention, any of the coefficient storage unit, the constant storage unit, and the read/write rearrangement module may be combined into one module to be implemented, or any one of the modules may be divided into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to the embodiment of the present invention, at least one of the coefficient storage unit, the constant storage unit, and the read/write rearrangement module may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of or a suitable combination of software, hardware, and firmware. Alternatively, at least one of the coefficient storage unit, the constant storage unit, the read/write reordering module may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.
Fig. 9 schematically shows a block diagram of an electronic device adapted to implement a method of polynomial multiplication according to an embodiment of the invention.
As shown in fig. 9, an electronic apparatus 900 according to an embodiment of the present invention includes a processor 901 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present invention.
In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. The processor 901 performs various operations of the method flow according to the embodiment of the present invention by executing programs in the ROM 902 and/or the RAM 903. Note that the program may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of method flows according to embodiments of the present invention by executing programs stored in the one or more memories.
Electronic device 900 may also include input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the invention. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
The present invention also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the present invention.
According to embodiments of the present invention, the computer readable storage medium may be a non-volatile computer readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the invention, a computer-readable storage medium may include the ROM 902 and/or the RAM 903 described above and/or one or more memories other than the ROM 902 and the RAM 903.
Embodiments of the invention also include a computer program product comprising a computer program comprising program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the item recommendation method provided by the embodiment of the invention.
Which when executed by the processor 901 performs the above-described functions defined in the system/apparatus of an embodiment of the present invention. The above described systems, devices, modules, units, etc. may be implemented by computer program modules according to embodiments of the invention.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, and the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, and downloaded and installed through the communication section 909 and/or installed from the removable medium 911. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present invention. The above described systems, devices, apparatuses, modules, units, etc. may be implemented by computer program modules according to embodiments of the present invention.
According to embodiments of the present invention, program code for executing a computer program provided by embodiments of the present invention may be written in any combination of one or more programming languages, and in particular, the computer program may be implemented using a high level procedural and/or object oriented programming language, and/or an assembly/machine language. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It will be appreciated by a person skilled in the art that features described in the various embodiments of the invention may be combined in various ways and/or combinations, even if such combinations or combinations are not explicitly described in the invention. In particular, various combinations and/or subcombinations of the features described in various embodiments of the invention may be made without departing from the spirit and teachings of the invention. All such combinations and/or associations fall within the scope of the present invention.
The embodiments of the present invention have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used advantageously in combination. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the invention, and these alternatives and modifications are intended to fall within the scope of the invention.

Claims (11)

1. A method of polynomial multiplication, comprising:
combining the pretreatment step with NTT transformation to obtain combined NTT transformation, and respectively carrying out the combined NTT transformation on input polynomials a (x) and B (x) to obtain polynomials A (x) and B (x), wherein the coefficient of the polynomial A (x) is a, and the coefficient of the polynomial B (x) is B;
performing a point multiplication operation on the coefficient a of the polynomial A (x) and the coefficient B of the polynomial B (x) to obtain a point multiplication operation result C (x);
combining the post-processing step with INTT transformation to obtain combined INTT transformation, and performing the combined INTT transformation on the point multiplication operation result C (x) to obtain C (x);
wherein, in the process of performing the modulo reduction operation of the merged INTT transform on the dot product operation result C (x), the operation positions of the coefficient a and the coefficient b are exchanged.
2. The polynomial multiplication method of claim 1 wherein the point multiplication uses Montgomery's algorithm.
3. The polynomial multiplication method of claim 1 wherein the modulus of the result of the dot product operation C (x) is q, the method further comprising:
acquiring the bit width of the modulus q;
under the condition that the bit width of the modulus q is within a preset first range value, performing polynomial multiplication operation in a dual-channel mode;
under the condition that the bit width of the modulus q is within a preset second range value, executing polynomial multiplication operation in a single-channel mode;
wherein the first range of values is not coincident with the second range of values.
4. The polynomial multiplication method of claim 3 wherein the first range of values is equal to or less than G bits, the second range of values is greater than G bits and less than 2G bits, and G is a positive integer.
5. A polynomial multiplier for performing the polynomial multiplication method according to any one of claims 1 to 4, comprising a modulo unit comprising a plurality of data selectors, a first modulo reduction unit, a modulo multiplication unit, a second modulo reduction unit and a modulo addition unit;
the operation types which can be configured to be executed by the modular operation unit comprise CT-BFU, GS-BFU, modular addition operation, modular subtraction operation and modular multiplication operation;
wherein, in the case that the type of the operation performed by the modular operation unit is GS-BFU, the first modular reduction unit reads the coefficient a of the input polynomial A (x) and the coefficient B of the polynomial B (x) in reverse order to perform the modular reduction operation.
6. The polynomial multiplier of claim 5 wherein said modular multiplication element employs the Montgomery algorithm.
7. The polynomial multiplier of claim 5 wherein the data path modes of the modulo unit comprise a single path mode and a multiple path mode;
in the case that the bit width of the modulus q is within a preset first range value, the data path mode of the modulo operation unit is configured as a two-channel mode;
in case the bit width of the modulus q is within a preset second range of values, the data path mode of the modulo operation unit is configured as single channel mode.
8. The polynomial multiplier of claim 5 wherein the modular multiplication unit comprises a first multiplier, a second multiplier, a third multiplier, a zero-evaluation unit and a modular adder;
the first multiplier, the second multiplier and the third multiplier are cascaded, an output end of the first multiplier is connected with an input end of the zero judging unit and a first input end of the modulo adder, an output end of the zero judging unit is connected with a second input end of the modulo adder, and an output end of the third multiplier is connected with a third input end of the modulo adder.
9. The polynomial multiplier of claim 8 wherein the first multiplier, the second multiplier and the third multiplier each comprise: the system comprises a KA multiplier, a first data selector and a second data selector;
the first output end of the KA multiplier is connected with the second data input end of the first data selector and one data input end of the second data selector, and the second output end of the KA multiplier is connected with the first data input end of the first data selector and the second data input end of the second data selector.
10. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the polynomial multiplication method according to any of claims 1 to 4 when executing the computer program.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the polynomial multiplication method according to any one of claims 1 to 4.
CN202211276167.1A 2022-10-19 2022-10-19 Polynomial multiplication method, polynomial multiplier, device and medium Active CN115344236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211276167.1A CN115344236B (en) 2022-10-19 2022-10-19 Polynomial multiplication method, polynomial multiplier, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211276167.1A CN115344236B (en) 2022-10-19 2022-10-19 Polynomial multiplication method, polynomial multiplier, device and medium

Publications (2)

Publication Number Publication Date
CN115344236A CN115344236A (en) 2022-11-15
CN115344236B true CN115344236B (en) 2023-03-24

Family

ID=83957604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211276167.1A Active CN115344236B (en) 2022-10-19 2022-10-19 Polynomial multiplication method, polynomial multiplier, device and medium

Country Status (1)

Country Link
CN (1) CN115344236B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196052B (en) * 2023-09-28 2025-08-12 本源量子计算科技(合肥)股份有限公司 Polynomial modular multiplication operator, operation method and related device
CN117240601B (en) * 2023-11-09 2024-03-26 深圳大普微电子股份有限公司 Encryption processing method, encryption processing circuit, processing terminal, and storage medium
CN117714054B (en) * 2024-02-01 2024-04-23 山东大学 Key encapsulation light-weight method, system, medium and equipment based on number theory transformation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101588234B (en) * 2008-05-19 2013-10-02 北京大学深圳研究生院 Encryption and decryption multiplexing method of row mixing conversion module in AES
CN106873942B (en) * 2017-01-05 2019-03-15 阜阳师范学院 The method that the MSD multiplication of structure amount computer calculates
US11995184B2 (en) * 2021-09-24 2024-05-28 Intel Corporation Low-latency digital signature processing with side-channel security
CN114968173A (en) * 2022-05-25 2022-08-30 中国人民武装警察部队工程大学 Polynomial Multiplication Operation Method and Polynomial Multiplier Based on NTT and INTT Structure

Also Published As

Publication number Publication date
CN115344236A (en) 2022-11-15

Similar Documents

Publication Publication Date Title
CN115344236B (en) Polynomial multiplication method, polynomial multiplier, device and medium
Roy et al. FPGA-based high-performance parallel architecture for homomorphic computing on encrypted data
US11693662B2 (en) Method and apparatus for configuring a reduced instruction set computer processor architecture to execute a fully homomorphic encryption algorithm
CN112200713B (en) Business data processing method, device and equipment in federal learning
CN105912501B (en) A kind of SM4-128 Encryption Algorithm realization method and systems based on extensive coarseness reconfigurable processor
US9767074B2 (en) Method and device for fast fourier transform
Li et al. Scalable and parallel optimization of the number theoretic transform based on FPGA
CN114968173A (en) Polynomial Multiplication Operation Method and Polynomial Multiplier Based on NTT and INTT Structure
US10771947B2 (en) Methods and apparatus for twiddle factor generation for use with a programmable mixed-radix DFT/IDFT processor
CN101847137B (en) FFT processor for realizing 2FFT-based calculation
US11764942B2 (en) Hardware architecture for memory organization for fully homomorphic encryption
CN117971163B (en) Butterfly unit-based computing architecture, computing method and device
JP2012502379A (en) Method and apparatus for computing a matrix for discrete Fourier transform (DFT) coefficients
US11546161B2 (en) Zero knowledge proof hardware accelerator and the method thereof
US11740869B2 (en) Scheduling atomic field operations in jacobian coordinates used in elliptic curve cryptography scalar multiplications
US11829322B2 (en) Methods and apparatus for a vector memory subsystem for use with a programmable mixed-radix DFT/IDFT processor
US20240411833A1 (en) Architecture for number theoretic transform and inverse number theoretic transform
CN102611667A (en) Random access detection FFT/IFFT (Fast Fourier Transform Algorithm/Inverse Fast Fourier Transform) processing method and device
Duan et al. SMBHA: A System-Level Multicore BGV Hardware Accelerator Based on FPGA
More et al. FPGA implementation of FFT processor using vedic algorithm
CN115765990B (en) NTRU security co-processor of post quantum cryptography algorithm
HK40044438B (en) Business data processing method, device and apparatus in federated learning
CN120045513B (en) A computing unit and FFT processor based on coarse-grained reconfigurable architecture
US20220350570A1 (en) Pipelined hardware to accelerate modular arithmetic operations
Li et al. A CPU-GPU-based parallel search algorithm for the best differential characteristics of block ciphers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant