[go: up one dir, main page]

CN115062768B - Softmax hardware implementation method and system of logic resource limited platform - Google Patents

Softmax hardware implementation method and system of logic resource limited platform Download PDF

Info

Publication number
CN115062768B
CN115062768B CN202210790639.9A CN202210790639A CN115062768B CN 115062768 B CN115062768 B CN 115062768B CN 202210790639 A CN202210790639 A CN 202210790639A CN 115062768 B CN115062768 B CN 115062768B
Authority
CN
China
Prior art keywords
function
constant
adder
accumulation
shifter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210790639.9A
Other languages
Chinese (zh)
Other versions
CN115062768A (en
Inventor
葛伟
刘殊赫
许艳鸿
韦社年
王一飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210790639.9A priority Critical patent/CN115062768B/en
Publication of CN115062768A publication Critical patent/CN115062768A/en
Application granted granted Critical
Publication of CN115062768B publication Critical patent/CN115062768B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Complex Calculations (AREA)

Abstract

本发明公开了一种逻辑资源受限平台的Softmax硬件实现方法及系统,针对任意n个输入x1,x2,....,xn,完成标量到概率的转换,本发明通过函数等价变换,乘幂基数和对数底数替换,函数拟合,串行累加,指数运算单元复用,仅用有限的基本运算逻辑单元实现复杂的函数,将原函数的幂函数和除法组合变换为幂函数和对数函数的组合,同时根据运算特点和数据范围进行精度可控的函数拟合,省去大量的计算时间和迭代过程,并利用串行累加和函数单元复用有效降低硬件实现面积和功耗成本。

The present invention discloses a Softmax hardware implementation method and system for a logic resource-constrained platform. For any n inputs x1 , x2 , ..., xn , the conversion from scalar to probability is completed. The present invention realizes complex functions with only limited basic operation logic units through function equivalent transformation, power base and logarithm base replacement, function fitting, serial accumulation, and exponential operation unit reuse. The power function and division combination of the original function is transformed into a combination of power function and logarithm function. At the same time, the function fitting with controllable accuracy is performed according to the operation characteristics and data range, thereby saving a lot of calculation time and iteration process, and effectively reducing the hardware implementation area and power consumption cost by using serial accumulation and function unit reuse.

Description

Softmax hardware implementation method and system of logic resource limited platform
Technical Field
The invention relates to a software max hardware implementation method and system of a logic resource limited platform, belonging to the technical field of hardware implementation of a neural network activation function.
Background
In the field of big data, deep Neural Networks (DNNs) have achieved great success, and efficient hardware architecture has been the goal of academia and industry. Wherein the Softmax layer is widely used for different DNNs. The Softmax function is typically used as an activation function for the output layer in classification tasks, which maps the output of multiple neurons into (0, 1) intervals, which has a very wide range of applications in machine learning and deep learning. Especially in dealing with multi-classification (C > 2) problems, the final output unit of the classifier requires a Softmax function for numerical processing. The expression is as follows: the exponentiation and division calculations in the Softmax function are quite expensive, especially in embedded systems, and many look-up table based implementations of the prior art introduce excessive resource consumption and are complex. There is therefore a need for a rational way to deploy functions on hardware that ensures low resource consumption and that can be implemented efficiently.
Disclosure of Invention
Aiming at the problems, the invention provides a software max hardware implementation of a logic resource limited platform, which solves the problems of high cost and low efficiency of software max function hardware implementation by utilizing function fitting and serial accumulation. The technical proposal is as follows:
The complete technical scheme of the invention is that a software max hardware implementation method of a logic resource limited platform comprises the following steps:
1) The original Softmax function expression And (3) performing transformation:
Wherein n is the total number of inputs to Softmax, i is the index of the corresponding inputs and outputs, i=1.
2) Calculating an exponential function of each gating input x 1,x2,....,xn Where radix e is replaced with radix 2:
3) Calculating the accumulated sum of the exponentiations of each input x 1,x2,....,xn in step 2):
4) Calculating the natural logarithm of the accumulated result in the step 3):
f_ln=ln(f)
5) Calculating the exponent power of each strobe input x 1,x2,....,xn added to the calculation result f_ln of step 4), respectively, where radix e is replaced by radix 2:
6) Storing each result R (i) of the calculation of step 5) in a register to obtain the final total output R.
Further, in steps 2) and 5), the same exponent operation module is used for calculation, the radix e is replaced by the radix 2, and the exponent operation module is composed of two adders, a constant multiplier and a shifter:
u is the integer portion of |y.log 2 e|, i.e., the integer portion of the truncated fixed point number, and v is the fractional portion of |y.log 2 e|, i.e., the fractional portion of the truncated fixed point number. The absolute value is represented in fixed-point hardware by judging the sign bit, the absolute value of the positive number is the original value, and the absolute value of the negative number is inverted and added by one. Such as in step 2) and step 5) Or (b)The first adder calculates x i -0 or x i -f_ln, the constant multiplier calculates (x i-f_ln).log2 e or x i.log2 e; intercept integer part of absolute value to get u, the second adder realizes fitting function 2 v≈v+b1, where b 1 is constant, and finally the calculated result is obtained by shifting u to left or right of 2 v.
Further, in step 3), a serial accumulation module is used for calculation, and an accumulation enable signal counter is used for control.
Further, in step 4), a logarithmic operation module is used to calculate, and the base e is replaced by the base 2, where the logarithmic operation module is composed of a leading 1 detector (LOD), a decoder and a right shifter, and a constant adder and adder:
ln(f)=ln2*log2f=ln2*(w+log2k)
t is the intermediate value where the most significant bit of f is 1 and the other bits are 0, w is the index where the most significant bit of f is, k is the remainder of f after scaling, and k e [1, 2), for example for sixteen fixed point numbers:
If f=16' b0000_1011_1111.0011,
Then t=16 'b0000_1000_0000.0000, w=4, k=16' b0000_1.011_1111_0011;
if f=16' b0000_0011_1111.0011,
Then t=16 'b0000_0010_0000.0000, w= 6,k =16' b0000_001.1_1111_0011.
Further, in step 4), ln (f) is calculated, LOD is used to obtain an intermediate value t with only the most significant bit of f being 1 and the other bits being 0, the decoder input t obtains the index w with the most significant bit of f being located, the right shifter shifts f right by w to obtain k, the constant adder implements a fitting function log 2k≈k+b2, where b 2 is a constant, and finally the constant multiplier and adder calculate ln2 x (w+k+b 2).
The method and the system for realizing the software max hardware of the logic resource limited platform have the beneficial effects that the complexity of realizing the software max hardware is effectively reduced, the complex functions are realized only by using a limited basic operation logic unit through function equivalent transformation, exponentiation base number and logarithmic base number replacement, function fitting, serial accumulation and exponential operation unit multiplexing, and the hardware realization area and the power consumption cost are effectively reduced.
Compared with the prior art, the method has the advantages that the circuit complexity is greatly reduced, the power consumption and the area cost are reduced, meanwhile, the requirement for parameter storage based on the LUT lookup table method is not met, the requirement for processing time based on iteration of the CORDIC method is not met, the throughput rate is increased, and the problem that Softmax hardware is difficult to realize is solved.
Drawings
FIG. 1 is a block diagram of a Softmax hardware implementation of a logical resource constrained platform of the present invention;
FIG. 2 is a schematic diagram of an exponential function circuit implementation;
fig. 3 is a schematic diagram of a logarithmic function circuit implementation.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.
As shown in fig. 1, the Softmax hardware implementation system of the logic resource limited platform according to the present invention has an input of x 1,x2,....,xn and an output of r 1,r2,....,rn. The method comprises the following steps:
(1) After operand x 1,x2,....,xn is input, lnF_en is set to 0, a counter Cnt [ $clog2 (n) -1:0] is enabled, x 1,x2,....,xn is sequentially gated by using Cnt [ $clog2 (n) -1:0] as gating signals, and each gating input x 1,x2,....,xn exponential function is calculated Where radix e is replaced with radix 2:
(2) Calculating the accumulated sum of the exponentiations of each input x 1,x2,....,xn in the step (1), setting Acc_en to 1, and setting Acc_en to 0 when Cnt [ $clog2 (n) -1:0] is equal to the input number n, so as to finish the accumulation of all the exponentiations of x 1,x2,....,xn.
(3) Calculating the natural logarithm f_ln=ln (f) of the accumulated result in the step (2), wherein the base e is replaced by the base 2.
(4) Setting and re-enabling a counter Cnt [ $ clone 2 (n) -1:0], setting LnF_en to 1, calculating an exponential power of each gating input x 1,x2,....,xn added to the calculation result f_ln of step (3), respectively, wherein the base e is replaced by the base 2:
(5) Storing each result R (i) of the calculation in the step (4) in a register to obtain a final total output R.
Through the design of the invention, the performance of the Softmax hardware circuit in the practical application scene with high efficiency, low complexity, low area and low energy consumption is met, and the Softmax calculation of n inputs is theoretically completed once in 2n clock cycles, wherein n is the number of the inputs, and meanwhile, the advantages in area and power consumption are ensured.
As shown in fig. 2, the calculation process of the exponential function circuit:
(1) The input x i or x i -f_ln is multiplied by the fixed-point constant log 2 e to obtain a fixed-point output { u i,vi }, where u i is the integer portion and v i is the fractional portion, where-1<v i <1.
(2) The f (v i)=vi +b) is used for performing function fitting calculation 2 v, and although x i is larger than or equal to 0, x i -f_ln has a positive value and a negative value, so that a piecewise function fitting is needed to be used for determining a coefficient of 0.9919 according to different fitting parameters b 1 or b 2 of positive and negative gating of u i, piecewise intervals are (-1, 0) and [0, 1), for example, if the input is 16-bit fixed point number decimal bit width is 4, b 1=6'b01_0100,b2 =6' b00_1111 and [0,1 ] interval fitting is taken. The coefficients are slightly worse in the (-1, 0) interval, f (v i)=a*vi +b) can be used to improve accuracy in the (-1, 0) interval by piecewise fitting, where a=0.4966, b= 0.9711, for example, the input is 16-bit fixed point number decimal place width is 4, a is realized by right shifting v i by one bit, no additional cost is left on hardware, b 2 =6' b00_1111 is taken, and the fitting is determined to be 0.9909.
(3) Judging positive and negative through the highest sign bit of u i, if the sign bit is negative, inverting and adding one to obtain an absolute value, gating, and performing left shift or right shift on the 2 v calculated in the step (2) according to the positive and negative of u i to obtain a final exponential function calculation result:
as shown in fig. 3, the calculation process of the logarithmic function circuit:
(1) The LOD is used to obtain an intermediate value t where the most significant bit of the input f is located at 1 and the other bits are 0.
(2) And decoding the input t by using a decoder to obtain an index w where the f most significant bit is located.
(3) The input f is shifted to the right by w bits using a shifter to get k, where k e (1, 2).
(4) The adder is used to calculate log 2k:log2k≈k+b2, where b 2 is a constant, such as a fixed point number system, and the total bit width of the output f_ln of the logarithmic function circuit is 6 decimal places and 4, then b= -0.9485 is taken to be b=6' b11.0001 after being fixed, and the fitting determination coefficient is 0.9906.
(5) Adding w to log 2 k obtained by the function fitting calculation in the step (4), and multiplying ln2 by the accumulated value by using a constant multiplier to obtain a final log function calculation result that ln (f) =ln2log 2f=ln2*(w+log2 k
The invention adopts the equivalent transformation of the functions, the replacement of the exponentiation base number and the logarithmic base number, the fitting of the functions, the serial accumulation and the multiplexing of the exponential operation units, only uses a limited basic operation logic unit to realize complex functions, converts the combination of the exponentiation function and the division of the original functions into the combination of the exponentiation function and the logarithmic function, simultaneously carries out the function fitting with controllable precision according to the operation characteristics and the data range, saves a great amount of calculation time and iteration process, and effectively reduces the hardware realization area and the power consumption cost by utilizing the multiplexing of the serial accumulation and the function unit.

Claims (1)

1.一种逻辑资源受限平台的Softmax硬件实现系统,其特征在于,包括如下单元:1. A Softmax hardware implementation system for a logic resource-constrained platform, characterized by comprising the following units: 指数运算单元:通过基数变换和线性拟合实现指数运算;Exponential operation unit: realizes exponential operation through base transformation and linear fitting; 串行累加单元:计算每一个输入x1,x2,....,xn的指数次幂的累加和: Serial Accumulation Unit: Calculates the sum of the exponential powers of each input x 1 ,x 2 ,....,x n : 对数运算单元:通过底数变换和线性拟合实现对数运算,计算累加结果的自然对数:f_ln=ln(f);Logarithmic operation unit: logarithmic operation is realized through base transformation and linear fitting, and the natural logarithm of the accumulated result is calculated: f_ln=ln(f); 所述指数运算单元包括两个加法器、一个常数乘法器和一个移位器,其中移位器支持左移和右移;第一个加法器计算xi-0或xi-f_ln;常数乘法器计算(xi-f_ln)×log2e或xi×log2e;截取绝对值的整数部分得到u;第二个加法器实现拟合函数:2v≈v+b1,其中b1为常数;最后通过2v的向左或向右移位u得到计算结果;The exponential operation unit includes two adders, a constant multiplier and a shifter, wherein the shifter supports left shift and right shift; the first adder calculates x i -0 or x i -f_ln; the constant multiplier calculates (x i -f_ln)×log 2 e or x i ×log 2 e; the integer part of the absolute value is intercepted to obtain u; the second adder implements the fitting function: 2 v ≈v+b 1 , wherein b 1 is a constant; finally, the calculation result is obtained by shifting 2 v to the left or right by u; 所述对数运算单元包括领先1检测器、一个解码器和一个移位器和一个常数加法器,其中移位器仅支持左移;其用于计算ln(f),利用领先1检测器得到仅最高位所在位为1其他位为0的中间值t;解码器输入t得到f最高位所在的索引w;右移位器将f右移w得到k,利用常数加法器实现拟合函数:log2k≈k+b2,其中b2为常数;最后由常数乘法器和加法器计算对数运算单元结果:ln(f)=ln2*(w+k+b2);The logarithmic operation unit includes a leading 1 detector, a decoder, a shifter and a constant adder, wherein the shifter only supports left shifting; it is used to calculate ln(f), and the leading 1 detector is used to obtain an intermediate value t where only the highest bit is 1 and the other bits are 0; the decoder inputs t to obtain the index w where the highest bit of f is located; the right shifter shifts f right by w to obtain k, and the constant adder is used to implement the fitting function: log 2 k≈k+b 2 , wherein b 2 is a constant; finally, the constant multiplier and the adder calculate the logarithmic operation unit result: ln(f)=ln2*(w+k+b 2 ); 所述串行累加单元可接受任意n输入的累加,累加使能由计数器控制,当计数器值等于n时停止累加。The serial accumulation unit can accept the accumulation of any n inputs, and the accumulation enable is controlled by a counter, and the accumulation stops when the counter value is equal to n.
CN202210790639.9A 2022-07-05 2022-07-05 Softmax hardware implementation method and system of logic resource limited platform Active CN115062768B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210790639.9A CN115062768B (en) 2022-07-05 2022-07-05 Softmax hardware implementation method and system of logic resource limited platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210790639.9A CN115062768B (en) 2022-07-05 2022-07-05 Softmax hardware implementation method and system of logic resource limited platform

Publications (2)

Publication Number Publication Date
CN115062768A CN115062768A (en) 2022-09-16
CN115062768B true CN115062768B (en) 2025-06-10

Family

ID=83203697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210790639.9A Active CN115062768B (en) 2022-07-05 2022-07-05 Softmax hardware implementation method and system of logic resource limited platform

Country Status (1)

Country Link
CN (1) CN115062768B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118133915B (en) * 2024-04-29 2024-08-13 深圳市九天睿芯科技有限公司 Circuit, neural network processing device, chip and equipment for realizing Softmax function calculation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021537A (en) * 2018-01-05 2018-05-11 南京大学 A kind of softmax implementations based on hardware platform
CN110135086A (en) * 2019-05-20 2019-08-16 合肥工业大学 Hardware circuit of softmax function with variable calculation precision and its realization method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9753695B2 (en) * 2012-09-04 2017-09-05 Analog Devices Global Datapath circuit for digital signal processors
US11836629B2 (en) * 2020-01-15 2023-12-05 SambaNova Systems, Inc. Computationally efficient softmax loss gradient backpropagation
CN112685693B (en) * 2020-12-31 2022-08-02 南方电网科学研究院有限责任公司 A device that implements the Softmax function
CN113485673A (en) * 2021-07-05 2021-10-08 上海西井信息科技有限公司 Circuit device for realizing softmax and method for generating softmax code

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021537A (en) * 2018-01-05 2018-05-11 南京大学 A kind of softmax implementations based on hardware platform
CN110135086A (en) * 2019-05-20 2019-08-16 合肥工业大学 Hardware circuit of softmax function with variable calculation precision and its realization method

Also Published As

Publication number Publication date
CN115062768A (en) 2022-09-16

Similar Documents

Publication Publication Date Title
CN108021537A (en) A kind of softmax implementations based on hardware platform
CN107305484B (en) Nonlinear function operation device and method
CN111984227B (en) Approximation calculation device and method for complex square root
EP0416309B1 (en) Method and apparatus for performing the square root function using a rectangular aspect ratio multiplier
CN109165006B (en) Design optimization and hardware implementation method and system of Softmax function
EP4285215A1 (en) Digital circuitry for normalization functions
JP2025010412A (en) Signed Multiword Multiplier
CN111930342A (en) Error unbiased approximate multiplier aiming at normalized floating point number and implementation method thereof
CN115062768B (en) Softmax hardware implementation method and system of logic resource limited platform
CN110879697B (en) Device for approximately calculating tanh function
CN114860193B (en) A hardware operation circuit and data processing method for calculating Power function
CN110837624A (en) An approximate computing device for sigmoid function
CN119536684A (en) A floating-point multiplier, calculation method and device based on FPGA
CN113837365A (en) Model for realizing sigmoid function approximation, FPGA circuit and working method
WO2022170811A1 (en) Fixed-point multiply-add operation unit and method suitable for mixed-precision neural network
CN111860792B (en) A hardware implementation device and method for activation function
CN109298848A (en) Circuit for double-mode floating point division square root
CN119249050A (en) Device and method for fast calculation of nonlinear activation function based on coefficient lookup table
CN117008872A (en) Multi-precision fusion multiply-accumulate operation device and method compatible with multiple formats
CN115222033A (en) A method and device for approximate calculation of softmax function
CN109416757B (en) Method, apparatus and computer-readable storage medium for processing numerical data
CN113515259B (en) A circuit and method for realizing approximate modulo of complex numbers in floating-point format
CN118378000B (en) Configurable transcendental function vector computing device
CN117270811B (en) Nonlinear operator approximation calculation method, device and neural network processor
CN116933840B (en) Multi-precision Posit encoding and decoding operation device and method supporting variable exponent bit width

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant