Disclosure of Invention
In order to solve the technical problems, the application provides an activation function processing method, an activation function processing device and an integrated circuit based on a hardware chip, which can obtain an accurate calculation result of an activation function under the conditions of low complexity and low cost.
The embodiment of the application discloses the following technical scheme:
In a first aspect, the present application provides a method for processing an activation function based on a hardware chip, where the method includes:
Initializing the configuration of the register comprises the following steps: a taylor expansion at a target value according to the activation function; acquiring the coefficient value of each item of the front M-item polynomial of the Taylor expansion and the compensation value of the approximate value of the remainder; storing the coefficient value and the compensation value in a register group corresponding to the target value;
acquiring a first independent variable in a preset definition domain of the activation function;
Determining a first difference value between the first independent variable and an initial value, and determining the address of the register set and the corresponding target value according to the multiple of the first difference value and 2 -K;
reading the coefficient value and the compensation value in the register group according to the address;
determining a sum of the first M-term polynomials from the first argument, the target value, and the coefficient value; determining an approximation of the remainder by the first argument, the target value, and the compensation value;
and determining the function value of the activation function at the first independent variable according to the sum of the first M polynomial and the approximation value of the remainder.
Optionally, determining a compensation coefficient according to the first independent variable and the target value, and determining an approximation value of the remainder through the compensation value and the compensation coefficient;
wherein the compensation coefficient is not positively correlated with the M.
Optionally, the compensation coefficient is as follows:
Wherein N is a positive integer, θ is a non-negative integer, θ is less than or equal to (2 N -1), and θ is positively correlated with the multiple.
Optionally, the approximation of the remainder is obtained by the following formula:
Where Δ (x) is an approximation of the remainder, and Δ max (x) is the compensation value stored in the 1 register.
Optionally, a second difference between the first argument and the target value is determined, and a sum of the first M-term polynomials is determined according to the second difference and the coefficient value.
Optionally, the sum of the first M polynomials is obtained by the following formula:
Wherein f lut (x) is the sum of the M polynomials, x 0 is the target value, (x-x 0) is the second difference, Is the mth value of the M coefficient values.
Optionally, determining the function value of the activation function at the first argument according to the sum of the first M polynomial and the approximation, and specifically obtaining the function value by the following formula:
f(x)=flut(x)+Δ(x)
wherein f (x) is the function value of the activation function at the first argument.
In a second aspect, the present application provides an activation function processing apparatus based on a hardware chip, including: the device comprises a configuration module, a determination module, a reading module and a calculation module;
The configuration module is used for carrying out initialization configuration on the register and expanding Taylor at a target value according to the activation function; acquiring the coefficient value of each item of the front M-item polynomial of the Taylor expansion and the compensation value of the approximate value of the remainder; storing the coefficient value and the compensation value in a register group corresponding to the target value;
The determining module is used for obtaining a first independent variable in a preset definition domain of the activation function; determining a first difference value between the first independent variable and an initial value, and determining the address of the register set and the corresponding target value according to the multiple of the first difference value and 2 -K;
The reading module is used for reading the coefficient value and the compensation value in the register group according to the address;
The computing module is used for determining the sum of the first M polynomial through the first independent variable, the target value and the coefficient value; determining an approximation of the remainder by the first argument, the target value, and the compensation value; and determining the function value of the activation function at the first independent variable according to the sum of the first M polynomial and the approximation value of the remainder.
Optionally, the calculating module is specifically configured to determine a compensation coefficient according to the first argument and the target value, and determine an approximation value of the remainder through the compensation value and the compensation coefficient;
wherein the compensation coefficient is not positively correlated with the M.
Optionally, the compensation coefficient is as follows:
Wherein N is a positive integer, θ is a non-negative integer, θ is less than or equal to (2 N -1), and θ is positively correlated with the multiple.
Optionally, the calculation module is specifically configured to obtain the approximation of the remainder through the following formula:
Where Δ (x) is an approximation of the remainder, and Δ max (x) is the compensation value stored in the 1 register.
Optionally, the calculating module is specifically configured to determine a second difference value between the first argument and the target value, and determine a sum of the first M-term polynomials according to the second difference value and the coefficient value.
Optionally, the calculating module is specifically configured to obtain the sum of the first M term polynomials through the following formula:
Wherein f lut (x) is the sum of the M polynomials, x 0 is the target value, (x-x 0) is the second difference, Is the mth value of the M coefficient values.
Optionally, the calculation module is specifically configured to determine a function value of the activation function at the first argument according to a sum of the first M-term polynomials and the approximation, and specifically is obtained by the following formula:
f(x)=flut(x)+Δ(x)
wherein f (x) is the function value of the activation function at the first argument.
In a third aspect, the application provides an integrated circuit comprising any of the apparatus described in the second aspect above.
As can be seen from the technical scheme, the application has the following advantages:
The application provides a method, a device and an integrated circuit for processing an activation function based on a hardware chip, wherein the method comprises the following steps: acquiring a first independent variable in a preset definition domain of the activation function; determining a first difference value between the first independent variable and an initial value, and determining an address of a register group and a corresponding target value according to the first difference value and a multiple of 2 -K; wherein the taylor expansion of the activation function at the target value includes a first M term polynomial and a remainder; the register set comprises M+1 registers; m registers for respectively storing coefficient values of the first M polynomial terms and 1 register for storing compensation values for calculating the approximation of the remainder; because the first independent variable comes from the preset definition domain interval, and the preset definition domain interval does not exceed the interval from minus infinity to plus infinity, the number of required registers is reduced, and the cost is reduced; the target values in adjacent register sets differ by 2 -K; k is an integer, M is a positive integer; because the target values in adjacent register groups differ by 2 -K, when the computer performs binary operation, the first independent variable can be directly obtained as a multiple of 2 -K without addition, subtraction and if judgment, thereby simplifying the calculation process. Reading the coefficient value in the register group and the compensation value according to the address; determining a sum of the first M-term polynomials from the first argument, the target value, and the coefficient value; determining an approximation of the remainder by the first argument, the target value, and the compensation value; and determining the function value of the activation function at the first independent variable according to the sum of the first M polynomial and the approximation value of the remainder. According to the technical scheme provided by the application, the function value of the first independent variable is obtained by using a pre-stored function value at the target value, a function multi-order derivative value and a remainder compensation value and adopting a nonlinear method. For the target values, the number of target values that need to be stored is reduced. Therefore, compared with the traditional method, the number of the registers is saved greatly from the use of the whole number of the registers. In other words, for the calculation accuracy of the function value, the method is superior to the case of using several times the number of registers in the conventional method. Therefore, the technical scheme provided by the application can obtain the accurate calculation result of the activation function under the conditions of low cost and low complexity.
Detailed Description
In order to make the present application better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
When some functions of the product are developed by using field programmable gate array (Field Programmable GATE ARRAY, FPGA) or System on Chip (SoC) hardware, special function operations except addition, subtraction and multiplication, such as e x, trigonometric functions, hyperbolic functions, etc., are encountered in the running process. With the recent development of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) technology, hyperbolic functions (such as Tanh functions) are frequently used by a neural network model as an activation function, so that the expression capability of the neural network model is enhanced through nonlinear mapping. In addition, the activation functions also include Sigmoid functions, reLU functions, and the like.
The accuracy of the activation function calculation result has a certain influence on the final result of the hardware accelerator of the large-scale neural network model, and the higher the accuracy of the activation function calculation result is, the more accurate the hardware accelerator calculation result is. However, the calculation of the activation function in hardware has the following two schemes:
First kind: the function value of the activation function is approximated by a polynomial of a higher order, but if the accuracy of the calculation result is to be improved, the order of the polynomial needs to be improved, and when the order of the polynomial is high, the calculation process is complicated, and the area, power consumption, delay, etc. of the hardware circuit are increased.
Second kind: the function value of the activation function at each argument is stored in a register, and is directly obtained from the register when the function value of the activation function needs to be calculated. While this approach simplifies the calculation process, it requires at least thousands of registers and is costly if one wants to reach an error level of one ten thousandth. Although some products use linear interpolation methods to properly reduce the use of registers; but limited by the limited expressivity of the linear function itself and the errors introduced thereby, the effect of register saving or precision improvement is not significant.
Under the conditions of low complexity and low cost, the activation function is difficult to directly obtain or calculate the accurate function value corresponding to the independent variable in hardware.
In order to solve the problem of calculating the activation function in hardware, the application provides a method for processing the activation function based on a hardware chip, which comprises the following steps: initializing the configuration of the register comprises the following steps: a taylor expansion at a target value according to the activation function; acquiring the coefficient value of each item of the front M-item polynomial of the Taylor expansion and the compensation value of the approximate value of the remainder; storing the coefficient value and the compensation value in a register group corresponding to the target value; acquiring a first independent variable in a preset definition domain of the activation function; determining a first difference value between the first independent variable and an initial value, and determining an address of a register group and the corresponding target value according to the multiple of the first difference value and 2 -K; because the first independent variable comes from the preset domain interval which is smaller than the interval from minus infinity to plus infinity, the number of required registers is reduced, and the cost is reduced; the target values in adjacent register sets differ by 2 -K; k is an integer, M is a positive integer; because the target values in adjacent register groups differ by 2 -K, when the computer performs binary operation, the first independent variable can be directly obtained as a multiple of 2 -K without addition, subtraction and if judgment, thereby simplifying the calculation process. Reading the coefficient value in the register group and the compensation value according to the address; determining a sum of the first M-term polynomials from the first argument, the target value, and the coefficient value; determining an approximation of the remainder by the first argument, the target value, and the compensation value; and determining the function value of the activation function at the first independent variable according to the sum of the first M polynomial and the approximation value of the remainder.
The technical scheme provided by the application can not only control the use of the number of the registers, but also give consideration to the simplicity of operation logic, and further can adopt a small number of registers to achieve the error level of one ten thousandth level. Specifically, the technical scheme provided by the application obtains the function value of the input first independent variable by using a pre-stored function value at the target value, a function multi-order derivative value and a remainder compensation value and adopting a nonlinear method. For the target values, the number of target values that need to be stored is reduced. Therefore, compared with the traditional method, the number of the registers is saved greatly from the use of the whole number of the registers. In other words, for the calculation accuracy of the function value, the method is superior to the case of using several times the number of registers in the conventional method. Therefore, the technical scheme provided by the application can obtain the accurate calculation result of the activation function under the conditions of low cost and low complexity.
Embodiment one:
the first embodiment of the application provides an activation function processing method based on a hardware chip, and the method is specifically described below with reference to the accompanying drawings.
Referring to fig. 1, the present application provides a flowchart of an activation function processing method based on a hardware chip.
The method comprises the following steps:
Step 101: and initializing and configuring the register.
Specifically, a taylor expansion at a target value according to the activation function; acquiring the coefficient value of each item of the front M-item polynomial of the Taylor expansion and the compensation value of the approximate value of the remainder; and storing the coefficient value and the compensation value in a register group corresponding to the target value. Therefore, the embodiment of the application stores the coefficient value and the compensation value in the register group corresponding to the target value in advance, and can quickly read the corresponding coefficient value and compensation value in the register group after determining the target value.
It should be noted that, in the case of the preset definition field [0.0625,5.0625], the minimum target value may be equal to the boundary value, that is, the target value is equal to the initial value, for example, the target value and the initial value are both 0.0625. The distance between the adjacent target values is 2 -K, and the number of the target values can be obtained according to the preset domain length and the distance between the adjacent target values, namely, the number of the target values can be obtained according to the ratio of the preset domain length and the distance between the adjacent target values. For example: q= (5.0625-0.0625)/2 -K. And then rounding Q to obtain Q, where Q is a ratio of a preset domain length to a distance between adjacent target values, Q is a number of target values, and Q refers to a minimum integer not smaller than Q, and when k=3, q=40 in this embodiment.
Step 102: and acquiring a first independent variable in a preset definition domain of the activation function.
The preset definition domain is a section of interval from minus infinity to plus infinity, for example: the preset definition field is [ -4,5], it can be seen that the interval is only a section, and only a value between-4 and 5 can be taken, and a value of-5 or 6 cannot be taken. The value range of the section of interval can be determined according to a specific activation function, for example, when the activation function is smaller than or equal to-4, and the calculation result of the activation function at the first independent variable is approximately equal to the first independent variable, then-4 can be used as the minimum value boundary of the preset definition domain, and similarly, the maximum value boundary of the preset definition domain can also be determined, so that the preset definition domain of the activation function is obtained.
The first independent variable is one independent variable in a preset definition domain, and the calculation result of the activation function is the value of the activation function at the first independent variable. For example: the first argument may be any value between-4 and 5, may be 0 or3, and when the first argument is 0, the calculation result of the activation function is the value of the activation function at 0.
The present embodiment does not limit the type of activation function. Typically, activation functions that require a lookup table to obtain a function value in a hardware operator are a Tanh function, a Sigmoid function, and the like.
For the sake of understanding by those skilled in the art, the Tanh function is taken as an example, and a similar processing method may be adopted by the Sigmoid function for those skilled in the art, and for brevity, only the Tanh function is taken as an example in this embodiment.
When the activation function is a Tanh function, the characteristics of the Tanh function can be known that the Tanh function satisfies the following characteristics:
Tanh(-x)=-Tant(x) (1)
That is, the Tanh function satisfies the formula (1-3), and as can be seen from the formula (1), the value of the Tanh function on x e [0, + ] is only needed to be known, and the value of the Tanh function on x e (- ≡,0] can be obtained by transforming the formula (1). First, the computer hardware is limited by the number of binary digits, and the decimal value expected to be expressed is different from the binary value actually corresponding to the computer after being converted into decimal. For example, the minimum number that a normalized 16-bit floating point number can represent is 2 -14, in other words, a normalized 16-bit floating point number cannot represent a positive real number that is smaller than 2 -14; if the distance between two normalized 16-bit floating point numbers is less than or approximately 2 -14, this gap may be ignored in the calculation. According to the formula (2), when x is less than or equal to x', tan h (x) is considered to be equal to x; in this embodiment x ′ =0.0625 is chosen such that tan h (x) -x≡2 -14, i.e. if x e 0,0.0625, x is returned directly as a function value for tan h (x). According to the formula (3), when x is larger than or equal to x', considering that tan h (x) ≡1, namely directly returning 1 as the function value of tan h (x); in this example, x "=5.0625 is chosen such that tanh (x) -x≡2 -14.
Furthermore, a preset definition domain of the Tanh function is obtained, namely [0.0625,5.0625], the first independent variable takes a value in the preset definition domain, and further, when other values are taken, the function value of the Tanh function can be obtained through the formulas (1) - (3). Therefore, only the values of the parameters corresponding to the target values contained in the preset definition domain of the Tanh function are required to be stored in the registers, so that the number of the required registers is reduced, and the cost is reduced. For example, if a value of tan h (0.01) is desired, 0.01 is returned directly according to the formula (2), i.e., an approximation of tan h (0.01); for another example, if the value of tanh (-4) is desired, according to the formula (1), the function value corresponding to the first argument of 4 in the preset definition domain is obtained, and the value corresponding to tanh (-4) is obtained after the sign bit is inverted.
Step 103: and determining a first difference value between the first independent variable and an initial value, and determining the address of the register set and the corresponding target value according to the multiple of the first difference value and 2 -K.
The first argument has been described in detail in step 102 and will not be described here. The initial value is a boundary value in a preset definition domain, and for convenience of understanding, the boundary value described in this embodiment is a smaller boundary value, i.e. 0.0625. Of course, in other embodiments, the initial value may be 5.0625.
A target value is selected from a predetermined definition field, the target value being closest to the first argument and being less than the value of the first argument. The target value is used to form a corresponding relation with the coefficient value stored in the register, and after the target value is confirmed through the first independent variable and the initial value, the register can be accessed to further obtain the value in the register, and the detailed implementation process is described later.
The taylor expansion of the activation function at the target value includes a first M term polynomial and a remainder. In order to facilitate calculation of the Tanh function, the function is expanded at the target value to obtain a taylor expansion, which is specifically as follows:
Where x 0 is the target value, tanh ′(x0) is the value of the first derivative of the Tanh function at target value x 0, x is the first argument, and R n (x) is the taylor remainder of order n. In this embodiment, define Tanh (x 0),Tanh′(x0),Tanh″(x0)/2……Tanh(n)(x0)/n-! Is a coefficient of a polynomial.
For the convenience of calculation, the Taylor expansion is divided into two parts for calculation, wherein the first part is a first M term polynomial; the second part is the remainder. For a function that needs to be approximated, the more terms that use the taylor series expansion are about accurate if there are higher derivatives at the target points. But the more terms that are expanded, the more complex the computation, and the more delay and power consumption. Therefore, embodiments of the present application approach the objective function, i.e., the first portion, with as few expansion terms as possible. The second portion is then used to approximate the error due to the too few unwrapped terms of the first portion. The one-time approximation problem of the objective function is converted into two sub-problems through the two parts, so that the calculation is simplified, and the accuracy of the result is ensured.
In this embodiment, M is not limited to a specific value of M, and M may be 1,2, 3, or a larger value. For ease of understanding, the description will be given below taking M as 3, i.e., in this embodiment, the first part includes the first 3 term polynomial, which is
The remainder being the remainder of the Taylor expansion except for the first M terms, in this embodiment the remainder of the polynomials except for the first 3 terms, i.e
The first difference is obtained by subtracting the initial value from the first argument, in this embodiment (x-0.0625). A multiple of the first difference with 2 -K, which in conventional techniques requires either addition, subtraction and IF judgment statements or more complex division to obtain the multiple. Unlike the conventional technique, the present embodiment defines the distance between adjacent target values as 2 -K, and can quickly obtain the multiple relationship.
For ease of understanding, K is illustrated as a negative number, although in the present application K should be a positive integer, the principle is the same. For example, when k= -2, the decimal number 10 is several times the decimal number 4, the binary number corresponding to 10 is 1010B, and the binary number corresponding to 4 is 100B. The last bit of 100B has two '0', so the last two bits are directly removed on the basis of 1010B to obtain 10B, and the decimal number corresponding to 10B is 2, so that the computer can quickly know that 10 is 2 times of 4. And the following steps: when k= -3, it is necessary to calculate 10 as several times as large as 8, the binary number corresponding to 8 is 1000B, and the last three bits are "0", so that the last three bits are directly removed on the basis of 1010B to obtain 1B, and the decimal number corresponding to 1B is 1, so that it can be quickly known that 10 is 1 times as large as 8. When K is a positive integer, the principle is similar and will not be described here again.
The addresses of the register set are used to access the data stored in the respective registers of the register set. When the multiple is determined, the register address corresponding to the multiple can be directly jumped to according to the multiple.
Wherein the register set includes m+1 registers; m registers for respectively storing coefficient values of the first M polynomial terms and 1 register for storing compensation values for calculating the approximation of the remainder; the target values in adjacent register sets differ by 2 -K.
In this embodiment, m=3, the register set includes 4 registers, in order to control the number of registers required on the hardware device and control the accuracy of the calculation result of the activation function, K is selected to be 3 in this embodiment, and of course, the application is not limited to k=3, K may be 2 or 4, and in this embodiment, when K takes 3, the cost and the calculation result are comprehensively considered, so that the effect is better.
When k=3, 2 -K =0.125, the target values in adjacent register groups differ by 0.125, and each register group includes 4 registers, when the preset domain is 0.0625-5.0625, the length of the preset domain is 5, and 5 is 40 times of 0.125, so in this embodiment, only 40×4=160 registers are needed, and the number of registers needed is greatly reduced.
Referring to fig. 2, a schematic diagram of a register according to the present application is shown.
Only the first 3 sets of registers are shown in detail, with subsequent registers being similar. As can be seen from the figure, each set of registers includes 4 registers, taking the first set of registers as an example, where M registers in the register set are a register a, a register B, and a register C, another register is a register D, each register stores data therein, the data in the register a is a value of Tanh (0.0625), the data in the register B is a value of Tanh '(0.0625), the data in the register C is a value of Tanh' (0.0625)/2, and the data in the register D is a value of Δmax (0.0625).
The first set of registers has a target value of 0.0625, the second set of registers has a target value of 0.0625+2 -3, and the third set of registers has a target value of 0.0625+2 x 2 -3. The data in each set of registers is a known value in advance, so that the value in the register corresponding to any optional target value can be obtained, for example, tanh″ is known (0.0625+2 -3)/2, Δmax (0.0625+2×2 -3), and the like.
Wherein, for each set of registers, the first three registers (e.g., register a, register B, and register C) store values that are coefficients corresponding to the first three polynomials in the taylor expansion, respectively, and the last register (e.g., register D) stores values that are compensation values for the approximation of the remainder.
The determining of the address of the register set according to the multiple may specifically be that when the multiple is 1, the register set is determined to be the first set, and when the multiple is 2, the register set is determined to be the second set, and so on, the position of the register set is directly determined by a Look-Up Table (LUT) display method, so that all register sets do not need to be traversed, further, the time of the difference value is reduced, and the Look-Up efficiency is improved. Note that the multiple and register set numbers start from 0. When the obtained multiple is a non-integer, the smallest integer not larger than the multiple is taken as the multiple.
For the convenience of understanding by those skilled in the art, the computer determining the address of the register will be described in detail with reference to the accompanying drawings.
Referring to fig. 3, a logic diagram of computer computation is provided in the present application.
In the figure, sign is a symbol part, exponent is an index part, and fraction is a fraction part.
301: A first difference between the first argument and the initial value is determined, as in part 301, wherein the first argument is represented by a binary 16bit half floating point number.
302: The 17bit fixed point alignment is performed on the obtained binary first difference value, as shown in a part 302, the first 3bits are integer parts, and the second 14bits are decimal parts.
303: After the first 6 bits after fixed point alignment are intercepted, two '0's are complemented at the last bit to form an 8-bit binary number, as shown in a part 303. The 8bit binary number is used as the first address of the register group of the LUT, and then the numerical value in the register is obtained. Wherein the 6 bits are multiples of the first difference and 2 -K.
304: Confirming the product of the truncated first 6bit binary integer number and 2 -3 after fixed point alignment in 303, subtracting the product from the first difference to obtain a second difference, and further calculating the sum of M polynomials.
In addition, in this embodiment, a first difference between the first argument and an initial value is determined, and the corresponding target value is determined according to a multiple of 2 -K and the first difference. Specifically, a multiple of the first difference and 2 -K is obtained first, and a value obtained by multiplying the multiple by 2 -K is added to the initial value to obtain a value corresponding to the target value, that is, x 0 =initial value+multiple· -K, where when the obtained multiple is a non-integer, a minimum integer not greater than the multiple is taken as the multiple.
Step 104: and reading the coefficient value in the register group and the compensation value according to the address.
After confirming the address of the register set, the address of the register set can be directly accessed, so that the data stored in 4 registers in the register set, namely the coefficient value and the compensation value, are obtained.
For example, when the second set of registers is known from the addresses of the register sets, the values stored in the 4 registers in the second set of registers are sequentially read, i.e., tanh (0.0625+2 -3)、Tanh`(0.0625+2-3)、Tanh``(0.0625+2-3)/2 and Δmax (0.0625+2 -3) are sequentially read. Wherein, tanh (0.0625+2 -3)、Tanh`(0.0625+2-3) and Tanh' (0.0625+2 -3)/2 are coefficient values, and Δmax (0.0625+2 -3) is a compensation value. Step 105: determining a sum of the first M-term polynomials from the first argument, the target value, and the coefficient value; and determining an approximation of the remainder through the first argument, the target value and the compensation value.
For ease of understanding by those skilled in the art, determining the sum of the first M-term polynomials from the first argument, the target value and the coefficient value is described first.
As can be seen from the Taylor expansion, the Taylor expansion polynomial also includes (x-x 0)0、(x-x0)、(x-x0)2).
Thus, a second difference, i.e., (x-x 0), between the first argument and the target value needs to be determined, and the sum of the first M-term polynomials is determined from the second difference and the coefficient values.
Specifically, the sum of the first M polynomial is obtained by the following formula:
Wherein f lut (x) is the sum of the M polynomials, x 0 is the target value, (x-x 0) is the second difference, Is the mth value of the M coefficient values. In this embodiment, taking an activation function as an example, and m=3 is selected, the above formula can be converted into:
and then the sum of the top M polynomials in the taylor expansion can be determined.
Further, (x-x 0) is the second difference, i.e. the second difference is equal to (x-x 0), as can be seen from step 103, x 0 = initial value + multiple-2 -K, i.e. the second difference = x- (initial value + multiple-2 -K) = x-initial value-multiple-2 -K = first difference-multiple-2 -K. Since the first difference has been obtained in advance, the second difference may be obtained directly from the difference of the calculated first difference and the product of the multiple and 2 -K for simplicity of calculation.
The determination of the approximation of the remainder by the first argument, the target value and the compensation value is described below.
Determining a compensation coefficient according to the first independent variable and the target value, and determining an approximate value of the remainder through the compensation value and the compensation coefficient;
wherein the compensation coefficient is not positively correlated with the M.
Specifically, the compensation coefficient is as follows:
Wherein N is a positive integer, θ is a non-negative integer, θ is less than or equal to (2 N -1), and θ is positively correlated with the multiple.
The specific numerical values of N are not limited in this embodiment, and n=5 will be used as an example for the sake of understanding. When n=5, m=3, the compensation coefficient is a finite number, i.e., 32 values. When the compensation coefficient is 32, the 32 nd value is 1, and in another case, the compensation coefficient may be 31. The compensation coefficient can also be stored in a register, one register needs to be added in each group of registers, and each group of registers comprises 5 registers for storing the compensation coefficient; a new register table may be created, and the compensation value may be stored separately in another register table for use in the query. In other embodiments, the compensation coefficient may be calculated in real time after the 17bit number is obtained after the fixed point alignment is determined, without being stored in a register. This embodiment is not limited thereto, and the above description is only an alternative.
After the compensation coefficients are obtained, an approximation of the remainder can be determined from the compensation coefficients.
Obtaining an approximation of the remainder by the following formula:
Where Δ (x) is an approximation of the remainder, and Δ max (x) is the compensation value stored in the 1 register.
To calculate the remainder of the taylor expansion, the definition in this embodiment is:
Where Δ max (x) is a compensation value of a remainder approximation, which is stored in a register, x i is the target value corresponding to the first argument x, Δx max represents the maximum distance between the argument x and the corresponding target value, and can be approximated by 2 -K-2-K-N. In this embodiment, i.e., k=3 and n=5, Δx max=2-3-2-8.
Referring to fig. 4, there is shown a logic diagram of a further computer according to the present application.
In the figure, as shown in 302, the last 11bits of the 17bits are selected from the 17bits, and the first N bits are defined as θ from the last 11 bits. For example, when n=5, the first 5bits of the last 11bits of 17bits after fixed point alignment are indicated as θ, as part 401 in the figure. When the 17bits number is determined, the value of the first M polynomial and the approximation of the remainder can be uniquely determined.
The figure shows that the compensation values are stored separately in one register table, so that only an additional 32 registers need be added.
Step 106: and determining the function value of the activation function at the first independent variable according to the sum of the first M polynomial and the approximation value of the remainder.
After obtaining the sum of the first M polynomials and the approximation of the remainder, the function value of the activation function at the first argument can be obtained by summing the two values.
Specifically, the function value of the activation function at the first independent variable is determined according to the sum of the first M polynomial and the approximation, and the function value is obtained through the following formula:
f(x)=flut(x)+Δ(x)
wherein f (x) is the function value of the activation function at the first argument.
When the activation function is a Tanh function, the above formula is transformed into:
tanh(x)=tanhlut(x)+Δ(x)
Wherein tan h (x) is the function value of the activation function at the first argument.
The solution described in the above embodiment is exemplified by m=3, M may also be 2, and m=2 is exemplified below.
When m=2, it is necessary to apply the first M terms to the polynomialMoving into the remainder, therefore, the use of one register in each set of registers can be reduced, i.e., a total of 40 registers.
Because the compensation value and the compensation coefficient are adopted to correct when the approximation value of the remainder is calculated, the use of 40 registers is reduced on the premise that the error meets the requirement, and the hardware area is reduced by approximately 20%. When m=2, two multiplications of the quadratic term are reduced, one addition operation of summation is reduced, and then 1/3 calculation delay is reduced, so that the benefit is obvious.
The specific calculation process is similar to that when m=3, and will not be described here again.
Also included in the taylor expansion polynomial is (x-x 0)0、(x-x0)、(x-x0)2, furthermore, in some embodiments, also calculated by way of a shift (x-x 0).
Specifically, the last 11bits after 17bits fixed point alignment is determined in the above manner. If the first digit of the 11bits starts to have one or more '0', the left shift is performed by n digits, and after the shift, n '0's are added to the last digit. For example, when the 11bit binary number is 0011001 b, there are two consecutive "0"B "at the first bit, so that a shift is performed to the left by 2 bits, where n=2, the shift is 11001 b, and the last bit is complemented by 2" 0 "s to obtain 110 01100100b. The first bit is removed from the left side to obtain the last 10 bits, namely the fraction (the decimal part of the 16bit half floating point number); then, 14-n-M is calculated as exponent (the exponent part of the 16bit half floating point number), where n=2, m=3, and the binary numbers corresponding to 14-n-m=9, 9 are 1001b, and 1001b is less than 5 bits, and "0" to 5 bits are required to be complemented on the left side, and exponent is finally obtained. Since the target value corresponding to the first argument acquired in this embodiment is not greater than the first argument, i.e., (x-x 0) results are not negative; therefore, the sign portion of the 16bit half floating point number is negligible in the above calculation, defaulting to 0B. Finally, the three parts are spliced to obtain the 16bit value of (x-x 0), namely 0 01001 1001100100B.
In addition, when m=2, the embodiment of the present application further provides another computer calculation logic diagram, referring to fig. 5A, where, compared with fig. 4, only one register table is needed to determine the compensation coefficient. In the figure, as shown in 302, the last 11bits of the 17bits are selected from the 17bits obtained after fixed point alignment, and then the first 5bits of the last 11bits are directly complemented with "11" to be the followingThe address in the register table is shown as 701. It should be noted that when θ is greater than 31, the content of the address of the register table corresponding to the following will be empty. Combining two register tables into one register table can avoid the waste of the content of the register address when m=2 is empty, while reducing the number of register tables, compared with using two register tables. The sum of the M term polynomials is determined in a similar manner to 301-304, except that when m=2, the values of the register set are read, only 3 values are read consecutively, for example, reading "Tanh (0.0625)", tanh' (0.0625) and Δmax (0.0625) in fig. 5A, and the fourth value "0" is read alone when determining the compensation coefficient.
The above describes the method of processing the activation function based on the hardware chip, and the internal processing logic is described below in terms of the integrated circuit.
Referring to fig. 5B, a logic diagram of a processing of an integrated circuit according to the present application is shown.
The processing logic may be for a processor of an integrated circuit, such as a CPU or the like, comprising the steps of:
Step 501: a first argument x is obtained.
For example, when x is calculated to be 0.325, the function value of the Tanh function is calculated. Inside the processor, the x is a binary number, represented by a 16-bit half floating point number.
Step 502: the left first bit of x is stored and then S is set to 0B, so that the variable used for storage is conveniently understood to be S.
The first left bit is a sign part, and because tanh (x) and x have the same number, namely the positive and negative relation is the same, the default sign bit can be 0B in the middle process of tanh (x) calculation, and finally the sign bit of x is assigned to tanh (x) to obtain a final result.
Step 503: judging that x is less than or equal to 0.0625 is true? If yes, go to step 504, if no, go to step 505.
Step 504: when x is less than or equal to 0.0625, the tanh function takes the value of x, so that x is returned.
Step 505: continue to judge that x is greater than or equal to 5.0625? If yes, go to step 506, if no, go to step 507.
Step 506: when x is more than or equal to 5.0625, the tanh function takes a value of 1, so that 1 is returned.
Step 507: x-0.0625 is calculated. The specific calculation process may be referred to the manner described in the above embodiments, and will not be described herein.
Step 508: 17-bit fixed point alignment processing.
Step 509: taking the first 6 bits and then right-hand complement 00B is used to determine the first address of the register set to determine the coefficient value and the offset value.
Step 510: bits 7-11 are taken for determining the compensation coefficient.
Step 511: the last 11 bits are used to determine x-x 0.
The specific implementation manner of step 508, step 509, step 510 and step 511 is referred to the description in the above embodiment, and will not be repeated here. In addition, step 509, step 510 and step 511 are performed in no order.
Step 512: and calculating to obtain the 16-bit half floating point value of the Tanh function.
Step 513: let the left first position be S. .
Step 514: the value of the Tanh function is output.
In this embodiment, the activation function processing method includes: initializing the configuration of the register comprises the following steps: a taylor expansion at a target value according to the activation function; acquiring the coefficient value of each item of the front M-item polynomial of the Taylor expansion and the compensation value of the approximate value of the remainder; storing the coefficient value and the compensation value in a register group corresponding to the target value; acquiring a first independent variable in a preset definition domain of the activation function; determining a first difference value between the first independent variable and an initial value, and determining an address of a register group and the corresponding target value according to the multiple of the first difference value and 2 -K; wherein the register set includes m+1 registers; m registers for respectively storing coefficient values of the first M polynomial terms and 1 register for storing compensation values for calculating the approximation of the remainder; because the first independent variable comes from the preset domain interval which is smaller than the interval from minus infinity to plus infinity, the number of required registers is reduced, and the cost is reduced; the target values in adjacent register sets differ by 2 -K; k and M are positive integers; because the target values in adjacent register groups differ by 2 -K, when the computer performs binary operation, the first independent variable can be directly obtained as a multiple of 2 -K without addition, subtraction and if judgment, thereby simplifying the calculation process. Reading the coefficient value in the register group and the compensation value according to the address; determining a sum of the first M-term polynomials from the first argument, the target value, and the coefficient value; determining an approximation of the remainder by the first argument, the target value, and the compensation value; and determining the function value of the activation function at the first independent variable according to the sum of the first M polynomial and the approximation value of the remainder. Therefore, the technical scheme provided by the application can obtain the accurate calculation result of the activation function under the conditions of low cost and low complexity.
Embodiment two:
The second embodiment of the application provides an activation function processing device based on a hardware chip, and the activation function processing device is described below with reference to the accompanying drawings.
Referring to fig. 6, the present application provides a schematic diagram of an activation function processing apparatus based on a hardware chip.
The device comprises: comprising the following steps: a configuration module 601, a determination module 602, a reading module 603 and a calculation module 604.
The configuration module 601 is configured to perform initialization configuration on a register, and perform taylor expansion at a target value according to the activation function; acquiring the coefficient value of each item of the front M-item polynomial of the Taylor expansion and the compensation value of the approximate value of the remainder; and storing the coefficient value and the compensation value in a register group corresponding to the target value.
The determining module 602 is configured to obtain a first argument in a preset definition domain of the activation function; and determining a first difference value between the first independent variable and the initial value, and determining the address of the register set and the corresponding target value according to a multiple of the first difference value and 2 -K.
The reading module 603 is configured to read the coefficient value and the compensation value in the register set according to the address.
The computing module 604 is configured to determine a sum of the first M-term polynomials from the first argument, the target value, and the coefficient value; determining an approximation of the remainder by the first argument, the target value, and the compensation value; and determining the function value of the activation function at the first independent variable according to the sum of the first M polynomial and the approximation value of the remainder.
Optionally, the calculating module 604 is specifically configured to determine a compensation coefficient according to the first argument and the target value, and determine an approximation of the remainder through the compensation value and the compensation coefficient.
Wherein the compensation coefficient is not positively correlated with the M.
Optionally, the compensation coefficient is as follows:
Wherein N is a positive integer, θ is a non-negative integer, θ is less than or equal to (2 N -1), and θ is positively correlated with the multiple.
Optionally, the calculating module 603 is specifically configured to obtain the approximation of the remainder through the following formula:
Where Δ (x) is an approximation of the remainder, and Δ max (x) is the compensation value stored in the 1 register.
Optionally, the calculating module 604 is specifically configured to determine a second difference value between the first argument and the target value, and determine a sum of the first M polynomials according to the second difference value and the coefficient value.
Optionally, the calculating module 604 is specifically configured to obtain the sum of the first M terms by the following formula:
Wherein f lut (x) is the sum of the M polynomials, x 0 is the target value, (x-x 0) is the second difference, Is the mth value of the M coefficient values.
Optionally, the calculating module 604 is specifically configured to determine the function value of the activation function at the first argument according to the sum of the first M-term polynomials and the approximation, and specifically is obtained by the following formula:
f(x)=flut(x)+Δ(x)
wherein f (x) is the function value of the activation function at the first argument.
In this embodiment, the activation function processing apparatus includes: the device comprises a configuration module, a determination module, a reading module and a calculation module; the configuration module is used for carrying out initialization configuration on the register and expanding Taylor at a target value according to the activation function; acquiring the coefficient value of each item of the front M-item polynomial of the Taylor expansion and the compensation value of the approximate value of the remainder; and storing the coefficient value and the compensation value in a register group corresponding to the target value. The determining module is used for obtaining a first independent variable in a preset definition domain of the activation function; determining a first difference value between the first independent variable and an initial value, and determining the address of the register set and the corresponding target value according to the multiple of the first difference value and 2 -K; wherein the taylor expansion of the activation function at the target value includes a first M term polynomial and a remainder; the register set comprises M+1 registers; m registers for respectively storing coefficient values of the first M polynomial terms and 1 register for storing compensation values for calculating the approximation of the remainder; because the first independent variable comes from the preset domain interval which is smaller than the interval from minus infinity to plus infinity, the number of required registers is reduced, and the cost is reduced; the target values in adjacent register sets differ by 2 -K; k is an integer, M is a positive integer; because the target values in adjacent register groups differ by 2 -K, when the computer performs binary operation, the first independent variable can be directly obtained as a multiple of 2 -K without addition, subtraction and if judgment, thereby simplifying the calculation process. The reading module is used for reading the coefficient value in the register group and the compensation value according to the address; the computing module is used for determining the sum of the first M polynomial through the first independent variable, the target value and the coefficient value; determining an approximation of the remainder by the first argument, the target value, and the compensation value; and determining the function value of the activation function at the first independent variable according to the sum of the first M polynomial and the approximation value of the remainder. Therefore, the technical scheme provided by the application can obtain the accurate calculation result of the activation function under the conditions of low cost and low complexity.
Embodiment III:
the third embodiment of the application also provides an integrated circuit, which comprises the device described in the above embodiment.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for device and integrated circuit embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part. The apparatus and integrated circuit embodiments described above are merely illustrative, with the units and modules illustrated as separate components, which may or may not be physically separate. In addition, some or all of the units and modules can be selected according to actual needs to achieve the purpose of the embodiment scheme. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
The above is merely a preferred embodiment of the present application, and is not intended to limit the present application in any way. While the application has been described with reference to preferred embodiments, it is not intended to be limiting. Any person skilled in the art can make many possible variations and modifications to the technical solution of the present application or modifications to equivalent embodiments using the methods and technical contents disclosed above, without departing from the scope of the technical solution of the present application. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present application still fall within the scope of the technical solution of the present application.