WO2018196750A1 - Dispositif permettant de traiter des opérations de multiplication et d'addition et procédé permettant de traiter des opérations de multiplication et d'addition - Google Patents
Dispositif permettant de traiter des opérations de multiplication et d'addition et procédé permettant de traiter des opérations de multiplication et d'addition Download PDFInfo
- Publication number
- WO2018196750A1 WO2018196750A1 PCT/CN2018/084275 CN2018084275W WO2018196750A1 WO 2018196750 A1 WO2018196750 A1 WO 2018196750A1 CN 2018084275 W CN2018084275 W CN 2018084275W WO 2018196750 A1 WO2018196750 A1 WO 2018196750A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- log
- data
- adder
- value
- values
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
Definitions
- the present application relates to the field of computers, and more particularly to an apparatus for processing multiply-add operations and a method of processing multiply-add operations.
- the computer often uses the multiplication and addition operation when processing the input data.
- the computer performs the multiplication and addition operation, it first multiplies the input data, and then adds the data obtained by the multiplication operation. Since the input data is generally data in a linear domain, and the data in the linear domain occupies a relatively large bit width (for example, 32 bits), the computer needs to occupy more resources when performing multiplication and addition operations.
- the multiplication and addition operations include a large number of multiplication operations, the multiplication operation has a large computation amount and the operation speed is relatively slow, which results in a computer having a low computational efficiency when performing multiplication and addition operations.
- the prior art proposes a scheme for processing multiplication and addition operations, which converts input data in a linear domain into data in a logarithmic domain, thereby converting multiplication operations in a linear domain into logarithms. Addition in the domain.
- the bit width occupied by the data can be reduced (for example, the original data is 32-bit data, and the bit width occupied by the logarithm becomes 5 bits). Converting multiplications in the linear domain to additions in the logarithmic domain also increases computational efficiency.
- the above scheme also needs to reconvert the data in the logarithmic domain into data in the linear domain, and add the data in these linear domains to obtain the final result of multiply and accumulate. result.
- the computer still needs to occupy more resources when performing the addition operation.
- the present application provides an apparatus and method for processing a multiply-accumulate operation to reduce computational power consumption.
- an apparatus for processing a multiply-add operation comprising: a first adder for adding the input first data and the second data to obtain first intermediate data, wherein the The values of the first data and the second data are log a A and log a B, respectively, and the value of the first intermediate data is m, and the first data and the second data are the number of the plurality of original data a raw data A and a second original data B are respectively obtained by taking a logarithm; a second adder is configured to add the input third data and the fourth data to obtain second intermediate data, wherein the first The values of the three data and the fourth data are log a C and log a D, respectively, and the value of the second intermediate data is n, and the third data and the fourth data are in the plurality of original data.
- the third original data C and the fourth original data D are respectively obtained by taking a logarithm, wherein a is an integer greater than 0 and not equal to 1, m and n are real numbers, and m is greater than or equal to n; a logarithmic adder, An input port of the logarithmic adder and the first adder and the Two adder output port connected to said adders for obtaining a nm according to the first adder and said second adder input m and n, and m and a nm is determined and the approximation ( A value of log e a )*log a (A*B+C*D); wherein the first adder, the second adder, and the logarithmic adder are implemented by a hardware circuit.
- the first adder, the second adder, and the logarithmic adder may be implemented by using various hardware circuits such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA). .
- ASIC Application Specific Integrated Circuit
- FPGA Field Programmable Gate Array
- the data operation for converting the high bit width data operation to the low bit width is realized, which can be reduced in the calculation process.
- a m and higher data a n is calculated by adding the bit width and data bit width plus and lower It can avoid the use of a high bit width adder, which can reduce the area of the computing chip and reduce the calculation power consumption. It should also be understood that the above A, B, C, and D are all real numbers greater than zero.
- the above numerical value which approximates the sum of m and a nm to (log e a )*log a (A*B+C*D) may be the sum of m and a nm as (log e a )*log a (A Approximate value of *B+C*D).
- the logarithmic adder may be further configured to obtain a nm according to m and n input by the first adder and the second adder, and determine a sum of m and -a nm as (log e a )*log a (A*BC*D) value.
- the above multiplication and addition operation is a generalized multiplication operation, which may include an addition operation between products, or may include a subtraction operation between products.
- the above multiplication operation may include A*B+C*D or A*B-C*D.
- the logarithmic adder is configured to derive a nm from m and n of the first adder and the second adder input, and to The sum of a nm is approximately determined as a value of (log e a )*log a (A*B+C*D), including: determining a target accuracy to be achieved when processing the plurality of original data; In the case where the accuracy is lower than the first precision, the sum of m and a nm is approximately determined as a value of (log e a )*log a (A*B+C*D).
- the first precision described above may be preset, and when the target accuracy is lower than the first precision, the accuracy required for processing the original data may be considered to be low.
- the accuracy requirement for processing the original data can be determined.
- the m+a nm approximation can be directly determined as (log e a )*log a (A
- the value of *B+C*D) can flexibly determine the value of (log e a )*log a (A*B+C*D) according to the accuracy requirement of processing the original data, and can ensure the accuracy requirement of the original data, and Improve computing efficiency.
- the logarithmic adder is specifically configured to: determine an error compensation value of a nm according to an error compensation table, where the error compensation table includes K values and The error compensation value of the K values, wherein the K values are obtained by dividing [-1, 1] into K shares, and the K error compensation values are obtained by substituting the K values into an error compensation term
- the obtained K and L are integers greater than 1; the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
- the error compensation value of a nm can be taken into account in determining the value of (log e a )*log a (A*B+C*D), which can further improve the calculation accuracy.
- the logarithmic adder approximates the sum of the error compensation values of m+a nm and a nm to (log e a )*log a (A*
- the value of B+C*D) includes: determining a target accuracy to be achieved when processing the plurality of original data; and if the target accuracy is higher than the second precision, m+a nm and a nm
- the sum of the error compensation values is approximately determined as the value of (log e a )*log a (A*B+C*D).
- the target precision is higher than the second precision, it can be considered that the precision required for the processing of the original data is high, and at this time, a value of (log e a )*log a (A*B+C*D) can be determined.
- the error compensation value of nm is taken into account to ensure the accuracy of the value of (log e a )*log a (A*B+C*D).
- the second precision may be the same as the first precision, and the second precision may be greater than the first precision.
- the K is determined based on the target accuracy.
- K When the target precision is high, K can be a large value, and when the target precision is low, K can be a small value.
- the L is determined based on the target accuracy.
- L When the value of L is larger, the more the number of items of the error compensation term, the more accurate the error compensation value obtained according to the error compensation term. Therefore, when the target precision is high, L can be a large value, and When the target accuracy is low, L can be a smaller value.
- the logarithmic adder specifically includes: a shifting circuit for performing a shift operation on a according to nm to obtain a nm ; a sub-addition circuit for Adding m and a nm gives m+a nm .
- the logarithmic adder further includes: a subtraction circuit for subtracting m and n to obtain mn or nm; and a comparison circuit for comparing mn Or a relationship between nm and zero; a selection circuit for selecting m and nm in the case where mn is greater than or equal to zero, or for selecting m and nm in the case where nm is less than or equal to zero.
- the apparatus further comprises: a converter for approximating A*B according to (log e a )*log a (A*B+C*D) approximation A value of +C*D, wherein the converter is implemented by a hardware circuit.
- the apparatus further includes: a quantizer for quantizing the value of the A*B+C*D to achieve a preset data bit width .
- a method for processing a multiply-add operation comprising: adding an input first data and a second data to obtain first intermediate data, wherein the first data and the first The values of the two data are log a A and log a B, respectively, the value of the first intermediate data is m, and the first data and the second data are the first original data A and the second of the plurality of original data.
- the raw data B is obtained by taking the logarithm respectively; adding the third data and the fourth data to obtain the second intermediate data, wherein the values of the third data and the fourth data are respectively log a C and log a D, the value of the second intermediate data is n, and the third data and the fourth data are respectively paired with the third original data C and the fourth original data D of the plurality of original data Obtained after the number, where a is an integer greater than 0 and not equal to 1, m and n are real numbers, and m is greater than or equal to n; according to m and n input by the first adder and the second adder a nm and approximate the sum of m and a nm as (log e a )*log a (A*B+C*D) The value.
- the m and n inputs according to the first adder and the second adder obtain a nm and approximate the sum of m and a nm
- Determining the value of (log e a )*log a (A*B+C*D) includes: determining a target accuracy that needs to be achieved when processing the plurality of original data; and the target accuracy is lower than the first precision In the case, the sum of m and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
- the method further comprises: determining an error compensation value of a nm according to the error compensation table, wherein the error compensation table includes K values and the K Numerical error compensation value, wherein the K values are obtained by dividing [-1, 1] into K shares, and the K error compensation values are obtained by substituting the K values into an error compensation term
- the obtained K and L are integers greater than 1; the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
- the sum of the error compensation values of m+a nm and a nm is approximately (log e a )*log a (A*B+C*
- the value of D) includes: determining a target accuracy that needs to be achieved when processing the plurality of original data; and correcting an error of m+a nm and a nm when the target accuracy is higher than the second precision The sum is approximately determined as the value of (log e a )*log a (A*B+C*D).
- the K is determined based on the target accuracy.
- the m and n inputs according to the first adder and the second adder obtain a nm and approximate the sum of m and a nm
- the value determined as (log e a )*log a (A*B+C*D) includes: shifting a according to nm to obtain a nm ; adding m and a nm to obtain m+a Nm .
- the m and n inputs according to the first adder and the second adder obtain a nm and approximate the sum of m and a nm
- the value determined as (log e a )*log a (A*B+C*D) includes: subtracting m and n to obtain mn or nm; comparing the magnitude relationship of mn or nm with zero; In the case of being equal to zero, m and nm are selected, or, in the case where nm is less than or equal to zero, m and nm are selected.
- the method further comprises: approximating A*B+C*D according to (log e a )*log a (A*B+C*D) Value, wherein the converter is implemented by a hardware circuit.
- the method further comprises: quantizing the value of the A*B+C*D to achieve a preset data bit width.
- FIG. 1 is a schematic flow chart of a method for processing a multiply-and-accumulate operation in the prior art
- FIG. 2 is a schematic block diagram of an apparatus for processing a multiply-and-accumulate operation according to an embodiment of the present application
- FIG. 3 is a schematic block diagram of an apparatus for processing a multiply-and-accumulate operation in an embodiment of the present application
- FIG. 4 is a schematic block diagram of an apparatus for processing a multiply-and-accumulate operation according to an embodiment of the present application
- FIG. 5 is a schematic flowchart of a method for processing a multiply-and-accumulate operation according to an embodiment of the present application
- FIG. 6 is a schematic flowchart of a method for processing a multiply and add operation in an embodiment of the present application.
- FIG. 1 shows a schematic flow chart of a method of processing a multiply-and-accumulate operation in the prior art.
- four multipliers (a first multiplier, a second multiplier, a third multiplier, and a fourth multiplier) respectively multiply four pairs of data to obtain four 32-bit data, and then, The first adder and the second adder respectively add four 32-bit data outputted by the four multipliers to obtain two 32-bit data, and then the third adder and the second adder and the second adder The two 32-bit data output by the adder is added to obtain a 32-bit data, and finally a 32-bit data obtained by the addition is quantized to obtain 16-bit data.
- the prior art proposes a scheme for processing the multiply-and-accumulate operation. This scheme converts data in a linear domain into data in a logarithmic domain, thereby transforming multiplication operations in the linear domain into addition operations in the logarithmic domain.
- the data in the linear domain occupies more bits (for example, 2 x+y and 2 z+w occupy 32 bits of data width), therefore, After converting the data in the log domain into data in the linear domain, it is still necessary to use a high bit width adder to perform the addition, resulting in more resources that the computer still needs to occupy when performing the addition operation.
- the embodiment of the present application proposes a device for processing a multiply-and-accumulate operation, which is capable of converting an addition operation between data of an exponential form of a higher bit width into an addition operation of data of a lower bit width, and is capable of The computational process reduces the use of resources, thereby reducing computational power consumption.
- FIG. 2 is a schematic block diagram of an apparatus for processing data according to an embodiment of the present application.
- the apparatus 200 of Figure 2 includes:
- the first adder 210 is configured to add the input first data and the second data to obtain the first intermediate data, wherein the values of the first data and the second data are log a A and log a B, respectively
- the value of an intermediate data is m, and the first data and the second data are obtained by taking a logarithm of the first original data A and the second original data B of the plurality of original data respectively;
- a second adder 220 configured to add the input third data and the fourth data to obtain second intermediate data, wherein the values of the third data and the fourth data are log a C and log a D, respectively
- the value of the second intermediate data is n
- the third data and the fourth data are obtained by taking the logarithm of the third original data C and the fourth original data D of the plurality of original data respectively, wherein a is greater than 0 and not An integer equal to 1, m and n are real numbers, and m is greater than or equal to n.
- the above raw data may be RGB pixel data when the image is processed.
- the value of a above may be 2.
- the product operation between the original data may be first converted into an addition operation in the logarithmic domain, and then a plurality of intermediate data in an exponential form are obtained.
- the logarithmic adder 230 the input port of the logarithmic adder 230 is connected to the output ports of the first adder 210 and the second adder 220, and the logarithmic adder 230 is used according to the first adder 210 and the second adder 220.
- the input m and n are a nm , and the sum of m and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
- the first adder 210, the second adder 220, and the logarithmic adder 230 described above may be implemented by hardware circuits. Specifically, the first adder 210, the second adder 220, and the logarithmic adder 230 may be based on an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or the like. A variety of hardware circuits are implemented.
- ASIC Application Specific Integrated Circuit
- FPGA Field Programmable Gate Array
- the data operation for converting the high bit width data operation to the low bit width is realized, which can be reduced in the calculation process.
- the addition of the high bit width of a m and a n is converted into m.
- the addition of a low bit width to a nm reduces the occupation of system resources during the calculation process and improves computational efficiency.
- the logarithmic adder 230 may determine the sum of m and a nm to be approximately (log e a )*log a (A*B+C*D), or m and -a nm. The sum is approximately determined as the value of (log e a )*log a (A*BC*D).
- the above multiplication and addition operation is a generalized multiplication and addition operation, and may include an addition operation between products, or may include a subtraction operation between products.
- the multiply-accumulate operation may include A*B+C*D or A*B-C*D.
- the logarithmic adder 230 obtains a nm at m and n input according to the first adder 210 and the second adder 220, and approximates the sum of m and a nm to (log e a )*log
- the value of a (A*B+C*D) specifically includes: determining the target accuracy to be achieved when processing a plurality of original data; and the sum of m and a nm when the target accuracy is lower than the first precision Approximately determined as the value of (log e a )*log a (A*B+C*D).
- the first precision described above may be preset, and when the target accuracy is lower than the first precision, the accuracy required for processing the original data may be considered to be low.
- the accuracy requirement for processing the original data can be determined.
- the m+a nm approximation can be directly determined as (log e a )*log a (A *B+C*D) value. Therefore, the present application can flexibly determine the value of (log e a )*log a (A*B+C*D) according to the precision requirement of processing the original data, can ensure the accuracy requirement of the original data, and improve the operation efficiency.
- the logarithmic adder 230 is specifically configured to: determine an error compensation value of a nm according to the error compensation table, where the error compensation table includes K values and error compensation values of K values, wherein K The value is obtained by dividing [-1,1] into K parts, and K error compensation values are substituted for K values into the error compensation term.
- the obtained K and L are integers greater than 1; the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
- the error compensation value of a nm can be taken into account in determining the value of (log e a )*log a (A*B+C*D), which can further improve the calculation accuracy.
- the K values may be divided into [0, 1] K shares were obtained.
- the K values may be obtained by dividing [-1, 0] into K shares.
- determining the error compensation value of a nm according to the error compensation table may be determining the error compensation value of a nm by querying the error compensation table. Specifically, the error compensation table may first query a value closest to a nm among the K values, and then determine the error compensation value of the value as the error compensation value of a nm .
- the logarithmic adder 230 determines the sum of the error compensation values of m+a nm and a nm to be a value of (log e a )*log a (A*B+C*D), specifically including: determining The target accuracy to be achieved when processing multiple raw data; if the target accuracy is higher than the second precision, the sum of the error compensation values of m+a nm and a nm is approximately (log e a )*log The value of a (A*B+C*D).
- the target precision is higher than the second precision, it can be considered that the precision required for the processing of the original data is high, and at this time, a value of (log e a )*log a (A*B+C*D) can be determined.
- the error compensation value of nm is taken into account to ensure the accuracy of the value of (log e a )*log a (A*B+C*D).
- the second precision described above may be the same as the first precision.
- the logarithmic adder 230 may determine the absolute value of the nm and the first threshold when determining the value of (log e a )*log a (A*B+C*D). Size relationship; if the absolute value of nm is greater than or equal to the first threshold, logarithmic adder 230 may directly determine m as a value of (log e a )*log a (A*B+C*D).
- the first threshold value is 5
- the absolute value is greater than a first threshold nm
- -8 A value much smaller than 10
- the value may be ignored
- a -8, 10 directly determined Is the value of (log e a )*log a (A*B+C*D).
- the logarithmic adder 230 When the absolute value of nm is less than the first threshold, the logarithmic adder 230 still determines the sum of m and a nm approximately as a value of (log e a )*log a (A*B+C*D).
- K is determined based on target accuracy. Specifically, K may be a larger value when the target precision is higher, and K may be a smaller value when the target precision is lower.
- L is determined based on target accuracy.
- L when the value of L is larger, the more the number of items of the error compensation term, the more accurate the error compensation value obtained according to the error compensation term, and the smaller the value of L is, the smaller the number of items of the error compensation term is.
- the flexibility of the original data processing can be flexibly adjusted by flexibly setting the values of K and L.
- the logarithmic adder 230 specifically includes:
- the shift circuit 2301 is configured to perform a shift operation on a according to nm to obtain a nm ;
- Sub-addition circuit 2302 is used to add m and a nm to obtain m+a nm .
- the logarithmic adder 230 further includes:
- the subtraction circuit 2303 is configured to perform subtraction on m and n to obtain m-n or n-m;
- Comparation circuit 2304 for comparing the magnitude relationship of m-n or n-m with zero
- the selection circuit 2305 is configured to select m and n-m if m-n is greater than or equal to zero, or to select m and n-m if n-m is less than or equal to zero.
- the shift circuit 2301 may first acquire nm from the selection circuit 2305 before performing a shift operation on a according to nm, and the sub-addition circuit 2302 may first select from the selection circuit 2305 before adding m and a nm . Get m.
- the subtraction circuit 2303 when the subtraction circuit 2303 performs subtraction on m and n, either one of them may be subtracted and the other may be subtracted, thereby obtaining m-n or n-m.
- the foregoing apparatus 200 further includes: a converter 240, configured to approximate the value of A*B+C*D according to (log e a )*log a (A*B+C*D) .
- the apparatus 200 further includes: a quantizer 250, configured to quantize the value of the A*B+C*D to reach a preset data bit width.
- the converter 240 and the quantizer 250 can be implemented by hardware circuits. Specifically, the converter 240 and the quantizer 250 can be implemented based on hardware circuits such as an ASIC and an FPGA.
- quantification refers to matching data of different bit widths.
- the bit width of the data obtained in the first step is 8 bits
- the bit width required for the second step operation is 5 bits
- 8 The bit data is truncated into 5 bits of data to meet the calculation of the bit width requirement in the second step.
- the specific implementation may be to adjust the maximum value of more than 5 bits of the 8-bit data to the 5-bit maximum value, which will be less than The 5-bit minimum is adjusted to the 5-bit minimum, and the other values are unchanged.
- FIG. 3 is a schematic block diagram of a logarithmic adder 300 for processing a multiply-and-accumulate operation in an embodiment of the present application.
- the logarithmic adder 300 specifically includes a subtraction circuit 310, a comparison circuit 320, a selection circuit 330, a shift circuit 340, an error compensation circuit 350, and an addition circuit 360.
- n and m are the input 5 bits of data (assuming m>n), and sign indicates whether the sign bits of n and m are the same. For example, when sign is 1, it means that a m and a n have the same number, and when sign is 0, it means a.
- the m and a n different numbers herein the case where sign is 1), the specific steps of the device 300 for calculating a m + a n are as follows:
- the subtraction circuit 310 makes a difference between n and m, and obtains n-m or m-n;
- the comparison circuit 320 obtains the result n-m or m-n calculated by the subtraction circuit 310, and compares the size of n-m or m-n with zero;
- the selection circuit 330 selects a larger number m and n-m from n and m according to the magnitude relationship of n-m or m-n and zero;
- the shift circuit 340 performs a shift operation on a according to nm to obtain a nm ;
- the error compensation circuit 350 calculates an error compensation value of a nm .
- the error compensation circuit 350 may specifically be a multiple-selector combination combination circuit.
- the error compensation circuit 350 may also be referred to as an error compensation table, that is, a dotted line portion in the figure.
- error(x) represents the sum of the quadratic term and the high-order term in the expansion, and as long as a sufficiently high number of items are retained, a sufficiently high precision can be ensured.
- the error(a nm ) is expanded according to the Taylor series. According to the accuracy requirement, the higher order items of the third level, the fourth level or more are retained, and the value ranges of x belonging to [-1, 1] are equally divided into K equal parts ( K is a positive integer), and the result is recorded into a K-select 1 selector combination circuit, which is called an error compensation table. For scenes with high computational accuracy requirements, the error compensation value is added to the results of other parts of the logarithmic addition circuit; for scenarios with low computational accuracy requirements, all circuits related to the error compensation table can be turned off, and this part of the function is not used.
- the adder 360 adds the error compensation values of m, a nm, and a nm to obtain a value of (log e a )*log a (a m + a n ).
- the log adder 300 may further be based on (log e a )*log a (a m +a n ) The value is used to determine the value of a m + a n , or the value of a m + a n is not calculated, but the value of (log e a )*log a (a m + a n ) is input to other arithmetic circuits for calculation. .
- the device 400 of FIG. 4 is composed of a central processing unit (CPU), a double data rate synchronous dynamic random access memory (DDR) memory, an AXI bus, and a computing chip.
- the computing chip includes an input buffer module, a calculation engine module, an output control module, and the like.
- the input buffer module is configured to store the input raw data
- the calculation engine module is used to calculate the original data
- the output control module controls the output of the calculation result output by the calculation engine module.
- the apparatus 200 shown in FIG. 2 and the apparatus 300 shown in FIG. 3 may correspond to the computing chip in FIG. 4, which is capable of implementing the processing of data by the apparatus 200 and the apparatus 300 above.
- the above apparatus 200 and apparatus 300 may also directly correspond to the calculation engine module in FIG. 4, which is capable of implementing the processing of data by the apparatus 200 and the apparatus 300 above.
- the above calculation engine module may also be implemented based on a hardware circuit.
- FIG. 5 is a schematic flowchart of a multiplication and addition operation performed by the apparatus for processing multiplication and addition operations in the embodiment of the present application. Specifically, FIG. 5 may specifically represent a schematic flowchart of the above-described multiplication and addition operation of the device 400. It should be understood that FIG. 5 may represent a calculation process of multiplying and accumulating a plurality of data.
- the input buffer module converts image data in the buffered linear domain into data in a logarithmic domain
- the calculation engine module adds the values in the logarithmic domain to calculate a result of multiplying the values in the linear domain;
- the calculation engine module adds the results obtained by multiplying the data in the linear domain, and completes the addition operation of the index through the comparison circuit, the shift circuit, and the error compensation circuit to obtain a processing result.
- the output control module quantizes the data output by the calculation engine module, aligns the data bit width of the next-level operation, and outputs the data.
- steps 502 to 504 may be repeated in the actual calculation process.
- the apparatus for processing the multiply-and-accumulate operation of the embodiment of the present application is described in detail above with reference to FIG. 2 to FIG. 4 .
- the method for processing the multiplication and addition operation of the embodiment of the present application will be described below with reference to FIG. 6 .
- the apparatus for processing multiply-add operation in FIGS. 2 to 4 can implement the processing multiplication and addition operation in FIG. 6, the processing multiplication and addition operation in FIG. 6, and the processing multiplication and addition operation in FIGS. 2 to 5.
- the device is corresponding. For the sake of brevity, the repeated description is appropriately omitted below.
- FIG. 6 is a schematic flowchart of a method for processing data according to an embodiment of the present application.
- the method of FIG. 6 can be performed by the apparatus 200, the apparatus 300, or the apparatus 400 that processes the data described above.
- the method 600 of Figure 6 includes:
- the data operation for converting the high bit width data operation to the low bit width is realized, which can be reduced in the calculation process.
- a m and higher data a n is calculated by adding the bit width and data bit width plus low
- the use of a high bit width adder can be avoided, which can reduce the area of the computing chip and reduce the calculation power consumption.
- the above a may specifically be 2.
- the m and n inputs according to the first adder and the second adder obtain a nm , and the sum of m and a nm is approximated as (log e a )
- the value of *log a (A*B+C*D) includes: determining a target accuracy that needs to be achieved when processing the plurality of original data; and in the case where the target accuracy is lower than the first precision, m and The sum of a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
- the first precision described above may be preset, and when the target accuracy is lower than the first precision, the accuracy required for processing the original data may be considered to be low.
- the accuracy requirement for processing the original data can be determined.
- the m+a nm approximation can be directly determined as (log e a )*log a (A
- the value of *B+C*D) can flexibly determine the value of (log e a )*log a (A*B+C*D) according to the accuracy requirement of processing the original data, and can ensure the accuracy requirement of the original data, and Improve computing efficiency.
- the method 600 further includes: determining an error compensation value of a nm according to the error compensation table, where the error compensation table includes K values and error compensation values of the K values, where The K values are obtained by dividing [-1, 1] into K shares, and the K error compensation values are obtained by substituting the K values into the error compensation term
- the obtained K and L are integers greater than 1; the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
- the error compensation value of a nm can be taken into account in determining the value of (log e a )*log a (A*B+C*D), which can further improve the calculation accuracy.
- the sum of the error compensation values of m+a nm and a nm is approximately determined as a value of (log e a )*log a (A*B+C*D), including: Determining a target accuracy that needs to be achieved when processing the plurality of original data; and if the target accuracy is higher than the second precision, determining a sum of error compensation values of m+a nm and a nm is determined as (log e a )*log a (A*B+C*D) value.
- the target precision is higher than the second precision, it can be considered that the precision required for the processing of the original data is high, and at this time, a value of (log e a )*log a (A*B+C*D) can be determined.
- the error compensation value of nm is taken into account to ensure the accuracy of the value of (log e a )*log a (A*B+C*D).
- the second precision described above may be the same as the first precision.
- the K is determined according to the target accuracy.
- the L is determined according to the target accuracy.
- K When the target precision is high, K can be a large value, and when the target precision is low, K can be a small value.
- L When the value of L is larger, the more the number of items of the error compensation term, the more accurate the error compensation value obtained according to the error compensation term. Therefore, when the target precision is high, L can be a large value, and When the target accuracy is low, L can be a smaller value.
- the m and n inputs according to the first adder and the second adder obtain a nm
- the sum of m and a nm is approximated as (log e a ) *log a (A*B+C*D) values, including: shifting a according to nm to obtain a nm ; adding m and a nm to obtain m+a nm .
- the m and n inputs according to the first adder and the second adder obtain a nm
- the sum of m and a nm is approximated as (log e a ) *log a (A*B+C*D) value, including: subtracting m and n to obtain mn or nm; comparing mn or nm to zero; if mn is greater than or equal to zero, select m and nm, or, for the case where nm is less than or equal to zero, m and nm are selected.
- the method 600 further includes: obtaining a value of A*B+C*D according to (log e a )*log a (A*B+C*D).
- the foregoing method 600 further includes: quantizing the value of the A*B+C*D to reach a preset data bit width.
- the disclosed systems, devices, and methods may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
- the technical solution of the present application which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
- the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
- Image Processing (AREA)
Abstract
La présente invention concerne un dispositif et un procédé pour traiter des opérations de multiplication et d'addition. Le dispositif comprend : un premier additionneur, utilisé pour effectuer une opération d'addition sur des premières données et des secondes données entrées afin d'obtenir des premières données intermédiaires, les valeurs des premières données et des secondes données étant respectivement logaA et logaB; un second additionneur, utilisé pour effectuer une opération d'addition sur des troisièmes données et des quatrièmes données entrées afin d'obtenir des secondes données intermédiaires, les valeurs des troisièmes données et des quatrièmes données étant respectivement logaC et logaD, et la valeur des secondes données intermédiaires étant N; un additionneur logarithmique, utilisé pour obtenir un m selon m et n entrés par le premier additionneur et le second additionneur et déterminer approximativement la somme de m et d'un m en tant que valeur de (loge
a) * loga(A * B + C * D), le premier additionneur, le second additionneur et l'additionneur logarithmique étant mis en œuvre par des circuits matériels. Selon la présente invention, la consommation d'énergie de calcul peut être réduite pendant un processus de calcul.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710269126.2 | 2017-04-24 | ||
CN201710269126.2A CN107220025B (zh) | 2017-04-24 | 2017-04-24 | 处理乘加运算的装置和处理乘加运算的方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018196750A1 true WO2018196750A1 (fr) | 2018-11-01 |
Family
ID=59945435
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/084275 Ceased WO2018196750A1 (fr) | 2017-04-24 | 2018-04-24 | Dispositif permettant de traiter des opérations de multiplication et d'addition et procédé permettant de traiter des opérations de multiplication et d'addition |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107220025B (fr) |
WO (1) | WO2018196750A1 (fr) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107220025B (zh) * | 2017-04-24 | 2020-04-21 | 华为机器有限公司 | 处理乘加运算的装置和处理乘加运算的方法 |
CN110337636A (zh) * | 2018-02-28 | 2019-10-15 | 深圳市大疆创新科技有限公司 | 数据转换方法和装置 |
GB2577132B (en) * | 2018-09-17 | 2021-05-26 | Apical Ltd | Arithmetic logic unit, data processing system, method and module |
US20200125991A1 (en) * | 2018-10-18 | 2020-04-23 | Facebook, Inc. | Optimization of neural networks using hardware calculation efficiency |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996027839A1 (fr) * | 1995-03-03 | 1996-09-12 | Motorola Inc. | Circuit de calcul multicellulaire effectuant des multiplications en parallele |
US5956264A (en) * | 1992-02-29 | 1999-09-21 | Hoefflinger; Bernd | Circuit arrangement for digital multiplication of integers |
US20060101243A1 (en) * | 2004-11-10 | 2006-05-11 | Nvidia Corporation | Multipurpose functional unit with multiply-add and logical test pipeline |
CN105867876A (zh) * | 2016-03-28 | 2016-08-17 | 武汉芯泰科技有限公司 | 一种乘加器、乘加器阵列、数字滤波器及乘加计算方法 |
CN107220025A (zh) * | 2017-04-24 | 2017-09-29 | 华为机器有限公司 | 处理乘加运算的装置和处理乘加运算的方法 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100340972C (zh) * | 2005-06-07 | 2007-10-03 | 北京北方烽火科技有限公司 | 数字自动增益控制中利用现场可编程门阵列实现对数计算的方法 |
JP2008257407A (ja) * | 2007-04-04 | 2008-10-23 | Fujitsu Microelectronics Ltd | 対数演算器及び対数演算方法 |
GB2554167B (en) * | 2014-05-01 | 2019-06-26 | Imagination Tech Ltd | Approximating functions |
CN106528046B (zh) * | 2016-11-02 | 2019-06-07 | 上海集成电路研发中心有限公司 | 长位宽时序累加乘法器 |
-
2017
- 2017-04-24 CN CN201710269126.2A patent/CN107220025B/zh active Active
-
2018
- 2018-04-24 WO PCT/CN2018/084275 patent/WO2018196750A1/fr not_active Ceased
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5956264A (en) * | 1992-02-29 | 1999-09-21 | Hoefflinger; Bernd | Circuit arrangement for digital multiplication of integers |
WO1996027839A1 (fr) * | 1995-03-03 | 1996-09-12 | Motorola Inc. | Circuit de calcul multicellulaire effectuant des multiplications en parallele |
US20060101243A1 (en) * | 2004-11-10 | 2006-05-11 | Nvidia Corporation | Multipurpose functional unit with multiply-add and logical test pipeline |
CN105867876A (zh) * | 2016-03-28 | 2016-08-17 | 武汉芯泰科技有限公司 | 一种乘加器、乘加器阵列、数字滤波器及乘加计算方法 |
CN107220025A (zh) * | 2017-04-24 | 2017-09-29 | 华为机器有限公司 | 处理乘加运算的装置和处理乘加运算的方法 |
Also Published As
Publication number | Publication date |
---|---|
CN107220025B (zh) | 2020-04-21 |
CN107220025A (zh) | 2017-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108021537B (zh) | 一种基于硬件平台的softmax函数计算方法 | |
US11249721B2 (en) | Multiplication circuit, system on chip, and electronic device | |
CN110362292B (zh) | 一种基于近似4-2压缩器的近似乘法运算方法和近似乘法器 | |
US10491239B1 (en) | Large-scale computations using an adaptive numerical format | |
US20190087718A1 (en) | Hardware Implementation of a Deep Neural Network with Variable Output Data Format | |
WO2018196750A1 (fr) | Dispositif permettant de traiter des opérations de multiplication et d'addition et procédé permettant de traiter des opérations de multiplication et d'addition | |
US11074041B2 (en) | Method and system for elastic precision enhancement using dynamic shifting in neural networks | |
CN110222833B (zh) | 一种用于神经网络的数据处理电路 | |
CN111240746A (zh) | 一种浮点数据反量化及量化的方法和设备 | |
CN112732221A (zh) | 用于浮点运算的乘法器、方法、集成电路芯片和计算装置 | |
CN110888623B (zh) | 数据转换方法、乘法器、加法器、终端设备及存储介质 | |
EP2940576B1 (fr) | Fonctions d'approximation | |
US9983850B2 (en) | Shared hardware integer/floating point divider and square root logic unit and associated methods | |
WO2022168604A1 (fr) | Dispositif de calcul d'approximation de fonction softmax, procédé de calcul d'approximation et programme de calcul d'approximation | |
US8346831B1 (en) | Systems and methods for computing mathematical functions | |
US6182100B1 (en) | Method and system for performing a logarithmic estimation within a data processing system | |
JP7665911B2 (ja) | 多入力の浮動小数点数の処理方法、装置、プロセッサ、コンピュータ機器及びコンピュータプログラム | |
CN114860193B (zh) | 一种用于计算Power函数的硬件运算电路及数据处理方法 | |
KR100433131B1 (ko) | 작은 사이즈의 룩업 테이블을 갖는 파이프라인 나눗셈연산기 및 연산방법 | |
US12327178B2 (en) | Neural network accelerator configured to perform operation on logarithm domain | |
CN111984226B (zh) | 一种基于双曲cordic的立方根求解装置及求解方法 | |
Hanuman et al. | Hardware implementation of 24-bit vedic multiplier in 32-bit floating-point divider | |
US20160085508A1 (en) | Optimized structure for hexadecimal and binary multiplier array | |
JP2015015026A (ja) | 様々な数値フォーマットのデータを用いてデータに基づく関数モデルを計算するためのモデル計算ユニット、および制御装置 | |
WO2023004799A1 (fr) | Dispositif électronique et procédé de quantification de réseau neuronal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18792008 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18792008 Country of ref document: EP Kind code of ref document: A1 |