CN115189675B

CN115189675B - A high-order FIR digital filter based on reduced calculation

Info

Publication number: CN115189675B
Application number: CN202210873320.2A
Authority: CN
Inventors: 宋宇鲲; 刘文娟; 蔡阳; 曾树铭; 张多利; 倪伟
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2022-07-22
Filing date: 2022-07-22
Publication date: 2025-09-02
Anticipated expiration: 2042-07-22
Also published as: CN115189675A

Abstract

The present invention proposes a high-order FIR digital filter based on reduced calculation. The filter taps are n single-bit binary random coefficients. During each sampling period, the filter inputs a sampling point and calculates the product of the n sampling points and the previous n-1 sampling points, which are composed of n sampling points and n binary random coefficients. The multiplication result of the sampling point and the binary random coefficient is specified: if the coefficient is 1, the multiplication result is the value of the original sampling point; otherwise, the multiplication result is the inverse of the sampling point. Based on this reduced calculation, the present invention designs a high-order FIR digital filter based on a data broadcast structure. The sampling point and the binary random coefficient are combined into n items. The partial sum of the combined items is a finite number of items. A lookup table with a small depth is used to compress the large number of repeated additions required to calculate the partial sum at each addition node. Each addition node of the filter obtains the corresponding partial sum by matching the address of each table entry in the lookup table, which greatly reduces the use of adders.

Description

High-order FIR digital filter based on protocol calculation

Technical Field

The invention relates to the technical field of time domain capturing, in particular to a high-order FIR digital filter based on protocol calculation.

Background

At present, the FIR digital filter is widely applied in the fields of signal processing and the like, and can ensure strict phase characteristics when the amplitude characteristics are met in the design. Along with the increasing requirements of the system on speed and real-time performance, the research of the high-performance filter has important theoretical significance and engineering value.

The time domain expression of the FIR digital filter is:

h (N) is a coefficient of the filter, x (N-i) is an input sampling point, y (N) is a multiplication and accumulation result of the input sampling point and the coefficient of the filter, and N is an order of the filter. The FIR digital filter consists of a limited number of sampling points, and convolution sum operation in the left formula is simplified into multiplication and accumulation results of N sampling points and N coefficients in the right formula under each sampling period.

In practice, the larger the order of the FIR, the larger the coefficient length corresponding to h (n), and the higher the hardware complexity of the final implementation. The FPGA implementation of the prior high-order FIR digital filter mainly comprises an FIR digital filter based on a MAC structure, an FIR digital filter based on a distributed algorithm and an FIR digital filter based on FFT. The limitations of FPGA implementation of these three FIR digital filters were analyzed separately as follows:

First, for a FIR digital filter based on a distributed algorithm, the input N-bit data is disassembled into N-bit binary numbers, and a lookup table with a depth of 2 ^N is built based on all multiplication results of the N filter coefficients and the N-bit binary numbers. While this approach saves multiplier resource consumption and can be optimized by partitioning the look-up table to reduce the look-up table resource consumption and increasing the number of look-up tables to increase the computation speed, the look-up table resource consumption is still significant for higher order FIR digital filters.

Second, for the FIR digital filter based on FFT, the convolution operation of the sampling point and coefficient input by the filter is transformed into multiplication operation of frequency domain by FFT, and finally the multiplication result is restored into time domain sequence by IFFT. While the resources of the FPGA occupied are somewhat reduced, its system performance is very limited by its data operation delay.

Thirdly, for the FIR digital filter based on the multiplier structure, particularly for the high-order FIR digital filter based on the data broadcasting structure, firstly, the input data is broadcasted to each multiplier and then sent to each adding node for delay accumulation, so that the real-time performance and the high efficiency of data processing are well ensured. FPGAs present significant advantages in terms of processing speed. However, for the high-order FIR digital filter, the number of adders in the hardware implementation is huge in order to ensure the real-time performance and high efficiency of the processing speed. Therefore, it is necessary to save hardware resource consumption for implementing the higher order FIR digital filter by compressing the adder.

Disclosure of Invention

The invention provides a high-order FIR digital filter based on protocol calculation, which is characterized by comprising a combination module, a lookup table, a broadcasting module, a Gate node, a Pipeline module, a hard-wired module, an addition node and a buffer module, wherein the lookup table is used for storing the combination of the high-order FIR digital filter and the hard-wired module;

the combination module carries out multiple combinations on the sampling points;

the lookup table stores partial sums formed by the multiple combined sampling points and corresponding addresses through protocol calculation according to the order of small addresses and large addresses, and the addresses are binary numbers after multiple combination.

The broadcasting module is used for transmitting the part in the lookup table and the address to the corresponding Gate node;

The Gate node stores the partial sum and adds a flag bit to the partial sum;

The Pipeline module is connected with the Gate node, and outputs the data output by the Gate node in the period after synchronizing;

the hard-wired module is connected with the Pipeline module, and simultaneously outputs the part of the sum or the opposite number of the part of the sum to the addition node according to the zone bit for the value of each addition node which is input in a time-sharing way;

The addition node adopts a time-sharing multiplexing strategy and is used for accumulating the input data from the hard-wired module and the input data of the cache module and registering the accumulated result in the cache module;

The buffer module is arranged between the addition points, and the buffer module buffers the data with preset period and inputs the data into the adjacent addition nodes.

Furthermore, the combination module firstly combines n items of new sampling points in the current sampling period with n sampling points in the previous n-1 sampling periods, n items of combination are also needed for the binary random coefficients of the filter with the original length of m, and the partial sum formed by the sampling points and the binary random coefficients after the n items of combination through protocol calculation is calculated.

Further, the address of the lookup table is fully listed by 2 ^n-1 possible sequences of n-bit binary numbers with the highest address of 0, and each address corresponds to a partial sum formed by the n-bit binary numbers and the current n sampling points through reduction calculation.

Furthermore, the number of the Gate nodes is m/n, the Gate nodes are provided with node addresses, and the node addresses are determined by binary random coefficients corresponding to the positions of the Gate nodes after n combinations;

If the main working frequency of the system is k times of the sampling frequency, each Gate node of the broadcasting module compares the n-bit node address of the Gate node with the n-bit lookup table address of the current broadcasting in every 1/2 ^n-1 cycles if k is more than 2 ^n-1, otherwise, each Gate node of the broadcasting module compares the n-bit node address of the Gate node with the 2 ^n-1/k n-bit lookup table addresses of the current broadcasting in every 1/k cycles if k is less than 2 ^n-1;

If the n-bit Gate node address is the opposite number of the n-bit lookup table address currently broadcast, the Gate node saves the partial sum corresponding to the lookup table address and adds a bit flag bit 0 at the highest bit during output, and if the n-bit Gate node address is the opposite number of the n-bit lookup table address currently broadcast, the Gate node saves the partial sum corresponding to the lookup table address and adds a bit flag bit 1 at the highest bit during output.

Further, the hard-wired module comprises a shift register set, an inverting unit and a one-out-of-two multiplexer, wherein the shift register set is used for inputting values from each node of the Pipeline module; the inverting unit is used for generating the opposite numbers of the input parts and the data output by the shift register set are respectively and directly input into the one-out-of-two multiplexer and the one-out-of-two multiplexer after passing through the inverting unit;

The flag bit output by the shift register set is used as a flag signal of the alternative multiplexer, if the flag signal is 0, the alternative multiplexer output part and the opposite number of the alternative multiplexer output part and the entering addition node enter the addition node, and if the flag signal is 1.

Furthermore, the buffer module is arranged between two addition nodes and is composed of a C1-Cn+1 common n+1 level shift register group, and 1 sampling period is divided into k system working periods;

In the 1 st to the k-1 st system working periods, each Cn+1 system working period receives the result of the previous adder through the last register in the stage, and shifts upwards in the stage, the result is wholly shifted into Cn until the kth system working period, and the data input in the kth system working period directly enters the last register of Cn;

After 1 sampling period, C2-Cn moves data to the next stage register, and each system working period of C1 outputs the result to the adder through the first register of the stage and shifts upwards in the stage.

The beneficial effects of the invention are as follows:

The invention realizes a high-order FIR digital filter based on protocol calculation, n-term combination is carried out on sampling points and binary random coefficients, the rule that the combined partial sum is a limited term is utilized, a lookup table with smaller depth is used for compressing the calculated part at each addition node and a large number of repeated addition needed, so that each addition node of the filter is matched with the address of each table item of the lookup table to obtain the corresponding partial sum, and the adder is greatly compressed for use. For higher order FIR digital filters with larger orders, the present invention saves more significant advantages of the adder.

Drawings

FIG. 1 is a block diagram of a high-order FIR digital filter based on protocol computation according to an embodiment of the present invention;

Fig. 2 is a block diagram of a hard-wired module in a high-order FIR digital filter based on protocol computation according to an embodiment of the present invention.

Fig. 3 is a block diagram of a single summing node input/output within a higher order FIR digital filter based on protocol computation according to an embodiment of the present invention.

Detailed Description

The technical scheme of the present invention will be described in more detail with reference to the accompanying drawings, and the present invention includes, but is not limited to, the following examples.

The invention aims to provide a high-order FIR digital filter based on protocol calculation, which comprises a combination module, a lookup table, a broadcasting module, a Gate node, a pipeline module, a hard-wired module, an addition node and a buffer module.

The combination module combines the sampling point X _i+n under the new sampling period with the sampling point X _i+n-1,...,X_i+2,X_i+1 of the previous n-1 sampling periods by n items, the operation is equivalent to a shift register group with the length of n, the sampling point X _i+n under the new sampling period enters from the lower side of the shift register group, the sampling point of the corresponding previous n-1 sampling periods slides upwards by 1 length, X ₀ slides upwards out of the shift register group, and for the binary random coefficient of the filter, the binary coefficient of the filter with the original length of m is grouped into n items according to the sequence from left to right.

The lookup table combines n terms of the sampling point X _i+n in the new sampling period with the sampling point X _i+n-1,...,X_i+2,X_i+1 in the first n-1 sampling periods, and then calculates 2 ⁿ possible partial sums formed by n-bit binary numbers through reduction.

To further compress the resource, the sum of the parts corresponding to the address 0 and the address 2 ^n-1 -1 is the opposite number, and the sum of the parts corresponding to the address 1 and the address 2 ^n-1 -2 is also the opposite number, so that the lookup table entry to be established is only the sum of the parts with the highest address of 0, and the number of the corresponding lookup table entry can be reduced from 2 ⁿ to 2 ^n-1.

A lookup table with depth of 2 ^n-1 is built, wherein, the lookup table is fully enumerated by 2 ^n-1 possible sequences with n-bit binary numbers and the address most significant bit of 0, and each table entry address corresponds to the partial sum formed by the n-bit binary numbers and the current n sampling points through reduction calculation.

Specifically, according to the protocol calculation feature, if the binary coefficient is 0000, the sum of the binary coefficient and the sampling point ABCD is accumulated into a partial sum y1= - (ABCD), if the binary coefficient is 1111, the sum of the binary coefficient and the sampling point ABCD is accumulated into a partial sum y2=abcd, it can be seen that Y1 and Y2 are in an opposite number relationship, so if the broadcast address of the current period is 0000, the address of the gate node is 1111, the gate node stores the partial sum corresponding to the address 0000, and a flag bit is output at the highest bit, which indicates that the partial sum actually needs to be transferred by the gate node is the opposite number of the currently stored partial sum. In summary, only the partial sums corresponding to the 0xxx addresses need to be established.

When broadcasting, the broadcasting module matches the address corresponding to the table item with the address of each adding node of the Gate node, and if the matching is successful, the corresponding Gate node adding node obtains the corresponding partial sum.

The number of Gate nodes is m/n, and the node addresses of the Gate nodes are respectively determined by binary random coefficients corresponding to the positions of the Gate nodes after n combinations.

Specifically, assuming that the system operating main frequency is k times of the sampling frequency, if k is greater than 2 ^n-1, each Gate node of the broadcasting module compares the n-bit node address with the n-bit lookup table address currently broadcast in every 1/2 ^n-1 cycles, otherwise if k is less than 2 ^n-1, each Gate node of the broadcasting module compares the n-bit node address with the 2 ^n-1/k n-bit lookup table addresses currently broadcast in every 1/k cycles;

If the n-bit Gate node address is the opposite number of the currently broadcast n-bit lookup table address, the Gate node saves the corresponding partial sum of the lookup table address and adds a bit flag bit 0 to the highest bit during output, and if the n-bit Gate node address is the opposite number of the currently broadcast n-bit lookup table address, the Gate node saves the corresponding partial sum of the lookup table address and adds a bit flag bit 1 to the highest bit during output.

Up to this point, 2 ^n-1 parts and the parts received between the addition nodes are fully enumerated by a lookup table and broadcast to the addition nodes, a large number of repeated items are needed in the part and the parts received between the addition nodes, and the lookup table with the depth of 2 ^n-1 is used for compressing a large number of repeated addition after n items of combination of each addition node and a binary random coefficient are respectively carried out.

The number of nodes in the Pipeline module is the same as the number of Gate nodes, and the nodes are m/n and are in one-to-one correspondence. The Pipeline module is used for synchronizing the output value of each Gate node to the hard-wired module due to the time difference of the output part of each Gate node.

The hard-wired module comprises a shift register group, an inverting unit and an alternative multiplexer. If the system working main frequency is k times of the sampling frequency, a time-sharing multiplexing strategy is adopted for the addition nodes, wherein m/n nodes from the Pipeline module are divided into k groups, and the output of (m/n)/k nodes from the Pipeline module is calculated under 1/k system working periods, so that the total consumption of the number of the addition nodes is only (m/n)/k.

The shift register group is a first-in first-out memory used for acquiring the value output by the Pipeline module from the input end according to the time-sharing calculation sequence of the addition node, and the marker bit data of the output end is directly connected with the one-out-of-two multiplexer and is partially and respectively input into the one-out-of-two multiplexer and the inverting unit.

The inverting unit is used for generating the opposite number of the input part sum, and the opposite number is input into the alternative multiplexer.

The flag bit output by the shift register group is used as a flag signal of the alternative multiplexer, if the flag signal is 0, the alternative multiplexer output part and the opposite number of the alternative multiplexer output part and the entering addition node enter the addition node, and if the flag signal is 1.

Thus, by the device, the length of the original filter is changed from m to (m/n)/k, and the number of addition nodes of the corresponding filter is (m/n)/k.

The buffer module is arranged between the addition nodes, realizes the concatenation of the addition nodes, buffers the data of n sampling periods and inputs the data into the next addition node.

Specifically, n-term combination is performed on the sampled data and the binary random coefficient of the filter, and according to the register retiming principle, 1 sampling period delay between addition nodes in the original filter needs to be changed into n sampling period delay, so that C1-Cn total n-level register is needed, and because k time division multiplexing processing of system working periods is adopted on the addition nodes, cn+1-level register is needed to be additionally added. Thus, C1-Cn-1 respectively form the delay of one sampling period, and Cn and Cn+1 form the delay of one sampling period.

The 1 sampling period is now divided into k system operating periods, and the number of shift registers in each stage of registers corresponds to a single system operating period. Then in the 1 st to k-1 st system working period, C2-Cn is kept, and the whole shift is carried out to the next stage register in the k system working period, and meanwhile the data of the previous stage whole is received.

For Cn+1, in the 1 st to k-1 st system working periods, each Cn+1 system working period receives the result of the previous adder through the last register in the stage, and shifts upwards in the stage, the whole is moved into Cn until the k system working period, and the data input in the k system working period directly enter the last register of Cn.

It is noted that since Cn+1 requires one system duty cycle to write the result of the previous stage to the last register in the present stage, cn+1 only writes k-1 number during the kth cycle, and then Cn+1 is shifted to Cn as a whole during the kth cycle.

In one embodiment, the length of the original FIR digital filter is 4096, the system operating frequency is 96MHz, and each 12MHz sampling frequency inputs a sample data with a bit width of 16 bits, which requires a "multiplication" accumulation result of 4096 16 bit sampling points and 4096 1 bit binary random coefficients to be calculated at each 12MHz sampling frequency. The binary random coefficient corresponds to the coefficient of the FIR filter, the value is 0 or 1, and the input sampling point corresponds to the data input of the FIR filter. The result of the 'multiplication' of the 1 sampling point and the 1-bit binary coefficient is that the multiplication result is the value of the sampling point when the coefficient is 1, otherwise, the multiplication result is that the value of the sampling point takes the opposite number.

After new sampling data is input in each sampling period by the FIR filter based on the data broadcasting structure before combination, the new sampling data is broadcast to 4096 addition nodes, and the value received by each addition node is determined to be the value of the current sampling point or the value obtained by taking the opposite number from the current sampling point according to the single-bit binary random coefficient 0 or 1 at the corresponding position. At this time, the number of adders to be used in each sampling period is 4096, so that the consumption of adder resources is excessive in this case.

Thus, in the present embodiment, the operation of combining the sampling point and the binary random coefficient of the filter by 4 items respectively is selected, namely, the sampling point X _i+4 under the new sampling period and the sampling point X _i+3,X_i+2,X_i+1 of the first 3 sampling periods form 4 sampling point combinations, which is equivalent to a shift register group with the length of 4, the sampling point X _i+4 under the new sampling period enters from the lower side of the shift register group, the sampling point of the corresponding first 3 periods slides upwards by 1 length, and X ₀ slides upwards out of the shift register group, and meanwhile, the binary coefficient of the filter with the length of 4096 is grouped by 4 items in sequence from left to right. After the 4-item combination operation, the length of the filter is reduced to 1024 from 4096.

Second, all possible results of the reduction calculation using a combination of 4 sampling points per sampling period and a 4-bit binary random number have a rule of at most 16 possible results, so that the sum of the calculated parts between the addition nodes must have a large number of repeated terms. In order to further save adder resources, only the table entry of the corresponding parts of the addresses 0000 to 0111 is required to be established, and the corresponding parts of the addresses 1000 to 1111 can be directly obtained by directly reversing the corresponding address parts. In sum, 24 adders consumed by each table item in the lookup table with depth of 8 are used for further compressing the repeated addition after 4 items of combination of each addition node and the binary random coefficient respectively, so that each addition node of the filter obtains corresponding partial sum through matching with the address of each table item of the lookup table.

As shown in figure 1, in view of the 8-frequency multiplication relation between the sampling frequency 12MHz and the system operating frequency 96MHz, partial sums in the compressed lookup table entries can be generated in 8 working periods and are broadcast to 1024 Gate nodes in a time-sharing manner according to the sequence of addresses 0000 to 0111, and the broadcast content is divided into two parts, wherein one part is a 4-bit broadcast coefficient and corresponds to the address corresponding to each table entry of the lookup table, and the other part is an 18-bit partial sum corresponding to each table entry address of the lookup table. Specifically, when each table entry of the lookup table is divided into periods to broadcast the address and the corresponding partial sums, each Gate node compares the 4-bit address with the 4-bit broadcast coefficient in each period, if the two are equal, the Gate node outputs the corresponding partial sums, and adds a bit of flag bit 0 at the highest bit during output for marking the received weighted sums, and then the weighted sums can be sent to an addition node for direct accumulation, otherwise, if the 4-bit local coefficient is equal to the 4-bit broadcast coefficient and inverted according to the bit, the Gate node outputs the corresponding partial sums, and adds a bit of flag bit 1 at the highest bit during output for marking the received weighted sums to be sent to the addition node for accumulation after the opposite number is needed to be obtained. The data format output by each Gate node is {1 bit flag bit, 18 bit part and }. In view of the fact that the partial sums output by all Gate nodes have time differences, in order to ensure that all addition nodes can receive corresponding partial sums and sign signals at the same time, when all Gate nodes receive the sign signals and the corresponding partial sums and output, a level-one Pipeline module is added between the Gate nodes and the addition node level to perform level-one buffering, all Gate node output signals are synchronized, and the number of nodes in the Pipeline module is 1024 identical to that of Gate nodes and corresponds to that of Gate nodes one by one.

As shown in FIG. 2, the hardwired module comprises a shift register set, an inverting unit and an alternative multiplexer. Because the system working main frequency 96MHz is 8 times of the sampling frequency 12MHz, 1 sampling period is divided into 8 system working periods, a time-sharing multiplexing strategy is adopted for the adding nodes, 128 adding nodes calculate the output of 1024 nodes from the Pipeline module in 8 periods, the number of registers in the shift register set is 8, the values from all nodes of the Pipeline module enter the shift register set from the right in a time-sharing manner according to the sequence of the calculating periods, and the values entering from adjacent periods move leftwards. The inverting unit is used for generating the opposite number of the input part sum, wherein the flag bit output by the shift register set is used as a flag signal of the alternative multiplexer, if the flag signal is 0, the alternative multiplexer output part sum enters the adding node, and if the flag signal is 1, the opposite number of the alternative multiplexer output part sum enters the adding node.

As shown in fig. 3, the buffer module is disposed between the addition nodes, and realizes the concatenation of the addition nodes, and the buffer module buffers the data of 4 periods and inputs the data into the next addition node.

Specifically, because 4 combinations are respectively performed on the sampled data and the binary random coefficients of the filter, according to the register retiming principle, the delay of 1 sampling period between the addition nodes in the original filter needs to be changed into the delay of 4 sampling periods, so that four-stage registers of C1, C2, C3 and C4 are needed, and because 8-period time division multiplexing processing is adopted for the addition nodes, a C5-stage register is needed to be additionally added. Thus, C1, C2, C3 each constitute a delay of one sample period, and C4 and C5 constitute a delay of one sample period.

The 1 sampling period is now divided into 8 system operating periods, and the number of shift registers in each stage of registers corresponds to a single system operating period. Then C2, C3, C4 remain in the 1 st through 7 th system cycles, shifting to the next stage register global by the 8 th system cycle while receiving the data of the previous stage global.

For C5, during the 1 st to 7 th system cycles, each C5 system cycle receives the result of the previous adder through the last register in the stage and shifts up in the stage, shifting in its entirety into C4 by the 8 th system cycle.

For C1, each system duty cycle outputs the result to the adder through the first register of the stage and shifts up in the stage.

It is noted that since C5 requires a system duty cycle to write the result of the previous stage to the last register in this stage, C5 only writes 7 numbers during the 8 th cycle, and when the 8 th cycle shifts C5 overall to C4,

The present invention is not limited to the above embodiments, and those skilled in the art can implement the present invention in various other embodiments according to the examples and the disclosure of the drawings, so that the design of the present invention is simply changed or modified while adopting the design structure and concept of the present invention, and the present invention falls within the scope of protection.

Claims

1. A high-order FIR digital filter based on reduced calculation, characterized in that the high-order FIR digital filter includes a combination module, a lookup table, a broadcast module, a gate node, a pipeline module, a hard-wired module, an addition node and a cache module;

The combination module performs multiple combinations of sampling points;

The lookup table stores the partial sums formed by reducing the multiple combination sampling points and the corresponding addresses in order of addresses from small to large, wherein the addresses are binary numbers after the multiple combinations;

The broadcast module is used to transmit the portion in the lookup table and the address to the corresponding Gate node;

The Gate node saves the partial sum and adds a flag bit to the partial sum;

The Pipeline module is connected to the Gate node, and synchronizes and outputs the data output by the Gate node within a cycle;

The hard-wired module is connected to the Pipeline module, and simultaneously outputs the partial sum or the inverse of the partial sum to the addition node according to the flag bit for the time-sharing input value of each addition node;

The adding node adopts a time-division multiplexing strategy to accumulate the input data from the hard-wired module and the input data from the cache module, and stores the accumulated result in the cache module;

The cache module is arranged between each addition node, and the cache module caches data of a preset period and then inputs the data into the adjacent addition node.

2. The high-order FIR digital filter according to claim 1 is characterized in that the combination module first performs n-item combinations on the new sampling points in the current sampling period and the n sampling points in the previous n-1 sampling periods, a total of n items; the binary random coefficients of the filter with an original length of m also need to be combined in n items, and the partial sum formed by the reduced calculation of the sampling points and the binary random coefficients after the n-item combination is calculated.

3. The high-order FIR digital filter according to claim 2, wherein the addresses of the lookup table are a complete list of 2n ^-1 possible orderings of n-bit binary numbers with the most significant bit of the address being 0; each of the addresses corresponds to a partial sum formed by the reduced calculation of the n-bit binary number and the current n sampling points.

4. The high-order FIR digital filter according to claim 3, wherein the number of the gate nodes is m/n, the gate nodes are provided with node addresses, and the node addresses are determined by the binary random coefficients corresponding to the locations thereof after n combinations;

Assuming that the system operating main frequency is k times the sampling frequency, if k> ^2n-1 , the broadcast module compares each Gate node's n-bit node address with the currently broadcast n-bit lookup table address every 1/ ^2n-1 cycle; otherwise, if k<2n ^-1 , the broadcast module compares each Gate node's n-bit node address with the currently broadcast 2n ^-1 /k n-bit lookup table addresses every 1/k cycle;

If the n-bit Gate node address is equal to the currently broadcast n-bit lookup table address, this Gate node saves the partial sum corresponding to the lookup table address and adds a flag bit 0 to the highest bit when outputting; if the n-bit Gate node address is the opposite of the currently broadcast n-bit lookup table address, this Gate node saves the partial sum corresponding to the lookup table address and adds a flag bit 1 to the highest bit when outputting.

5. The high-order FIR digital filter according to claim 4, wherein the hard-wired module comprises a shift register group, an inverting unit, and a two-to-one multiplexer: the shift register group is used to input values from each node of the pipeline module; the inverting unit is used to generate the inverse of the input partial sum; the partial sum data output by the shift register group are respectively input directly into the two-to-one multiplexer and then input into the two-to-one multiplexer after passing through the inverting unit;

The flag bit output by the shift register group serves as a flag signal of a two-to-one multiplexer. If the flag signal is 0, the two-to-one multiplexer outputs a partial sum that enters the addition node; if the flag signal is 1, the opposite of the two-to-one multiplexer outputs a partial sum that enters the addition node.

6. The high-order FIR digital filter according to claim 4, wherein the buffer module is arranged between two addition nodes and is composed of a shift register group of n+1 stages from C1 to Cn+1, and one sampling period is divided into k system working periods;

During the 1st to k-1th system working cycles, Cn+1 receives the result of the previous adder through the last register in this stage in each system working cycle, and shifts it upward within the stage. When it reaches the kth system working cycle, it is shifted into Cn as a whole, and the data input during the kth system working cycle directly enters the last register of Cn;

After one sampling cycle, C2~Cn move data to the next level register. C1 outputs the result to the adder through the first register of this level in each system working cycle and shifts it upward within the level.