CN109871511B

CN109871511B - Rapid calculation method for realizing correlation algorithm based on FPGA

Info

Publication number: CN109871511B
Application number: CN201910152685.4A
Authority: CN
Inventors: 付小伟
Original assignee: Xi'an Standard Information Technology Co ltd
Current assignee: Xi'an Standard Information Technology Co ltd
Priority date: 2019-02-28
Filing date: 2019-02-28
Publication date: 2023-08-01
Anticipated expiration: 2039-02-28
Also published as: CN109871511A

Abstract

The invention provides a fast calculation method for realizing a correlation algorithm based on an FPGA (field programmable gate array), which is used for reading search window data required by m times of matching by one-time matching calculation, wherein the matching calculation of the read search window data is performed in parallel according to m paths, and m is a multiple of 2. According to the invention, the search window data required by m times of matching is read through 1 time of matching calculation, and the subsequent matching calculation is performed in parallel according to m paths, so that m times of matching are completed through 1 time of matching calculation; and secondly, because the matching calculation cycle times of the related calculation are large, and much time is wasted on data movement, the invention expands the data bit width, changes the movement of 1 data at one time into the movement of m data at one time, and shortens the movement time. The invention realizes the rapid calculation of the correlation algorithm, and compared with the traditional calculation speed, the invention improves the (m multiplied by m).

Description

Rapid calculation method for realizing correlation algorithm based on FPGA

Technical Field

The invention relates to correlation tracking, in particular to a rapid calculation method for realizing a correlation algorithm based on an FPGA.

Background

The correlation tracking is to determine the target position by using the correlation function of the scene image and the template image selected at the previous moment to determine the best matching position of the two images after digitizing the scene image.

The related tracking based on image matching has the defects of large data volume, complex operation and difficult realization of the requirement of large window real-time processing, for example, when the template size is M×M pixels and the search window size is N×N pixels, the (N-M+1) times of matching calculation needs to be completed in one field of time, and the calculated amount can be seen to be increased in square along with the increase of the search window. The size of the search window will directly affect the performance of the system, such as the environment and the type of target being tracked, to which the image tracker can adapt, so when the tracking window is relatively large, the requirements of the system on hardware are relatively high.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a rapid calculation method for realizing a related algorithm based on an FPGA, which greatly improves the calculation speed.

The invention is realized by the following technical scheme:

a fast calculation method for realizing a correlation algorithm based on FPGA (field programmable gate array) reads search window data required by m times of matching by one-time matching calculation, and the matching calculation of the read search window data is performed in parallel according to m paths, wherein m is a multiple of 2.

Preferably, m is 2, 4 or 8.

Preferably, the method specifically comprises the following steps:

step 1, firstly, inputting external search window data into a search window buffer BRAM, and controlling to read search window n× (n+1) matrix data, wherein n is a multiple of 2; dividing the n multiplied (n+1) matrix data of the search window into n multiplied n matrix data of m paths of search window, respectively carrying out average calculation on the n multiplied n matrix data of the m paths of search window, and storing the result as sub_aver;

the external template window data are input into a template window buffer area BRAM, the template window n multiplied by n matrix data are controlled and read, then the average value calculation is carried out, and the result is stored as mb_aver;

step 2, reading n× (n+1) matrix data of a search window and dividing the matrix data into n×n matrix data of m paths of search windows, subtracting respective average values of n×n matrix data of the m paths of search windows to obtain m paths of data sub_reduce_aver, and performing square and root number opening operation on the m paths of data sub_reduce_aver in parallel to obtain m paths of data sub_sqrt; reading the n multiplied by n matrix data of the template window, subtracting the average value mb_aver to obtain data mb_reduce_aver, and performing square sum root number opening operation in parallel to obtain data mb_sqrt; multiplying and summing the m paths of data sub_reduce_aver and the data mb_reduce_aver to obtain m paths of data molecular;

step 3, comparing and assigning operations are carried out on m paths of data in parallel;

and 4, finishing all matching calculation for k times in the step 1-3 to obtain a matching result, wherein k= (n multiplied by n)/m.

Further, in step 3, the comparison and assignment operation of m paths of data in parallel is specifically: the matching calculation result is 1 when sub_sqrt is equal to 0 and mb_sqrt is equal to 0; the match calculation result is molecular/(sub_sqrt×mb_sqrt) when sub_sqrt is not equal to 0 and mb_sqrt is not equal to 0; the matching results are 0 when sub_sqrt is equal to 0 and mb_sqrt is not equal to 0 or sub_sqrt is not equal to 0 and mb_sqrt is equal to 0; and step 4, obtaining the maximum value of the matching calculation result after all the matching calculation is completed, namely the matching result.

Further, in step 1, the search window n× (n+1) matrix data is transferred to the search window FIFO, and the template window n×n matrix data is transferred to the template window FIFO; in step 2, the search window n× (n+1) matrix data is read from the search window FIFO, and the template window n×n matrix data is read from the template window FIFO.

Further, n is 64, m is 8, and k is 512.

Compared with the prior art, the invention has the following beneficial technical effects:

according to the invention, the search window data required by m times of matching is read through 1 time of matching calculation, and the subsequent matching calculation is performed in parallel according to m paths, so that m times of matching are completed through 1 time of matching calculation; and secondly, because the matching calculation cycle times of the related calculation are large, and much time is wasted on data movement, the invention expands the data bit width, changes the movement of 1 data at one time into the movement of m data at one time, and shortens the movement time. The invention is applicable to scenes with template sizes greater than or equal to (m x 2) x (m x 2). The invention realizes the rapid calculation of the correlation algorithm, and compared with the traditional calculation speed, the invention improves the (m multiplied by m).

Drawings

FIG. 1 is a flow chart of a related algorithm of the present invention;

FIG. 2 is a flowchart of the related algorithm step 1 of the present invention;

FIG. 3 is a flowchart of the related algorithm step 2 of the present invention;

fig. 4 is a flowchart of the correlation algorithm step 3 of the present invention.

Detailed Description

The invention will now be described in further detail with reference to specific examples, which are intended to illustrate, but not to limit, the invention.

The quick calculation method comprises 2 parts, wherein the 1 st part is to complete m times of matching by using 1 time of matching calculation, the 1 time of matching calculation reads search window data (n x (n+1)) required by m times of matching, wherein n is a multiple of 2, and the subsequent matching calculation is performed in parallel according to m paths. The 2 nd part is that the matching calculation cycle times of the related calculation are large, and much time is wasted on data movement, so the invention expands the data bit width, and the movement of 1 data at one time is changed into the movement of m data at one time. The invention is suitable for scenes with the template size of (m multiplied by 2) multiplied by (m multiplied by 2), for example, when the template size is 16 multiplied by 16, m can be taken as 8, and if the template size is smaller than 16 multiplied by 16, the number of data to be moved at one time needs to be reduced.

Referring to the related algorithm flowchart shown in fig. 1, the related algorithm is started first, then a search window required for matching calculation and a read address of a template BRAM are generated once, and the matching calculation is waited for to be completed. And (3) carrying out all matching calculation for k times according to the steps, and finally solving the maximum value in a matching calculation result matrix. The method specifically comprises the following steps:

step 1, as shown in fig. 2 (a), first, external search window data is input into a search window buffer area BRAM, reading of search window n× (n+1) matrix data is controlled according to a related algorithm flow, n× (n+1) matrix data is divided into m paths of n×n matrix data, then, the m paths of n×n matrix data are respectively subjected to averaging operation and the result is stored as sub_aver, and at the same time, n× (n+1) matrix data is transferred into a search window FIFO.

As shown in fig. 2 (b), external template window data is input into a template window buffer BRAM, the template window n×n matrix data is read according to the related algorithm flow control, then the average value calculation is performed, the result is stored as mb_aver, and at the same time, the n×n matrix data is transferred into a template window FIFO.

Step 2, as shown in fig. 3, n× (n+1) matrix data are read from the search window FIFO and divided into m paths of n×n matrix data, and then m paths of n×n matrix data are subtracted by respective average values to obtain m paths of data sub_reduce_aver, and the m paths of data sub_reduce_aver are subjected to square and root number opening operation in parallel to obtain m paths of data sub_sqrt. And reading n multiplied by n matrix data from the template window FIFO, subtracting the average value to obtain data mb_reduce_aver, and performing square sum open root number operation in parallel to obtain data mb_sqrt. And multiplying and summing the m paths of data sub_reduce_aver and the data mb_reduce_aver respectively to obtain a result m paths of data molecular.

Step 3, as shown in fig. 4, m paths of data are subjected to comparison and assignment operation in parallel, and when sub_sqrt is equal to 0 and mb_sqrt is equal to 0, a matching calculation result is 1; the match calculation result is molecular/(sub_sqrt×mb_sqrt) when sub_sqrt is not equal to 0 and mb_sqrt is not equal to 0; the other case matches a result of 0.

And 4, finishing all matching calculation for k times in the step 1-3, and finally solving the maximum value, wherein k= (n multiplied by n)/m.

In the embodiment of the invention, m is 8, n is 64, and k is 512.

The invention relates to a fast calculation method for realizing a correlation algorithm based on an FPGA, which is mainly used in the fields of image tracking, image matching and the like. The data moving time and the circulation times are reduced through the optimization algorithm framework, so that the matching calculation time is shortened. Assuming that the template size is 64×64, the search window size is 128×128, and the clock frequency of the fpga is 100MHz (10 ns), one field of matching of (128-64+1) × (128-64+1) =4225 times is required to be completed according to the conventional calculation, the data shift length is 64×64=4096, and the time taken to be 4225×4096×10ns= 173056000 ns=173 ms. If the fast calculation according to the present invention is to complete (128-64) × (128-64)/8=512 matches for one field, the data move length is 64×8=4512, and the time taken is 512×512×10 ns= 2621440 ns=2.6 ms.

Claims

1. A fast calculation method for realizing a correlation algorithm based on an FPGA is characterized in that search window data required by m times of matching are read by one-time matching calculation, the matching calculation of the read search window data is performed in parallel according to m paths, and m is a multiple of 2;

the method specifically comprises the following steps:

step 1, firstly, inputting external search window data into a search window buffer area BRAM, and controlling to read search window n× (n+1) matrix data, wherein n is a multiple of 2; dividing the n multiplied (n+1) matrix data of the search window into n multiplied n matrix data of m paths of search window, respectively carrying out average calculation on the n multiplied n matrix data of the m paths of search window, and storing the result as sub_aver;

inputting external template window data into a template window buffer BRAM, controlling and reading template window n multiplied by n matrix data, transferring the template window n multiplied by n matrix data into a template window FIFO, then carrying out average calculation and saving the result as mb_aver;

step 2, reading search window n× (n+1) matrix data from a search window FIFO and dividing the search window n× (n+1) matrix data into m paths of search window n×n matrix data, then subtracting respective average values of the m paths of search window n×n matrix data to obtain m paths of data sub_reduce_aver, and performing square and root number opening operation on the m paths of data sub_sqrt in parallel; reading the n multiplied by n matrix data of the template window from the FIFO of the template window, subtracting the average value mb_aver to obtain data mb_reduce_aver, and performing square sum root number opening operation to obtain data mb_sqrt; multiplying and summing the m paths of data sub_reduce_aver and the data mb_reduce_aver to obtain m paths of data molecular;

step 3, comparing and assigning operations are carried out on m paths of data in parallel; the method specifically comprises the following steps: the matching calculation result is 1 when sub_sqrt is equal to 0 and mb_sqrt is equal to 0; the match calculation result is molecular/(sub_sqrt×mb_sqrt) when sub_sqrt is not equal to 0 and mb_sqrt is not equal to 0; the matching results are 0 when sub_sqrt is equal to 0 and mb_sqrt is not equal to 0 or sub_sqrt is not equal to 0 and mb_sqrt is equal to 0;

and 4, finishing all matching calculation for k times in the step 1-3, and solving the maximum value of the matching calculation result to obtain the matching result, wherein k= (n multiplied by n)/m.

2. The method for fast computing a correlation algorithm based on an FPGA of claim 1, wherein m is 2, 4 or 8.

3. The method for fast computing a correlation algorithm based on an FPGA of claim 1, wherein n is 64, m is 8, and k is 512.