CN109871511B - Rapid calculation method for realizing correlation algorithm based on FPGA - Google Patents
Rapid calculation method for realizing correlation algorithm based on FPGA Download PDFInfo
- Publication number
- CN109871511B CN109871511B CN201910152685.4A CN201910152685A CN109871511B CN 109871511 B CN109871511 B CN 109871511B CN 201910152685 A CN201910152685 A CN 201910152685A CN 109871511 B CN109871511 B CN 109871511B
- Authority
- CN
- China
- Prior art keywords
- data
- search window
- sqrt
- paths
- matching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 60
- 239000011159 matrix material Substances 0.000 claims description 37
- 238000000034 method Methods 0.000 claims description 6
- 238000012935 Averaging Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005314 correlation function Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Image Analysis (AREA)
Abstract
The invention provides a fast calculation method for realizing a correlation algorithm based on an FPGA (field programmable gate array), which is used for reading search window data required by m times of matching by one-time matching calculation, wherein the matching calculation of the read search window data is performed in parallel according to m paths, and m is a multiple of 2. According to the invention, the search window data required by m times of matching is read through 1 time of matching calculation, and the subsequent matching calculation is performed in parallel according to m paths, so that m times of matching are completed through 1 time of matching calculation; and secondly, because the matching calculation cycle times of the related calculation are large, and much time is wasted on data movement, the invention expands the data bit width, changes the movement of 1 data at one time into the movement of m data at one time, and shortens the movement time. The invention realizes the rapid calculation of the correlation algorithm, and compared with the traditional calculation speed, the invention improves the (m multiplied by m).
Description
Technical Field
The invention relates to correlation tracking, in particular to a rapid calculation method for realizing a correlation algorithm based on an FPGA.
Background
The correlation tracking is to determine the target position by using the correlation function of the scene image and the template image selected at the previous moment to determine the best matching position of the two images after digitizing the scene image.
The related tracking based on image matching has the defects of large data volume, complex operation and difficult realization of the requirement of large window real-time processing, for example, when the template size is M×M pixels and the search window size is N×N pixels, the (N-M+1) times of matching calculation needs to be completed in one field of time, and the calculated amount can be seen to be increased in square along with the increase of the search window. The size of the search window will directly affect the performance of the system, such as the environment and the type of target being tracked, to which the image tracker can adapt, so when the tracking window is relatively large, the requirements of the system on hardware are relatively high.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a rapid calculation method for realizing a related algorithm based on an FPGA, which greatly improves the calculation speed.
The invention is realized by the following technical scheme:
a fast calculation method for realizing a correlation algorithm based on FPGA (field programmable gate array) reads search window data required by m times of matching by one-time matching calculation, and the matching calculation of the read search window data is performed in parallel according to m paths, wherein m is a multiple of 2.
Preferably, m is 2, 4 or 8.
Preferably, the method specifically comprises the following steps:
step 1, firstly, inputting external search window data into a search window buffer BRAM, and controlling to read search window n× (n+1) matrix data, wherein n is a multiple of 2; dividing the n multiplied (n+1) matrix data of the search window into n multiplied n matrix data of m paths of search window, respectively carrying out average calculation on the n multiplied n matrix data of the m paths of search window, and storing the result as sub_aver;
the external template window data are input into a template window buffer area BRAM, the template window n multiplied by n matrix data are controlled and read, then the average value calculation is carried out, and the result is stored as mb_aver;
step 2, reading n× (n+1) matrix data of a search window and dividing the matrix data into n×n matrix data of m paths of search windows, subtracting respective average values of n×n matrix data of the m paths of search windows to obtain m paths of data sub_reduce_aver, and performing square and root number opening operation on the m paths of data sub_reduce_aver in parallel to obtain m paths of data sub_sqrt; reading the n multiplied by n matrix data of the template window, subtracting the average value mb_aver to obtain data mb_reduce_aver, and performing square sum root number opening operation in parallel to obtain data mb_sqrt; multiplying and summing the m paths of data sub_reduce_aver and the data mb_reduce_aver to obtain m paths of data molecular;
step 3, comparing and assigning operations are carried out on m paths of data in parallel;
and 4, finishing all matching calculation for k times in the step 1-3 to obtain a matching result, wherein k= (n multiplied by n)/m.
Further, in step 3, the comparison and assignment operation of m paths of data in parallel is specifically: the matching calculation result is 1 when sub_sqrt is equal to 0 and mb_sqrt is equal to 0; the match calculation result is molecular/(sub_sqrt×mb_sqrt) when sub_sqrt is not equal to 0 and mb_sqrt is not equal to 0; the matching results are 0 when sub_sqrt is equal to 0 and mb_sqrt is not equal to 0 or sub_sqrt is not equal to 0 and mb_sqrt is equal to 0; and step 4, obtaining the maximum value of the matching calculation result after all the matching calculation is completed, namely the matching result.
Further, in step 1, the search window n× (n+1) matrix data is transferred to the search window FIFO, and the template window n×n matrix data is transferred to the template window FIFO; in step 2, the search window n× (n+1) matrix data is read from the search window FIFO, and the template window n×n matrix data is read from the template window FIFO.
Further, n is 64, m is 8, and k is 512.
Compared with the prior art, the invention has the following beneficial technical effects:
according to the invention, the search window data required by m times of matching is read through 1 time of matching calculation, and the subsequent matching calculation is performed in parallel according to m paths, so that m times of matching are completed through 1 time of matching calculation; and secondly, because the matching calculation cycle times of the related calculation are large, and much time is wasted on data movement, the invention expands the data bit width, changes the movement of 1 data at one time into the movement of m data at one time, and shortens the movement time. The invention is applicable to scenes with template sizes greater than or equal to (m x 2) x (m x 2). The invention realizes the rapid calculation of the correlation algorithm, and compared with the traditional calculation speed, the invention improves the (m multiplied by m).
Drawings
FIG. 1 is a flow chart of a related algorithm of the present invention;
FIG. 2 is a flowchart of the related algorithm step 1 of the present invention;
FIG. 3 is a flowchart of the related algorithm step 2 of the present invention;
fig. 4 is a flowchart of the correlation algorithm step 3 of the present invention.
Detailed Description
The invention will now be described in further detail with reference to specific examples, which are intended to illustrate, but not to limit, the invention.
The quick calculation method comprises 2 parts, wherein the 1 st part is to complete m times of matching by using 1 time of matching calculation, the 1 time of matching calculation reads search window data (n x (n+1)) required by m times of matching, wherein n is a multiple of 2, and the subsequent matching calculation is performed in parallel according to m paths. The 2 nd part is that the matching calculation cycle times of the related calculation are large, and much time is wasted on data movement, so the invention expands the data bit width, and the movement of 1 data at one time is changed into the movement of m data at one time. The invention is suitable for scenes with the template size of (m multiplied by 2) multiplied by (m multiplied by 2), for example, when the template size is 16 multiplied by 16, m can be taken as 8, and if the template size is smaller than 16 multiplied by 16, the number of data to be moved at one time needs to be reduced.
Referring to the related algorithm flowchart shown in fig. 1, the related algorithm is started first, then a search window required for matching calculation and a read address of a template BRAM are generated once, and the matching calculation is waited for to be completed. And (3) carrying out all matching calculation for k times according to the steps, and finally solving the maximum value in a matching calculation result matrix. The method specifically comprises the following steps:
step 1, as shown in fig. 2 (a), first, external search window data is input into a search window buffer area BRAM, reading of search window n× (n+1) matrix data is controlled according to a related algorithm flow, n× (n+1) matrix data is divided into m paths of n×n matrix data, then, the m paths of n×n matrix data are respectively subjected to averaging operation and the result is stored as sub_aver, and at the same time, n× (n+1) matrix data is transferred into a search window FIFO.
As shown in fig. 2 (b), external template window data is input into a template window buffer BRAM, the template window n×n matrix data is read according to the related algorithm flow control, then the average value calculation is performed, the result is stored as mb_aver, and at the same time, the n×n matrix data is transferred into a template window FIFO.
Step 2, as shown in fig. 3, n× (n+1) matrix data are read from the search window FIFO and divided into m paths of n×n matrix data, and then m paths of n×n matrix data are subtracted by respective average values to obtain m paths of data sub_reduce_aver, and the m paths of data sub_reduce_aver are subjected to square and root number opening operation in parallel to obtain m paths of data sub_sqrt. And reading n multiplied by n matrix data from the template window FIFO, subtracting the average value to obtain data mb_reduce_aver, and performing square sum open root number operation in parallel to obtain data mb_sqrt. And multiplying and summing the m paths of data sub_reduce_aver and the data mb_reduce_aver respectively to obtain a result m paths of data molecular.
Step 3, as shown in fig. 4, m paths of data are subjected to comparison and assignment operation in parallel, and when sub_sqrt is equal to 0 and mb_sqrt is equal to 0, a matching calculation result is 1; the match calculation result is molecular/(sub_sqrt×mb_sqrt) when sub_sqrt is not equal to 0 and mb_sqrt is not equal to 0; the other case matches a result of 0.
And 4, finishing all matching calculation for k times in the step 1-3, and finally solving the maximum value, wherein k= (n multiplied by n)/m.
In the embodiment of the invention, m is 8, n is 64, and k is 512.
The invention relates to a fast calculation method for realizing a correlation algorithm based on an FPGA, which is mainly used in the fields of image tracking, image matching and the like. The data moving time and the circulation times are reduced through the optimization algorithm framework, so that the matching calculation time is shortened. Assuming that the template size is 64×64, the search window size is 128×128, and the clock frequency of the fpga is 100MHz (10 ns), one field of matching of (128-64+1) × (128-64+1) =4225 times is required to be completed according to the conventional calculation, the data shift length is 64×64=4096, and the time taken to be 4225×4096×10ns= 173056000 ns=173 ms. If the fast calculation according to the present invention is to complete (128-64) × (128-64)/8=512 matches for one field, the data move length is 64×8=4512, and the time taken is 512×512×10 ns= 2621440 ns=2.6 ms.
Claims (3)
1. A fast calculation method for realizing a correlation algorithm based on an FPGA is characterized in that search window data required by m times of matching are read by one-time matching calculation, the matching calculation of the read search window data is performed in parallel according to m paths, and m is a multiple of 2;
the method specifically comprises the following steps:
step 1, firstly, inputting external search window data into a search window buffer area BRAM, and controlling to read search window n× (n+1) matrix data, wherein n is a multiple of 2; dividing the n multiplied (n+1) matrix data of the search window into n multiplied n matrix data of m paths of search window, respectively carrying out average calculation on the n multiplied n matrix data of the m paths of search window, and storing the result as sub_aver;
inputting external template window data into a template window buffer BRAM, controlling and reading template window n multiplied by n matrix data, transferring the template window n multiplied by n matrix data into a template window FIFO, then carrying out average calculation and saving the result as mb_aver;
step 2, reading search window n× (n+1) matrix data from a search window FIFO and dividing the search window n× (n+1) matrix data into m paths of search window n×n matrix data, then subtracting respective average values of the m paths of search window n×n matrix data to obtain m paths of data sub_reduce_aver, and performing square and root number opening operation on the m paths of data sub_sqrt in parallel; reading the n multiplied by n matrix data of the template window from the FIFO of the template window, subtracting the average value mb_aver to obtain data mb_reduce_aver, and performing square sum root number opening operation to obtain data mb_sqrt; multiplying and summing the m paths of data sub_reduce_aver and the data mb_reduce_aver to obtain m paths of data molecular;
step 3, comparing and assigning operations are carried out on m paths of data in parallel; the method specifically comprises the following steps: the matching calculation result is 1 when sub_sqrt is equal to 0 and mb_sqrt is equal to 0; the match calculation result is molecular/(sub_sqrt×mb_sqrt) when sub_sqrt is not equal to 0 and mb_sqrt is not equal to 0; the matching results are 0 when sub_sqrt is equal to 0 and mb_sqrt is not equal to 0 or sub_sqrt is not equal to 0 and mb_sqrt is equal to 0;
and 4, finishing all matching calculation for k times in the step 1-3, and solving the maximum value of the matching calculation result to obtain the matching result, wherein k= (n multiplied by n)/m.
2. The method for fast computing a correlation algorithm based on an FPGA of claim 1, wherein m is 2, 4 or 8.
3. The method for fast computing a correlation algorithm based on an FPGA of claim 1, wherein n is 64, m is 8, and k is 512.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910152685.4A CN109871511B (en) | 2019-02-28 | 2019-02-28 | Rapid calculation method for realizing correlation algorithm based on FPGA |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910152685.4A CN109871511B (en) | 2019-02-28 | 2019-02-28 | Rapid calculation method for realizing correlation algorithm based on FPGA |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN109871511A CN109871511A (en) | 2019-06-11 |
| CN109871511B true CN109871511B (en) | 2023-08-01 |
Family
ID=66919507
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910152685.4A Active CN109871511B (en) | 2019-02-28 | 2019-02-28 | Rapid calculation method for realizing correlation algorithm based on FPGA |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109871511B (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2004145476A (en) * | 2002-10-22 | 2004-05-20 | Hiroshima Industrial Promotion Organization | Matching arithmetic circuit |
| CN104537657A (en) * | 2014-12-23 | 2015-04-22 | 西安交通大学 | Laser speckle image depth perception method implemented through parallel search GPU acceleration |
| CN107292291A (en) * | 2017-07-19 | 2017-10-24 | 北京智芯原动科技有限公司 | A kind of vehicle identification method and system |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6970509B2 (en) * | 2001-07-31 | 2005-11-29 | Wis Technologies, Inc. | Cell array and method of multiresolution motion estimation and compensation |
-
2019
- 2019-02-28 CN CN201910152685.4A patent/CN109871511B/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2004145476A (en) * | 2002-10-22 | 2004-05-20 | Hiroshima Industrial Promotion Organization | Matching arithmetic circuit |
| CN104537657A (en) * | 2014-12-23 | 2015-04-22 | 西安交通大学 | Laser speckle image depth perception method implemented through parallel search GPU acceleration |
| CN107292291A (en) * | 2017-07-19 | 2017-10-24 | 北京智芯原动科技有限公司 | A kind of vehicle identification method and system |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109871511A (en) | 2019-06-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20210073569A1 (en) | Pooling device and pooling method | |
| CN108573305B (en) | Data processing method, equipment and device | |
| CN106204660A (en) | A kind of Ground Target Tracking device of feature based coupling | |
| CN110321888B (en) | FPGA-based satellite-borne infrared small target detection method | |
| JP7163372B2 (en) | Target tracking method and device, electronic device and storage medium | |
| JP2011229030A (en) | System and method for image processing, recording medium, and program | |
| KR102199912B1 (en) | Data Augmentation based Robust Object Recognition Method and System | |
| CN117726921B (en) | ORB feature extraction accelerator based on stream processing and non-blocking | |
| CN109871511B (en) | Rapid calculation method for realizing correlation algorithm based on FPGA | |
| CN109215001B (en) | High temperature difference self-adaptive platform histogram equalization implementation method based on FPGA | |
| CN108282597A (en) | A kind of real-time target tracing system and method based on FPGA | |
| CN114254740B (en) | Convolution neural network accelerated calculation method, calculation system, chip and receiver | |
| CN110991609A (en) | Line buffer for improved data transfer efficiency | |
| US11580617B2 (en) | Method of matching images to be merged and data processing device performing the same | |
| CN109035178A (en) | A kind of multi-parameter value tuning method applied to image denoising | |
| Yin et al. | A low-hardware-overhead, high-energy-efficiency, and end-to-end CNN-based feature extraction accelerator for mobile visual SLAM | |
| CN110322388B (en) | Pooling method and apparatus, pooling system, and computer-readable storage medium | |
| CN115952844A (en) | System and method for improving hardware accelerator performance in neural network applications | |
| Zhang et al. | Bucket-FEM: A bucket-based architecture of real-time ORB feature extraction and matching for embedded SLAM applications | |
| US10983978B2 (en) | Method for updating relational index, storage medium and electronic device | |
| Ding et al. | Multi-scale FAST feature extraction heterogeneous design based on FPGA | |
| US7453761B2 (en) | Method and system for low cost line buffer system design | |
| CN105827550A (en) | Method and device for determining target parameter through use of sliding window | |
| Xin et al. | Design and Implementation of Target Tracking System in Low Illumination Environment Based on FPGA | |
| CN113688703B (en) | Low-latency non-maximum suppression method and device based on FPGA |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |