[go: up one dir, main page]

CN109871511B - Rapid calculation method for realizing correlation algorithm based on FPGA - Google Patents

Rapid calculation method for realizing correlation algorithm based on FPGA Download PDF

Info

Publication number
CN109871511B
CN109871511B CN201910152685.4A CN201910152685A CN109871511B CN 109871511 B CN109871511 B CN 109871511B CN 201910152685 A CN201910152685 A CN 201910152685A CN 109871511 B CN109871511 B CN 109871511B
Authority
CN
China
Prior art keywords
data
search window
sqrt
paths
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910152685.4A
Other languages
Chinese (zh)
Other versions
CN109871511A (en
Inventor
付小伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Standard Information Technology Co ltd
Original Assignee
Xi'an Standard Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Standard Information Technology Co ltd filed Critical Xi'an Standard Information Technology Co ltd
Priority to CN201910152685.4A priority Critical patent/CN109871511B/en
Publication of CN109871511A publication Critical patent/CN109871511A/en
Application granted granted Critical
Publication of CN109871511B publication Critical patent/CN109871511B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a fast calculation method for realizing a correlation algorithm based on an FPGA (field programmable gate array), which is used for reading search window data required by m times of matching by one-time matching calculation, wherein the matching calculation of the read search window data is performed in parallel according to m paths, and m is a multiple of 2. According to the invention, the search window data required by m times of matching is read through 1 time of matching calculation, and the subsequent matching calculation is performed in parallel according to m paths, so that m times of matching are completed through 1 time of matching calculation; and secondly, because the matching calculation cycle times of the related calculation are large, and much time is wasted on data movement, the invention expands the data bit width, changes the movement of 1 data at one time into the movement of m data at one time, and shortens the movement time. The invention realizes the rapid calculation of the correlation algorithm, and compared with the traditional calculation speed, the invention improves the (m multiplied by m).

Description

Rapid calculation method for realizing correlation algorithm based on FPGA
Technical Field
The invention relates to correlation tracking, in particular to a rapid calculation method for realizing a correlation algorithm based on an FPGA.
Background
The correlation tracking is to determine the target position by using the correlation function of the scene image and the template image selected at the previous moment to determine the best matching position of the two images after digitizing the scene image.
The related tracking based on image matching has the defects of large data volume, complex operation and difficult realization of the requirement of large window real-time processing, for example, when the template size is M×M pixels and the search window size is N×N pixels, the (N-M+1) times of matching calculation needs to be completed in one field of time, and the calculated amount can be seen to be increased in square along with the increase of the search window. The size of the search window will directly affect the performance of the system, such as the environment and the type of target being tracked, to which the image tracker can adapt, so when the tracking window is relatively large, the requirements of the system on hardware are relatively high.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a rapid calculation method for realizing a related algorithm based on an FPGA, which greatly improves the calculation speed.
The invention is realized by the following technical scheme:
a fast calculation method for realizing a correlation algorithm based on FPGA (field programmable gate array) reads search window data required by m times of matching by one-time matching calculation, and the matching calculation of the read search window data is performed in parallel according to m paths, wherein m is a multiple of 2.
Preferably, m is 2, 4 or 8.
Preferably, the method specifically comprises the following steps:
step 1, firstly, inputting external search window data into a search window buffer BRAM, and controlling to read search window n× (n+1) matrix data, wherein n is a multiple of 2; dividing the n multiplied (n+1) matrix data of the search window into n multiplied n matrix data of m paths of search window, respectively carrying out average calculation on the n multiplied n matrix data of the m paths of search window, and storing the result as sub_aver;
the external template window data are input into a template window buffer area BRAM, the template window n multiplied by n matrix data are controlled and read, then the average value calculation is carried out, and the result is stored as mb_aver;
step 2, reading n× (n+1) matrix data of a search window and dividing the matrix data into n×n matrix data of m paths of search windows, subtracting respective average values of n×n matrix data of the m paths of search windows to obtain m paths of data sub_reduce_aver, and performing square and root number opening operation on the m paths of data sub_reduce_aver in parallel to obtain m paths of data sub_sqrt; reading the n multiplied by n matrix data of the template window, subtracting the average value mb_aver to obtain data mb_reduce_aver, and performing square sum root number opening operation in parallel to obtain data mb_sqrt; multiplying and summing the m paths of data sub_reduce_aver and the data mb_reduce_aver to obtain m paths of data molecular;
step 3, comparing and assigning operations are carried out on m paths of data in parallel;
and 4, finishing all matching calculation for k times in the step 1-3 to obtain a matching result, wherein k= (n multiplied by n)/m.
Further, in step 3, the comparison and assignment operation of m paths of data in parallel is specifically: the matching calculation result is 1 when sub_sqrt is equal to 0 and mb_sqrt is equal to 0; the match calculation result is molecular/(sub_sqrt×mb_sqrt) when sub_sqrt is not equal to 0 and mb_sqrt is not equal to 0; the matching results are 0 when sub_sqrt is equal to 0 and mb_sqrt is not equal to 0 or sub_sqrt is not equal to 0 and mb_sqrt is equal to 0; and step 4, obtaining the maximum value of the matching calculation result after all the matching calculation is completed, namely the matching result.
Further, in step 1, the search window n× (n+1) matrix data is transferred to the search window FIFO, and the template window n×n matrix data is transferred to the template window FIFO; in step 2, the search window n× (n+1) matrix data is read from the search window FIFO, and the template window n×n matrix data is read from the template window FIFO.
Further, n is 64, m is 8, and k is 512.
Compared with the prior art, the invention has the following beneficial technical effects:
according to the invention, the search window data required by m times of matching is read through 1 time of matching calculation, and the subsequent matching calculation is performed in parallel according to m paths, so that m times of matching are completed through 1 time of matching calculation; and secondly, because the matching calculation cycle times of the related calculation are large, and much time is wasted on data movement, the invention expands the data bit width, changes the movement of 1 data at one time into the movement of m data at one time, and shortens the movement time. The invention is applicable to scenes with template sizes greater than or equal to (m x 2) x (m x 2). The invention realizes the rapid calculation of the correlation algorithm, and compared with the traditional calculation speed, the invention improves the (m multiplied by m).
Drawings
FIG. 1 is a flow chart of a related algorithm of the present invention;
FIG. 2 is a flowchart of the related algorithm step 1 of the present invention;
FIG. 3 is a flowchart of the related algorithm step 2 of the present invention;
fig. 4 is a flowchart of the correlation algorithm step 3 of the present invention.
Detailed Description
The invention will now be described in further detail with reference to specific examples, which are intended to illustrate, but not to limit, the invention.
The quick calculation method comprises 2 parts, wherein the 1 st part is to complete m times of matching by using 1 time of matching calculation, the 1 time of matching calculation reads search window data (n x (n+1)) required by m times of matching, wherein n is a multiple of 2, and the subsequent matching calculation is performed in parallel according to m paths. The 2 nd part is that the matching calculation cycle times of the related calculation are large, and much time is wasted on data movement, so the invention expands the data bit width, and the movement of 1 data at one time is changed into the movement of m data at one time. The invention is suitable for scenes with the template size of (m multiplied by 2) multiplied by (m multiplied by 2), for example, when the template size is 16 multiplied by 16, m can be taken as 8, and if the template size is smaller than 16 multiplied by 16, the number of data to be moved at one time needs to be reduced.
Referring to the related algorithm flowchart shown in fig. 1, the related algorithm is started first, then a search window required for matching calculation and a read address of a template BRAM are generated once, and the matching calculation is waited for to be completed. And (3) carrying out all matching calculation for k times according to the steps, and finally solving the maximum value in a matching calculation result matrix. The method specifically comprises the following steps:
step 1, as shown in fig. 2 (a), first, external search window data is input into a search window buffer area BRAM, reading of search window n× (n+1) matrix data is controlled according to a related algorithm flow, n× (n+1) matrix data is divided into m paths of n×n matrix data, then, the m paths of n×n matrix data are respectively subjected to averaging operation and the result is stored as sub_aver, and at the same time, n× (n+1) matrix data is transferred into a search window FIFO.
As shown in fig. 2 (b), external template window data is input into a template window buffer BRAM, the template window n×n matrix data is read according to the related algorithm flow control, then the average value calculation is performed, the result is stored as mb_aver, and at the same time, the n×n matrix data is transferred into a template window FIFO.
Step 2, as shown in fig. 3, n× (n+1) matrix data are read from the search window FIFO and divided into m paths of n×n matrix data, and then m paths of n×n matrix data are subtracted by respective average values to obtain m paths of data sub_reduce_aver, and the m paths of data sub_reduce_aver are subjected to square and root number opening operation in parallel to obtain m paths of data sub_sqrt. And reading n multiplied by n matrix data from the template window FIFO, subtracting the average value to obtain data mb_reduce_aver, and performing square sum open root number operation in parallel to obtain data mb_sqrt. And multiplying and summing the m paths of data sub_reduce_aver and the data mb_reduce_aver respectively to obtain a result m paths of data molecular.
Step 3, as shown in fig. 4, m paths of data are subjected to comparison and assignment operation in parallel, and when sub_sqrt is equal to 0 and mb_sqrt is equal to 0, a matching calculation result is 1; the match calculation result is molecular/(sub_sqrt×mb_sqrt) when sub_sqrt is not equal to 0 and mb_sqrt is not equal to 0; the other case matches a result of 0.
And 4, finishing all matching calculation for k times in the step 1-3, and finally solving the maximum value, wherein k= (n multiplied by n)/m.
In the embodiment of the invention, m is 8, n is 64, and k is 512.
The invention relates to a fast calculation method for realizing a correlation algorithm based on an FPGA, which is mainly used in the fields of image tracking, image matching and the like. The data moving time and the circulation times are reduced through the optimization algorithm framework, so that the matching calculation time is shortened. Assuming that the template size is 64×64, the search window size is 128×128, and the clock frequency of the fpga is 100MHz (10 ns), one field of matching of (128-64+1) × (128-64+1) =4225 times is required to be completed according to the conventional calculation, the data shift length is 64×64=4096, and the time taken to be 4225×4096×10ns= 173056000 ns=173 ms. If the fast calculation according to the present invention is to complete (128-64) × (128-64)/8=512 matches for one field, the data move length is 64×8=4512, and the time taken is 512×512×10 ns= 2621440 ns=2.6 ms.

Claims (3)

1. A fast calculation method for realizing a correlation algorithm based on an FPGA is characterized in that search window data required by m times of matching are read by one-time matching calculation, the matching calculation of the read search window data is performed in parallel according to m paths, and m is a multiple of 2;
the method specifically comprises the following steps:
step 1, firstly, inputting external search window data into a search window buffer area BRAM, and controlling to read search window n× (n+1) matrix data, wherein n is a multiple of 2; dividing the n multiplied (n+1) matrix data of the search window into n multiplied n matrix data of m paths of search window, respectively carrying out average calculation on the n multiplied n matrix data of the m paths of search window, and storing the result as sub_aver;
inputting external template window data into a template window buffer BRAM, controlling and reading template window n multiplied by n matrix data, transferring the template window n multiplied by n matrix data into a template window FIFO, then carrying out average calculation and saving the result as mb_aver;
step 2, reading search window n× (n+1) matrix data from a search window FIFO and dividing the search window n× (n+1) matrix data into m paths of search window n×n matrix data, then subtracting respective average values of the m paths of search window n×n matrix data to obtain m paths of data sub_reduce_aver, and performing square and root number opening operation on the m paths of data sub_sqrt in parallel; reading the n multiplied by n matrix data of the template window from the FIFO of the template window, subtracting the average value mb_aver to obtain data mb_reduce_aver, and performing square sum root number opening operation to obtain data mb_sqrt; multiplying and summing the m paths of data sub_reduce_aver and the data mb_reduce_aver to obtain m paths of data molecular;
step 3, comparing and assigning operations are carried out on m paths of data in parallel; the method specifically comprises the following steps: the matching calculation result is 1 when sub_sqrt is equal to 0 and mb_sqrt is equal to 0; the match calculation result is molecular/(sub_sqrt×mb_sqrt) when sub_sqrt is not equal to 0 and mb_sqrt is not equal to 0; the matching results are 0 when sub_sqrt is equal to 0 and mb_sqrt is not equal to 0 or sub_sqrt is not equal to 0 and mb_sqrt is equal to 0;
and 4, finishing all matching calculation for k times in the step 1-3, and solving the maximum value of the matching calculation result to obtain the matching result, wherein k= (n multiplied by n)/m.
2. The method for fast computing a correlation algorithm based on an FPGA of claim 1, wherein m is 2, 4 or 8.
3. The method for fast computing a correlation algorithm based on an FPGA of claim 1, wherein n is 64, m is 8, and k is 512.
CN201910152685.4A 2019-02-28 2019-02-28 Rapid calculation method for realizing correlation algorithm based on FPGA Active CN109871511B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910152685.4A CN109871511B (en) 2019-02-28 2019-02-28 Rapid calculation method for realizing correlation algorithm based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910152685.4A CN109871511B (en) 2019-02-28 2019-02-28 Rapid calculation method for realizing correlation algorithm based on FPGA

Publications (2)

Publication Number Publication Date
CN109871511A CN109871511A (en) 2019-06-11
CN109871511B true CN109871511B (en) 2023-08-01

Family

ID=66919507

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910152685.4A Active CN109871511B (en) 2019-02-28 2019-02-28 Rapid calculation method for realizing correlation algorithm based on FPGA

Country Status (1)

Country Link
CN (1) CN109871511B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004145476A (en) * 2002-10-22 2004-05-20 Hiroshima Industrial Promotion Organization Matching arithmetic circuit
CN104537657A (en) * 2014-12-23 2015-04-22 西安交通大学 Laser speckle image depth perception method implemented through parallel search GPU acceleration
CN107292291A (en) * 2017-07-19 2017-10-24 北京智芯原动科技有限公司 A kind of vehicle identification method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6970509B2 (en) * 2001-07-31 2005-11-29 Wis Technologies, Inc. Cell array and method of multiresolution motion estimation and compensation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004145476A (en) * 2002-10-22 2004-05-20 Hiroshima Industrial Promotion Organization Matching arithmetic circuit
CN104537657A (en) * 2014-12-23 2015-04-22 西安交通大学 Laser speckle image depth perception method implemented through parallel search GPU acceleration
CN107292291A (en) * 2017-07-19 2017-10-24 北京智芯原动科技有限公司 A kind of vehicle identification method and system

Also Published As

Publication number Publication date
CN109871511A (en) 2019-06-11

Similar Documents

Publication Publication Date Title
US20210073569A1 (en) Pooling device and pooling method
CN108573305B (en) Data processing method, equipment and device
CN106204660A (en) A kind of Ground Target Tracking device of feature based coupling
CN110321888B (en) FPGA-based satellite-borne infrared small target detection method
JP7163372B2 (en) Target tracking method and device, electronic device and storage medium
JP2011229030A (en) System and method for image processing, recording medium, and program
KR102199912B1 (en) Data Augmentation based Robust Object Recognition Method and System
CN117726921B (en) ORB feature extraction accelerator based on stream processing and non-blocking
CN109871511B (en) Rapid calculation method for realizing correlation algorithm based on FPGA
CN109215001B (en) High temperature difference self-adaptive platform histogram equalization implementation method based on FPGA
CN108282597A (en) A kind of real-time target tracing system and method based on FPGA
CN114254740B (en) Convolution neural network accelerated calculation method, calculation system, chip and receiver
CN110991609A (en) Line buffer for improved data transfer efficiency
US11580617B2 (en) Method of matching images to be merged and data processing device performing the same
CN109035178A (en) A kind of multi-parameter value tuning method applied to image denoising
Yin et al. A low-hardware-overhead, high-energy-efficiency, and end-to-end CNN-based feature extraction accelerator for mobile visual SLAM
CN110322388B (en) Pooling method and apparatus, pooling system, and computer-readable storage medium
CN115952844A (en) System and method for improving hardware accelerator performance in neural network applications
Zhang et al. Bucket-FEM: A bucket-based architecture of real-time ORB feature extraction and matching for embedded SLAM applications
US10983978B2 (en) Method for updating relational index, storage medium and electronic device
Ding et al. Multi-scale FAST feature extraction heterogeneous design based on FPGA
US7453761B2 (en) Method and system for low cost line buffer system design
CN105827550A (en) Method and device for determining target parameter through use of sliding window
Xin et al. Design and Implementation of Target Tracking System in Low Illumination Environment Based on FPGA
CN113688703B (en) Low-latency non-maximum suppression method and device based on FPGA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant