[go: up one dir, main page]

CN109446478B - Complex covariance matrix calculation system based on iteration and reconfigurable mode - Google Patents

Complex covariance matrix calculation system based on iteration and reconfigurable mode Download PDF

Info

Publication number
CN109446478B
CN109446478B CN201811284263.4A CN201811284263A CN109446478B CN 109446478 B CN109446478 B CN 109446478B CN 201811284263 A CN201811284263 A CN 201811284263A CN 109446478 B CN109446478 B CN 109446478B
Authority
CN
China
Prior art keywords
complex
covariance matrix
matrix
data
bank
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811284263.4A
Other languages
Chinese (zh)
Other versions
CN109446478A (en
Inventor
李丽
陈辉
傅玉祥
陈沁雨
何国强
何书专
李伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201811284263.4A priority Critical patent/CN109446478B/en
Publication of CN109446478A publication Critical patent/CN109446478A/en
Application granted granted Critical
Publication of CN109446478B publication Critical patent/CN109446478B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

本发明涉及基于迭代和可重构方式的复协方差矩阵计算系统,包括片上SRAM存储器、片外DDR存储器、可重构单元、DMA控制器以及加速核,所述加速核包括:矩阵协方差运算模块,通过迭代计算方式轮询片上SRAM存储器的各区域源数据,并计算出下三角协方差矩阵;共轭对称模块,根据协方差矩阵的共轭对称性质,将下三角协方差矩阵通过地址映射和重构存储的方式得出完整的复协方差矩阵,形成最终的运算结果;DMA接口函数模块,将通过DMA方式从片外DDR存储器读入的数据按分区方式存入片上SRAM存储器。有益效果:本发明支持任意列数的复矩阵进行协方差运算,降低了传统硬件实现方式的源数据计算量以及多次将结果数据写回DDR的时间。

Figure 201811284263

The invention relates to a complex covariance matrix computing system based on an iterative and reconfigurable manner, including on-chip SRAM memory, off-chip DDR memory, reconfigurable unit, DMA controller and acceleration core, the acceleration core includes: matrix covariance calculation The module polls the source data of each area of the on-chip SRAM memory through iterative calculation, and calculates the lower triangular covariance matrix; the conjugate symmetric module, according to the conjugate symmetry property of the covariance matrix, maps the lower triangular covariance matrix through the address mapping The complete complex covariance matrix is obtained by reconstructing the storage method to form the final operation result; the DMA interface function module stores the data read from the off-chip DDR memory by DMA into the on-chip SRAM memory by partition. Beneficial effects: the present invention supports the covariance operation of complex matrices with any number of columns, which reduces the calculation amount of the source data in the traditional hardware implementation and the time for writing the result data back to the DDR for many times.

Figure 201811284263

Description

Complex covariance matrix calculation system based on iteration and reconfigurable mode
Technical Field
The invention relates to the technical field of computers, in particular to a complex covariance matrix calculation system based on iteration and reconfigurable modes.
Background
The calculation of the covariance matrix is typical operation in the field of signal processing, is a key part for realizing multi-level nested wiener filters, spatial spectrum estimation, coherent source number estimation and affine invariant pattern recognition, and is widely applied to the fields of radar, sonar, digital image processing and the like. In addition, the covariance matrix is widely applied to the fields of image matching, image pattern recognition and the like, but the calculation process of the covariance matrix is complex, and if the covariance matrix of a multidimensional random variable composed of a plurality of eigenvectors in each region of an image needs to be calculated, the time consumption is long, and the calculation of the covariance matrix of the image on a personal universal PC platform in real time is also a great obstacle.
With the rapid development of the integrated circuit industry, high performance and real time are constantly pursued in the embedded field. Currently, most of the hardware implementation for the complex covariance matrix is designed based on platforms such as a DSP, a GPU, and an FPGA. The invention provides a hardware implementation method for computing a complex covariance matrix based on an iteration and reconfigurable mode based on a reconfigurable intelligent acceleration core, and compared with the traditional hardware implementation mode, the hardware implementation method has the advantages of high resource utilization rate and high hardware implementation speed. As typical operation in the field of signal processing, the hardware implementation method has good reference significance and wide application prospect.
In statistics and probability theory, the covariance matrix is a matrix. Each element of which is the covariance between the elements of the respective vector. Let X ═ X1,X2,X3,...,XN)TFor n-dimensional random variables, called matrices
Figure BDA0001847970200000011
Covariance matrix, denoted as d (X), for n-dimensional random variable X. Wherein, Cij=Cov(Xi,Xj) I, j is 1, …, n is the component X of XiAnd XjThe covariance of (a). Due to Cij=CjiTherefore, the covariance matrix is a symmetric matrix.
As shown in fig. 1, the conventional hardware implementation is as follows: for an mxn complex matrix a ═ aij]Its complex covariance matrix B may be composed ofThe following formula gives: b ═ AAH. Namely, firstly, the matrix A is transposed, and then conjugate is solved to obtain AHFinally, each element of the matrix B is obtained as follows:
Figure BDA0001847970200000012
because the method is realized according to matrix multiplication, when the covariance of a complex matrix with a large number of points is calculated, data is carried in and carried out for multiple times, which can cause the access and storage time to be overlong, and the realization method does not reduce the operation amount according to the conjugate symmetry property of the covariance matrix, which are reasons of long calculation time of the covariance matrix. In many practical application scenarios, the low-efficiency calculation of the covariance matrix can become a great obstacle, and the method has certain reference significance and application prospect.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a complex covariance matrix calculation system based on iteration and reconfigurable modes, which effectively reduces the calculation amount of source data, fully utilizes storage resources, accelerates the calculation speed, and further integrally improves the performance of an application algorithm, and is specifically realized by the following technical scheme:
the complex covariance matrix computing system based on iteration and reconfigurable mode comprises an on-chip SRAM (static random access memory), an off-chip DDR (double data rate) memory, a reconfigurable unit, a DMA (direct memory access) controller and an acceleration core, wherein the acceleration core is respectively in communication connection with the on-chip SRAM memory and the reconfigurable unit, the DMA controller is in communication connection with the reconfigurable unit, the off-chip DDR memory is in communication connection with the DMA controller through a bus, and the acceleration core comprises:
the matrix covariance operation module polls the source data of each area of the on-chip SRAM memory in an iterative computation mode and computes a lower triangular covariance matrix;
the conjugate symmetry module obtains a complete complex covariance matrix from the lower triangular covariance matrix in an address mapping and reconstruction storage mode according to the conjugate symmetry property of the covariance matrix to form a final operation result;
the DMA interface function module is used for storing data read from the off-chip DDR memory in a DMA mode into the on-chip SRAM memory in a partition mode; and writing the operation result back to the off-chip DDR memory in a DMA mode.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that the on-chip SRAM memory sets storage resources to be divided into k banks, the depth of each bank is d, if M banks are allocated for storing a lower triangular covariance matrix, covariance operation is carried out on a complex matrix with the size of M multiplied by N, and the requirement of M multiplied by N is met2Md is less than or equal to md, and N is an arbitrary value; if the calculation parallelism is b, the condition of defining the complex matrix to be solved as the decimal point is as follows: m2Bd is less than or equal to, if the condition is not met, the complex matrix to be solved is judged to be a large point number.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that if the complex matrix to be solved is a small point number, a one-dimensional data transmission mode of DMA is adopted; and if the complex matrix to be solved is a large number of points, adopting a DMA two-dimensional data transmission mode.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that the matrix covariance operation module stores original data according to a rule that one column of a complex matrix to be solved is stored in a single bank, all banks in a single time are stored in one column, the current column is divided into one zone, all banks in a single time are divided into one section, and the total size of a partition in a single time, the total times of the partition in a single time, the number of the zones in the last section and the number of the columns in the last zone are calculated.
The complex covariance matrix computing system based on the iteration and reconfigurable mode is further designed in that a zero filling mechanism is adopted if any area column number is not filled; if the source data storage exceeds the single maximum storage point number, ping-pong operation segmented processing is adopted, and if the source data storage exceeds the single maximum storage point number, batch processing is adopted.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that the matrix covariance operation module adopts the reconfigurable mode to construct a complex multiplication accumulation calculation unit, and iterative calculation is carried out on each segment, each partition and each bank temporary storage result data to obtain a lower triangular covariance matrix.
The complex covariance matrix calculation system based on the iteration and the reconfigurable mode is further designed in that the using number of complex multipliers and complex adders in the complex multiplication accumulation calculation unit is equal to the bank number of source data storage, b is set, and the number of input data of each calculation unit is as follows: 2b + 1; reading the Mth address of all the source data banks as the input A of a complex multiplier; and respectively reading the conjugate of the data of the 1 st to M th addresses of all the source data banks as the input B of the complex multiplier, wherein the other input data is the value of the corresponding storage address in the last storage result bank, and the calculation result needs to be written back to the same address in the same bank again every time.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that the conjugate symmetry module reads data from a bank where the lower triangular covariance matrix is located in sequence, analyzes rows and columns of the data in the lower triangular covariance matrix, and stores the data and conjugate symmetry data thereof into a multiplexed source data bank according to a rule that the rows and the columns of the matrix are stored in one bank until a complete complex covariance matrix is obtained.
The complex covariance matrix computing system based on the iteration and the reconfigurable mode is further designed in that when the acceleration core distributes data read from the DDR, the rows and the columns of the incoming data in the complex matrix to be solved are analyzed according to the addresses incoming by the DMA, and then the incoming data are stored in the corresponding source data bank according to the column distribution principle.
The complex covariance matrix calculation system based on iteration and reconfigurable mode is further designed in that the address mapping adopted by the conjugate symmetry module is as follows: and sequentially reading the data of the lower triangular covariance matrix from the result bank, resolving the row and the column of the original lower triangular covariance matrix according to the bank number of the data and the address in the bank, and storing the current data and the conjugate symmetric data thereof into the reconstructed bank according to the distribution principle that one column of the complete covariance matrix is stored in one bank.
The invention has the following advantages:
the complex covariance matrix calculation system based on the iteration and reconfigurable mode is designed in a parameterization mode, and a scheme with the maximum resource utilization rate is reconfigured for different storage and calculation resources to carry out complex covariance matrix operation. The method reduces the source data calculation amount, improves the resource utilization rate, accelerates the calculation speed of hardware realization, and provides a good reference for the design realization of the calculation covariance matrix in the field of signal processing.
Drawings
FIG. 1 is a hardware implementation architecture diagram of the computational complex covariance matrix of the present invention.
FIG. 2 is a diagram of a hardware implementation architecture for computing a complex covariance matrix in a conventional manner.
FIG. 3 is a schematic diagram of the arrangement of source data in the present invention.
FIG. 4 is a diagram illustrating a source data transport address mapping mechanism according to the present invention.
FIG. 5 is a flow chart of the storage of the dot source data in the present invention.
FIG. 6 is a schematic diagram of a computing unit design in the present invention.
FIG. 7 is a schematic diagram of a source data read of the computing process of the present invention.
FIG. 8 is a schematic diagram of a lower triangular matrix conjugate symmetric address mapping mechanism according to the present invention.
FIG. 9 is a comparison of performance evaluation for the computation of a complex covariance matrix according to the present invention and the conventional method.
Detailed Description
The following describes the present invention in detail with reference to the accompanying drawings.
Referring to fig. 1, the complex covariance matrix calculation system based on iterative and reconfigurable mode in this example mainly includes main modules such as DMA interface function, matrix covariance operation, and conjugate symmetry. The DMA interface function module is mainly responsible for: firstly, data read from an off-chip DDR memory (DDR) by a DMA mode is stored into an on-chip SRAM memory (SRAM) in a partitioning mode, and secondly, the final operation result is written back to the DDR by the DMA mode. The matrix covariance operation module is mainly responsible for polling source data of each region of the SRAM in an iterative computation mode to calculate a lower triangular covariance matrix. The conjugate symmetry module is mainly responsible for obtaining a complete complex covariance matrix by the lower triangular covariance matrix through address mapping and a reconstruction storage mode according to the conjugate symmetry property of the covariance matrix. The method supports the covariance operation of the complex matrix with any column number, reduces the source data calculation amount of the traditional hardware implementation mode and the time for writing back result data to DDR for many times, balances calculation and storage resources to realize maximized multipath parallelism, constructs the calculation unit by using a reconfigurable mode, and sets a specific address mapping rule to obtain the complex covariance matrix.
The on-chip SRAM memory sets storage resources to be divided into k banks, the depth of each bank is d, if M banks are allocated for storing a lower triangular covariance matrix, covariance operation is carried out on a complex matrix with the size of M multiplied by N, and the requirement of M is met2Md is less than or equal to md, and N is an arbitrary value; if the calculation parallelism is b, the condition of defining the complex matrix to be solved as the decimal point is as follows: m2Bd is less than or equal to, if the condition is not met, the complex matrix to be solved is judged to be a large point number.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that if the complex matrix to be solved is a small point number, a one-dimensional data transmission mode of DMA is adopted; and if the complex matrix to be solved is a large number of points, adopting a DMA two-dimensional data transmission mode.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that the matrix covariance operation module stores original data according to a rule that one column of a complex matrix to be solved is stored in a single bank, all banks in a single time are stored in one column, the current column is divided into one zone, all banks in a single time are divided into one section, and the total size of a partition in a single time, the total times of the partition in a single time, the number of the zones in the last section and the number of the columns in the last zone are calculated.
The complex covariance matrix computing system based on the iteration and reconfigurable mode is further designed in that a zero filling mechanism is adopted if any area column number is not filled; if the source data storage exceeds the single maximum storage point number, ping-pong operation segmented processing is adopted, and if the source data storage exceeds the single maximum storage point number, batch processing is adopted.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that the matrix covariance operation module adopts the reconfigurable mode to construct a complex multiplication accumulation calculation unit, and iterative calculation is carried out on each segment, each partition and each bank temporary storage result data to obtain a lower triangular covariance matrix.
The complex covariance matrix calculation system based on the iteration and the reconfigurable mode is further designed in that the using number of complex multipliers and complex adders in the complex multiplication accumulation calculation unit is equal to the bank number of source data storage, b is set, and the number of input data of each calculation unit is as follows: 2b + 1; reading the Mth address of all the source data banks as the input A of a complex multiplier; and respectively reading the conjugate of the data of the 1 st to M th addresses of all the source data banks as the input B of the complex multiplier, wherein the other input data is the value of the corresponding storage address in the last storage result bank, and the calculation result needs to be written back to the same address in the same bank again every time.
The complex covariance matrix calculation system based on the iteration and reconfigurable mode is further designed in that the conjugate symmetry module reads data from a bank where the lower triangular covariance matrix is located in sequence, analyzes rows and columns of the data in the lower triangular covariance matrix, and stores the data and conjugate symmetry data thereof into a multiplexed source data bank according to a rule that the rows and the columns of the matrix are stored in one bank until a complete complex covariance matrix is obtained.
The complex covariance matrix computing system based on the iteration and the reconfigurable mode is further designed in that when the acceleration core distributes data read from the DDR, the rows and the columns of the incoming data in the complex matrix to be solved are analyzed according to the addresses incoming by the DMA, and then the incoming data are stored in the corresponding source data bank according to the column distribution principle.
The complex covariance matrix calculation system based on iteration and reconfigurable mode is further designed in that the address mapping adopted by the conjugate symmetry module is as follows: and sequentially reading the data of the lower triangular covariance matrix from the result bank, resolving the row and the column of the original lower triangular covariance matrix according to the bank number of the data and the address in the bank, and storing the current data and the conjugate symmetric data thereof into the reconstructed bank according to the distribution principle that one column of the complete covariance matrix is stored in one bank.
The following is a detailed description of an example implementation of the invention, and a cycle accurate system level simulation model based on SystemC language is set up for verification.
The invention calculates a complex covariance matrix based on the hardware implementation architecture shown in FIG. 2, in the example, it is assumed that the matrix X is M × N (M is less than or equal to 256, N is less than or equal to 8K), and the generalized covariance result Y is M × M: (hereinafter, "%" indicates a modulo operation)
Figure BDA0001847970200000061
If the memory size is 2MB of SRAM as an example (divided into 32 banks, each bank depth is 8k), the data processing is carried out in a partitioning mode (large number of points and also segmentation) during hardware implementation, and the source data area is filled in a partitioning mode for multiple times according to columns. Considering supporting the ping-pong operation with large dot count, the generalized covariance result Y is 256 × 256 dot counts at most, i.e. the lower triangular matrix is 256 × 256 (256+1)/2 — 32896 dot counts at most, so at least 5 bank storages are required. Therefore, the available 32 banks remove 5 banks storing the calculation results, and all the remaining banks store the source data through the ping-pong operation, so that the number of banks storing the source data at a time is a floor [ (32-5)/2] ═ 13. (floor function: return the largest integer smaller than the parameter).
Since each bank is 8k deep and data is stored in each bank in a single row of M (M is 256 at most), B ═ floor [8k/M ] zones are defined, i.e. a banks of each zone can store a row data, and all banks can store C ═ B row data at most. (definition of large dot number: data size of complex matrix to be solved exceeds C x M)
Taking an M N matrix as an example (N ≦ C), the source data is arranged as shown in FIG. 3. If N > C, then according to the cycle, D ═ ceil (N/C) times are transmitted in total, and the ceil function: the smallest integer greater than or equal to the specified expression is returned. The number of banks occupied by the last zone is (N-1)% A +1, and the remaining banks of the zone are filled with 0.
The AXI bus data bit width adopted in the implementation of this example is 256 bits, and the DMA splits the AXI bus data into 4 data of 64 bits and transmits the data to the DMA interface function of the matrix covariance operation, and the function stores the data into the corresponding bank according to the address mapping mechanism, as shown in fig. 4. Therefore, when the large dot count is operated, the partition size of each section is preferably multiple of 4, so that the number of columns transmitted each time is just multiple of 4, 2 data cannot be written into the same bank at the same time, and the influence of delay memory on the overall operation speed is avoided; for the small dot count operation, since the DMA is transferred once and the number of columns is not necessarily exactly a multiple of 4, 2 data are written to the same bank at the same time, and at this time, the data need to be written in turn with a delay, as shown in fig. 5. For the last section of last area data transmission of large dot number operation, because the DMA adopts a two-dimensional data transmission mode, if the number of the remaining columns is not a multiple of 4, zero padding can be automatically carried out to be a multiple of 4, although zero padding columns are written into the banks, zero padding is just needed to be carried out on the last area, so that the method is not only irrelevant, but also shortens the number of zero padding columns and shortens the storage time of source data.
The reconfigurable computing unit needs to be reconfigured into a 27 complex data input and 5-level full-flow water complex multiplication accumulation unit, as shown in fig. 6, 13 complex multipliers and 13 complex adders are used in total.
According to the conjugate symmetry characteristic of the generalized covariance result, in order to shorten the operation time, only the lower triangle of the complex covariance matrix is calculated, and the upper triangle is expanded by adopting conjugate symmetry. The source data reading form in the calculation process is shown in fig. 7, and the specific operation steps are as follows:
1) the matrix is divided into D sections according to columns, 1 section of data is imported each time, and the source data bank is placed according to the form of a graph 3;
2) the following operation is performed for each area occupied by the current segment: (input Pre _ Region Data is Data of the corresponding position in the same area)
Reading the 1 st number from the A banks simultaneously as an input I1, and taking the conjugate of the 1 st number as an input I2;
reading the 2 nd number from the A banks simultaneously as an input I1, and sequentially reading the 1 st and 2 nd numbers to take a conjugate as an input I2;
reading the 3 rd number from the A banks simultaneously as an input I1, and sequentially reading the 1 st, 2 nd and 3 rd numbers to take conjugation as an input I2;
……;
reading the Mth number from the A banks simultaneously as an input I1, and sequentially reading the 1 st, the 2 nd and the … …, wherein the Mth number takes a conjugate as an input I2;
3) repeating the step 2) to finish the operation of all the areas of the current section;
4) repeating the step 2) and the step 3) to complete calculation of all the sections to obtain M (M +1)/2 results;
5) and expanding the M (M +1)/2 results into M results through conjugate symmetry and storing the M results into a new storage array again.
Since the source data storage area can be reused after the calculation is completed, a new storage array is constructed to store the covariance calculation results of M matrix/M matrix, in order to correspond to that the DMA transfers data with 256 bits each time, and in consideration of the convenience of taking out 4 data each time, the new storage array is planned as bank0-bank15, the calculated M matrix/M complex covariance matrix is stored in the new storage array in sequence according to columns, and a specific lower triangular matrix conjugate symmetric address mapping mechanism is shown in FIG. 8.
The performance of this example was evaluated as follows: 1) the number of the sections occupying the B areas is floor (N/C); 2) the number of zones occupied by the last stage is ceil ((N% C)/A); 3) taking the number of cycles of the input I1 as M during operation of each Region, and taking the number of cycles of the input I2 and the input Pre _ Region Data as M x (M +1)/2 in parallel; 4) the time of conjugate symmetric expansion of the lower triangular matrix is M (M + 1)/2; 5) the complex multiply accumulate unit calculates time as T1, and reads and stores data from the bank as T2 and T3, respectively. The total number of operation cycles of the complex covariance matrix is as follows: floor (N/C) [ (M +1)/2+ M) × B ] + ceil [ (N% C)/a ] × [ M × (M +1)/2+ M ] + M × (M +1)/2+ T1+ T2+ T3. While the conventional way calculates the number of cycles of the complex covariance matrix as follows: m N (M + 4)/4.
T1, T2, T3 are negligible because they are small relative to the total number of cycles. The performance pairs for the above 2 implementations are shown in fig. 9 when covariance operations are performed on complex matrices of different sizes. As is apparent from the figure, the cycle number of calculating the complex covariance matrix based on the iteration and the reconfigurable mode is greatly reduced compared with the traditional mode. The complex covariance matrix calculation system based on the iteration and reconfigurable mode supports complex matrixes with any column number to perform covariance calculation, reduces the source data calculation amount of the traditional hardware implementation mode and the time of writing back result data to DDR for multiple times, balances calculation and storage resources to realize maximized multipath parallelism, constructs a calculation unit by using the reconfigurable mode, sets a specific address mapping rule to obtain the complex covariance matrix, and greatly improves the resource utilization rate and the calculation speed of hardware implementation. As typical operation in the field of signal processing, the hardware implementation method has good reference significance and wide application prospect.

Claims (9)

1.一种基于迭代和可重构方式的复协方差矩阵计算系统,包括片上SRAM存储器、片外DDR存储器、可重构单元、DMA控制器以及加速核,所述加速核分别与片上SRAM存储器、可重构单元通信连接,DMA控制器与可重构单元通信连接,片外DDR存储器通过总线与DMA控制器通信连接,其特征在于所述加速核包括:1. A complex covariance matrix computing system based on iterative and reconfigurable mode, comprising on-chip SRAM memory, off-chip DDR memory, reconfigurable unit, DMA controller and acceleration core, the acceleration core and on-chip SRAM memory are respectively , the reconfigurable unit is communicated and connected, the DMA controller is communicated with the reconfigurable unit, and the off-chip DDR memory is communicated with the DMA controller through a bus, and is characterized in that the acceleration core includes: 矩阵协方差运算模块,通过迭代计算方式轮询片上SRAM存储器的各区域源数据,并计算出下三角协方差矩阵;The matrix covariance operation module polls the source data of each area of the on-chip SRAM memory by iterative calculation, and calculates the lower triangular covariance matrix; 共轭对称模块,根据协方差矩阵的共轭对称性质,将下三角协方差矩阵通过地址映射和重构存储的方式得出完整的复协方差矩阵,形成最终的运算结果;The conjugate symmetry module, according to the conjugate symmetry property of the covariance matrix, obtains a complete complex covariance matrix by means of address mapping and reconstruction storage of the lower triangular covariance matrix, and forms the final operation result; DMA接口函数模块,将通过DMA方式从片外DDR存储器读入的数据按分区方式存入片上SRAM存储器;并将所述运算结果通过DMA方式写回片外DDR存储器;The DMA interface function module stores the data read in from the off-chip DDR memory by DMA into the on-chip SRAM memory by partition; and writes the operation result back to the off-chip DDR memory by DMA; 所述矩阵协方差运算模块按照将待求复矩阵的一列存于单个bank、单次所有bank存入一列并将当前列划为一个区、单次所有bank划为一个段的规则将原数据进行存储,计算出单次分区总大小、分段总次数、最后一段的区数以及最后一个区的列数。The matrix covariance computing module processes the original data according to the rules of storing one column of the complex matrix to be obtained in a single bank, storing all banks in one column at a time, dividing the current column into one area, and dividing all banks into one segment at a time. Store, calculate the total size of a single partition, the total number of segmentations, the number of zones in the last segment, and the number of columns in the last zone. 2.根据权利要求1所述的基于迭代和可重构方式的复协方差矩阵计算系统,其特征在于所述片上SRAM存储器设定存储资源分为k个bank,每个bank的深度为d,若分配m个bank用于存储下三角协方差矩阵,针对大小为M×N的复矩阵进行协方差运算,满足M2≤md,N为任意值;若计算并行度为b,则划定待求复矩阵为小点数的条件为:M2≤bd,不满足该条件即判定待求复矩阵为大点数。2. the complex covariance matrix computing system based on iteration and reconfigurable mode according to claim 1, is characterized in that described on-chip SRAM memory setting storage resource is divided into k banks, and the depth of each bank is d, If m banks are allocated to store the lower triangular covariance matrix, the covariance operation is performed on a complex matrix of size M×N, satisfying M 2 ≤md, and N is an arbitrary value; if the calculation parallelism is b, the The condition for finding a complex matrix to be a small number of points is: M 2 ≤bd, if this condition is not satisfied, it is determined that the complex matrix to be found is a large number of points. 3.根据权利要求2所述的基于迭代和可重构方式的复协方差矩阵计算系统,其特征在于若待求复矩阵为小点数,则采用DMA的一维数据传输模式;若待求复矩阵为大点数则采用DMA的二维数据传输模式。3. the complex covariance matrix computing system based on iterative and reconfigurable mode according to claim 2 is characterized in that if the complex matrix to be sought is a small number of points, then the one-dimensional data transmission mode of DMA is adopted; If the matrix is large, the two-dimensional data transfer mode of DMA is adopted. 4.根据权利要求1所述的基于迭代和可重构方式的复协方差矩阵计算系统,其特征在于若任意区列数未填满,则采用补零机制;若源数据存储超过单次最大存储点数,则采用乒乓操作分段处理,若源数据存储超过单次最大存储点数,则采用批处理。4. The complex covariance matrix computing system based on iterative and reconfigurable mode according to claim 1, is characterized in that if the number of columns in any region is not filled up, then a zero-filling mechanism is adopted; if the source data storage exceeds a single maximum The number of storage points is processed in sections by ping-pong operation. If the source data storage exceeds the maximum number of storage points at a time, batch processing is used. 5.根据权利要求1所述的基于迭代和可重构方式的复协方差矩阵计算系统,其特征在于矩阵协方差运算模块采用可重构方式构建复乘累加计算单元,将各分段、各分区、各bank暂存结果数据进行迭代计算,得到下三角协方差矩阵。5. the complex covariance matrix computing system based on iterative and reconfigurable mode according to claim 1, is characterized in that the matrix covariance calculation module adopts reconfigurable mode to construct complex multiply-accumulate calculation unit, and each segment, each Partition and each bank temporarily store the result data for iterative calculation to obtain the lower triangular covariance matrix. 6.根据权利要求5所述的基于迭代和可重构方式的复协方差矩阵计算系统,其特征在于复乘累加计算单元中复乘法器和复加法器的使用个数等于源数据存储的bank数量,设定为b,每次计算单元输入数据的个数为:2b+1;读取所有源数据bank第M个地址作为复乘法器输入A;分别读取所有源数据bank第1~M个地址的数据的共轭作为复乘法器输入B,另一个输入数据为上一次存储结果bank中对应的存储地址的值,每次计算结果又需重新写回同一bank中的同一地址。6. the complex covariance matrix computing system based on iterative and reconfigurable mode according to claim 5 is characterized in that in the complex multiplying and accumulating computing unit, the number of use of complex multiplier and complex adder is equal to the bank of source data storage The number is set to b, and the number of input data of each calculation unit is: 2b+1; read the Mth address of all source data banks as input A of the complex multiplier; read all source data banks 1st to Mth respectively The conjugate of the data of one address is input to B as a complex multiplier, and the other input data is the value of the corresponding storage address in the bank of the previous storage result, and each calculation result needs to be rewritten back to the same address in the same bank. 7.根据权利要求1所述的基于迭代和可重构方式的复协方差矩阵计算系统,其特征在于所述共轭对称模块依次从下三角协方差矩阵所在bank读取数据,解析出数据在下三角矩阵中的行和列,再按照矩阵一列存于一个bank的规则,将该数据及其共轭对称数据存入复用的源数据bank中,直到得出完整的复协方差矩阵。7. the complex covariance matrix computing system based on iterative and reconfigurable mode according to claim 1, is characterized in that described conjugate symmetry module reads data successively from the bank where the lower triangular covariance matrix is located, and parses out that data is in the following For the rows and columns in the triangular matrix, according to the rule that one column of the matrix is stored in a bank, the data and its conjugate symmetric data are stored in the multiplexed source data bank until a complete complex covariance matrix is obtained. 8.根据权利要求1所述的基于迭代和可重构方式的复协方差矩阵计算系统,其特征在于所述加速核分配从DDR中读取的数据时,根据DMA传入的地址解析出传入的数据在待求复矩阵中的行和列,再根据按列分配原则存入相应源数据bank中。8. the complex covariance matrix computing system based on iterative and reconfigurable mode according to claim 1, is characterized in that when the described acceleration core distributes the data read from DDR, according to the address that DMA passes in, resolves outgoing transmission. The input data is stored in the corresponding source data bank according to the row and column of the complex matrix to be calculated. 9.根据权利要求1所述的基于迭代和可重构方式的复协方差矩阵计算系统,其特征在于共轭对称模块在采用的地址映射为:从结果bank中依次读取下三角协方差矩阵的数据,根据该数据所在的bank编号和bank中的地址解析出所在原下三角矩阵的行和列,再按完整协方差矩阵的一列存于一个bank的分配原则将当前数据和其共轭对称数据存入重构的bank中。9. the complex covariance matrix computing system based on iterative and reconfigurable mode according to claim 1, it is characterized in that the address mapping that the conjugated symmetric module adopts is: read lower triangular covariance matrix successively from the result bank According to the bank number of the data and the address in the bank, the row and column of the original lower triangular matrix are parsed out, and then the current data and its conjugate are symmetric according to the allocation principle that one column of the complete covariance matrix is stored in a bank The data is stored in the reconstructed bank.
CN201811284263.4A 2018-10-30 2018-10-30 Complex covariance matrix calculation system based on iteration and reconfigurable mode Active CN109446478B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811284263.4A CN109446478B (en) 2018-10-30 2018-10-30 Complex covariance matrix calculation system based on iteration and reconfigurable mode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811284263.4A CN109446478B (en) 2018-10-30 2018-10-30 Complex covariance matrix calculation system based on iteration and reconfigurable mode

Publications (2)

Publication Number Publication Date
CN109446478A CN109446478A (en) 2019-03-08
CN109446478B true CN109446478B (en) 2021-09-28

Family

ID=65550425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811284263.4A Active CN109446478B (en) 2018-10-30 2018-10-30 Complex covariance matrix calculation system based on iteration and reconfigurable mode

Country Status (1)

Country Link
CN (1) CN109446478B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614582B (en) * 2018-11-06 2020-08-11 海南大学 Lower triangular partial storage device and parallel reading method of self-conjugate matrix
CN111045965B (en) * 2019-10-25 2021-06-04 南京大学 A kind of hardware implementation method of multi-channel conflict-free splitting and computer device and readable storage medium for running the method
CN111723336B (en) * 2020-06-01 2023-01-24 南京大学 A Cholesky Decomposition-Based Arbitrary Order Matrix Inversion Hardware Acceleration System Using Loop Iteration
CN119441089A (en) * 2024-11-04 2025-02-14 南京大学 A data transmission path suitable for FFT algorithm
CN119441131A (en) * 2024-11-04 2025-02-14 南京大学 A dynamic reconstruction method for two-dimensional memory access controller

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002037259A1 (en) * 2000-11-01 2002-05-10 Bops, Inc. Methods and apparatus for efficient complex long multiplication and covariance matrix implementation
EP1215507A2 (en) * 2000-12-12 2002-06-19 Matsushita Electric Industrial Co., Ltd. Radio-wave arrival-direction estimating apparatus and directional variable transceiver
CN101211333A (en) * 2006-12-30 2008-07-02 北京邮电大学 A signal processing method, device and system
CN103685110A (en) * 2013-12-17 2014-03-26 京信通信系统(中国)有限公司 Predistortion processing method and system and predistortion factor arithmetic unit
CN105426345A (en) * 2015-12-25 2016-03-23 南京大学 Matrix inverse operation method
CN105630735A (en) * 2015-12-25 2016-06-01 南京大学 Coprocessor based on reconfigurable computational array
CN105893333A (en) * 2016-03-25 2016-08-24 合肥工业大学 Hardware circuit for calculating covariance matrix in MUSIC algorithm
US9686069B2 (en) * 2015-05-22 2017-06-20 ZTE Canada Inc. Adaptive MIMO signal demodulation using determinant of covariance matrix

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002037259A1 (en) * 2000-11-01 2002-05-10 Bops, Inc. Methods and apparatus for efficient complex long multiplication and covariance matrix implementation
EP1215507A2 (en) * 2000-12-12 2002-06-19 Matsushita Electric Industrial Co., Ltd. Radio-wave arrival-direction estimating apparatus and directional variable transceiver
CN101211333A (en) * 2006-12-30 2008-07-02 北京邮电大学 A signal processing method, device and system
CN103685110A (en) * 2013-12-17 2014-03-26 京信通信系统(中国)有限公司 Predistortion processing method and system and predistortion factor arithmetic unit
US9686069B2 (en) * 2015-05-22 2017-06-20 ZTE Canada Inc. Adaptive MIMO signal demodulation using determinant of covariance matrix
CN105426345A (en) * 2015-12-25 2016-03-23 南京大学 Matrix inverse operation method
CN105630735A (en) * 2015-12-25 2016-06-01 南京大学 Coprocessor based on reconfigurable computational array
CN105893333A (en) * 2016-03-25 2016-08-24 合肥工业大学 Hardware circuit for calculating covariance matrix in MUSIC algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种高精度的大点数二维FFT处理器设计;于东等;《现代雷达》;20160531;第38卷(第5期);第16-21页 *
基于数据阵共辘重构的MUSIC角估计算法;何子述等;《电子科技大学学报》;19990430;第28卷(第2期);第111-115页 *

Also Published As

Publication number Publication date
CN109446478A (en) 2019-03-08

Similar Documents

Publication Publication Date Title
CN109446478B (en) Complex covariance matrix calculation system based on iteration and reconfigurable mode
CN106875011B (en) Hardware architecture of binary weight convolution neural network accelerator and calculation flow thereof
Yuan et al. High performance CNN accelerators based on hardware and algorithm co-optimization
US11645529B2 (en) Sparsifying neural network models
CN110097174B (en) Method, system and device for realizing convolutional neural network based on FPGA and row output priority
JP7007488B2 (en) Hardware-based pooling system and method
CN111667051A (en) Neural network accelerator suitable for edge equipment and neural network acceleration calculation method
CN108416434A (en) The circuit structure accelerated with full articulamentum for the convolutional layer of neural network
US12056459B2 (en) Compute in memory architecture and dataflows for depth-wise separable convolution
CN110766128A (en) Convolution calculation unit, calculation method and neural network calculation platform
US20220188613A1 (en) Sgcnax: a scalable graph convolutional neural network accelerator with workload balancing
CN112306555B (en) Method, device, apparatus and computer-readable storage medium for extracting image data from multiple convolution windows in parallel
US20230025068A1 (en) Hybrid machine learning architecture with neural processing unit and compute-in-memory processing elements
US12340304B2 (en) Partial sum management and reconfigurable systolic flow architectures for in-memory computation
KR20230081697A (en) Method and apparatus for accelerating dilatational convolution calculation
US20230065725A1 (en) Parallel depth-wise processing architectures for neural networks
Niu et al. SPEC2: Spectral sparse CNN accelerator on FPGAs
JP2024532682A (en) In-memory digital computation
US11580191B1 (en) Method and system for convolution
CN111610963A (en) Chip structure and multiply-add calculation engine thereof
JP7642919B2 (en) An Activation Buffer Architecture for Data Reuse in Neural Network Accelerators
Wang et al. An FPGA-based reconfigurable CNN training accelerator using decomposable Winograd
US12423377B2 (en) Computation in memory architecture for phased depth-wise convolutional
CN115033843A (en) Circuit implementation method for covariance matrix calculation based on triangular pulse array
Chung et al. Energy efficient CNN inference accelerator using fast fourier transform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant