[go: up one dir, main page]

CN120602650A - JPEG image compression system and method based on FPGA double-matrix sharing assembly line - Google Patents

JPEG image compression system and method based on FPGA double-matrix sharing assembly line

Info

Publication number
CN120602650A
CN120602650A CN202511109247.1A CN202511109247A CN120602650A CN 120602650 A CN120602650 A CN 120602650A CN 202511109247 A CN202511109247 A CN 202511109247A CN 120602650 A CN120602650 A CN 120602650A
Authority
CN
China
Prior art keywords
data
matrix
row
component
dual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202511109247.1A
Other languages
Chinese (zh)
Other versions
CN120602650B (en
Inventor
欧洋
李洪威
张俊佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongxing Times Technology Co ltd
Original Assignee
Beijing Zhongxing Times Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongxing Times Technology Co ltd filed Critical Beijing Zhongxing Times Technology Co ltd
Priority to CN202511109247.1A priority Critical patent/CN120602650B/en
Priority claimed from CN202511109247.1A external-priority patent/CN120602650B/en
Publication of CN120602650A publication Critical patent/CN120602650A/en
Application granted granted Critical
Publication of CN120602650B publication Critical patent/CN120602650B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本发明公开了一种基于FPGA双矩阵共享流水线的JPEG图像压缩系统及方法,JPEG图像压缩系统包括YUV420预处理模块、双8行矩阵分割存储模块、共享流水线时序调度模块、共享处理模块以及合并编码模块,本发明还提出一种基于FPGA双矩阵共享流水线的JPEG图像压缩方法。本发明的技术方案能够突破传统独立模块架构,通过双8行矩阵分割与时序对齐共享流水线设计,实现YUV三分量对DCT变换、量化、ZigZag扫描及Huffman编码等核心模块的高效分时复用,通过优化数据组织、处理流程与时序控制,形成了一套完整高效的JPEG编码系统,显著降低了计算复杂度和数据搬运开销,特别适合于资源受限的嵌入式图像压缩应用场景。

The present invention discloses a JPEG image compression system and method based on an FPGA dual-matrix shared pipeline. The JPEG image compression system includes a YUV420 preprocessing module, a dual 8-row matrix segmentation storage module, a shared pipeline timing scheduling module, a shared processing module, and a merge encoding module. The present invention also proposes a JPEG image compression method based on an FPGA dual-matrix shared pipeline. The technical solution of the present invention can break through the traditional independent module architecture. Through the dual 8-row matrix segmentation and timing alignment shared pipeline design, it realizes efficient time-sharing multiplexing of core modules such as DCT transformation, quantization, ZigZag scanning, and Huffman encoding for the three YUV components. By optimizing data organization, processing flow, and timing control, a complete and efficient JPEG encoding system is formed, which significantly reduces computational complexity and data handling overhead. It is particularly suitable for resource-constrained embedded image compression application scenarios.

Description

JPEG image compression system and method based on FPGA double-matrix sharing assembly line
Technical Field
The invention relates to the technical field of image processing and data compression, in particular to a JPEG image compression system and method based on an FPGA double-matrix sharing pipeline.
Background
Existing image compression algorithms are mainly divided into two main categories, namely lossless compression and lossy compression. Lossless compression refers to the pure reliance on encoding of the data stream for statistical properties without losing any information of the image, but such compression is often difficult to meet the high compression ratio requirements. Lossy compression refers to the loss of certain redundant information or insensitive information generated in the compression process, and the subsequent image processing is not affected excessively, but a high compression ratio can be obtained. The JPEG compression algorithm is a lossy compression algorithm which is most widely applied, has the characteristics of high compression efficiency and simple and easy realization of the algorithm, and is widely applied to the fields of digital photography, network transmission, remote sensing communication and the like.
At present, in the existing JPEG image compression scheme based on the FPGA, a component independent processing architecture is generally adopted for processing a YUV color space, wherein a brightness component (Y) and a chromaticity component (U, V) are respectively processed through an independent DCT module, a Zig-Zag scanning module, a quantization module and a Huffman coding module. This architecture suffers from the following significant drawbacks:
And the hardware resource redundancy is that each component needs to be configured with a complete processing module group, taking 8-bit image data as an example, a single DCT module needs 64 multipliers and 192 adders, and the independent configuration of the three components can lead to the direct multiplication of the resource occupation amount by 3, thereby seriously increasing the consumption of an FPGA logic unit (LE), a storage unit (RAM) and a multiplier (DSP).
The coding module is isolated, the Huffman coding is used as an entropy coding core link, the statistical characteristic association of YUV component code streams is not considered in the existing scheme, and the independent coding leads to repeated design of code table cache (3 sets of independent code tables are required to be stored) and coding control logic, so that resources are further wasted.
Disclosure of Invention
The invention mainly aims to provide a JPEG image compression system and a JPEG image compression system based on an FPGA double-matrix shared pipeline, which aim at realizing high-efficiency time-sharing multiplexing of Y, U, V three-component pair Discrete Cosine Transform (DCT), zig-Zag scanning, quantization, huffman coding and other core modules by aiming at image data compression processing under YUV color space through a double-8-row matrix segmentation architecture and time sequence alignment shared pipeline design, and solve the technical problem of hardware resource redundancy in the traditional scheme.
In order to achieve the above object, the JPEG image compression system based on the FPGA double-matrix shared pipeline according to the present invention includes:
The YUV420 preprocessing module is used for converting YUV444 format data into YUV420 format data, completely reserving a Y component, performing 2:1 horizontal/vertical downsampling on a UV component, and generating a Y component data stream and a UV component data stream subjected to 2:1 horizontal/vertical downsampling;
The double 8-row matrix segmentation storage module is used for alternately storing the Y component data streams into Y1 and Y2 cache areas according to rows to form a double 8-row matrix structure, synchronously storing the UV component data streams to obtain Y1, Y2 and U, V data blocks, and realizing Y1/Y2 double matrix segmentation of the Y component and 8-row alignment storage of the UV component;
the shared pipeline time sequence scheduling module is used for reading Y1, Y2 and U, V data blocks from the Y1 and Y2 buffer areas according to the sequence of Y1-Y2-U-V, generating a time sequence control signal of a shared processing pipeline, realizing pipeline type continuous processing and converting parallel multi-component data into serial data streams;
the sharing processing module is used for performing DCT conversion, quantization, zigZag scanning, run length coding and Huffman coding processing on the Y1, Y2 and U, V data blocks in a time-sharing multiplexing mode according to the time sequence control signals;
and the merging and encoding module is used for splicing the encoded output into a continuous bit stream, generating output according to byte alignment, realizing the formatting integration of encoded data and generating a compressed data stream conforming to the JPEG standard.
Optionally, the dual 8-row matrix split storage module alternately generates 28×8Y 1 and Y2 data blocks each time 16 rows of Y component data are received by a row count control and split counter, the UV component data stream is buffered according to 8 rows depth, forms a standard input group of 68×8 data blocks with the Y1 and Y2 data blocks, and provides a standardized 8×8 data block for a shared pipeline.
Optionally, the shared pipeline timing scheduling module adopts a three-stage state machine, and the three-stage state machine includes:
an IDLE state, namely when the data quantity of the UV component FIFO is detected to be more than or equal to 8, jumping to a read enabling generation state;
GEN_RD_EN state, namely generating a read enable signal according to the sequence of Y1- & gt Y2- & gt U- & gt V through a 48-period counter;
and in the WAIT_BACK state, monitoring a rear end FIFO empty mark to ensure that a shared pipeline has no backlog.
Optionally, the sharing processing module includes:
The shared two-dimensional DCT conversion module is used for decomposing the two-dimensional DCT into row conversion and column conversion through a row-column separation algorithm, realizing 8-point DCT calculation through a 3-level butterfly network by adopting a Loeffler algorithm, and converting a floating point coefficient of a cosine base function into a 16-bit fixed point number;
The dynamic quantization module is used for executing non-uniform quantization on DCT coefficients based on the time sequence control signals, dynamically switching the brightness/chromaticity quantization table and reserving human eye sensitive low-frequency information;
The Zigzag pipeline scanning module is used for rearranging the quantized 8 multiplied by 8 quantization coefficient matrix into a one-dimensional sequence according to the Zigzag path and performing DPCM differential encoding on the DC coefficient;
The shared run length coding module is used for compressing the one-dimensional sequence scanned by the ZigZag, executing zero run length coding and reducing the data quantity;
The four-table sharing Huffman coding module is used for carrying out variable length coding on Y, U, V component data DC/AC four sets of independent code tables after run length coding, and distributing codes with different lengths according to the occurrence probability of the data so as to realize lossless compression.
Optionally, the dynamic quantization module dynamically switches the luminance/chrominance quantization table according to a 48-period counter, wherein:
The counter processes Y1 and Y2 data blocks in 0-31 period and uses brightness quantization table;
The counter processes U, V blocks of data at 32-47 cycles, using the chroma quantization table.
On the other hand, the invention also provides a JPEG image compression method based on the FPGA double-matrix sharing pipeline, which is performed by adopting the JPEG image compression system based on the FPGA double-matrix sharing pipeline, and comprises the following steps:
converting YUV444 format data into YUV420 format, and generating Y component data stream and UV component data stream subjected to 2:1 horizontal/vertical downsampling;
The Y component data stream is alternately stored in Y1 and Y2 buffer areas according to rows to form a double-8-row matrix structure, UV component data streams are synchronously stored, Y1/Y2 double-matrix segmentation of the Y component and 8-row aligned storage of the UV component are realized, and standardized 8X 8 data blocks are provided for a shared pipeline;
Reading data blocks from the Y1 and Y2 buffer areas according to the sequence of Y1, Y2, U and V, generating a time sequence control signal of a shared processing pipeline, and outputting according to the sequence of Y1, Y2, U and V to realize pipeline type continuous processing;
Based on the time sequence control signal, performing DCT conversion, quantization, zigZag scanning, run length coding and Huffman coding processing of Y1, Y2 and U, V data blocks in a time-sharing multiplexing mode;
The encoded outputs are spliced into a continuous bit stream, generating a compressed data stream conforming to the JPEG standard.
Optionally, the converting YUV444 format data into YUV420 format, generating the Y component data stream and the UV component data stream subjected to 2:1 horizontal/vertical downsampling includes the following steps:
constructing a UV component cache with 2 lines of depth to form a 2X 2 pixel window matrix;
the average value calculation is to sum 4 UV values in a window and realize high-efficiency average calculation by right shift of 2 bits;
And (3) data alignment output, namely generating a YUV420 format data stream, wherein the Y resolution is 2 multiplied by UV, and matching the double 8-row matrix segmentation requirement.
Optionally, storing the Y component data stream alternately in rows in Y1 and Y2 buffers to form a dual 8-row matrix structure, including the steps of:
by row counting control, every 16 rows of Y component data are received, 2Y 1 and Y2 data blocks of 8 multiplied by 8 are alternately generated;
the UV component data stream is buffered at 8 line depth and forms a standard input set of 6 8 x 8 data blocks with the Y1/Y2 data blocks.
Optionally, the time-division multiplexing performs DCT transform, quantization, zigZag scanning, run-length encoding and Huffman encoding processing on the Y1, Y2, U, V data blocks based on the timing control signal, including the steps of:
Based on the 48-period counter value, the luminance quantization table is used when processing Y1, Y2 data blocks in 0-31 periods, and the chrominance quantization table is used when processing U, V data blocks in 32-47 periods;
huffman coding is performed on the different component data based on four sets of independent code tables of the Y/UV component DC/AC.
Optionally, the performing the DCT transform of the Y1, Y2, U, V data blocks based on the timing control signal by time-division multiplexing includes the steps of:
Line-column separation, namely decomposing the two-dimensional DCT into 8 times of one-dimensional line transformation and 8 times of one-dimensional column transformation, namely:
Wherein f (x, y) is a spatial domain 8×8 pixel value, C i(x)、Cj (y) is a cosine basis function, a one-dimensional DCT unit is called for each row of an 8×8 image block to generate an intermediate frequency domain matrix, row-column dimension interchange is performed on a row conversion result through a dual-port BRAM, the row-column dimension interchange is converted into a column data format, and then a one-dimensional DCT unit is called for the transposed column data to output a complete 8×8 frequency domain coefficient matrix;
The butterfly operation is that the Loeffler algorithm is adopted, 8-point DCT calculation is completed through a 3-level butterfly network, each level only needs 4 times of multiplication and 8 times of addition, and the matrix multiplication is disassembled into the iterative operation of addition, subtraction and a small amount of multiplication by utilizing cosine function symmetry;
realizing fixed point: and converting the floating point coefficient of the cosine base function into a 16-bit fixed point number.
The technical scheme of the invention has the advantages that the technical scheme breaks through the traditional independent module architecture, the design of sharing the pipeline by double 8-row matrix segmentation and time sequence alignment is realized, the efficient time-sharing multiplexing of YUV three-component to DCT conversion, quantization, zigZag scanning, huffman coding and other core modules is realized, compared with the traditional three-channel architecture, the computing circuit reduces more than 50% of logic units and DSP resources, the storage system does not need to buffer three-component intermediate results at the same time, the on-chip SRAM requirement is reduced by 30%, the power consumption is controlled, the time-sharing multiplexing mechanism reduces the average power consumption by 40%, the data bus bandwidth requirement by 30%, the realization complexity is optimized, the data path is simplified, the FPGA/ASIC wiring difficulty is reduced, the development period is shortened by about 25%, and the invention forms a complete and efficient coding system by optimizing the data organization, the processing flow and the time sequence control, thereby remarkably reducing the JPEG computation complexity and the data handling overhead, and being particularly suitable for the embedded image compression application scene with limited resources.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an overall module frame structure of a JPEG image compression system based on an FPGA double-matrix sharing pipeline according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a partial module frame structure of a JPEG image compression system based on an FPGA double-matrix sharing pipeline according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating the operation of JPEG image compression in a JPEG image compression system based on FPGA double matrix shared pipeline according to an embodiment of the present invention;
FIG. 4 is a flowchart of a YUV data storage module of a JPEG image compression system based on an FPGA double matrix shared pipeline according to an embodiment of the present invention;
FIG. 5 is a flow chart of a shared pipeline timing scheduling module state machine of a JPEG image compression system based on an FPGA double matrix shared pipeline according to an embodiment of the invention.
FIG. 6 is a ZigZag path diagram of a JPEG image compression system based on an FPGA double matrix shared pipeline according to an embodiment of the present invention;
fig. 7 is a schematic flow chart of a JPEG image compression method based on an FPGA dual matrix shared pipeline according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that all directional indicators (such as up, down, left, right, front, and rear are used in the embodiments of the present invention) are merely for explaining the relative positional relationship, movement conditions, and the like between the components in a certain specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicators are changed accordingly.
In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
The invention provides a JPEG image compression system and method based on an FPGA double-matrix sharing pipeline.
As shown in fig. 1 to 6, in an embodiment of the present invention, the JPEG image compression system based on the FPGA dual matrix shared pipeline includes:
the YUV420 preprocessing module 101 is configured to convert YUV444 format data into YUV420 format data, completely reserve a Y component, perform 2:1 horizontal/vertical downsampling on a UV component, and generate a Y component data stream and a UV component data stream subjected to 2:1 horizontal/vertical downsampling;
The dual 8-row matrix segmentation storage module 102 is used for alternately storing Y component data streams into Y1 and Y2 cache areas according to rows to form a dual 8-row matrix structure, synchronously storing UV component data streams to obtain Y1, Y2 and U, V data blocks, and realizing Y1/Y2 dual matrix segmentation of Y components and 8-row aligned storage of UV components;
The shared pipeline time sequence scheduling module 103 is used for reading Y1, Y2 and U, V data blocks from the Y1 and Y2 buffer areas according to the sequence of Y1, Y2, U and V, generating a time sequence control signal of a shared processing pipeline, realizing pipeline type continuous processing and converting parallel multicomponent data into serial data streams;
The sharing processing module 104 is configured to perform DCT transform, quantization, zigZag scanning, run-length encoding, and Huffman encoding processing on the Y1, Y2, U, V data blocks in a time-division multiplexing manner according to the timing control signal;
And the merging and encoding module 105 is used for splicing the encoded output into a continuous bit stream, generating output according to byte alignment, realizing the formatting integration of encoded data and generating a compressed data stream conforming to the JPEG standard.
The JPEG image compression system based on the FPGA double-matrix sharing pipeline constructs a JPEG compression architecture taking double-8-row matrix segmentation and time sequence alignment sharing pipeline as a core, and realizes efficient compression processing of YUV three components through cooperation of 9 large modules, wherein the JPEG image compression flow is shown in figure 3.
Specifically, the YUV420 preprocessing module 101 (dual matrix preparation) is configured to implement format conversion from YUV444 to YUV420, completely reserve the Y component, perform 2:1 horizontal/vertical downsampling on the UV component, and use 2×2 block mean filtering to ensure chroma smoothing, and provide preprocessed data for dual 8-row matrix segmentation, where the implementation process specifically includes the following steps:
(1) Constructing a UV component cache with 2 lines of depth to form a 2X 2 pixel window matrix;
(2) The average value calculation is that 4 UV values in a window are summed, and high-efficiency average calculation is realized by right shift by 2 bits (fixed-point division);
(3) And (3) data alignment output, namely generating a YUV420 format data stream, wherein the Y resolution is 2 multiplied by UV, and matching the double 8-row matrix segmentation requirement.
Specifically, the dual 8-row matrix partition storage module 102 realizes the Y1/Y2 dual matrix partition of the Y component and the 8-row aligned storage of the UV component through independent caching, row counting judgment, write enabling triggering and FIFO writing control, provides a standardized 8×8 data block for the shared pipeline, supports links such as subsequent encoding, algorithm processing and the like to operate efficiently, and the implemented flowchart is shown in fig. 4, and specifically realizes the following steps:
(1) Parallel channel input Y, U, V component data is continuously input through independent channels. When i_de_Y is valid, writing i_data_Y into a Y cache 8X 8 sliding window according to rows, and similarly writing data into a U/V cache 8X 8 sliding window through i_de_U/i_data_ U, i _de_V/i_data_V;
(2) The row count and write enable control is that the Y component triggers the row count by the i_de_Y rising edge and the 3-bit counter row_cnt_Y cycles through 0-7. When row_cnt_y=7, o_de_y= i_de_y, the counter automatically resets to zero into the next row count period. The UV component is the same as the Y component, outputting o_de_u=i_de_u, o_de_v=i_de_v when the row counts to 7;
(3) Y component double buffer discrimination, namely row counting the falling edge of o_de_Y, and Y1/Y2 switching is realized by using a 1-bit counter row_cnt_Y (0/1 cycle). When row_cnt_y=0, o_de_y1=o_de_y, and when row_cnt_y=1, o_de_y2=o_de_y, so that the first 8 rows Y and the last 8 rows Y are respectively written into corresponding FIFOs, and a double 8-row matrix splitting strategy is realized;
(4) The data bit width and transmission period are that the single pixel bit width is 8 bits/pixel, the YUV420 standard is met, the bus bit width is 64 bits (8 pixels×8 bits/pixel), 1 row of data is transmitted in each clock period, and 18×8 data block writing is completed in 8 periods.
Specifically, the shared pipeline timing scheduling module 103 designs a special timing scheduling module according to the YUV42016 ×16 matrix operation requirement in the JPEG compression process, converts parallel multi-component data (Y1/Y2/U/V matrix) into a serial data stream, and outputs the serial data stream according to the sequence of Y1→y1→y2→y2→u→v, so as to realize pipeline type continuous processing, eliminate the data waiting bottleneck, and ensure the orderly and efficient processing of each component data.
The specific implementation flow is that a three-state machine (IDLE→GEN_RD_EN→WAIT_BACK) is adopted to realize the function, the state machine flow is shown in the figure 5, and the specific flow steps are as follows:
(1) Initial state (IDLE)
The function is to wait for data ready, ensuring that 8 x 8 data is stored in both the UV component and Y component FIFOs.
Triggering condition is that the Y component data amount is 4 times of U/V due to YUV420 sampling characteristic, and the UV component writing FIFO speed is slower than the Y component. Only the amount of data in the U-component FIFO needs to be detected, and when the amount of data in the U-component FIFO is equal to or greater than 8, the jump is made to the read enable generation state (gen_rd_en).
(2) Read enable generation state (GEN_RD_EN)
The function is that each component reading enabling signal is generated according to the sequence of Y1-Y2-U-V, so that data sequence control is realized, the 16X 16 matrix processing flow in JPEG compression is strictly matched, each 8X 8 data block is ensured to be completely read, and data dislocation is avoided.
Trigger condition-6 8×8 blocks (y1×2+y2× 2+U × 1+V ×1) need to be completed for a single read, 384 pixels in total. If the data is transmitted according to 64bit bus plus 8 cycles/block, the data is counted by cnt to be 0-47 because the data is transmitted according to 6 blocks with 48 cycles. When cnt= 47 (48 cycle read completed), the trigger state jumps to wait_back waiting for the backend FIFO to empty. The specific cnt range and read enable control relationship is as follows:
(3) Waiting for a backend FIFO empty state (wait_back)
The flow control mechanism is used for preventing the back-end FIFO from overflowing, avoiding data backlog caused by low back-end processing speed and ensuring continuous and smooth operation of the assembly line.
The triggering condition is to monitor the empty flag (empty_back) of the back-end FIFO, and return to the initial state to prepare for the next round of reading only when the back-end has enough space.
Specifically, the shared processing module 104 includes a shared two-dimensional DCT transformation module 1041, a dynamic quantization module 1042, a ZigZag pipeline scanning module 1043, a shared run-length encoding module 1044, and a four-table shared Huffman encoding module 1045.
Specifically, the shared two-dimensional DCT transform module 1041 is configured to implement two-dimensional discrete cosine transform (2D-DCT) of an 8×8 image block, convert spatial domain pixels into frequency domain coefficients, and implement energy concentration, parallel computation, and resource optimization through a line-column separation algorithm, a Loeffler algorithm, and a fixed-point process. The specific implementation flow steps are as follows:
(1) The row-column separation algorithm is to decompose the two-dimensional DCT into 8 times of one-dimensional row transformation and 8 times of one-dimensional column transformation, namely:
Where f (x, y) is a spatial domain 8×8 pixel value, and Ci (x), cj (y) are cosine basis functions.
Transferring each row of the 8X 8 image block to a one-dimensional DCT unit to generate an intermediate frequency domain matrix, exchanging row and column dimensions of a row transformation result through a dual-port BRAM, converting the row transformation result into a column data format, transferring the transposed column data to the one-dimensional DCT unit, and outputting a complete 8X 8 frequency domain coefficient matrix;
(2) And performing butterfly operation, namely completing 8-point DCT calculation through a 3-level butterfly network by adopting a Loeffler algorithm. Only 4 multiplications and 8 additions are needed in each stage, the matrix multiplication is disassembled into the iterative operation of addition, subtraction and a small amount of multiplications by utilizing cosine function symmetry, the calculated amount is reduced, the parallel pipeline characteristics of the FPGA are matched, and the processing throughput is improved;
(3) The fixed-point implementation is realized by converting a cosine base function floating-point coefficient into a 16-bit fixed-point number in order to avoid high resource consumption of floating-point operation, wherein the fixed-point number 0xB5 (decimal 181) is obtained by approximating a floating-point coefficient C (0) = 0.3536 through 0.3536 × 2^9 = 181.0432.
Specifically, the dynamic quantization module 1042 (shared table switching) is used for performing non-uniform quantization on the DCT coefficients, dynamically switching the luminance/chrominance quantization table according to the 48-period counter, and retaining the human eye sensitive low frequency information. The specific implementation flow steps are as follows:
(1) Quantization table storage and reading each quantization table uses an 8 x8 bit wide data store containing luminance component (Y) and chrominance component (U/V) quantization tables. A column priority parallel processing architecture is adopted, one column of data (8 elements) of a quantization table is read in each clock period, element-by-element fixed-point division operation (equivalent to multiplication of the reciprocal of the quantization table) is carried out on the column corresponding to the DCT coefficient matrix, and the quantization operation of the 8X 8 matrix is completed in 8 periods;
(2) And the quantization table switching mechanism is that the system uses a 0-47 cycle counter as a time sequence reference and is synchronous with the processing rhythm of the DCT module. An 8 x 8 block is processed for each 8 counts, and either the luminance table (Y) or the chrominance table (UV) is dynamically selected according to the counter value:
Counter = 0-7-the first 8 x 8 block of Y1 component is processed, using a luminance table.
Counter = 8-15-the second 8 x 8 block of Y1 component is processed, using a luminance table.
Counter = 16-23-the first 8 x 8 block of Y2 component is processed, using a luminance table.
Counter = 24-31-the second 8 x 8 block of Y2 component is processed, using a luminance table.
Counter = 32-39-the first 8 x 8 block of U components is processed, using a chroma table.
Counter = 40-47-the second 8 x 8 block of V components is processed, using a chroma table.
Specifically, the Zigzag pipeline scanning module 1043 is configured to implement Zigzag rearrangement, DPCM differential encoding, and data pipeline processing of an 8×8 quantization coefficient matrix, convert a two-dimensional matrix into a one-dimensional sequence, reduce inter-block redundancy, and ensure continuous output of data. The specific implementation flow steps are as follows:
(1) The matrix cache architecture adopts an 8-level line shift register group, 8 pixels are received each time and stored in a first line, the subsequent line is updated for 1 period through a register chain, a complete 8 multiplied by 8 matrix is formed after 8 periods, and the data time sequence alignment during ZigZag scanning is ensured;
(2) The ZigZag path decomposition is that the matrix is split into 8 subsequences according to the ZigZag path of figure 4, and each subsequence contains 8 data;
(3) Pipeline buffering and outputting
Multistage buffering, namely delaying each subsequence through a register to ensure time sequence alignment;
the time sequence control, namely circularly gating the subsequence through a counter, and outputting a complete sequence in 8 periods;
enabling delay, namely delaying an input enabling signal through a 9-stage register to ensure synchronization with data;
data output, namely writing the processed data into the FIFO in sequence;
(4) FIFO control
The read enabling triggering is that when the residual data quantity of the FIFO reaches a threshold value, the read enabling is started;
rhythm control, namely controlling read enable through a counter, reading 1 pixel per cycle, and ensuring continuous output of data;
(5) DPCM processing
Latching DC value, namely latching the DC value of the current block to a corresponding component register when detecting the initial position of the block;
calculating the difference value between the current DC and the previous block DC;
output control, namely outputting DC difference at the initial position of the block and outputting AC components at the rest positions;
Component switching, namely tracking the block types through a counter, and sequentially processing 4Y blocks, 1U block and 1V block;
each component DC value is managed independently.
Specifically, the shared run-length encoding module 1044 is configured to compress the one-dimensional sequence scanned by the zigbee, and represent consecutive zero values with "zero run length+non-zero values", so as to reduce the data amount, and especially optimize for AC components in the zero value set in the high frequency region. The specific implementation flow steps are as follows:
(1) Data input, namely receiving a one-dimensional sequence (DC difference plus AC component) processed by DPCM;
(2) Zero run count, namely adding 1 to the counter when encountering 0, outputting the current count value and a non-zero value when encountering non-0, and resetting the counter;
(3) The code output is generated (run length, non-zero value) combination, the end of block output (0, 0) is used as end symbol (EOB);
(4) Special processing, namely, the EOB is directly output by all zero blocks, and the DC difference is independently output and does not participate in zero run counting.
Specifically, the four-table shared Huffman coding module 1045 is configured to perform variable length coding on the Y, U, V component data after run-length coding, and allocate codes with different lengths according to the probability of occurrence of the data. The Y component DC/AC and the UV component DC/AC respectively use independent Huffman tables to output the code length (size) and the binary code (code) corresponding to each symbol, thereby realizing lossless compression. The specific implementation flow steps are as follows:
(1) Four sets of Huffman table loading, namely, reading Y-DC, Y-AC, UV-DC and UV-AC tables from ROM, and storing mapping relation from symbols to (code length and code) in the tables;
(2) The method comprises the steps of table lookup coding, namely selecting a corresponding table according to the component (Y/UV) and the type (DC/AC) of input data, wherein DC difference is used for table lookup according to the amplitude value and the (run length and non-zero value) of AC by pressing a combination key, and obtaining a corresponding code length and a binary code;
(3) Output control, namely outputting a group of codes (size) per cycle, wherein the code length indicates the number of coding bits, and binary codes are output according to left-to-left Ziegler;
(4) Special symbol processing, outputting a fixed code length and code (e.g., size=2, code=00) when an end of block (EOB) is encountered;
Specifically, the merging and encoding module 105 is configured to splice a code length (size) and a binary code (code) output by the Huffman coding module into a continuous bitstream, and generate output according to byte alignment, so as to realize formatting integration of encoded data. The specific implementation flow steps are as follows:
(1) The input buffer architecture is designed by adopting a 32-bit depth buffer register (supporting two splicing of maximum 16-bit coding) and matching with a 5-bit counter (recording the current buffer bit number). The buffer is set to 0 during initialization, and the counter is set to 0, so that the continuity of the cross-byte coding is ensured;
(2) Bit stream dynamic splicing, namely splicing codes to a buffer according to the code length, namely combining the new codes with the buffer after shifting the current buffer bit number leftwards, and updating the buffer and the bit count. The maximum 16-bit coding (compatible with Huffman longest codes) is supported, and the coding sequence is ensured to be correct through a shift operation;
(3) And (3) byte alignment output control, namely when the number of bits in the buffer is more than or equal to 8, intercepting the high 8 bits as byte output, updating the buffer to the rest bits, and subtracting 8 from the counter. The multistage register delay is adopted to ensure that output data is synchronous with an enabling signal, so that time sequence conflict is avoided;
(4) And (3) performing cross-byte coding processing, namely, aiming at the condition that the coding length exceeds 8 bits, judging dynamic segmentation coding through a state machine and conditions, namely, outputting a complete byte firstly, and reserving the remaining bits until the next splicing (for example, 3 bits exist in a buffer, 7 bits are newly coded, and outputting the upper 8 bits and reserving the remaining 2 bits after splicing). Processing combinations of different code lengths and residual bits by using a predefined case statement to ensure that the splicing logic covers all scenes;
(5) And (3) counting the end of block and data, namely outputting residual bits in the buffer and filling the residual bits into byte boundaries when the end of block sign is detected, and generating an end of block signal. And accumulating and counting total code length data for subsequent compression ratio analysis or state monitoring.
On the other hand, as shown in fig. 7, the invention also provides a JPEG image compression method based on an FPGA double-matrix sharing pipeline, which is performed by adopting the JPEG image compression system based on the FPGA double-matrix sharing pipeline, and the JPEG image compression method comprises the following steps:
s100, converting YUV444 format data into YUV420 format, and generating a Y component data stream and a UV component data stream subjected to 2:1 horizontal/vertical downsampling;
s200, alternately storing the Y component data stream into Y1 and Y2 buffer areas according to rows to form a double-8-row matrix structure, synchronously storing the UV component data stream, realizing Y1/Y2 double-matrix segmentation of the Y component and 8-row aligned storage of the UV component, and providing a standardized 8X 8 data block for a shared pipeline;
s300, reading data blocks from the Y1 and Y2 cache areas according to the sequence of Y1, Y2, U and V, generating a time sequence control signal of a shared processing pipeline, and outputting according to the sequence of Y1, Y2, U and V to realize pipeline type continuous processing;
S400, based on the time sequence control signals, performing DCT conversion, quantization, zigZag scanning, run length coding and Huffman coding processing on the Y1, Y2 and U, V data blocks in a time-sharing multiplexing mode;
S500, splicing the coded output into a continuous bit stream to generate a compressed data stream conforming to the JPEG standard.
Specifically, the converting YUV444 format data into YUV420 format, generating a Y component data stream and a 2:1 horizontal/vertical downsampled UV component data stream includes the following steps:
constructing a UV component cache with 2 lines of depth to form a 2X 2 pixel window matrix;
The average value calculation is that 4 UV values in a window are summed, and high-efficiency average calculation is realized by right shift by 2 bits (fixed-point division);
And (3) data alignment output, namely generating a YUV420 format data stream, wherein the Y resolution is 2 multiplied by UV, and matching the double 8-row matrix segmentation requirement.
Specifically, the Y component data stream is alternately stored in the Y1 and Y2 buffer areas according to rows, so as to form a dual 8-row matrix structure, which comprises the following steps:
by row counting control, every 16 rows of Y component data are received, 2Y 1 and Y2 data blocks of 8 multiplied by 8 are alternately generated;
the UV component data stream is buffered at 8 line depth and forms a standard input set of 6 8 x 8 data blocks with the Y1/Y2 data blocks.
Specifically, the time-division multiplexing performs DCT transform, quantization, zigZag scanning, run-length encoding and Huffman encoding processing on the Y1, Y2, U, V data blocks based on the timing control signal, and includes the following steps:
Based on the 48-period counter value, the luminance quantization table is used when processing Y1, Y2 data blocks in 0-31 periods, and the chrominance quantization table is used when processing U, V data blocks in 32-47 periods;
huffman coding is performed on the different component data based on four sets of independent code tables of the Y/UV component DC/AC.
Specifically, the performing DCT transform of the Y1, Y2, U, V data blocks based on the timing control signal in a time-division multiplexing manner includes the following steps:
Line-column separation, namely decomposing the two-dimensional DCT into 8 times of one-dimensional line transformation and 8 times of one-dimensional column transformation, namely:
Wherein f (x, y) is a spatial domain 8×8 pixel value, C i(x)、Cj (y) is a cosine basis function, a one-dimensional DCT unit is called for each row of an 8×8 image block to generate an intermediate frequency domain matrix, row-column dimension interchange is performed on a row conversion result through a dual-port BRAM, the row-column dimension interchange is converted into a column data format, and then a one-dimensional DCT unit is called for the transposed column data to output a complete 8×8 frequency domain coefficient matrix;
the butterfly operation is that the Loeffler algorithm is adopted, 8-point DCT calculation is completed through a 3-level butterfly network, each level only needs 4 times of multiplication and 8 times of addition, the cosine function symmetry is utilized to disassemble matrix multiplication into iterative operation of addition, subtraction and a small amount of multiplication, the calculated amount is reduced, the parallel pipelining characteristic of the FPGA is matched, and the processing throughput is improved;
and (3) realizing fixed-point implementation, namely converting the floating point coefficient of the cosine base function into a 16-bit fixed point number in order to avoid high resource consumption of floating point operation.
Specifically, the basic principle and process of the technical scheme of the invention are as follows:
For YUV420 data characteristics, a vertical matrix segmentation strategy is proposed:
(1) Dividing 16-row Y component data into a first 8-row Y1 matrix and a second 8-row Y2 matrix, and synchronously extracting 8-row U/V components to form an independent matrix;
(2) Switching Y1/Y2 write enable through a 1-bit row counter cycle generates 28×8Y matrices, 18×8U matrix, and 18×8V matrix per 16 rows of Y data received;
(3) The method has the beneficial effects of directly matching 8X 8DCT operation, eliminating the traditional row-column conversion overhead and reducing the data caching requirement by 30%.
2. Sequential alignment shared pipeline
The special time sequence control module is designed to realize ordered scheduling of multi-component data:
(1) The 48-cycle state machine is adopted to output 6 8X 8 data blocks (Y1×2, Y2×2, U× 1 and V× 1) according to the sequence of Y1→Y1→Y2→Y2→U→V, and DCT conversion is carried out on each block.
(2) Triggering data reading based on the UV component FIFO depth threshold value, and ensuring YUV component data synchronization;
(3) And the sharing processing unit only needs one set of DCT conversion, quantization, zigZag scanning, RLE coding and Huffman coding modules to process different component data blocks in a time-sharing multiplexing way.
3. Hardware resource optimization mechanism
The resource efficient multiplexing is realized through architecture innovation:
(1) Compared with the traditional three-channel architecture, the computing circuit reduces logic units and DSP resources by more than 50%;
(2) The memory system does not need to buffer the intermediate result of three components at the same time, and the demand of the on-chip SRAM is reduced by 30%;
(3) And the power consumption control is that the time division multiplexing mechanism reduces the average power consumption by 40 percent and the data bus bandwidth requirement by 30 percent.
4. Complete coding flow optimization
(1) YUV444 to YUV420, 2 x2 block mean filtering downsampling, preserving the Y component, compressing the UV component;
(2) Two-dimensional DCT transformation, namely combining a row-column separation algorithm with a Loeffler butterfly network, and realizing 16-bit fixed-point implementation;
(3) Quantization and Huffman coding, namely dynamically switching brightness/chromaticity quantization tables, and realizing entropy coding by four sets of independent code tables.
Specifically, the invention forms a complete and efficient JPEG coding system by optimizing data organization, processing flow and time sequence control, remarkably reduces the computational complexity and the data carrying cost, and is particularly suitable for embedded image compression application scenes with limited resources.
In particular, compared with the prior art, the invention has the following advantages:
the invention realizes the efficient multiplexing of resources by using the serial processing sequence of Y1, Y2, U and V:
(1) The hardware area is saved, only one DCT/quantization/ZigZag/Huffman coding processing unit is needed (3 parallel units are needed in the traditional scheme), the calculation circuit area is reduced by more than 50%, the control logic is simplified, and the time sequence synchronization complexity is reduced.
(2) The power consumption is reduced, namely the average power consumption of the time division multiplexing calculation unit is reduced by about 40%, the bandwidth requirement of a data bus is reduced, and the memory access power consumption is reduced.
(3) The memory requirements are reduced by about 30% without the need to cache Y, U, V intermediate results for three components simultaneously.
(4) The complexity optimization is realized, the data path is simplified, the wiring difficulty of the FPGA/ASIC is reduced, and the development period is shortened by about 25%.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the description of the present invention and the accompanying drawings or direct/indirect application in other related technical fields are included in the scope of the invention.

Claims (10)

1.一种基于FPGA双矩阵共享流水线的JPEG图像压缩系统,其特征在于,包括:1. A JPEG image compression system based on an FPGA dual-matrix shared pipeline, comprising: YUV420预处理模块,用于将YUV444格式数据转换为YUV420格式数据,完整保留Y分量,对UV分量进行2:1水平/垂直下采样,生成Y分量数据流及经2:1水平/垂直下采样的UV分量数据流;The YUV420 preprocessing module is used to convert YUV444 format data into YUV420 format data, completely retain the Y component, perform 2:1 horizontal/vertical downsampling on the UV component, and generate a Y component data stream and a 2:1 horizontal/vertical downsampling UV component data stream; 双8行矩阵分割存储模块,用于将所述Y分量数据流按行交替存储至Y1和Y2缓存区,形成双8行矩阵结构,并同步存储UV分量数据流,得到Y1、Y2、U、V数据块,实现Y分量的Y1/Y2双矩阵分割及UV分量的8行对齐存储;A dual 8-row matrix segmentation storage module is used to store the Y component data stream alternately in the Y1 and Y2 buffer areas by row to form a dual 8-row matrix structure, and synchronously store the UV component data stream to obtain Y1, Y2, U, and V data blocks, thereby realizing Y1/Y2 dual matrix segmentation of the Y component and 8-row aligned storage of the UV component; 共享流水线时序调度模块,用于按Y1→Y1→Y2→Y2→U→V的顺序从Y1和Y2缓存区读取Y1、Y2、U、V数据块,生成共享处理流水线的时序控制信号,实现流水线式连续处理,将并行多分量数据转换为串行数据流;A shared pipeline timing scheduling module is used to read Y1, Y2, U, and V data blocks from the Y1 and Y2 buffers in the order of Y1 → Y1 → Y2 → Y2 → U → V, generate timing control signals for the shared processing pipeline, implement pipeline-type continuous processing, and convert parallel multi-component data into a serial data stream; 共享处理模块,用于根据所述时序控制信号,分时复用执行Y1、Y2、U、V数据块的DCT变换、量化、ZigZag扫描、游程编码及Huffman编码处理;以及A shared processing module, configured to perform DCT transformation, quantization, ZigZag scanning, run-length coding, and Huffman coding of the Y1, Y2, U, and V data blocks in a time-division multiplexing manner according to the timing control signal; and 合并编码模块,用于将编码输出拼接为连续比特流,并按字节对齐生成输出,实现编码数据的格式化整合,生成符合JPEG标准的压缩数据流。The merge encoding module is used to splice the encoded output into a continuous bit stream and generate output by byte alignment, so as to realize the formatted integration of the encoded data and generate a compressed data stream that conforms to the JPEG standard. 2.根据权利要求1所述的基于FPGA双矩阵共享流水线的JPEG图像压缩系统,其特征在于,所述双8行矩阵分割存储模块通过行计数控制和分割计数器,每接收16行Y分量数据,交替生成2个8×8的Y1和Y2数据块,所述UV分量数据流按8行深度缓存,与Y1和Y2数据块形成6个8×8数据块的标准输入组,为共享流水线提供标准化8×8数据块。2. The JPEG image compression system based on the FPGA dual-matrix shared pipeline according to claim 1 is characterized in that the dual 8-row matrix segmentation storage module uses row count control and a segmentation counter to alternately generate two 8×8 Y1 and Y2 data blocks every time 16 rows of Y component data are received. The UV component data stream is cached at an 8-row depth and forms a standard input group of six 8×8 data blocks with the Y1 and Y2 data blocks, providing standardized 8×8 data blocks for the shared pipeline. 3.根据权利要求1所述的基于FPGA双矩阵共享流水线的JPEG图像压缩系统,其特征在于,所述共享流水线时序调度模块采用三段式状态机,所述三段式状态机包括:3. The JPEG image compression system based on the FPGA dual-matrix shared pipeline according to claim 1, wherein the shared pipeline timing scheduling module adopts a three-stage state machine, and the three-stage state machine comprises: IDLE状态:检测UV分量FIFO数据量≥8时,跳转至读使能生成状态;IDLE state: when the UV component FIFO data volume is detected to be ≥8, it jumps to the read enable generation state; GEN_RD_EN状态:通过48周期计数器按Y1→Y1→Y2→Y2→U→V顺序生成读使能信号;GEN_RD_EN state: Generates the read enable signal in the order of Y1→Y1→Y2→Y2→U→V through the 48-cycle counter; WAIT_BACK状态:监听后端FIFO空标志,确保共享流水线无数据积压。WAIT_BACK state: monitors the backend FIFO empty flag to ensure that there is no data backlog in the shared pipeline. 4.根据权利要求1所述的基于FPGA双矩阵共享流水线的JPEG图像压缩系统,其特征在于,所述共享处理模块包括:4. The JPEG image compression system based on FPGA dual-matrix shared pipeline according to claim 1, wherein the shared processing module comprises: 共享二维DCT变换模块,用于通过行列分离算法将二维DCT分解为行变换和列变换,采用Loeffler算法,通过3级蝶形网络实现8点DCT计算,将余弦基函数浮点系数转换为16bit定点数;Shared 2D DCT transform module, used to decompose the 2D DCT into row transform and column transform using the row-column separation algorithm, implement 8-point DCT calculation through a 3-level butterfly network using the Loeffler algorithm, and convert the cosine basis function floating-point coefficients into 16-bit fixed-point numbers; 动态量化模块,用于基于时序控制信号,对DCT系数执行非均匀量化,动态切换亮度/色度量化表,保留人眼敏感低频信息;Dynamic quantization module, which is used to perform non-uniform quantization on DCT coefficients based on timing control signals, dynamically switch the luminance/chrominance quantization table, and retain low-frequency information that is sensitive to the human eye; Zigzag流水线扫描模块,用于将量化后的8×8量化系数矩阵按ZigZag路径重排为一维序列,并对DC系数执行DPCM差分编码;Zigzag pipeline scanning module, used to rearrange the quantized 8×8 quantization coefficient matrix into a one-dimensional sequence according to the ZigZag path, and perform DPCM differential coding on the DC coefficient; 共享游程编码模块,用于对ZigZag扫描后的一维序列进行压缩,执行零游程编码,减少数据量;以及A shared run-length encoding module is used to compress the one-dimensional sequence after ZigZag scanning, perform zero run-length encoding, and reduce the amount of data; and 四表共享Huffman编码模块,用于对游程编码后的Y、U、V分量数据DC/AC四套独立码表进行变长编码,根据数据出现概率分配不同长度的编码,实现无损压缩。The four tables share the Huffman encoding module, which is used to perform variable-length encoding on the four independent code tables of DC/AC for the Y, U, and V component data after run-length encoding, and assign codes of different lengths according to the probability of data occurrence to achieve lossless compression. 5.根据权利要求4所述的基于FPGA双矩阵共享流水线的JPEG图像压缩系统,其特征在于,所述动态量化模块根据48周期计数器动态切换亮度/色度量化表,其中:5. The JPEG image compression system based on FPGA dual-matrix shared pipeline according to claim 4, wherein the dynamic quantization module dynamically switches the luminance/chrominance quantization table according to a 48-cycle counter, wherein: 计数器在0-31周期:处理Y1、Y2数据块,使用亮度量化表;Counter in cycles 0-31: Process Y1 and Y2 data blocks and use the brightness quantization table; 计数器在32-47周期:处理U、V数据块,使用色度量化表。Counter in cycles 32-47: Processing U, V data blocks and using the chromaticity quantization table. 6.一种基于FPGA双矩阵共享流水线的JPEG图像压缩方法,采用权利要求1~5任一项所述的基于FPGA双矩阵共享流水线的JPEG图像压缩系统进行,其特征在于,所述JPEG图像压缩方法包括以下步骤:6. A JPEG image compression method based on an FPGA dual-matrix shared pipeline, performed using the JPEG image compression system based on an FPGA dual-matrix shared pipeline according to any one of claims 1 to 5, characterized in that the JPEG image compression method comprises the following steps: 将YUV444格式数据转换为YUV420格式,生成Y分量数据流及经2:1水平/垂直下采样的UV分量数据流;Convert YUV444 format data to YUV420 format, generate Y component data stream and UV component data stream after 2:1 horizontal/vertical downsampling; 将所述Y分量数据流按行交替存储至Y1和Y2缓存区,形成双8行矩阵结构,并同步存储UV分量数据流,实现Y分量的Y1/Y2双矩阵分割及UV分量的8行对齐存储,为共享流水线提供标准化8×8数据块;The Y component data stream is alternately stored in the Y1 and Y2 buffer areas by row to form a dual 8-row matrix structure, and the UV component data stream is stored synchronously to achieve Y1/Y2 dual matrix segmentation of the Y component and 8-row aligned storage of the UV component, providing a standardized 8×8 data block for the shared pipeline; 按Y1→Y1→Y2→Y2→U→V的顺序从Y1和Y2缓存区读取数据块,生成共享处理流水线的时序控制信号,并按Y1→Y1→Y2→Y2→U→V顺序输出,实现流水线式连续处理;Read data blocks from the Y1 and Y2 buffers in the order of Y1 → Y1 → Y2 → Y2 → U → V, generate timing control signals for the shared processing pipeline, and output them in the order of Y1 → Y1 → Y2 → Y2 → U → V to achieve pipeline continuous processing; 基于所述时序控制信号,分时复用执行Y1、Y2、U、V数据块的DCT变换、量化、ZigZag扫描、游程编码及Huffman编码处理;Based on the timing control signal, DCT transformation, quantization, ZigZag scanning, run-length coding and Huffman coding processing of Y1, Y2, U and V data blocks are performed in a time-division multiplexing manner; 将编码输出拼接为连续比特流,生成符合JPEG标准的压缩数据流。The encoded output is spliced into a continuous bit stream to generate a compressed data stream that conforms to the JPEG standard. 7.根据权利要求6所述的基于FPGA双矩阵共享流水线的JPEG图像压缩方法,其特征在于,所述的将YUV444格式数据转换为YUV420格式,生成Y分量数据流及经2:1水平/垂直下采样的UV分量数据流包括以下步骤:7. The JPEG image compression method based on FPGA dual-matrix shared pipeline according to claim 6, wherein the step of converting YUV444 format data into YUV420 format to generate a Y component data stream and a UV component data stream that has been horizontally/vertically downsampled by 2:1 comprises the following steps: 数据缓存:构建2行深度的UV分量缓存,形成2×2像素窗口矩阵;Data buffer: Build a UV component buffer with a depth of 2 rows, forming a 2×2 pixel window matrix; 均值计算:对窗口内4个UV值求和,通过右移2位实现高效平均计算;Average calculation: sum the 4 UV values in the window and perform efficient average calculation by right shifting 2 bits; 数据对齐输出:生成YUV420格式数据流,Y分辨率为UV的2×2倍,匹配双8行矩阵分割需求。Data alignment output: Generates a YUV420 format data stream with a Y resolution 2×2 times that of UV, matching the dual 8-row matrix segmentation requirements. 8.根据权利要求6所述的基于FPGA双矩阵共享流水线的JPEG图像压缩方法,其特征在于,将所述Y分量数据流按行交替存储至Y1和Y2缓存区,形成双8行矩阵结构包括以下步骤:8. The JPEG image compression method based on an FPGA dual-matrix shared pipeline according to claim 6, wherein the step of alternately storing the Y component data stream in the Y1 and Y2 buffers by row to form a dual 8-row matrix structure comprises the following steps: 通过行计数控制,每接收16行Y分量数据,交替生成2个8×8的Y1和Y2数据块;Through line count control, every time 16 lines of Y component data are received, two 8×8 Y1 and Y2 data blocks are alternately generated; 将UV分量数据流按8行深度缓存,与Y1/Y2数据块形成6个8×8数据块的标准输入组。The UV component data stream is buffered as 8 lines of depth and forms a standard input group of 6 8×8 data blocks with the Y1/Y2 data blocks. 9.根据权利要求6所述的基于FPGA双矩阵共享流水线的JPEG图像压缩方法,其特征在于,所述的基于所述时序控制信号,分时复用执行Y1、Y2、U、V数据块的DCT变换、量化、ZigZag扫描、游程编码及Huffman编码处理,包括以下步骤:9. The JPEG image compression method based on FPGA dual-matrix shared pipeline according to claim 6, characterized in that the DCT transformation, quantization, ZigZag scanning, run-length encoding and Huffman encoding processing of the Y1, Y2, U and V data blocks are performed in a time-division multiplexing manner based on the timing control signal, comprising the following steps: 基于48周期计数器值,在0-31周期处理Y1、Y2数据块时使用亮度量化表,在32-47周期处理U、V数据块时使用色度量化表;Based on the 48-cycle counter value, the luminance quantization table is used when processing the Y1 and Y2 data blocks in cycles 0-31, and the chrominance quantization table is used when processing the U and V data blocks in cycles 32-47; 基于Y/UV分量DC/AC四套独立码表,对不同分量数据执行Huffman编码。Based on four independent code tables for Y/UV components DC/AC, Huffman encoding is performed on different component data. 10.根据权利要求6所述的基于FPGA双矩阵共享流水线的JPEG图像压缩方法,其特征在于,所述的基于所述时序控制信号,分时复用执行Y1、Y2、U、V数据块的DCT变换包括以下步骤:10. The JPEG image compression method based on FPGA dual-matrix shared pipeline according to claim 6, wherein the time-division multiplexing execution of DCT transform of Y1, Y2, U, and V data blocks based on the timing control signal comprises the following steps: 行列分离:将二维DCT分解为8次一维行变换+8次一维列变换,即:Row and column separation: Decompose the two-dimensional DCT into 8 one-dimensional row transforms + 8 one-dimensional column transforms, that is: 其中,f(x,y)为空间域8×8像素值,Ci(x)、Cj(y)为余弦基函数,对8×8图像块的每一行调用一维DCT单元,生成中间频域矩阵,通过双端口BRAM将行变换结果进行行列维度互换,转为列数据格式,再对转置后的列数据调用一维DCT单元,输出完整8×8频域系数矩阵;Where f(x,y) is the 8×8 pixel value in the spatial domain, C i (x) and C j (y) are cosine basis functions, and a one-dimensional DCT unit is called for each row of the 8×8 image block to generate an intermediate frequency domain matrix. The row transform result is converted to column data format by swapping the row and column dimensions through a dual-port BRAM. The one-dimensional DCT unit is then called on the transposed column data to output a complete 8×8 frequency domain coefficient matrix. 蝶形运算:采用Loeffler算法,通过3级蝶形网络完成8点DCT计算,每级仅需4次乘法、8次加法,利用余弦函数对称性,将矩阵乘法拆解为加减+少量乘法的迭代运算;Butterfly operation: Using the Loeffler algorithm, a three-level butterfly network is used to complete 8-point DCT calculations. Each level requires only four multiplications and eight additions. Utilizing the symmetry of the cosine function, matrix multiplication is decomposed into iterative operations of addition, subtraction, and a small number of multiplications. 定点化实现:将余弦基函数浮点系数转换为16bit定点数。Fixed-point implementation: Convert the cosine basis function floating-point coefficients to 16-bit fixed-point numbers.
CN202511109247.1A 2025-08-08 JPEG image compression system and method based on FPGA double-matrix sharing assembly line Active CN120602650B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202511109247.1A CN120602650B (en) 2025-08-08 JPEG image compression system and method based on FPGA double-matrix sharing assembly line

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202511109247.1A CN120602650B (en) 2025-08-08 JPEG image compression system and method based on FPGA double-matrix sharing assembly line

Publications (2)

Publication Number Publication Date
CN120602650A true CN120602650A (en) 2025-09-05
CN120602650B CN120602650B (en) 2025-10-17

Family

ID=

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040109610A1 (en) * 2002-08-26 2004-06-10 Taku Kodama Image processing apparatus for compositing images
CN101951524A (en) * 2009-07-10 2011-01-19 比亚迪股份有限公司 JPEG (Joint Photographic Experts Group) compression method and device of color digital image
CN103491375A (en) * 2013-05-29 2014-01-01 东南大学 JPEG compression system based on bin DCT algorithm
CN113301344A (en) * 2021-05-22 2021-08-24 兰州大学 Image compression and decompression method based on FPGA

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040109610A1 (en) * 2002-08-26 2004-06-10 Taku Kodama Image processing apparatus for compositing images
CN101951524A (en) * 2009-07-10 2011-01-19 比亚迪股份有限公司 JPEG (Joint Photographic Experts Group) compression method and device of color digital image
CN103491375A (en) * 2013-05-29 2014-01-01 东南大学 JPEG compression system based on bin DCT algorithm
CN113301344A (en) * 2021-05-22 2021-08-24 兰州大学 Image compression and decompression method based on FPGA

Similar Documents

Publication Publication Date Title
US6195026B1 (en) MMX optimized data packing methodology for zero run length and variable length entropy encoding
US5341442A (en) Method and apparatus for compression data by generating base image data from luminance and chrominance components and detail image data from luminance component
US9554153B2 (en) Data compression using spatial decorrelation
Westwater et al. Real-time video compression: techniques and algorithms
US5659362A (en) VLSI circuit structure for implementing JPEG image compression standard
CN112399181B (en) Image coding and decoding method, device and storage medium
JP2003023635A (en) Video frame compression/decompression hardware system
CN105120293A (en) Image cooperative decoding method and apparatus based on CPU and GPU
KR101710001B1 (en) Apparatus and Method for JPEG2000 Encoding/Decoding based on GPU
CN105578190A (en) Lossless compression method and system for video hard decoding
AU2002259268C1 (en) Apparatus and method for encoding and computing a discrete cosine transform using a butterfly processor
KR101314458B1 (en) Compression using range coding with virtual sliding window
US20170041625A1 (en) Video decoder memory bandwidth compression
CN103167289A (en) Method and device for coding and decoding image
CN114584773A (en) Image compression device, method, electronic device, and computer-readable storage medium
WO2021143634A1 (en) Arithmetic coder, method for implementing arithmetic coding, and image coding method
CN120602650B (en) JPEG image compression system and method based on FPGA double-matrix sharing assembly line
CN120602650A (en) JPEG image compression system and method based on FPGA double-matrix sharing assembly line
CN103491375B (en) JPEG compression system based on bin DCT algorithm
US11941397B1 (en) Machine instructions for decoding acceleration including fuse input instructions to fuse multiple JPEG data blocks together to take advantage of a full SIMD width of a processor
JPH02272970A (en) Data processing circuit
CN116527903B (en) Image shallow compression method and decoding method
KR0178746B1 (en) Half pixel processing unit of macroblock
Luthi et al. A video-rate JPEG chip set
Yang et al. JPEG XS hardware encoder based on modular pipeline design

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant