CN114663536B

CN114663536B - Image compression method and device

Info

Publication number: CN114663536B
Application number: CN202210118720.2A
Authority: CN
Inventors: 张兆翔; 宋纯锋; 邹仁杰
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2022-02-08
Filing date: 2022-02-08
Publication date: 2022-12-06
Anticipated expiration: 2042-02-08
Also published as: CN114663536A

Abstract

The present invention provides an image compression method and device. The method includes: acquiring an image to be compressed; dividing the image to be compressed into multiple image blocks based on preprocessing rules, and inputting all the image blocks to be compressed into a pre-stored In the target encoder of , to obtain the first hidden variable; input the first hidden variable into the pre-stored entropy model to obtain the second hidden variable; input the second hidden variable into the pre-stored target decoder , to obtain compressed image blocks, and obtain compressed images according to the compressed image blocks; the method of the present invention introduces a Transformer module in the image compression task and uses a symmetrical processing architecture to encode and decode images, improving image compression efficiency.

Description

An image compression method and device

技术领域technical field

本发明属于计算机视觉领域，尤其涉及一种图像压缩方法及装置。The invention belongs to the field of computer vision, in particular to an image compression method and device.

背景技术Background technique

图像压缩是数据压缩技术在数字图像上的应用，图像压缩目的是减少图像数据中的冗余信息，从而高效地存储和传输数据，即在给定比特率或者压缩比下得到最好的图像质量。Image compression is the application of data compression technology on digital images. The purpose of image compression is to reduce redundant information in image data, so as to store and transmit data efficiently, that is, to obtain the best image quality at a given bit rate or compression ratio. .

现有技术通常基于卷积神经网络来设计解码器和编码器来执行图像压缩任务，但基于卷积神经网络的图像压缩过程无法捕获图像的语义信息，而利用图像的空间冗余信息又使得基于全局的注意力机制在图像压缩任务中表现较差，导致图像压缩效率较低。Existing technologies usually design decoders and encoders based on convolutional neural networks to perform image compression tasks, but the image compression process based on convolutional neural networks cannot capture the semantic information of images, and the use of spatial redundancy information of images makes the image compression based on Global attention mechanisms perform poorly in image compression tasks, resulting in inefficient image compression.

发明内容Contents of the invention

本发明提供的一种图像压缩方法及装置，用以解决现有技术基于卷积神经网络来设计解码器和编码器来执行图像压缩任务时因无法捕获图像的语义信息导致图像压缩时的率失真性能较差的缺陷，提升了图像的压缩效率。An image compression method and device provided by the present invention are used to solve the rate distortion of the image compression caused by the inability to capture the semantic information of the image when the decoder and encoder are designed based on the convolutional neural network to perform the image compression task in the prior art The defect of poor performance improves the compression efficiency of images.

本发明提供一种图像压缩方法，所述方法包括：The invention provides an image compression method, the method comprising:

获取待压缩图像；基于预处理规则将所述待压缩图像划分为多个图像块，并将所有所述待压缩图像块输入到预存的目标编码器中，以获取第一隐变量，其中，所述目标编码器包括线性嵌入层模块、Transformer模块以及块合并模块；将所述第一隐变量输入到预存的熵模型中，以获取第二隐变量；将所述第二隐变量输入到预存的目标解码器中，以获取压缩后的图像块，并根据所述压缩后的图像块获取压缩后的图像，其中，所述目标解码器包括去嵌入层模块、所述 Transformer模块以及块分裂模块。Obtaining an image to be compressed; dividing the image to be compressed into a plurality of image blocks based on preprocessing rules, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first latent variable, wherein the The target encoder includes a linear embedding layer module, a Transformer module and a block merging module; the first hidden variable is input into a pre-stored entropy model to obtain a second hidden variable; the second hidden variable is input into a pre-stored In the target decoder, the compressed image block is obtained, and the compressed image is obtained according to the compressed image block, wherein the target decoder includes a de-embedding layer module, the Transformer module and a block splitting module.

根据本发明提供的一种图像压缩方法，所述方法还包括：According to an image compression method provided by the present invention, the method further includes:

将所述第一隐变量输入到所述熵模型中，获取所述第一隐变量中每个元素的均值和方差，并根据所述每个元素的均值和方差模拟所述第一隐变量的正态分布，以获取概率分布函数；基于所述概率分布函数将所述第一隐变量进行算术编码，以获取目标比特流；基于所述概率分布函数将所述目标比特流进行算术解码，以获取第三隐变量；通过所述熵模型获取所述第三隐变量的量化残差损失，并基于所述第三隐变量和所述量化残差损失获取所述第二隐变量。Input the first hidden variable into the entropy model, obtain the mean and variance of each element in the first hidden variable, and simulate the first hidden variable according to the mean and variance of each element normal distribution, to obtain a probability distribution function; perform arithmetic coding on the first hidden variable based on the probability distribution function, to obtain a target bit stream; perform arithmetic decoding on the target bit stream based on the probability distribution function, to obtain Obtaining a third hidden variable; obtaining a quantized residual loss of the third hidden variable through the entropy model, and acquiring the second hidden variable based on the third hidden variable and the quantized residual loss.

利用以下公式计算全局损失L：The global loss L is calculated using the following formula:

L＝R+λDL=R+λD

其中，λ为超参数，R为压缩得到的比特流大小，D为失真项；根据所述全局损失获取目标图像压缩模型。Among them, λ is a hyperparameter, R is the size of the compressed bit stream, and D is a distortion item; the target image compression model is obtained according to the global loss.

基于BP算法训练图像压缩模型，并调整所述比特流大小R和所述失真项D来减小全局损失L，以获取目标超参数；根据所述目标超参数训练所述图像压缩模型，以获取图像压缩模型。Train the image compression model based on the BP algorithm, and adjust the bit stream size R and the distortion item D to reduce the global loss L to obtain the target hyperparameters; train the image compression model according to the target hyperparameters to obtain Image compression model.

将所述待压缩图像进行归一化处理，并按照固定划分面积将处理后的图像均分为多个所述图像块。The image to be compressed is subjected to normalization processing, and the processed image is equally divided into a plurality of image blocks according to a fixed division area.

所述Transformer模块包括基于窗口的注意力层、多层感知机以及归一化层。The Transformer module includes a window-based attention layer, a multilayer perceptron, and a normalization layer.

本发明还提供一种图像压缩装置，所述装置包括：The present invention also provides an image compression device, which includes:

图像获取模块，用于获取待压缩图像；解码模块，用于基于预处理规则将所述待压缩图像划分为多个图像块，并将所有所述待压缩图像块输入到预存的目标编码器中，以获取第一隐变量；其中，所述目标编码器包括线性嵌入层模块、Transformer模块以及块合并模块；转换模块，用于将所述第一隐变量输入到预存的熵模型中，以获取第二隐变量；解码模块，用于将所述第二隐变量输入到预存的目标解码器中，以获取压缩后的图像块，并根据所述压缩后的图像块获取压缩后的图像；其中，所述目标解码器包括去嵌入层模块、所述Transformer 模块以及块分裂模块。An image acquisition module, configured to acquire an image to be compressed; a decoding module, configured to divide the image to be compressed into multiple image blocks based on preprocessing rules, and input all the image blocks to be compressed into a pre-stored target encoder , to obtain the first hidden variable; wherein, the target encoder includes a linear embedding layer module, a Transformer module, and a block merging module; a conversion module is used to input the first hidden variable into a pre-stored entropy model to obtain The second latent variable; a decoding module, configured to input the second latent variable into a pre-stored target decoder to obtain compressed image blocks, and obtain compressed images according to the compressed image blocks; wherein , the target decoder includes a de-embedding layer module, the Transformer module and a block splitting module.

本发明还提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如上述任一种所述图像压缩方法的步骤。The present invention also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, it realizes any one of the image compression methods described above. A step of.

本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如上述任一种所述图像压缩方法的步骤。The present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of any one of the above-mentioned image compression methods are realized.

本发明提供的一种图像压缩方法及装置，先获取待压缩图像；然后基于预处理规则将所述待压缩图像划分为多个图像块，并将所有所述待压缩图像块输入到预存的目标编码器中，以获取第一隐变量；再将所述第一隐变量输入到预存的熵模型中，以获取第二隐变量；最后将所述第二隐变量输入到预存的目标解码器中，以获取压缩后的图像块，并根据所述压缩后的图像块获取压缩后的图像；本发明所述方法在图像压缩任务中引入Transformer模块并采用对称处理架构进行图像的编码和解码，提高了图像压缩效率。An image compression method and device provided by the present invention first obtains the image to be compressed; then divides the image to be compressed into multiple image blocks based on preprocessing rules, and inputs all the image blocks to be compressed to the pre-stored target In the encoder, to obtain the first hidden variable; then input the first hidden variable into the pre-stored entropy model to obtain the second hidden variable; finally, input the second hidden variable into the pre-stored target decoder , to obtain compressed image blocks, and obtain compressed images according to the compressed image blocks; the method of the present invention introduces a Transformer module in the image compression task and uses a symmetrical processing architecture to encode and decode images, improving image compression efficiency.

附图说明Description of drawings

为了更清楚地说明本发明或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the present invention or the technical solutions in the prior art, the accompanying drawings that need to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the accompanying drawings in the following description are the present invention. For some embodiments of the invention, those skilled in the art can also obtain other drawings based on these drawings without creative effort.

图1是本发明实施例提供的图像压缩方法的流程示意图；Fig. 1 is a schematic flow chart of an image compression method provided by an embodiment of the present invention;

图2是本发明实施例提供的目标编码器的结构示意图；Fig. 2 is a schematic structural diagram of a target encoder provided by an embodiment of the present invention;

图3是本发明实施例提供的目标解码器的结构示意图；Fig. 3 is a schematic structural diagram of a target decoder provided by an embodiment of the present invention;

图4是本发明另一实施例提供的获取第二隐变量的流程示意图；FIG. 4 is a schematic flow diagram of obtaining a second hidden variable provided by another embodiment of the present invention;

图5是本发明又一实施例提供的Transformer模块的结构示意图；Fig. 5 is a schematic structural diagram of a Transformer module provided by another embodiment of the present invention;

图6是本发明实施例提供的图像压缩装置的结构示意图；FIG. 6 is a schematic structural diagram of an image compression device provided by an embodiment of the present invention;

图7是本发明提供的电子设备的结构示意图。Fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.

具体实施方式detailed description

为使本发明的目的、技术方案和优点更加清楚，下面将结合本发明中的附图，对本发明中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the present invention clearer, the technical solutions in the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the present invention. Obviously, the described embodiments are part of the embodiments of the present invention , but not all examples. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

下面结合图1描述本发明实施例提供的图像压缩方法，包括：The image compression method provided by the embodiment of the present invention is described below in conjunction with FIG. 1, including:

步骤101、获取待压缩图像。Step 101, acquire an image to be compressed.

可以理解的是，图像压缩是数据压缩技术在数字图像上的应用，其目的是减少图像数据中的冗余信息从而用更加高效的格式存储和传输数据，当图像数据量过于庞大时，图像信息的存储、传输、处理会变得十分困难，因此，需要对待处理的图像进行图像压缩，使得压缩后的图像能够被有效使用；本实施例从开源数据库中随机抽取100 万张RGB彩色图像作为待压缩图像，也可以从其他数据采集平台获取所需的待压缩图像数据。It can be understood that image compression is the application of data compression technology on digital images. Its purpose is to reduce redundant information in image data and store and transmit data in a more efficient format. When the amount of image data is too large, image information Therefore, it is necessary to perform image compression on the image to be processed, so that the compressed image can be used effectively; this embodiment randomly extracts 1 million RGB color images from the open source database as the image to be processed To compress the image, the required image data to be compressed can also be obtained from other data acquisition platforms.

步骤102、基于预处理规则将所述待压缩图像划分为多个图像块，并将所有所述待压缩图像块输入到预存的目标编码器中，以获取第一隐变量；其中，所述目标编码器包括线性嵌入层模块、Transformer 模块以及块合并模块。Step 102: Divide the image to be compressed into multiple image blocks based on preprocessing rules, and input all the image blocks to be compressed into a pre-stored target encoder to obtain a first latent variable; wherein, the target The encoder consists of a linear embedding layer module, a Transformer module, and a block merging module.

可以理解的是，由于需要去除待压缩图像中的冗余信息以获取压缩后的图像，本实施例先利用图像处理软件将获取的待压缩的RGB 彩色图像进行归一化处理，统一各图像的尺寸大小；然后将每张图像按照一定的排列顺序划分为多个固定尺寸的图像块，得到图像块序列；最后将图像块序列输入到目标编码器中进行编码处理，编码器在训练时，可以将图像块映射成隐变量所服从的概率分布的参数，然后对该概率分布参数进行采样可获取第一隐变量，该第一隐变量是由图像块经过目标编码器训练后输出的特征经过量化取整后得到的。如图2 所示：目标编码器的主体结构包括线性嵌入层模块、Transformer模块以及块合并模块；其中，线性嵌入层由多层感知机(MLP)构成，线性嵌入层能够对图像中的每个像素的通道数据作线性变换， Transformer模块由基于窗口的注意力层、多层感知机和归一化层构成，每一层Transformer模块数目分别为2、2、6和2，块合并模块由多层感知机和归一化层构成，用于对图像特征进行下采样操作。It can be understood that since redundant information in the image to be compressed needs to be removed to obtain the compressed image, this embodiment uses image processing software to normalize the acquired RGB color image to be compressed, and unifies the size; then each image is divided into multiple fixed-size image blocks according to a certain order, and the image block sequence is obtained; finally, the image block sequence is input into the target encoder for encoding processing. When the encoder is training, it can The image block is mapped to the parameters of the probability distribution subject to the hidden variable, and then the probability distribution parameter is sampled to obtain the first hidden variable, which is quantized by the feature output by the image block after the target encoder training obtained after rounding. As shown in Figure 2: the main structure of the target encoder includes a linear embedding layer module, a Transformer module, and a block merging module; among them, the linear embedding layer is composed of a multi-layer perceptron (MLP), and the linear embedding layer can be used for each image in the image. The pixel channel data is linearly transformed. The Transformer module is composed of a window-based attention layer, a multi-layer perceptron and a normalization layer. The number of Transformer modules in each layer is 2, 2, 6 and 2 respectively. The block merging module consists of multiple A layer perceptron and a normalization layer are used to downsample image features.

步骤103、将所述第一隐变量输入到预存的熵模型中，以获取第二隐变量。Step 103: Input the first hidden variable into a pre-stored entropy model to obtain a second hidden variable.

可以理解的是，由于第一隐变量是由图像块经过目标编码器训练后的输出的特征经过量化取整后得到的，取整的方式为四舍五入取整，这样会带来量化损失，为了弥补图像块在经过目标编码器出现的量化损失，本实施例将获取的第一隐变量输入到预存的熵模型中以获取第二隐变量，第二隐变量弥补了第一隐变量出现的量化损失；具体的，本实施例基于卷积神经网络构建逐通道自回归熵模型，然后将第一隐变量输入到熵模型中，得到第一隐变量中各个元素概率分布函数，然后基于该概率分布函数分别利用算术编码、解码获取第二隐变量。It is understandable that since the first hidden variable is obtained by quantizing and rounding the output features of the image block after the target encoder is trained, the rounding method is rounded, which will bring quantization loss. In order to make up for For the quantization loss of the image block passing through the target encoder, this embodiment inputs the obtained first hidden variable into the pre-stored entropy model to obtain the second hidden variable, and the second hidden variable makes up for the quantization loss of the first hidden variable Specifically, this embodiment builds a channel-by-channel autoregressive entropy model based on a convolutional neural network, and then inputs the first hidden variable into the entropy model to obtain a probability distribution function of each element in the first hidden variable, and then based on the probability distribution function Arithmetic coding and decoding are used to obtain the second hidden variable respectively.

步骤104、将所述第二隐变量输入到预存的目标解码器中，以获取压缩后的图像块，并根据所述压缩后的图像块获取压缩后的图像；其中，所述目标解码器包括去嵌入层模块、所述Transformer模块以及块分裂模块。Step 104: Input the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtain a compressed image according to the compressed image block; wherein, the target decoder includes De-embedding layer module, the Transformer module and block splitting module.

可以理解的是，解码是编码的逆过程，本将获取的第二隐变量输入到目标解码器中进行解析，以获取对应RGB图像压缩后的图像块，然后根据上述实施例中图像块的排序信息，将压缩后的图像块重新拼接成完整的图像；如图3所示：目标解码器的主体结构包括去嵌入层模块、Transformer模块以及块分裂模块；其中，去嵌入层由多层感知机(MLP)构成，Transformer模块由基于窗口的注意力层、多层感知机和归一化层构成，块合并模块由多层感知机和归一化层构成，用于对图像特征进行上采样操作。It can be understood that decoding is the inverse process of encoding, and the obtained second latent variable is input into the target decoder for analysis to obtain the compressed image blocks corresponding to the RGB image, and then according to the ordering of the image blocks in the above-mentioned embodiment Information, re-splicing the compressed image blocks into a complete image; as shown in Figure 3: the main structure of the target decoder includes a de-embedding layer module, a Transformer module and a block splitting module; wherein, the de-embedding layer is composed of a multi-layer perceptron (MLP), the Transformer module is composed of a window-based attention layer, a multi-layer perceptron and a normalization layer, and the block merging module is composed of a multi-layer perceptron and a normalization layer for upsampling of image features .

本发明所述方法在图像压缩任务中引入Transformer模块并采用对称处理架构进行图像的编码和解码，提高了图像压缩效率。The method of the invention introduces a Transformer module into the image compression task and adopts a symmetrical processing framework to encode and decode images, thereby improving image compression efficiency.

可选的，将所述第一隐变量输入到所述熵模型中，获取所述第一隐变量中每个元素的均值和方差，并根据所述每个元素的均值和方差模拟所述第一隐变量的正态分布，以获取概率分布函数；基于所述概率分布函数将所述第一隐变量进行算术编码，以获取目标比特流；基于所述概率分布函数将所述目标比特流进行算术解码，以获取第三隐变量；通过所述熵模型获取所述第三隐变量的量化残差损失，并基于所述第三隐变量和所述量化残差损失获取所述第二隐变量。Optionally, input the first hidden variable into the entropy model, obtain the mean value and variance of each element in the first hidden variable, and simulate the first hidden variable according to the mean value and variance of each element a normal distribution of hidden variables to obtain a probability distribution function; arithmetically encode the first hidden variable based on the probability distribution function to obtain a target bit stream; perform the target bit stream on the basis of the probability distribution function Arithmetic decoding to obtain a third hidden variable; obtaining a quantized residual loss of the third hidden variable through the entropy model, and obtaining the second hidden variable based on the third hidden variable and the quantized residual loss .

具体的，根据图4所示，本实施例先将待压缩的图像进行归一化处理，并将每张图像划分为多个图像块；然后将这些图像块按照一定的排列顺序得到图像块序列，并将图像块序列输入到基于Transformer 架构的目标编码器中进行训练，以获取第一隐变量

接着将第一隐变量

输入到基于卷积神经网络构建逐通道自回归熵模型中，以获取

中各元素对应的均值μ和方差σ，并根据并由均值μ和方差σ模拟每个元素的高斯分布，得到其概率分布函数，再由算术编码根据概率分布函数将量化后的第一隐变量

无损压缩为比特流，比特流为二进制字符串；最后，由算术解码根据概率分布函数将比特流解析为量化后的第三隐变量y，同时获取逐通道自回归熵模型预测隐变量y的量化残差损失r，由损失公式

可以获取第二隐变量

Specifically, as shown in FIG. 4 , in this embodiment, the image to be compressed is first normalized, and each image is divided into a plurality of image blocks; then these image blocks are arranged in a certain order to obtain an image block sequence , and input the sequence of image blocks into the target encoder based on the Transformer architecture for training to obtain the first latent variable

Then the first latent variable

Input to the channel-by-channel autoregressive entropy model based on the convolutional neural network to obtain

The mean value μ and variance σ corresponding to each element in , and the Gaussian distribution of each element is simulated by the mean value μ and variance σ to obtain its probability distribution function, and then the quantized first hidden variable is calculated by arithmetic coding according to the probability distribution function

Lossless compression into a bit stream, the bit stream is a binary string; finally, the bit stream is parsed into the quantized third hidden variable y by the arithmetic decoding according to the probability distribution function, and the quantization of the hidden variable y predicted by the channel-by-channel autoregressive entropy model is obtained at the same time Residual loss r, by the loss formula

The second hidden variable can be obtained

需要说明的是，第三隐变量是由第一隐变量经过无损算术编码和解码得到的，二者的值相同。It should be noted that the third hidden variable is obtained from the first hidden variable through lossless arithmetic coding and decoding, and the values of the two are the same.

本实施例提供了一种将编码器输出的隐变量输入到熵模型，并分别通过算术编码、解码技术获取新的隐变量的方法，弥补了量化后的隐变量损失的量化残差，能够减少图像的失真。This embodiment provides a method of inputting the hidden variables output by the encoder into the entropy model, and obtaining new hidden variables through arithmetic coding and decoding techniques respectively, which makes up for the quantized residuals lost by the quantized hidden variables, and can reduce Distortion of the image.

可选的，利用以下公式计算全局损失L：Optionally, use the following formula to calculate the global loss L:

L＝R+λDL=R+λD

具体的，λ为超参数，用于控制压缩的比特率大小以及压缩质量，以生成率失真曲线，R的计算公式为：

其中

是熵模型中的超隐变量，是一种先验信息，用于求取

的均值和方差；

表示在先验信息

条件下，

的正态分布概率值，

表示

的条件熵，

表示先验信息

的正态分布概率值，

表示先验信息

的信息熵，E_x～px[·]表示图像x在其正态分布px下的表达式内的期望值，x表示待压缩图像，

为所述压缩后的图像；D为失真项，表示所述压缩后的图像和所述待压缩图像之间的差异大小，

表示原图像 x和重建图像

之间的失真，常用评估标准为均方误差MSE；本实施例通过计算图像压缩模型的全局损失L来确定合适的超参数λ，并利用目标超参数λ来获取目标图像压缩模型。Specifically, λ is a hyperparameter, which is used to control the compressed bit rate and compression quality to generate a rate-distortion curve. The calculation formula of R is:

in

It is an ultra-hidden variable in the entropy model, and it is a kind of prior information, which is used to obtain

The mean and variance of ;

represented in the prior information

condition,

The normal distribution probability value of ,

express

The conditional entropy of

represent prior information

The normal distribution probability value of ,

represent prior information

The information entropy of , E _x～px [ ] represents the expected value of the image x in the expression under its normal distribution px, x represents the image to be compressed,

is the compressed image; D is a distortion item, representing the difference between the compressed image and the image to be compressed,

Represents the original image x and the reconstructed image

The common evaluation standard is the mean square error MSE; this embodiment determines the appropriate hyperparameter λ by calculating the global loss L of the image compression model, and uses the target hyperparameter λ to obtain the target image compression model.

本实施例所述方法提供了一种图像压缩模型的全局损失L的计算方法，通过减小L来确定所需的超参数λ，以获取不同比特率或者重建质量要求图像压缩模型。The method described in this embodiment provides a calculation method for the global loss L of the image compression model, and the required hyperparameter λ is determined by reducing L, so as to obtain image compression models with different bit rates or reconstruction quality requirements.

可选的，基于BP算法训练图像压缩模型，并调整所述比特流大小R和所述失真项D来减小全局损失l，以获取目标超参数；根据所述目标超参数训练所述图像压缩模型，以获取图像压缩模型。Optionally, train an image compression model based on the BP algorithm, and adjust the bit stream size R and the distortion item D to reduce the global loss l, so as to obtain target hyperparameters; train the image compression model according to the target hyperparameters model to get the image compression model.

具体的，本实施例采用反向传播算法和随机梯度下降法来减小预测整体误差l以训练图像压缩模型，经过多次迭代训练得到最终图像压缩模型；例如，本实施例设置的超参数λ取值为{0.0018,0.0035, 0.0067,0.0130,0.025,0.0483}，以得到适用于不同场景的多个图像压缩模型；对于不同的比特率或者重建质量要求，选择不同的图像压缩模型，对重建质量要求高的场景，选取较大的λ如0.0483，要求比特率较低的场景下，选取较低的λ如0.0018。Specifically, this embodiment uses the backpropagation algorithm and the stochastic gradient descent method to reduce the overall prediction error l to train the image compression model, and obtain the final image compression model through multiple iterations of training; for example, the hyperparameter λ set in this embodiment The values are {0.0018, 0.0035, 0.0067, 0.0130, 0.025, 0.0483} to obtain multiple image compression models suitable for different scenarios; for different bit rates or reconstruction quality requirements, different image compression models are selected, and the reconstruction quality For scenes with high requirements, choose a larger λ such as 0.0483, and for scenes that require a lower bit rate, select a lower λ such as 0.0018.

本实施例提供了一种利用BP训练算法确定超参数λ的方法，以满足不同比特率或者重建质量要求的图像压缩模型。This embodiment provides a method for determining a hyperparameter λ by using a BP training algorithm to meet image compression models with different bit rates or reconstruction quality requirements.

可选的，将所述待压缩图像进行归一化处理，并按照固定划分面积将处理后的图像均分为多个所述图像块。Optionally, the image to be compressed is subjected to normalization processing, and the processed image is equally divided into a plurality of image blocks according to a fixed division area.

具体的，本实施例将获取的100万张RGB彩色图像作为待压缩图像，将待压缩图像进行归一化处理后，获取尺寸大小相同的RGB 彩色图像，每张图像的维度为768×512×3，在长(768)和宽(512) 维度切分为单位大小为2的图像块，然后将这98304个图像块送入到目标编码器进行训练。Specifically, in this embodiment, the acquired 1 million RGB color images are used as images to be compressed, and after the images to be compressed are normalized, RGB color images of the same size are obtained, and the dimension of each image is 768×512× 3. Divide into image blocks with a unit size of 2 in the length (768) and width (512) dimensions, and then send these 98304 image blocks to the target encoder for training.

本实施例提供了一种数据预处理和图像块划分方法，为后续过程将图像块输入目标编码器来获取隐变量提供了方便。This embodiment provides a method for data preprocessing and image block division, which provides convenience for the subsequent process of inputting image blocks into a target encoder to obtain latent variables.

可选的，所述Transformer模块包括基于窗口的注意力层、多层感知机以及归一化层。Optionally, the Transformer module includes a window-based attention layer, a multilayer perceptron, and a normalization layer.

根据图5所示，Transformer模块是由基于窗口(Windows)的注意力层(W-MSA、SW-MSA)、多层感知机(MLP)以及归一化层(LN) 构成；其中，W-MSA和SW-MSA是成对使用的，假设第L层使用 W-MSA，那么第L+1层使用的就是SW-MSA。根据左右两幅图对比能够发现窗口发生了偏移，而偏移后的窗口使得之前的临近窗口之间进行交流，解决了不同窗口之间无法进行信息交流的问题，本实施例在所述Transformer模块中引入注意力机制，以构成基于窗口的注意力机制层。As shown in Figure 5, the Transformer module is composed of a window-based (Windows) attention layer (W-MSA, SW-MSA), a multi-layer perceptron (MLP) and a normalization layer (LN); where, W- MSA and SW-MSA are used in pairs. Assuming that layer L uses W-MSA, then layer L+1 uses SW-MSA. According to the comparison of the left and right pictures, it can be found that the window has been shifted, and the shifted window enables the communication between the previous adjacent windows, which solves the problem that information exchange cannot be carried out between different windows. In this embodiment, the Transformer The attention mechanism is introduced into the module to form a window-based attention mechanism layer.

本实施例所述方法提供了一种Transformer模块的组成结构，并在每个窗口内引入注意力机制，使其更关注输入图像局部的结构信息，即空间相邻元素间的相关性，以克服图像压缩中缺失语义信息的难点，提高了图像压缩的失真率性能。The method described in this embodiment provides a composition structure of the Transformer module, and introduces an attention mechanism in each window to make it pay more attention to the local structural information of the input image, that is, the correlation between spatially adjacent elements, to overcome The difficulty of missing semantic information in image compression improves the distortion rate performance of image compression.

结合图6对本发明实施例提供的一种图像压缩装置进行描述，下文描述的一种图像压缩装置与上文描述的一种图像压缩方法可相互对应参照。An image compression device provided by an embodiment of the present invention is described in conjunction with FIG. 6 . An image compression device described below and an image compression method described above may be referred to in correspondence.

本发明提供的一种图像压缩装置，所述装置包括：An image compression device provided by the present invention, said device comprising:

图像获取模块601，用于获取待压缩图像；解码模块602，用于基于预处理规则将所述待压缩图像划分为多个图像块，并将所有所述待压缩图像块输入到预存的目标编码器中，以获取第一隐变量；其中，所述目标编码器包括线性嵌入层模块、Transformer模块以及块合并模块；转换模块603，用于将所述第一隐变量输入到预存的熵模型中，以获取第二隐变量；解码模块604，用于将所述第二隐变量输入到预存的目标解码器中，以获取压缩后的图像块，并根据所述压缩后的图像块获取压缩后的图像；其中，所述目标解码器包括去嵌入层模块、所述Transformer模块以及块分裂模块。An image acquisition module 601, configured to acquire an image to be compressed; a decoding module 602, configured to divide the image to be compressed into multiple image blocks based on preprocessing rules, and input all the image blocks to be compressed into a prestored target code In the encoder, to obtain the first hidden variable; wherein, the target encoder includes a linear embedding layer module, a Transformer module and a block merging module; the conversion module 603 is used to input the first hidden variable into a pre-stored entropy model , to obtain the second latent variable; the decoding module 604 is used to input the second latent variable into the pre-stored target decoder to obtain the compressed image block, and obtain the compressed image block according to the compressed image block. image; wherein, the target decoder includes a de-embedding layer module, the Transformer module and a block splitting module.

本发明提供的一种图像压缩装置，先通过图像获取模块601获取待压缩图像；然后通过解码模块602基于预处理规则将所述待压缩图像划分为多个图像块，并将所有所述待压缩图像块输入到预存的目标编码器中，以获取第一隐变量；再通过转换模块603将所述第一隐变量输入到预存的熵模型中，以获取第二隐变量；最后通过解码模块 604将所述第二隐变量输入到预存的目标解码器中，以获取压缩后的图像块，并根据所述压缩后的图像块获取压缩后的图像；本发明所述装置在图像压缩任务中引入Transformer模块并采用对称处理架构进行图像的编码和解码，提高了图像压缩效率。An image compression device provided by the present invention first obtains the image to be compressed through the image acquisition module 601; then divides the image to be compressed into multiple image blocks through the decoding module 602 based on preprocessing rules, and divides all the images to be compressed The image block is input into the pre-stored target encoder to obtain the first hidden variable; then the first hidden variable is input into the pre-stored entropy model through the conversion module 603 to obtain the second hidden variable; finally through the decoding module 604 Input the second latent variable into the pre-stored target decoder to obtain the compressed image block, and obtain the compressed image according to the compressed image block; the device of the present invention introduces in the image compression task The Transformer module uses a symmetrical processing architecture to encode and decode images, which improves image compression efficiency.

图7示例了一种电子设备的实体结构示意图，如图7所示，该电子设备可以包括：处理器(processor)710、通信接口(Communications Interface)720、存储器(memory)730和通信总线740，其中，处理器710，通信接口720，存储器730通过通信总线740完成相互间的通信。处理器710可以调用存储器730中的逻辑指令，以执行一种图像压缩方法，该方法包括：获取待压缩图像；基于预处理规则将所述待压缩图像划分为多个图像块，并将所有所述待压缩图像块输入到预存的目标编码器中，以获取第一隐变量，其中，所述目标编码器包括线性嵌入层模块、Transformer模块以及块合并模块；将所述第一隐变量输入到预存的熵模型中，以获取第二隐变量；将所述第二隐变量输入到预存的目标解码器中，以获取压缩后的图像块，并根据所述压缩后的图像块获取压缩后的图像，其中，所述目标解码器包括去嵌入层模块、所述Transformer模块以及块分裂模块。FIG. 7 illustrates a schematic diagram of the physical structure of an electronic device. As shown in FIG. 7, the electronic device may include: a processor (processor) 710, a communication interface (Communications Interface) 720, a memory (memory) 730, and a communication bus 740, Wherein, the processor 710 , the communication interface 720 , and the memory 730 communicate with each other through the communication bus 740 . The processor 710 can invoke logic instructions in the memory 730 to execute an image compression method, the method comprising: acquiring an image to be compressed; dividing the image to be compressed into multiple image blocks based on preprocessing rules, and dividing all The image block to be compressed is input into a pre-stored target encoder to obtain a first hidden variable, wherein the target encoder includes a linear embedding layer module, a Transformer module and a block merging module; the first hidden variable is input into In the pre-stored entropy model, to obtain the second hidden variable; input the second hidden variable into the pre-stored target decoder to obtain the compressed image block, and obtain the compressed image block according to the compressed image block image, wherein the target decoder includes a de-embedding layer module, the Transformer module and a block splitting module.

此外，上述的存储器730中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logic instructions in the memory 730 may be implemented in the form of software functional units and when sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including several The instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes. .

本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现以执行上述各方法提供的一种图像压缩方法，该方法包括：获取待压缩图像；基于预处理规则将所述待压缩图像划分为多个图像块，并将所有所述待压缩图像块输入到预存的目标编码器中，以获取第一隐变量，其中，所述目标编码器包括线性嵌入层模块、Transformer模块以及块合并模块；将所述第一隐变量输入到预存的熵模型中，以获取第二隐变量；将所述第二隐变量输入到预存的目标解码器中，以获取压缩后的图像块，并根据所述压缩后的图像块获取压缩后的图像，其中，所述目标解码器包括去嵌入层模块、所述Transformer模块以及块分裂模块。The present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored. When the computer program is executed by a processor, the image compression method provided by the above-mentioned methods is implemented. The method includes: obtaining the Compressing an image; dividing the image to be compressed into a plurality of image blocks based on preprocessing rules, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first latent variable, wherein the target The encoder includes a linear embedding layer module, a Transformer module and a block merging module; the first hidden variable is input into a pre-stored entropy model to obtain a second hidden variable; the second hidden variable is input into a pre-stored target decoding A decoder to obtain compressed image blocks, and obtain compressed images according to the compressed image blocks, wherein the target decoder includes a de-embedding layer module, the Transformer module and a block splitting module.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without any creative efforts.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the above description of the implementations, those skilled in the art can clearly understand that each implementation can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware. Based on this understanding, the essence of the above technical solution or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, disk , CD, etc., including several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) execute the methods described in each embodiment or some parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present invention.

Claims

1. An image compression method, comprising:

acquiring an image to be compressed;

dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule, and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable, wherein the target encoder comprises a linear embedding layer module, a Transformer module and a block merging module; normalizing the image to be compressed, and equally dividing the processed image into a plurality of image blocks according to a fixed division area, wherein the image blocks have the same size;

inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable;

inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block, and obtaining a compressed image according to the compressed image block, wherein the target decoder comprises an embedding layer removing module, a transform module and a block splitting module;

after the compressed image block acquires a compressed image, the method further comprises:

the global loss L is calculated using the following formula:

L＝R+λD；

the method comprises the following steps that A is a hyper-parameter, and the A is used for obtaining a rate-distortion curve by controlling the bit rate and compression quality of compression;

r is the bit stream size obtained by compression, and the calculation formula of R is as follows:

wherein x is the image to be compressed,

in order to be a first hidden variable of said first type,

for a hyper-hidden variable in the entropy model,

for obtaining

Mean and variance of;

for prior information

Under the condition of

The probability value of the normal distribution of (c),

is composed of

The conditional entropy of (a) is,

as a priori information

The probability value of the normal distribution of (c),

as a priori information

Information entropy of (E) _x～px [·]Is the expected value of x within its expression under its normal distribution px;

d is a distortion term and is used for representing the difference between the compressed image and the image to be compressed, and the calculation formula of D is as follows:

wherein,

for the purpose of the compressed image, the image is,

denotes x and

distortion between;

and acquiring a target image compression model according to the global loss.

2. The image compression method according to claim 1, wherein inputting the first hidden variable into a pre-stored entropy model to obtain a second hidden variable specifically comprises:

inputting the first hidden variable into the entropy model, acquiring the mean value and the variance of each element in the first hidden variable, and simulating the normal distribution of the first hidden variable according to the mean value and the variance of each element to acquire a probability distribution function;

performing arithmetic coding on the first hidden variable based on the probability distribution function to obtain a target bit stream;

arithmetically decoding the target bit stream based on the probability distribution function to obtain a third hidden variable;

and obtaining the quantized residual loss of the third hidden variable through the entropy model, and obtaining the second hidden variable based on the third hidden variable and the quantized residual loss.

3. The image compression method of claim 1, wherein obtaining a target image compression model based on the global loss comprises:

training an image compression model based on a BP algorithm, and adjusting the bit stream size R and the distortion item D to reduce the global loss L so as to obtain a target hyper-parameter;

and training the image compression model according to the target hyper-parameter to obtain the image compression model.

4. The method of image compression of claim 1, wherein the transform module comprises a window-based attention layer, a multi-layer perceptron, and a normalization layer.

5. An image compression apparatus, characterized in that the apparatus comprises:

the image acquisition module is used for acquiring an image to be compressed;

the decoding module is used for dividing the image to be compressed into a plurality of image blocks based on a preprocessing rule and inputting all the image blocks to be compressed into a pre-stored target encoder to obtain a first hidden variable, wherein the target encoder comprises a linear embedding layer module, a Transformer module and a block merging module;

the decoding module is specifically configured to perform normalization processing on the image to be compressed, and equally divide the processed image into a plurality of image blocks according to a fixed division area, where the image blocks are the same in size;

the conversion module is used for inputting the first hidden variable into a pre-stored entropy model so as to obtain a second hidden variable;

the decoding module is used for inputting the second hidden variable into a pre-stored target decoder to obtain a compressed image block and obtaining a compressed image according to the compressed image block, wherein the target decoder comprises a de-embedding layer module, a Transformer module and a block splitting module;

the decoding module is further configured to, after the compressed image block obtains a compressed image, calculate a global loss L using the following formula:

L＝R+λD；

wherein,

for the hyper-hidden variables in the entropy model,

for obtaining

Mean and variance of;

for prior information

Under the condition of

The probability value of the normal distribution of (c),

is composed of

The conditional entropy of (a) is,

is a priori information

The probability value of the normal distribution of (b),

as a priori information

Information entropy of (E) _x～px [·]Is the expected value of the image x within its expression under the normal distribution px, x being the image to be compressed,

the compressed image is obtained;

wherein,

denotes x and

distortion between;

and acquiring a target image compression model according to the global loss.

6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the image compression method according to any of claims 1 to 4 are implemented when the processor executes the program.

7. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the image compression method according to any one of claims 1 to 4.