CN113393377B

CN113393377B - Single-frame image super-resolution method based on video coding

Info

Publication number: CN113393377B
Application number: CN202110541900.7A
Authority: CN
Inventors: 吴庆波; 李鹏飞; 李宏亮; 孟凡满; 许林峰; 潘力立
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-05-18
Filing date: 2021-05-18
Publication date: 2022-02-01
Anticipated expiration: 2041-05-18
Also published as: CN113393377A

Abstract

The invention discloses a single-frame image super-resolution method based on video coding, which utilizes prior information directly obtained in the video coding to perform targeted processing on subblocks in different parts of an image, utilizes a complex network to process subblocks with more complex textures, and simultaneously designs an adaptive convolution module to perform targeted processing on subblocks with different coding modes, so that the network is more targeted, different detailed information is restored aiming at different textures, and the precision of a super-resolution result is improved. The invention shares the parameters of the network with few channels into the network with deep channels, namely, the super-resolution process of the whole picture is realized by using different layers of a main network, the network with relatively simple use, shallow layers and few channels is used for processing relatively large subblocks with smoother texture, and the time required by the super-resolution process is reduced.

Description

A single-frame image super-resolution method based on video coding

技术领域technical field

本发明涉及图像处理技术领域，具体涉及一种基于视频编码的单帧图像超分辨率方法。The invention relates to the technical field of image processing, in particular to a single-frame image super-resolution method based on video coding.

背景技术Background technique

图像超分辨率是将输入的低分辨率的视觉图像转化为高分辨率的视觉图像的过程。最近超分辨率工作的一个重要的关注点是提出各式各样的对推理过程进行加速的网络。其中一个分支是利用更少的参数，更快的速度实现高效的超分辨率工作。例如早期的FSRCNN，直接将输入图像进行特征提取，随后特征图经过一个上采样网络完成超分辨率图像的构建。又例如最近的工作CARN是利用了分组卷积技术设计了一个残差网络，以实现对输入图片的快速处理。另一个分支是增大网络模型的复杂度，增加模型分支数目，通过对不同种类的输入进行单独训练，如ClassSR。Image super-resolution is the process of converting an input low-resolution visual image into a high-resolution visual image. An important focus of recent super-resolution work has been to propose various networks to accelerate the inference process. One of the branches is to achieve efficient super-resolution work faster with fewer parameters. For example, the early FSRCNN directly extracts features from the input image, and then the feature map passes through an upsampling network to complete the construction of super-resolution images. Another example is the recent work CARN, which uses grouped convolution technology to design a residual network to achieve fast processing of input images. Another branch is to increase the complexity of the network model, increase the number of model branches, and train separately on different kinds of inputs, such as ClassSR.

ClassSR通过对不同复杂程度的低分辨率输入图像采用不同复杂度的神经网络进行训练和推理。由于图像的大部分区域只需要通过计算量相对小的网络，这种方法在一定程度上提升了网络推理阶段的运行速度。具体来说该方法是将图片分割成32×32像素的小块。通过一个预先训练好的分类网络，依据小块图像的纹理复杂程度将其分为三类：简单图片，中等图片，困难图片。不同类别的图片对应不同通道数目的主干网络。ClassSR trains and infers by employing neural networks of different complexity on low-resolution input images of different complexity. Since most areas of the image only need to pass through the network with a relatively small amount of computation, this method improves the running speed of the network inference stage to a certain extent. Specifically, the method divides the image into small blocks of 32×32 pixels. Through a pre-trained classification network, small images are classified into three categories according to their texture complexity: simple images, medium images, and difficult images. Pictures of different categories correspond to backbone networks with different numbers of channels.

在传统的超分辨率网络中，都是对整张图片直接提取特征图，这样的结构使得网络没有办法很好的学习每个区域不同的特征，应用相同的卷积核处理不同的区域使得恢复出来的图像纹理细节与真实图像不符。并且由于图像不同区域的纹理细节复杂度不同，对低细节区域的复杂处理往往会没有必要的增加网络的计算量。但是如ClassSR提出的先分类后经过三个参数不共享的神经网络会使得在训练的时候花费大量的时间和计算力，增大了网络的复杂度。在上述提到的缺点之外，如今的超分辨率方法大部分都忽略了图像原本便的先验信息对于图像超分辨率过程的帮助。因此亟需一种网络的计算量小，恢复出来的图像纹理细节与真实图像符合精度提高的超分辨率方法。In the traditional super-resolution network, the feature map is directly extracted from the entire image. This structure makes the network unable to learn the different features of each region well. Applying the same convolution kernel to process different regions makes the recovery The texture details of the resulting image do not match the real image. And because the complexity of texture details in different areas of the image is different, the complex processing of low-detail areas often increases the computational complexity of the network unnecessarily. However, as proposed by ClassSR, the neural network that first classifies and then passes through three parameters that are not shared will spend a lot of time and computing power during training, increasing the complexity of the network. In addition to the shortcomings mentioned above, most of today's super-resolution methods ignore the help of the image's original prior information for the image super-resolution process. Therefore, there is an urgent need for a super-resolution method with a small amount of network computation, and the recovered image texture details are in line with the real image and improved in accuracy.

发明内容SUMMARY OF THE INVENTION

为解决现有技术中存在的问题，本发明提供了一种基于视频编码的单帧图像超分辨率方法，解决了上述背景技术中提到的问题。In order to solve the problems existing in the prior art, the present invention provides a single-frame image super-resolution method based on video coding, which solves the problems mentioned in the above background art.

为实现上述目的，本发明提供如下技术方案：一种基于视频编码的单帧图像超分辨率方法，包括以下步骤：To achieve the above object, the present invention provides the following technical solutions: a single-frame image super-resolution method based on video coding, comprising the following steps:

S1、利用视频编码每一帧图像的先验信息，将视频中低分辨率图像I_LR按照H.265视频编码信息分为对应的4×4像素、8×8像素、16×16像素和32×32像素的子块，对于4×4和8×8像素的子块

可以得到其对应的编码预测模式M_pre，依据不同的编码模式生成对应的高斯分布的模型G_m；S1. Using the prior information of each frame of video coding, the low-resolution image I _LR in the video is divided into corresponding 4 × 4 pixels, 8 × 8 pixels, 16 × 16 pixels and 32 pixels according to the H.265 video coding information. × 32 pixel sub-block, for 4 × 4 and 8 × 8 pixel sub-block

The corresponding coding prediction mode M _pre can be obtained, and the corresponding Gaussian distribution model G _m is generated according to different coding modes;

S2、利用16×16以及32×32像素的子块

对通道自适应主干网络CAB进行训练，将CAB中的每一个卷积块分为conv1和conv2两层通道,在每一次迭代中，仅使用conv1的参数进行前向和反向传播，不使用conv2的参数，通过最小化感知损失

和mse损失

得到最终超分辨输出I_SR S2. Utilize sub-blocks of 16×16 and 32×32 pixels

The channel adaptive backbone network CAB is trained, and each convolution block in the CAB is divided into conv1 and conv2 two-layer channels. In each iteration, only the parameters of conv1 are used for forward and backward propagation, and conv2 is not used. parameters, by minimizing the perceptual loss

and mse loss

Get the final super-resolution output I _SR

S3、利用4×4以及8×8像素的子块

对通道自适应主干网络CAB进行训练，此时使用conv1和conv2的参数进行前向传播，conv1已经在

的训练中学习到了平滑信息的特征提取方式，在反向传播的时候固定conv1的参数，仅更新conv2的参数，通过最小化感知损失

和mse损失

得到最终超分辨率输出I_SR S3. Use 4×4 and 8×8 pixel sub-blocks

The channel adaptive backbone network CAB is trained. At this time, the parameters of conv1 and conv2 are used for forward propagation, and conv1 is already in the

The feature extraction method of smoothing information is learned in the training of . The parameters of conv1 are fixed during backpropagation, and only the parameters of conv2 are updated, by minimizing the perceptual loss.

and mse loss

Get the final super-resolution output I _SR

S4、步骤S2和步骤S3训练完成后，对整个网络进行训练，训练时固定通道自适应主干网络CAB的参数，利用最小化感知损失

和mse损失

进行训练，对剩下的网络参数进行更新，训练

对应分支的特征提取模块，初步提取出

的特征

S4. After the training of steps S2 and S3 is completed, the entire network is trained. During training, the parameters of the channel adaptive backbone network CAB are fixed, and the use of minimizing the perceptual loss

and mse loss

Perform training, update the remaining network parameters, train

The feature extraction module of the corresponding branch is initially extracted

Characteristics

S5、在对4×4以及8×8像素的子块

所对应的分支网络CAB进行训练时，将

按数字编号i(i＝0,1,2…15)的相对顺序输入到网络中，将每个子块记为

与其相同大小且相邻的四个子块记为

其中，

中的i代表了数字编号的数值；S5, in sub-blocks of 4×4 and 8×8 pixels

When the corresponding branch network CAB is trained, the

Input into the network in relative order of numerical numbers i (i = 0, 1, 2...15), and denote each sub-block as

The four adjacent sub-blocks of the same size are denoted as

in,

The i in represents the value of the number number;

S6、对步骤S1中生成的高斯模型以(0,0)为中心进行宽高等间距采样

得到与卷积块宽高相同的矩阵，将

与自适应卷积模块ACB中的卷积层Conv进行点乘操作，进行加权，表达式为：S6. Sampling the Gaussian model generated in step S1 with (0, 0) as the center

To get a matrix with the same width and height as the convolution block, set

Do point multiplication with the convolutional layer Conv in the adaptive convolution module ACB, and weight it, the expression is:

用点乘之后的卷积核再对输入图像

进行普通卷积运算，经过ACB模块后得到更加专注于图像纹理特征的特征图

Use the convolution kernel after dot multiplication to apply the input image

Perform ordinary convolution operations, and get a feature map that is more focused on image texture features after the ACB module

S7、在每四张相邻子块经过自适应纹理处理模块后，对这四张子块按照其在原图中的位置进行拼接，再将其传递到主干网络，得到一张宽高为单张子块的两倍的特征图

以矩阵的形式表示为：S7. After every four adjacent sub-blocks pass through the adaptive texture processing module, splicing the four sub-blocks according to their positions in the original image, and then transmitting them to the backbone network to obtain a single sub-block with a width and height of twice the feature map of

Represented in matrix form as:

S8、对网络利用最小化L_total进行进一步细微调整即完成图片的超分辨率过程。S8, further fine-tuning the network utilization to minimize L _total to complete the image super-resolution process.

优选的，所述步骤S1的编码预测模式M_pre包括DC预测模式、平面预测模式和角度预测模式。Preferably, the coding prediction mode M _pre of the step S1 includes a DC prediction mode, a plane prediction mode and an angle prediction mode.

优选的，所述的通过编码预测模式M_pre对G_m的协方差矩阵C进行控制，Preferably, the covariance matrix C of G _m is controlled by the coding prediction mode M _pre ,

G_m＝Guss(C,θ|M_pre)G _m =Guss(C,θ|M _pre )

通过调整协方差矩阵，使生成的高斯模型的极大值处与模式纹理角度相吻合，自适应的专注于图像纹理特征，其中，将M_pre为DC模式或平面模式时，设置为拥有单位协方差矩阵的高斯模型，将M_pre为角度模式且角度为θ的子块设置初始协方差矩阵C，并对其做θ角度变换后得到的结果，表示为：By adjusting the covariance matrix, the maximum value of the generated Gaussian model is consistent with the pattern texture angle, and adaptively focuses on the image texture features. When M _pre is DC mode or plane mode, it is set to have unit covariance The Gaussian model of the variance matrix, the initial covariance matrix C is set to the sub-block with M _pre as the angle mode and the angle is θ, and the result obtained after the θ angle transformation is performed, which is expressed as:

G_m＝A(θ)CA(θ)^T G _m =A(θ)CA(θ) ^T

其中A(θ)为二维旋转矩阵

A(θ)^T表示矩阵A(θ)的转置。where A(θ) is a two-dimensional rotation matrix

A(θ) ^T represents the transpose of matrix A(θ).

优选的，所述步骤S8中的细微调整具体包括：Preferably, the fine adjustment in step S8 specifically includes:

使用mse损失

来最小化输入的低分辨率图像和真实的高分辨率图像之间的差距

其中,N代表了像素个数，

代表了不同分支的输出，将其分别与相对应分支的真实图像

进行计算,在损失函数中加入感知损失项，使得生成的图片经过CNN网络的特征值与目标图片经过CNN网络的特征值的L2距离尽可能的小，这使得待生成的图片与目标图片在语义上更加相似，

其中f表示CNN网络，CNN网络具体为VGG-16网络。use mse loss

to minimize the gap between the input low-resolution image and the real high-resolution image

Among them, N represents the number of pixels,

represent the outputs of different branches, and compare them with the real images of the corresponding branches

Perform calculation, add the perceptual loss term to the loss function, so that the L2 distance between the eigenvalues of the generated image passing through the CNN network and the eigenvalues of the target image passing through the CNN network is as small as possible, which makes the image to be generated and the target image semantically indistinguishable. more similar to

Among them, f represents the CNN network, and the CNN network is specifically the VGG-16 network.

对4×4、8×8子块使用更大的损失权重值ω₂,对更大的平滑子块16×16、32×32使用更小的权重值ω₁，Use a larger loss weight value ω ₂ for the 4×4 and 8×8 sub-blocks, and use a smaller weight value ω ₁ for the larger smoothing sub-blocks 16×16 and 32×32,

损失函数L_total表示为：The loss function L _total is expressed as:

其中ω₁为0.5，ω₂为1。where _ω1 is 0.5 and _ω2 is 1.

本发明的有益效果是：The beneficial effects of the present invention are:

1)本发明利用视频编码中可以直接得到的先验信息，对图像的不同部分子块进行针对性的处理，利用复杂的网络处理纹理更复杂的子块，同时设计一个自适应卷积模块对不同编码模式的子块进行针对处理，使网络更有针对性，针对不同的纹理恢复出不同的细节信息，从而提高超分辨率结果的精度。1) The present invention uses the prior information that can be directly obtained in video coding to carry out targeted processing on different sub-blocks of the image, uses complex networks to process sub-blocks with more complex textures, and simultaneously designs an adaptive convolution module to The sub-blocks of different coding modes are targeted for processing, so that the network is more targeted, and different details are recovered for different textures, thereby improving the accuracy of super-resolution results.

2)本发明将少通道的网络的参数共享到深通道的网络中，即达到用一个主干网络的不同层数实现一整张图片的超分辨率过程，使用相对简单，浅层，少通道的网络处理相对大的、纹理更为平滑的子块，减少超分辨过程所需要的时间。2) The present invention shares the parameters of the network with few channels to the network with deep channels, that is to achieve the super-resolution process of realizing a whole picture with different layers of a backbone network, and it is relatively simple to use, shallow layer, few channels. The network processes relatively large, smoother textured sub-blocks, reducing the time required for the super-resolution process.

附图说明Description of drawings

图1为本发明实施例网络结构示意图；1 is a schematic diagram of a network structure according to an embodiment of the present invention;

图2为本发明实施例网络自适应纹理处理模块示意图；2 is a schematic diagram of a network adaptive texture processing module according to an embodiment of the present invention;

图3为本发明实施例4×4、8×8像素子块训练输入顺序示意图。FIG. 3 is a schematic diagram of a training input sequence of 4×4 and 8×8 pixel sub-blocks according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

请参阅图1-3，本发明提供一种技术方案：一种基于视频编码的单帧图像超分辨率方法，网络结构如图1所示，包括以下步骤：Please refer to Fig. 1-3, the present invention provides a technical solution: a single-frame image super-resolution method based on video coding, the network structure is shown in Fig. 1, including the following steps:

可以得到其对应的编码预测模式M_pre，编码预测模式M_pre包括DC预测模式、平面预测模式和角度预测模式，依据不同的编码模式生成对应的高斯分布的模型G_m。S1. Using the prior information of each frame of video coding, the low-resolution image I _LR in the video is divided into corresponding 4 × 4 pixels, 8 × 8 pixels, 16 × 16 pixels and 32 pixels according to the H.265 video coding information. × 32 pixel sub-block, for 4 × 4 and 8 × 8 pixel sub-block

The corresponding coding prediction mode M _pre can be obtained. The coding prediction mode M _pre includes a DC prediction mode, a plane prediction mode and an angle prediction mode, and a corresponding Gaussian distribution model G _m is generated according to different coding modes.

其中，通过编码预测模式M_pre对G_m的协方差矩阵C进行控制，Among them, the covariance matrix C of G _m is controlled by the coding prediction mode M _pre ,

G_m＝Guss(C,θ|M_pre)G _m =Guss(C,θ|M _pre )

通过调整协方差矩阵，使得生成的高斯模型的极大值处与模式纹理角度相吻合，以自适应的专注于图像纹理特征，其中，将M_pre为DC模式或平面模式时，设置为拥有单位协方差矩阵的高斯模型，将M_pre为角度模式且角度为θ的子块设置初始协方差矩阵C，并对其做θ角度变换后得到的结果，表示为：By adjusting the covariance matrix, the maximum value of the generated Gaussian model is consistent with the pattern texture angle, so as to adaptively focus on the image texture features. When M _pre is DC mode or plane mode, it is set to have the unit For the Gaussian model of the covariance matrix, the initial covariance matrix C is set to the sub-block with M _pre as the angle mode and the angle is θ, and the result obtained after performing the θ angle transformation on it is expressed as:

G_m＝A(θ)CA(θ)^T G _m =A(θ)CA(θ) ^T

其中A(θ)为二维旋转矩阵

A(θ) ^T represents the transpose of matrix A(θ).

S2、利用16×16以及32×32像素的子块

对图1中通道自适应主干网络(ChannelAdaptive Backbone,CAB)进行训练，注意此时为了高效处理不同类型的输入，将CAB中的每一个卷积块分为conv1和conv2两层通道,在每一次迭代中，仅使用conv1的参数进行前向和反向传播，并不使用conv2的参数，通过最小化感知损失

和mse损失

Train the Channel Adaptive Backbone (CAB) in Figure 1. Note that in order to efficiently process different types of input, each convolution block in the CAB is divided into two layers of channels, conv1 and conv2. In the iteration, only the parameters of conv1 are used for forward and backward propagation, and the parameters of conv2 are not used, by minimizing the perceptual loss

and mse loss

Get the final super-resolution output I _SR

S3、利用4×4以及8×8像素的子块

对图1中通道自适应主干网络CAB进行训练，注意因为复杂的纹理信息需要使用到更多的网络参数进行处理，此时使用conv1和conv2的参数进行前向传播，因为conv1已经在

的训练中以及学习到了平滑信息的特征提取方式，在反向传播的时候固定conv1的参数，仅更新conv2的参数，依旧通过最小化感知损失

和mse损失

得到最终超分辨率输出I_SR S3. Use 4×4 and 8×8 pixel sub-blocks

Train the channel adaptive backbone network CAB in Figure 1. Note that more network parameters need to be used for processing complex texture information. At this time, the parameters of conv1 and conv2 are used for forward propagation, because conv1 is already in the

In the training of , and the feature extraction method of smooth information is learned, the parameters of conv1 are fixed during backpropagation, and only the parameters of conv2 are updated, still by minimizing the perceptual loss.

and mse loss

Get the final super-resolution output I _SR

S4、步骤S2和步骤S3训练完成后，对整个网络进行训练，训练时固定通道自适应主干网络CAB的参数，仅利用最小化感知损失

和mse损失

进行训练，对剩下的网络参数进行更新，首先训练

对应分支的特征提取模块，初步提取出

的特征

S4. After the training of step S2 and step S3 is completed, the entire network is trained. During training, the parameters of the channel adaptive backbone network CAB are fixed, and only the minimum perceptual loss is used.

and mse loss

Perform training, update the remaining network parameters, first train

Characteristics

S5、在对4×4以及8×8像素的子块

所对应的分支网络CAB进行训练时，将

与其相同大小且相邻的四个子块记为

其中，

中的i代表了数字编号的数值(如图3中所示，在输入i＝5的子块时，其相邻子块为i＝6,7,8)；S5, in sub-blocks of 4×4 and 8×8 pixels

When the corresponding branch network CAB is trained, the

The four adjacent sub-blocks of the same size are denoted as

in,

i in represents the numerical value of the number number (as shown in Figure 3, when the sub-block of i=5 is input, its adjacent sub-blocks are i=6, 7, 8);

得到与卷积块宽高相同的矩阵，将

与自适应卷积模块ACB(如图2所示)中的卷积层Conv进行点乘操作，进行加权，表达式为：S6. Sampling the Gaussian model generated in step S1 with (0, 0) as the center

To get a matrix with the same width and height as the convolution block, set

Do a point multiplication operation with the convolutional layer Conv in the adaptive convolution module ACB (as shown in Figure 2), and perform weighting, the expression is:

用点乘之后的卷积核再对输入图像

Use the convolution kernel after dot multiplication to apply the input image

Represented in matrix form as:

S8、为了更关注于细节信息，对网络利用最小化L_total进行进一步细微调整即完成图片的超分辨率过程。S8. In order to pay more attention to the detailed information, the network utilization minimizes L _total to make further fine adjustments to complete the super-resolution process of the picture.

在上述的训练过程中，使用mse损失

其中,N代表了像素个数，

代表了不同分支的输出，将其分别与相对应分支的真实图像

进行计算,但是由于mse损失进行逐像素的损失往往与真实视觉感受有出入，因此我们在损失函数中加入感知损失项，使得生成的图片经过CNN网络的特征值与目标图片经过CNN网络的特征值的L2距离尽可能的小，这使得待生成的图片与目标图片在语义上更加相似(相对于Pixel级别的损失函数)，

In the above training process, the mse loss is used

Among them, N represents the number of pixels,

However, since the pixel-by-pixel loss of the mse loss is often different from the real visual experience, we add the perceptual loss term to the loss function, so that the eigenvalues of the generated image passing through the CNN network and the eigenvalues of the target image passing through the CNN network The L2 distance is as small as possible, which makes the image to be generated and the target image more semantically similar (relative to the Pixel-level loss function),

这里，f所代表的CNN网络我们选为VGG-16。Here, the CNN network represented by f is selected as VGG-16.

此外，由于图像的超分辨率质量更体现在细节处的表现，因此我们更加关注纹理复杂部分，即4×4，8×8子块的重建效果，我们也因此对于这两种子块给予更大的损失权重值ω₂,对更大的平滑子块16×16、32×32使用更小的权重值ω₁，In addition, since the super-resolution quality of the image is more reflected in the performance of the details, we pay more attention to the complex parts of the texture, that is, the reconstruction effect of the 4×4 and 8×8 sub-blocks. The loss weight value ω ₂ of the larger smoothing sub-blocks 16×16 and 32×32 uses a smaller weight value ω ₁ ,

因此，损失函数L_total表示为：Therefore, the loss function L _total is expressed as:

其中ω₁为0.5，ω₂为1。where _ω1 is 0.5 and _ω2 is 1.

本发明利用视频编码中可以直接得到的先验信息，对图像的不同部分子块进行针对性的处理，利用复杂的网络处理纹理更复杂的子块，同时设计一个自适应卷积模块对不同编码模式的子块进行针对处理，使网络更有针对性，针对不同的纹理恢复出不同的细节信息，从而提高超分辨率结果的精度。本发明将少通道的网络的参数共享到深通道的网络中，即达到用一个主干网络的不同层数实现一整张图片的超分辨率过程，使用相对简单，浅层，少通道的网络处理相对大的、纹理更为平滑的子块，减少超分辨过程所需要的时间。The present invention uses the prior information that can be directly obtained in video coding to carry out targeted processing on different sub-blocks of the image, uses a complex network to process sub-blocks with more complex textures, and simultaneously designs an adaptive convolution module for different coding. The sub-blocks of the pattern are targeted for processing, making the network more targeted and recovering different details for different textures, thereby improving the accuracy of the super-resolution results. The present invention shares the parameters of the network with few channels to the network with deep channels, that is to achieve the super-resolution process of a whole picture with different layers of a backbone network, and uses relatively simple, shallow, and few-channel network processing. Relatively large, smoother textured sub-blocks reduce the time required for the super-resolution process.

尽管参照前述实施例对本发明进行了详细的说明，对于本领域的技术人员来说，其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换,凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。Although the present invention has been described in detail with reference to the foregoing embodiments, for those skilled in the art, it is still possible to modify the technical solutions described in the foregoing embodiments, or to perform equivalent replacements for some of the technical features. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims

1. a single-frame image super-resolution method based on video coding, is characterized in that, comprises the following steps:

S1. Using the prior information of each frame of video coding, the low-resolution image I _LR in the video is divided into corresponding 4 × 4 pixels, 8 × 8 pixels, 16 × 16 pixels and 32 pixels according to the H.265 video coding information. × 32 pixel sub-block, for 4 × 4 and 8 × 8 pixel sub-block

S2. Utilize sub-blocks of 16×16 and 32×32 pixels

and mse loss

Get the final super-resolution output I _SR

S3. Use 4×4 and 8×8 pixel sub-blocks

and mse loss

Get the final super-resolution output I _SR

and mse loss

Perform training, update the remaining network parameters, train

Characteristics

S5, in sub-blocks of 4×4 and 8×8 pixels

When the corresponding branch network CAB is trained, the

Input into the network in the relative order of number i, i=0,1,2...15, denote each sub-block as

The four adjacent sub-blocks of the same size are denoted as

in,

The i in represents the value of the number number;

S6. Sampling the Gaussian model generated in step S1 with (0, 0) as the center

To get a matrix with the same width and height as the convolution block, set

Use the convolution kernel after dot multiplication to apply the input image

S7. After every four adjacent sub-blocks pass through the adaptive texture processing module, splicing the four sub-blocks according to their positions in the original image, and then transmitting them to the backbone network to obtain a single sub-block with a width and height of twice the feature map of

Represented in matrix form as:

S8, further fine-tuning the network utilization to minimize L _total to complete the image super-resolution process.

2 . The single-frame image super-resolution method based on video coding according to claim 1 , wherein the coding prediction mode M _pre in step S1 includes DC prediction mode, plane prediction mode and angle prediction mode. 3 .

3. the single-frame image super-resolution method based on video coding according to claim 1, is characterized in that: the described covariance matrix C of G _m is controlled by coding prediction mode M _pre ,

G _m =Guss(C,θ|M _pre )

By adjusting the covariance matrix, the maximum value of the generated Gaussian model is consistent with the pattern texture angle, and adaptively focuses on the image texture features. When M _pre is DC mode or plane mode, it is set to have unit covariance The Gaussian model of the variance matrix, the initial covariance matrix C is set to the sub-block with M _pre as the angle mode and the angle is θ, and the result obtained after the θ angle transformation is performed, which is expressed as:

G _m =A(θ)CA(θ) ^T

where A(θ) is a two-dimensional rotation matrix

A(θ) ^T represents the transpose of matrix A(θ).

4. The single-frame image super-resolution method based on video coding according to claim 1, wherein the fine adjustment in the step S8 specifically comprises:

use mse loss

Among them, N represents the number of pixels,

Among them, f represents the CNN network, and the CNN network is specifically the VGG-16 network;

Use a larger loss weight value ω ₂ for the 4×4 and 8×8 sub-blocks, and use a smaller weight value ω ₁ for the larger smoothing sub-blocks 16×16 and 32×32,

The loss function L _total is expressed as:

where _ω1 is 0.5 and _ω2 is 1.