CN113393377B - Single-frame image super-resolution method based on video coding - Google Patents
Single-frame image super-resolution method based on video coding Download PDFInfo
- Publication number
- CN113393377B CN113393377B CN202110541900.7A CN202110541900A CN113393377B CN 113393377 B CN113393377 B CN 113393377B CN 202110541900 A CN202110541900 A CN 202110541900A CN 113393377 B CN113393377 B CN 113393377B
- Authority
- CN
- China
- Prior art keywords
- network
- image
- sub
- resolution
- super
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/40—Analysis of texture
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
技术领域technical field
本发明涉及图像处理技术领域,具体涉及一种基于视频编码的单帧图像超分辨率方法。The invention relates to the technical field of image processing, in particular to a single-frame image super-resolution method based on video coding.
背景技术Background technique
图像超分辨率是将输入的低分辨率的视觉图像转化为高分辨率的视觉图像的过程。最近超分辨率工作的一个重要的关注点是提出各式各样的对推理过程进行加速的网络。其中一个分支是利用更少的参数,更快的速度实现高效的超分辨率工作。例如早期的FSRCNN,直接将输入图像进行特征提取,随后特征图经过一个上采样网络完成超分辨率图像的构建。又例如最近的工作CARN是利用了分组卷积技术设计了一个残差网络,以实现对输入图片的快速处理。另一个分支是增大网络模型的复杂度,增加模型分支数目,通过对不同种类的输入进行单独训练,如ClassSR。Image super-resolution is the process of converting an input low-resolution visual image into a high-resolution visual image. An important focus of recent super-resolution work has been to propose various networks to accelerate the inference process. One of the branches is to achieve efficient super-resolution work faster with fewer parameters. For example, the early FSRCNN directly extracts features from the input image, and then the feature map passes through an upsampling network to complete the construction of super-resolution images. Another example is the recent work CARN, which uses grouped convolution technology to design a residual network to achieve fast processing of input images. Another branch is to increase the complexity of the network model, increase the number of model branches, and train separately on different kinds of inputs, such as ClassSR.
ClassSR通过对不同复杂程度的低分辨率输入图像采用不同复杂度的神经网络进行训练和推理。由于图像的大部分区域只需要通过计算量相对小的网络,这种方法在一定程度上提升了网络推理阶段的运行速度。具体来说该方法是将图片分割成32×32像素的小块。通过一个预先训练好的分类网络,依据小块图像的纹理复杂程度将其分为三类:简单图片,中等图片,困难图片。不同类别的图片对应不同通道数目的主干网络。ClassSR trains and infers by employing neural networks of different complexity on low-resolution input images of different complexity. Since most areas of the image only need to pass through the network with a relatively small amount of computation, this method improves the running speed of the network inference stage to a certain extent. Specifically, the method divides the image into small blocks of 32×32 pixels. Through a pre-trained classification network, small images are classified into three categories according to their texture complexity: simple images, medium images, and difficult images. Pictures of different categories correspond to backbone networks with different numbers of channels.
在传统的超分辨率网络中,都是对整张图片直接提取特征图,这样的结构使得网络没有办法很好的学习每个区域不同的特征,应用相同的卷积核处理不同的区域使得恢复出来的图像纹理细节与真实图像不符。并且由于图像不同区域的纹理细节复杂度不同,对低细节区域的复杂处理往往会没有必要的增加网络的计算量。但是如ClassSR提出的先分类后经过三个参数不共享的神经网络会使得在训练的时候花费大量的时间和计算力,增大了网络的复杂度。在上述提到的缺点之外,如今的超分辨率方法大部分都忽略了图像原本便的先验信息对于图像超分辨率过程的帮助。因此亟需一种网络的计算量小,恢复出来的图像纹理细节与真实图像符合精度提高的超分辨率方法。In the traditional super-resolution network, the feature map is directly extracted from the entire image. This structure makes the network unable to learn the different features of each region well. Applying the same convolution kernel to process different regions makes the recovery The texture details of the resulting image do not match the real image. And because the complexity of texture details in different areas of the image is different, the complex processing of low-detail areas often increases the computational complexity of the network unnecessarily. However, as proposed by ClassSR, the neural network that first classifies and then passes through three parameters that are not shared will spend a lot of time and computing power during training, increasing the complexity of the network. In addition to the shortcomings mentioned above, most of today's super-resolution methods ignore the help of the image's original prior information for the image super-resolution process. Therefore, there is an urgent need for a super-resolution method with a small amount of network computation, and the recovered image texture details are in line with the real image and improved in accuracy.
发明内容SUMMARY OF THE INVENTION
为解决现有技术中存在的问题,本发明提供了一种基于视频编码的单帧图像超分辨率方法,解决了上述背景技术中提到的问题。In order to solve the problems existing in the prior art, the present invention provides a single-frame image super-resolution method based on video coding, which solves the problems mentioned in the above background art.
为实现上述目的,本发明提供如下技术方案:一种基于视频编码的单帧图像超分辨率方法,包括以下步骤:To achieve the above object, the present invention provides the following technical solutions: a single-frame image super-resolution method based on video coding, comprising the following steps:
S1、利用视频编码每一帧图像的先验信息,将视频中低分辨率图像ILR按照H.265视频编码信息分为对应的4×4像素、8×8像素、16×16像素和32×32像素的子块,对于4×4和8×8像素的子块可以得到其对应的编码预测模式Mpre,依据不同的编码模式生成对应的高斯分布的模型Gm;S1. Using the prior information of each frame of video coding, the low-resolution image I LR in the video is divided into corresponding 4 × 4 pixels, 8 × 8 pixels, 16 × 16 pixels and 32 pixels according to the H.265 video coding information. × 32 pixel sub-block, for 4 × 4 and 8 × 8 pixel sub-block The corresponding coding prediction mode M pre can be obtained, and the corresponding Gaussian distribution model G m is generated according to different coding modes;
S2、利用16×16以及32×32像素的子块对通道自适应主干网络CAB进行训练,将CAB中的每一个卷积块分为conv1和conv2两层通道,在每一次迭代中,仅使用conv1的参数进行前向和反向传播,不使用conv2的参数,通过最小化感知损失和mse损失得到最终超分辨输出ISR S2. Utilize sub-blocks of 16×16 and 32×32 pixels The channel adaptive backbone network CAB is trained, and each convolution block in the CAB is divided into conv1 and conv2 two-layer channels. In each iteration, only the parameters of conv1 are used for forward and backward propagation, and conv2 is not used. parameters, by minimizing the perceptual loss and mse loss Get the final super-resolution output I SR
S3、利用4×4以及8×8像素的子块对通道自适应主干网络CAB进行训练,此时使用conv1和conv2的参数进行前向传播,conv1已经在的训练中学习到了平滑信息的特征提取方式,在反向传播的时候固定conv1的参数,仅更新conv2的参数,通过最小化感知损失和mse损失得到最终超分辨率输出ISR S3. Use 4×4 and 8×8 pixel sub-blocks The channel adaptive backbone network CAB is trained. At this time, the parameters of conv1 and conv2 are used for forward propagation, and conv1 is already in the The feature extraction method of smoothing information is learned in the training of . The parameters of conv1 are fixed during backpropagation, and only the parameters of conv2 are updated, by minimizing the perceptual loss. and mse loss Get the final super-resolution output I SR
S4、步骤S2和步骤S3训练完成后,对整个网络进行训练,训练时固定通道自适应主干网络CAB的参数,利用最小化感知损失和mse损失进行训练,对剩下的网络参数进行更新,训练对应分支的特征提取模块,初步提取出的特征 S4. After the training of steps S2 and S3 is completed, the entire network is trained. During training, the parameters of the channel adaptive backbone network CAB are fixed, and the use of minimizing the perceptual loss and mse loss Perform training, update the remaining network parameters, train The feature extraction module of the corresponding branch is initially extracted Characteristics
S5、在对4×4以及8×8像素的子块所对应的分支网络CAB进行训练时,将按数字编号i(i=0,1,2…15)的相对顺序输入到网络中,将每个子块记为与其相同大小且相邻的四个子块记为其中,中的i代表了数字编号的数值;S5, in sub-blocks of 4×4 and 8×8 pixels When the corresponding branch network CAB is trained, the Input into the network in relative order of numerical numbers i (i = 0, 1, 2...15), and denote each sub-block as The four adjacent sub-blocks of the same size are denoted as in, The i in represents the value of the number number;
S6、对步骤S1中生成的高斯模型以(0,0)为中心进行宽高等间距采样得到与卷积块宽高相同的矩阵,将与自适应卷积模块ACB中的卷积层Conv进行点乘操作,进行加权,表达式为:S6. Sampling the Gaussian model generated in step S1 with (0, 0) as the center To get a matrix with the same width and height as the convolution block, set Do point multiplication with the convolutional layer Conv in the adaptive convolution module ACB, and weight it, the expression is:
用点乘之后的卷积核再对输入图像进行普通卷积运算,经过ACB模块后得到更加专注于图像纹理特征的特征图 Use the convolution kernel after dot multiplication to apply the input image Perform ordinary convolution operations, and get a feature map that is more focused on image texture features after the ACB module
S7、在每四张相邻子块经过自适应纹理处理模块后,对这四张子块按照其在原图中的位置进行拼接,再将其传递到主干网络,得到一张宽高为单张子块的两倍的特征图以矩阵的形式表示为:S7. After every four adjacent sub-blocks pass through the adaptive texture processing module, splicing the four sub-blocks according to their positions in the original image, and then transmitting them to the backbone network to obtain a single sub-block with a width and height of twice the feature map of Represented in matrix form as:
S8、对网络利用最小化Ltotal进行进一步细微调整即完成图片的超分辨率过程。S8, further fine-tuning the network utilization to minimize L total to complete the image super-resolution process.
优选的,所述步骤S1的编码预测模式Mpre包括DC预测模式、平面预测模式和角度预测模式。Preferably, the coding prediction mode M pre of the step S1 includes a DC prediction mode, a plane prediction mode and an angle prediction mode.
优选的,所述的通过编码预测模式Mpre对Gm的协方差矩阵C进行控制,Preferably, the covariance matrix C of G m is controlled by the coding prediction mode M pre ,
Gm=Guss(C,θ|Mpre)G m =Guss(C,θ|M pre )
通过调整协方差矩阵,使生成的高斯模型的极大值处与模式纹理角度相吻合,自适应的专注于图像纹理特征,其中,将Mpre为DC模式或平面模式时,设置为拥有单位协方差矩阵的高斯模型,将Mpre为角度模式且角度为θ的子块设置初始协方差矩阵C,并对其做θ角度变换后得到的结果,表示为:By adjusting the covariance matrix, the maximum value of the generated Gaussian model is consistent with the pattern texture angle, and adaptively focuses on the image texture features. When M pre is DC mode or plane mode, it is set to have unit covariance The Gaussian model of the variance matrix, the initial covariance matrix C is set to the sub-block with M pre as the angle mode and the angle is θ, and the result obtained after the θ angle transformation is performed, which is expressed as:
Gm=A(θ)CA(θ)T G m =A(θ)CA(θ) T
其中A(θ)为二维旋转矩阵A(θ)T表示矩阵A(θ)的转置。where A(θ) is a two-dimensional rotation matrix A(θ) T represents the transpose of matrix A(θ).
优选的,所述步骤S8中的细微调整具体包括:Preferably, the fine adjustment in step S8 specifically includes:
使用mse损失来最小化输入的低分辨率图像和真实的高分辨率图像之间的差距其中,N代表了像素个数,代表了不同分支的输出,将其分别与相对应分支的真实图像进行计算,在损失函数中加入感知损失项,使得生成的图片经过CNN网络的特征值与目标图片经过CNN网络的特征值的L2距离尽可能的小,这使得待生成的图片与目标图片在语义上更加相似,其中f表示CNN网络,CNN网络具体为VGG-16网络。use mse loss to minimize the gap between the input low-resolution image and the real high-resolution image Among them, N represents the number of pixels, represent the outputs of different branches, and compare them with the real images of the corresponding branches Perform calculation, add the perceptual loss term to the loss function, so that the L2 distance between the eigenvalues of the generated image passing through the CNN network and the eigenvalues of the target image passing through the CNN network is as small as possible, which makes the image to be generated and the target image semantically indistinguishable. more similar to Among them, f represents the CNN network, and the CNN network is specifically the VGG-16 network.
对4×4、8×8子块使用更大的损失权重值ω2,对更大的平滑子块16×16、32×32使用更小的权重值ω1,Use a larger loss weight value ω 2 for the 4×4 and 8×8 sub-blocks, and use a smaller weight value ω 1 for the
损失函数Ltotal表示为:The loss function L total is expressed as:
其中ω1为0.5,ω2为1。where ω1 is 0.5 and ω2 is 1.
本发明的有益效果是:The beneficial effects of the present invention are:
1)本发明利用视频编码中可以直接得到的先验信息,对图像的不同部分子块进行针对性的处理,利用复杂的网络处理纹理更复杂的子块,同时设计一个自适应卷积模块对不同编码模式的子块进行针对处理,使网络更有针对性,针对不同的纹理恢复出不同的细节信息,从而提高超分辨率结果的精度。1) The present invention uses the prior information that can be directly obtained in video coding to carry out targeted processing on different sub-blocks of the image, uses complex networks to process sub-blocks with more complex textures, and simultaneously designs an adaptive convolution module to The sub-blocks of different coding modes are targeted for processing, so that the network is more targeted, and different details are recovered for different textures, thereby improving the accuracy of super-resolution results.
2)本发明将少通道的网络的参数共享到深通道的网络中,即达到用一个主干网络的不同层数实现一整张图片的超分辨率过程,使用相对简单,浅层,少通道的网络处理相对大的、纹理更为平滑的子块,减少超分辨过程所需要的时间。2) The present invention shares the parameters of the network with few channels to the network with deep channels, that is to achieve the super-resolution process of realizing a whole picture with different layers of a backbone network, and it is relatively simple to use, shallow layer, few channels. The network processes relatively large, smoother textured sub-blocks, reducing the time required for the super-resolution process.
附图说明Description of drawings
图1为本发明实施例网络结构示意图;1 is a schematic diagram of a network structure according to an embodiment of the present invention;
图2为本发明实施例网络自适应纹理处理模块示意图;2 is a schematic diagram of a network adaptive texture processing module according to an embodiment of the present invention;
图3为本发明实施例4×4、8×8像素子块训练输入顺序示意图。FIG. 3 is a schematic diagram of a training input sequence of 4×4 and 8×8 pixel sub-blocks according to an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
请参阅图1-3,本发明提供一种技术方案:一种基于视频编码的单帧图像超分辨率方法,网络结构如图1所示,包括以下步骤:Please refer to Fig. 1-3, the present invention provides a technical solution: a single-frame image super-resolution method based on video coding, the network structure is shown in Fig. 1, including the following steps:
S1、利用视频编码每一帧图像的先验信息,将视频中低分辨率图像ILR按照H.265视频编码信息分为对应的4×4像素、8×8像素、16×16像素和32×32像素的子块,对于4×4和8×8像素的子块可以得到其对应的编码预测模式Mpre,编码预测模式Mpre包括DC预测模式、平面预测模式和角度预测模式,依据不同的编码模式生成对应的高斯分布的模型Gm。S1. Using the prior information of each frame of video coding, the low-resolution image I LR in the video is divided into corresponding 4 × 4 pixels, 8 × 8 pixels, 16 × 16 pixels and 32 pixels according to the H.265 video coding information. × 32 pixel sub-block, for 4 × 4 and 8 × 8 pixel sub-block The corresponding coding prediction mode M pre can be obtained. The coding prediction mode M pre includes a DC prediction mode, a plane prediction mode and an angle prediction mode, and a corresponding Gaussian distribution model G m is generated according to different coding modes.
其中,通过编码预测模式Mpre对Gm的协方差矩阵C进行控制,Among them, the covariance matrix C of G m is controlled by the coding prediction mode M pre ,
Gm=Guss(C,θ|Mpre)G m =Guss(C,θ|M pre )
通过调整协方差矩阵,使得生成的高斯模型的极大值处与模式纹理角度相吻合,以自适应的专注于图像纹理特征,其中,将Mpre为DC模式或平面模式时,设置为拥有单位协方差矩阵的高斯模型,将Mpre为角度模式且角度为θ的子块设置初始协方差矩阵C,并对其做θ角度变换后得到的结果,表示为:By adjusting the covariance matrix, the maximum value of the generated Gaussian model is consistent with the pattern texture angle, so as to adaptively focus on the image texture features. When M pre is DC mode or plane mode, it is set to have the unit For the Gaussian model of the covariance matrix, the initial covariance matrix C is set to the sub-block with M pre as the angle mode and the angle is θ, and the result obtained after performing the θ angle transformation on it is expressed as:
Gm=A(θ)CA(θ)T G m =A(θ)CA(θ) T
其中A(θ)为二维旋转矩阵A(θ)T表示矩阵A(θ)的转置。where A(θ) is a two-dimensional rotation matrix A(θ) T represents the transpose of matrix A(θ).
S2、利用16×16以及32×32像素的子块对图1中通道自适应主干网络(ChannelAdaptive Backbone,CAB)进行训练,注意此时为了高效处理不同类型的输入,将CAB中的每一个卷积块分为conv1和conv2两层通道,在每一次迭代中,仅使用conv1的参数进行前向和反向传播,并不使用conv2的参数,通过最小化感知损失和mse损失得到最终超分辨输出ISR S2. Utilize sub-blocks of 16×16 and 32×32 pixels Train the Channel Adaptive Backbone (CAB) in Figure 1. Note that in order to efficiently process different types of input, each convolution block in the CAB is divided into two layers of channels, conv1 and conv2. In the iteration, only the parameters of conv1 are used for forward and backward propagation, and the parameters of conv2 are not used, by minimizing the perceptual loss and mse loss Get the final super-resolution output I SR
S3、利用4×4以及8×8像素的子块对图1中通道自适应主干网络CAB进行训练,注意因为复杂的纹理信息需要使用到更多的网络参数进行处理,此时使用conv1和conv2的参数进行前向传播,因为conv1已经在的训练中以及学习到了平滑信息的特征提取方式,在反向传播的时候固定conv1的参数,仅更新conv2的参数,依旧通过最小化感知损失和mse损失得到最终超分辨率输出ISR S3. Use 4×4 and 8×8 pixel sub-blocks Train the channel adaptive backbone network CAB in Figure 1. Note that more network parameters need to be used for processing complex texture information. At this time, the parameters of conv1 and conv2 are used for forward propagation, because conv1 is already in the In the training of , and the feature extraction method of smooth information is learned, the parameters of conv1 are fixed during backpropagation, and only the parameters of conv2 are updated, still by minimizing the perceptual loss. and mse loss Get the final super-resolution output I SR
S4、步骤S2和步骤S3训练完成后,对整个网络进行训练,训练时固定通道自适应主干网络CAB的参数,仅利用最小化感知损失和mse损失进行训练,对剩下的网络参数进行更新,首先训练对应分支的特征提取模块,初步提取出的特征 S4. After the training of step S2 and step S3 is completed, the entire network is trained. During training, the parameters of the channel adaptive backbone network CAB are fixed, and only the minimum perceptual loss is used. and mse loss Perform training, update the remaining network parameters, first train The feature extraction module of the corresponding branch is initially extracted Characteristics
S5、在对4×4以及8×8像素的子块所对应的分支网络CAB进行训练时,将按数字编号i(i=0,1,2…15)的相对顺序输入到网络中,将每个子块记为与其相同大小且相邻的四个子块记为其中,中的i代表了数字编号的数值(如图3中所示,在输入i=5的子块时,其相邻子块为i=6,7,8);S5, in sub-blocks of 4×4 and 8×8 pixels When the corresponding branch network CAB is trained, the Input into the network in relative order of numerical numbers i (i = 0, 1, 2...15), and denote each sub-block as The four adjacent sub-blocks of the same size are denoted as in, i in represents the numerical value of the number number (as shown in Figure 3, when the sub-block of i=5 is input, its adjacent sub-blocks are i=6, 7, 8);
S6、对步骤S1中生成的高斯模型以(0,0)为中心进行宽高等间距采样得到与卷积块宽高相同的矩阵,将与自适应卷积模块ACB(如图2所示)中的卷积层Conv进行点乘操作,进行加权,表达式为:S6. Sampling the Gaussian model generated in step S1 with (0, 0) as the center To get a matrix with the same width and height as the convolution block, set Do a point multiplication operation with the convolutional layer Conv in the adaptive convolution module ACB (as shown in Figure 2), and perform weighting, the expression is:
用点乘之后的卷积核再对输入图像进行普通卷积运算,经过ACB模块后得到更加专注于图像纹理特征的特征图 Use the convolution kernel after dot multiplication to apply the input image Perform ordinary convolution operations, and get a feature map that is more focused on image texture features after the ACB module
S7、在每四张相邻子块经过自适应纹理处理模块后,对这四张子块按照其在原图中的位置进行拼接,再将其传递到主干网络,得到一张宽高为单张子块的两倍的特征图以矩阵的形式表示为:S7. After every four adjacent sub-blocks pass through the adaptive texture processing module, splicing the four sub-blocks according to their positions in the original image, and then transmitting them to the backbone network to obtain a single sub-block with a width and height of twice the feature map of Represented in matrix form as:
S8、为了更关注于细节信息,对网络利用最小化Ltotal进行进一步细微调整即完成图片的超分辨率过程。S8. In order to pay more attention to the detailed information, the network utilization minimizes L total to make further fine adjustments to complete the super-resolution process of the picture.
在上述的训练过程中,使用mse损失来最小化输入的低分辨率图像和真实的高分辨率图像之间的差距其中,N代表了像素个数,代表了不同分支的输出,将其分别与相对应分支的真实图像进行计算,但是由于mse损失进行逐像素的损失往往与真实视觉感受有出入,因此我们在损失函数中加入感知损失项,使得生成的图片经过CNN网络的特征值与目标图片经过CNN网络的特征值的L2距离尽可能的小,这使得待生成的图片与目标图片在语义上更加相似(相对于Pixel级别的损失函数), In the above training process, the mse loss is used to minimize the gap between the input low-resolution image and the real high-resolution image Among them, N represents the number of pixels, represent the outputs of different branches, and compare them with the real images of the corresponding branches However, since the pixel-by-pixel loss of the mse loss is often different from the real visual experience, we add the perceptual loss term to the loss function, so that the eigenvalues of the generated image passing through the CNN network and the eigenvalues of the target image passing through the CNN network The L2 distance is as small as possible, which makes the image to be generated and the target image more semantically similar (relative to the Pixel-level loss function),
这里,f所代表的CNN网络我们选为VGG-16。Here, the CNN network represented by f is selected as VGG-16.
此外,由于图像的超分辨率质量更体现在细节处的表现,因此我们更加关注纹理复杂部分,即4×4,8×8子块的重建效果,我们也因此对于这两种子块给予更大的损失权重值ω2,对更大的平滑子块16×16、32×32使用更小的权重值ω1,In addition, since the super-resolution quality of the image is more reflected in the performance of the details, we pay more attention to the complex parts of the texture, that is, the reconstruction effect of the 4×4 and 8×8 sub-blocks. The loss weight value ω 2 of the
因此,损失函数Ltotal表示为:Therefore, the loss function L total is expressed as:
其中ω1为0.5,ω2为1。where ω1 is 0.5 and ω2 is 1.
本发明利用视频编码中可以直接得到的先验信息,对图像的不同部分子块进行针对性的处理,利用复杂的网络处理纹理更复杂的子块,同时设计一个自适应卷积模块对不同编码模式的子块进行针对处理,使网络更有针对性,针对不同的纹理恢复出不同的细节信息,从而提高超分辨率结果的精度。本发明将少通道的网络的参数共享到深通道的网络中,即达到用一个主干网络的不同层数实现一整张图片的超分辨率过程,使用相对简单,浅层,少通道的网络处理相对大的、纹理更为平滑的子块,减少超分辨过程所需要的时间。The present invention uses the prior information that can be directly obtained in video coding to carry out targeted processing on different sub-blocks of the image, uses a complex network to process sub-blocks with more complex textures, and simultaneously designs an adaptive convolution module for different coding. The sub-blocks of the pattern are targeted for processing, making the network more targeted and recovering different details for different textures, thereby improving the accuracy of the super-resolution results. The present invention shares the parameters of the network with few channels to the network with deep channels, that is to achieve the super-resolution process of a whole picture with different layers of a backbone network, and uses relatively simple, shallow, and few-channel network processing. Relatively large, smoother textured sub-blocks reduce the time required for the super-resolution process.
尽管参照前述实施例对本发明进行了详细的说明,对于本领域的技术人员来说,其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。Although the present invention has been described in detail with reference to the foregoing embodiments, for those skilled in the art, it is still possible to modify the technical solutions described in the foregoing embodiments, or to perform equivalent replacements for some of the technical features. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.
Claims (4)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110541900.7A CN113393377B (en) | 2021-05-18 | 2021-05-18 | Single-frame image super-resolution method based on video coding |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110541900.7A CN113393377B (en) | 2021-05-18 | 2021-05-18 | Single-frame image super-resolution method based on video coding |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN113393377A CN113393377A (en) | 2021-09-14 |
| CN113393377B true CN113393377B (en) | 2022-02-01 |
Family
ID=77617993
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110541900.7A Active CN113393377B (en) | 2021-05-18 | 2021-05-18 | Single-frame image super-resolution method based on video coding |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN113393377B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115115512B (en) * | 2022-06-13 | 2023-10-03 | 荣耀终端有限公司 | A training method and device for image super-resolution network |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102835105A (en) * | 2010-02-19 | 2012-12-19 | 斯凯普公司 | Data compression for video |
| CN110956671A (en) * | 2019-12-12 | 2020-04-03 | 电子科技大学 | An Image Compression Method Based on Multi-scale Feature Coding |
| CN112449140A (en) * | 2019-08-29 | 2021-03-05 | 华为技术有限公司 | Video super-resolution processing method and device |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110969577B (en) * | 2019-11-29 | 2022-03-11 | 北京交通大学 | A Video Super-Resolution Reconstruction Method Based on Deep Dual Attention Network |
-
2021
- 2021-05-18 CN CN202110541900.7A patent/CN113393377B/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102835105A (en) * | 2010-02-19 | 2012-12-19 | 斯凯普公司 | Data compression for video |
| CN112449140A (en) * | 2019-08-29 | 2021-03-05 | 华为技术有限公司 | Video super-resolution processing method and device |
| CN110956671A (en) * | 2019-12-12 | 2020-04-03 | 电子科技大学 | An Image Compression Method Based on Multi-scale Feature Coding |
Non-Patent Citations (4)
| Title |
|---|
| A2RMNet: Adaptively Aspect Ratio Multi-Scale Network for Object Detection in Remote Sensing Images;Heqian Qiu等;《remote sensing》;20190704;第2-23页 * |
| REGION ADAPTIVE TWO-SHOT NETWORK FOR SINGLE IMAGE DEHAZING;Hui Li等;《UTC from IEEE Xplore》;20200619;第1-6页 * |
| 基于压缩特征的稀疏表示运动目标跟踪;张红梅等;《郑州大学学报(工学版)》;20160603(第03期);第24-29页 * |
| 高效率视频编码帧内预测编码单元划分快速算法;齐美彬等;《电子与信息学报》;20140731;第36卷(第7期);第1699-1704页 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN113393377A (en) | 2021-09-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111861945B (en) | A text-guided image restoration method and system | |
| CN108596841B (en) | A parallel method for image super-resolution and deblurring | |
| CN116309232B (en) | Underwater image enhancement method combining physical priori with deep learning | |
| CN112489164B (en) | Image coloring method based on improved depth separable convolutional neural network | |
| CN116433516A (en) | Low-illumination image denoising and enhancing method based on attention mechanism | |
| CN114898284B (en) | Crowd counting method based on feature pyramid local difference attention mechanism | |
| CN116912114B (en) | Non-reference low-illumination image enhancement method based on high-order curve iteration | |
| CN116958534A (en) | An image processing method, image processing model training method and related devices | |
| CN110148138A (en) | A kind of video object dividing method based on dual modulation | |
| CN116109510A (en) | A Face Image Inpainting Method Based on Dual Generation of Structure and Texture | |
| CN118230131B (en) | Image recognition and target detection method | |
| CN119863544B (en) | Lightweight face image generation method based on improved StarGAN and knowledge distillation | |
| CN117152600A (en) | An underwater image processing method based on lightweight diffusion model | |
| CN115937704A (en) | Remote sensing image road segmentation method based on topology perception neural network | |
| CN117689592A (en) | An underwater image enhancement method based on cascade adaptive network | |
| CN113393377B (en) | Single-frame image super-resolution method based on video coding | |
| CN115526779A (en) | Infrared image super-resolution reconstruction method based on dynamic attention mechanism | |
| Shen et al. | Deeper super-resolution generative adversarial network with gradient penalty for sonar image enhancement | |
| CN118799230A (en) | A high dynamic range imaging method based on multi-scale progressive reconstruction network | |
| CN116681621A (en) | Face image restoration method based on feature fusion and multiplexing | |
| Yang et al. | PAFPT: Progressive aggregator with feature prompted transformer for underwater image enhancement | |
| CN119888656A (en) | Self-supervision blind image decomposition method oriented to automatic driving scene | |
| CN118154440A (en) | Image enhancement method and system based on multi-prior feature fusion | |
| CN118333865A (en) | Multi-scale mixed self-attention-based light-weight image super-resolution method | |
| CN117409204A (en) | Real-time semantic segmentation method based on feature multiplexing and two-stage self-attention |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |