CN118469818B

CN118469818B - Image super-resolution reconstruction method, electronic device and storage medium based on FSRCNN and OPE

Info

Publication number: CN118469818B
Application number: CN202410671523.2A
Authority: CN
Inventors: 刘晓东; 郭旬涛; 占梓源
Original assignee: Harbin Institute of Technology Shenzhen
Current assignee: Harbin Institute of Technology Shenzhen
Priority date: 2024-05-28
Filing date: 2024-05-28
Publication date: 2025-04-18
Anticipated expiration: 2044-05-28
Also published as: CN118469818A

Abstract

A method, electronic device and storage medium for image super-resolution reconstruction based on FSRCNN and OPE, belonging to the technical field of image super-resolution reconstruction. In order to solve the problem of independent training models of image super-resolution algorithms at different upsampling rates, the present invention inputs low-resolution images in a data set into an FSRCNN network, extracts features of the image through a convolutional layer, outputs feature maps and inputs them into an OPE module, encodes the feature maps to obtain orthogonal position codes, and then uses the obtained orthogonal position codes to upsample the feature maps, calculates pixel values of super-resolution images SR through linear combinations of OPE modules, generates preliminary SR images for post-processing, and obtains images after super-resolution processing; performs performance evaluation and calculates loss functions, and adjusts the FSRCNN network structure and parameters of the OPE module through back propagation according to the results of the loss function. The present invention saves training time.

Description

Image super-resolution reconstruction method based on FSRCNN and OPE, electronic equipment and storage medium

Technical Field

The invention belongs to the technical field of image super-resolution reconstruction, and particularly relates to an image super-resolution reconstruction method based on FSRCNN and OPE, electronic equipment and a storage medium.

Background

Image resolution is an important indicator for measuring image detail and definition, and high-resolution images can provide more information. In military, medical, industrial and everyday life, there is an increasing demand for high resolution images. However, limited by the technology and conditions, the actual acquired images often have blurring and noise problems, and it is difficult to achieve the desired resolution. The traditional sensor technology method is high in cost and lacks flexibility, is only suitable for capturing new images, and cannot improve the resolution of the existing low-resolution images. Therefore, it is important to explore other methods for improving the resolution of images. The image super-resolution technology can recover a high-resolution image from a low-resolution image, improves the image definition, reduces the blurring and noise, and is widely applied to the fields of high-definition televisions, medical imaging, satellite imaging, monitoring, biological information identification and the like.

The technique utilizes image processing techniques to mine hidden information to generate high resolution images. At present, the technology is mainly divided into three types, namely an interpolation-based method, a reconstruction-based method and a depth learning-based method. Interpolation-based methods infer unknown pixel values from known pixel values, including nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation. These methods are simple and fast, but have limitations in high frequency detail recovery. The reconstruction-based method takes image priori knowledge as constraint, and improves the linear model performance and the edge blurring problem through iterative back projection, a maximum posterior probability method, a convex set projection method and the like. The method based on the deep learning realizes the super-resolution reconstruction of the image by learning the mapping relation between the low-resolution image and the high-resolution image and adopting the methods of neighborhood embedding, sparse representation, anchoring neighbor regression and deep learning. The deep learning method, such as SRCNN, learns image features through a convolutional neural network to realize efficient super-resolution reconstruction, but faces challenges such as model size, training time and memory consumption, and the existing image super-resolution algorithm based on deep learning generally needs to train an independent model aiming at a specific up-sampling rate, which limits the universality of the algorithm in image processing of different scales.

Disclosure of Invention

The invention aims to solve the problem that an existing image super-resolution algorithm needs to independently train a model under different upsampling rates, and provides an image super-resolution reconstruction method, electronic equipment and a storage medium based on FSRCNN and OPE.

In order to achieve the above purpose, the present invention is realized by the following technical scheme:

A FSRCNN and OPE-based image super-resolution reconstruction method comprises the following steps:

S1, acquiring a high-resolution image or acquiring a high-resolution image based on a DIV2K Dataset open source data set, preprocessing to acquire a high-resolution image HR and a corresponding low-resolution image LR, and constructing a data set for training based on FSRCNN and OPE image super-resolution reconstruction methods;

s2, constructing FSRCNN network structures and OPE modules;

S3, inputting the low-resolution image in the dataset obtained in the step S1 into a FSRCNN network, extracting the features of the image through a convolution layer, and outputting a feature map F;

S4, inputting the feature map obtained in the step S3 into an OPE module, coding the feature map to obtain an orthogonal position code, then carrying out up-sampling processing on the feature map by utilizing the obtained orthogonal position code, and calculating a pixel value of a super-resolution image SR through linear combination of the OPE module to generate a preliminary SR image;

S5, performing post-processing on the preliminary SR image obtained in the step S4 to obtain an image subjected to super-resolution processing;

s6, performing performance evaluation on the super-resolution processed image obtained in the step S5, calculating a loss function, adjusting FSRCNN parameters of a network structure and an OPE module through back propagation according to the result of the loss function, performing iterative training by using a data set, and outputting a reconstruction result from an LR image to an SR image.

Further, the specific implementation method of the step S1 includes the following steps:

s1.1, downsampling a high-resolution image by a bicubic interpolation method to obtain a low-resolution image LR, adjusting the reduction ratio of the bicubic interpolation method to obtain a low-resolution image LR with three ratios of 1/2, 1/4 and 1/8, and respectively placing the HR and the LR under different folders;

S1.2, carrying out normalization processing on the HR and LR obtained in the step S1.1, wherein the pixel value range is [0,1], and obtaining a dataset used for training an image super-resolution reconstruction method based on FSRCNN and OPE.

Further, the specific implementation method of the step S2 includes the following steps:

S2.1, constructing FSRCNN a network structure, wherein the first layer is a 3x3 convolution layer, 32 convolution kernels are used, a ReLU activation function is connected to the first layer, the second layer is a 3x3 maximum pooling layer, the third layer is a 3x3 convolution layer, 64 convolution kernels are used, a ReLU activation function is connected to the second layer, and the fourth layer is a 1x1 convolution layer used for compressing feature mapping;

S2.2, constructing an OPE module, namely selecting sine and cosine functions as orthogonal bases, setting maximum frequency and coding length, selecting Adam as an optimizer for model training, and setting initial learning rate lr to be 0.001.

Further, the expression of the feature of the extracted image in step S3 is:

Where x _m,n is the pixel value in the input image, B _i,j is the weight of the convolution kernel, f is the ReLU activation function, y _i,j is the pixel value of the output feature image, M×N is the size of the convolution kernel, wherein M and N represent the position coordinates of the pixels in the image, M is the coordinates in the horizontal direction, N is the coordinates in the vertical direction, M is the width of the image, and N is the height of the image;

The output of FSRCNN network is the feature map F, the dimension of F is h×w×c, where H is the height of feature map, W is the width of feature map, and C is the number of channels of feature map.

Further, the specific implementation method of the step S4 includes the following steps:

s4.1, defining a group of orthogonal basis functions, wherein the group of orthogonal basis functions comprises an orthogonal basis function in the horizontal direction and an orthogonal basis function in the vertical direction, and the expression is as follows:

Wherein phi _p (x) is a cosine base function in the horizontal direction, phi _q (y) is a cosine base function in the vertical direction, phi _p (x) is a sine base function in the horizontal direction, phi _q (y) is a sine base function in the vertical direction, p and q are frequency indexes of orthogonal base functions respectively, and x and y are positions of feature maps in the horizontal direction and the vertical direction respectively;

S4.2, for each pair of orthogonal basis functions, calculating projection coefficients of characteristic mapping on the orthogonal basis functions, wherein the expression is as follows:

Wherein F _m,n,c is the pixel value of the feature map at positions m, n and channel c, phi _p (m) and phi _q (n) are the values of the sine base function at positions m, n, respectively, and phi _p (m) and phi _q (n) are the values of the cosine base function at positions m, n, respectively;

s4.3, carrying out up-sampling processing on the feature mapping by utilizing the projection coefficient of the obtained feature mapping on the orthogonal basis function, calculating the pixel value of the super-resolution image SR through the linear combination of the OPE module, generating a preliminary SR image, and calculating the expression as follows:

Where I _i,j is the pixel value of the high-resolution image at position I, j, K and L are the number of orthogonal basis functions in the horizontal and vertical directions, and P _p,q is the coefficient of the P-th and q-th basis functions in the encoded feature map.

Further, step S5 performs post-processing on the preliminary SR image, denoising by using a non-local mean technique, performing high-pass filtering by using a Sobel operator, and finally obtaining the super-resolution processed image through histogram equalization.

Further, the specific implementation method of the step S6 includes the following steps:

S6.1, performance evaluation, namely performing performance evaluation on the reconstructed super-resolution image by using peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) indexes, wherein the PSNR has a calculation expression as follows:

Wherein MAX _I is the maximum pixel value of the image, MSE is the mean square error, and the calculation formula of MSE is:

Wherein, I _HR is the original high-resolution image, I _SR is the image processed by super resolution, M and N are the number of rows and columns of the image respectively;

The calculated expression for SSIM is:

Wherein: And The average of the original high resolution image and the super resolution processed image,AndThe variances of the original high resolution image and the super resolution processed image respectively,Is the covariance of the original high-resolution image and the super-resolution processed image, C ₁ and C ₂ are constants for stabilizing the denominator of SSIM;

s6.2, setting MSE and SSIM to jointly construct a loss function of model training, wherein the expression is as follows:

Loss=α·MSE(I,K)+(1-α)·(1-SSIM(I,K))

Wherein alpha is a weight parameter, the range is between 0 and 1, and the larger the Loss value is, the worse the effect of the super-resolution image reconstructed by the model is.

An electronic device comprising a memory and a processor, the memory storing a computer program, said processor implementing the steps of said one image super-resolution reconstruction method based on FSRCNN and OPE when executing said computer program.

A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the above-described image super-resolution reconstruction method based on FSRCNN and OPE.

The invention has the beneficial effects that:

the image super-resolution reconstruction method based on FSRCNN and OPE can solve the problem of image super-resolution of any scale, does not need to train an independent model aiming at each scale, and saves training time and calculation resources.

According to the image super-resolution reconstruction method based on FSRCNN and OPE, the OPE module is utilized to improve the accuracy of the up-sampling process, reduce the blurring and distortion introduced by the traditional interpolation method, and improve the definition and visual quality of the image.

According to the image super-resolution reconstruction method based on FSRCNN and OPE, the image super-resolution task of any scale can be processed by training the model once.

According to the image super-resolution reconstruction method based on FSRCNN and OPE, the lightweight design of the FSRCNN network ensures the calculation efficiency of the reconstruction process.

The image super-resolution reconstruction method based on FSRCNN and OPE has the effectiveness and potential in practical application. With the continuous development of technology, the method of the invention is expected to play an important role in more fields.

Drawings

FIG. 1 is a flow chart of an image super-resolution reconstruction method based on FSRCNN and OPE according to the invention;

FIG. 2 is a flow chart of the training process of the method of the present invention;

FIG. 3 is a schematic diagram of the structure of the FSRCNN and OPE models according to the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and detailed description. It should be understood that the embodiments described herein are for purposes of illustration only and are not intended to limit the invention, i.e., the embodiments described are merely some, but not all, of the embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein can be arranged and designed in a wide variety of different configurations, and the present invention can have other embodiments as well.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.

For further understanding of the invention, the following detailed description is to be taken in conjunction with fig. 1-3, in which the following detailed description is given, of the invention:

the first embodiment is as follows:

S2, constructing FSRCNN network structures and OPE modules;

Further, the expression of the feature of the extracted image in step S3 is:

Wherein I _i,j is the pixel value of the high-resolution image at position I, j, K and L are the number of orthogonal basis functions in the horizontal and vertical directions, and P _p,q is the coefficient of the P-th and q-th basis functions in the encoded feature map;

further, step S5 performs post-processing on the preliminary SR image, denoising by using a non-local mean technique, performing high-pass filtering by using a Sobel operator, and finally obtaining a super-resolution processed image through histogram equalization;

S6, performing performance evaluation on the super-resolution processed image obtained in the step S5, calculating a loss function, adjusting FSRCNN parameters of a network structure and OPE module through back propagation according to the result of the loss function, performing iterative training by using a data set, and outputting a reconstruction result from an LR image to an SR image;

The calculated expression for SSIM is:

Loss=α·MSE(I,K)+(1-α)·(1-SSIM(I,K))

Further, as shown in fig. 2, the iterative training process using the data set, the specific implementation method includes the following steps:

S6.2.1, inputting a training set, a verification set, an up-sampling rate and iteration times epoch into a FSRCNN network structure and an OPE module to train, and obtaining a training result;

S6.2.2, calculating a peak signal to noise ratio PSNR average value of the verification set by using a training result, then judging whether the constructed FSRCNN network structure and the OPE module are the best model, if so, updating the best model, and if not, updating the iteration times to carry out the next training until the best model is output.

The face image with 160x160 pixels is input into the model as a test case, and up-sampling multiples of the model are respectively set to be 2 times, 4 times and 6 times, so that HR images with 3 resolutions are obtained. By the implementation steps, a high-resolution face image is obtained, the resolution of the face image is improved from 160x160 pixels to 320x320 pixels, and the like, and the face image with multiple resolutions can be obtained only through one training. The reconstructed HR image has significant improvement in detail, texture, edges, etc., and no significant artifacts or noise. Both the quantitative evaluation indexes PSNR and SSIM reach higher levels.

The second embodiment is as follows:

An electronic device includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the image super-resolution reconstruction method based on FSRCNN and OPE according to the embodiment when executing the computer program.

The computer device of the present invention may be a device including a processor and a memory, such as a single chip microcomputer including a central processing unit. And the processor is used for executing the computer program stored in the memory to realize the steps of the image super-resolution reconstruction method based on FSRCNN and OPE.

The Processor may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may mainly include a storage program area which may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), etc., and a storage data area which may store data created according to the use of the cellular phone (such as audio data, a phonebook, etc.), etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart memory card (SMART MEDIA CARD, SMC), secure Digital (SD) card, flash memory card (FLASH CARD), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

And a third specific embodiment:

a computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements a method for reconstructing an image super-resolution based on FSRCNN and OPE according to the embodiment.

The computer readable storage medium of the present invention may be any form of storage medium that is readable by a processor of a computer device, including but not limited to, nonvolatile memory, volatile memory, ferroelectric memory, etc., on which a computer program is stored, and when the processor of the computer device reads and executes the computer program stored in the memory, the steps of an image super resolution reconstruction method based on FSRCNN and OPE described above can be implemented.

The computer program comprises computer program code which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.

It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

Although the application has been described above with reference to specific embodiments, various modifications may be made and equivalents may be substituted for elements thereof without departing from the scope of the application. In particular, the features of the disclosed embodiments may be combined with each other in any manner so long as there is no structural conflict, and the exhaustive description of these combinations is not given in this specification solely for the sake of brevity and resource saving. Therefore, it is intended that the application not be limited to the particular embodiments disclosed herein, but that the application will include all embodiments falling within the scope of the appended claims.

Claims

1. An image super-resolution reconstruction method based on FSRCNN and OPE, characterized by comprising the following steps:

S1. Collect high-resolution images or obtain high-resolution images based on the DIV2K Dataset open source dataset, obtain high-resolution images HR and corresponding low-resolution images LR after preprocessing, and construct a dataset for training image super-resolution reconstruction methods based on FSRCNN and OPE;

S2. Construct FSRCNN network structure and OPE module;

The specific implementation method of step S2 includes the following steps:

S2.1. Construct the FSRCNN network structure: the first layer is a 3x3 convolution layer, using 32 convolution kernels, followed by a ReLU activation function, the second layer is a 3x3 maximum pooling layer, the third layer is a 3x3 convolution layer, using 64 convolution kernels, followed by a ReLU activation function, and the fourth layer is a 1x1 convolution layer for compressing feature maps;

S2.2. Construct the OPE module: select sine and cosine functions as orthogonal bases, then set the maximum frequency and encoding length, select Adam as the optimizer for model training, and set the initial learning rate lr to 0.001;

S3. Input the low-resolution image in the data set obtained in step S1 into the FSRCNN network, extract the features of the image through the convolution layer, and output the feature map F;

S4. Input the feature map obtained in step S3 into the OPE module, encode the feature map to obtain the orthogonal position code, and then use the obtained orthogonal position code to upsample the feature map, calculate the pixel value of the super-resolution image SR through the linear combination of the OPE module, and generate a preliminary SR image;

The specific implementation method of step S4 includes the following steps:

S4.1. Define a set of orthogonal basis functions including orthogonal basis functions in the horizontal direction and orthogonal basis functions in the vertical direction, expressed as:

Wherein, φ _p (x) is the cosine basis function in the horizontal direction, φ _q (y) is the cosine basis function in the vertical direction, ψ _p (x) is the sine basis function in the horizontal direction, ψ _q (y) is the sine basis function in the vertical direction, p and q are the frequency indexes of the orthogonal basis functions, respectively, and x and y are the positions of the feature map in the horizontal direction and the vertical direction, respectively;

S4.2. For each pair of orthogonal basis functions, calculate the projection coefficient of the feature map on the orthogonal basis function, expressed as:

Where Fm _,n,c is the pixel value of the feature map at position m, n and channel c, _φp (m) and _φq (n) are the values of the sine basis function at position m and n, respectively, and _ψp (m) and _ψq (n) are the values of the cosine basis function at position m and n, respectively;

S4.3. The feature map is upsampled using the projection coefficients of the feature map obtained on the orthogonal basis function. The pixel values of the super-resolution image SR are calculated through the linear combination of the OPE modules to generate a preliminary SR image. The calculation expression is:

Where I _i,j is the pixel value of the high-resolution image at position i,j, K and L are the number of orthogonal basis functions in the horizontal and vertical directions, and P _p,q is the coefficient of the pth and qth basis functions in the encoded feature map;

S5. Post-processing the preliminary SR image obtained in step S4 to obtain a super-resolution processed image;

S6. Perform performance evaluation on the super-resolution processed image obtained in step S5 and calculate the loss function. According to the result of the loss function, adjust the FSRCNN network structure and the parameters of the OPE module through back propagation, use the data set for iterative training, and output the reconstruction result from the LR image to the SR image.

2. According to the method for image super-resolution reconstruction based on FSRCNN and OPE in claim 1, it is characterized in that the specific implementation method of step S1 comprises the following steps:

S1.1. Downsample the high-resolution image to obtain a low-resolution image LR by using the bicubic interpolation method, and adjust the reduction ratio of the bicubic interpolation method to obtain low-resolution images LR with three ratios of 1/2, 1/4, and 1/8, and place HR and LR in different folders respectively;

S1.2. After normalizing the HR and LR obtained in step S1.1, the pixel value range is [0,1], and a data set for training the image super-resolution reconstruction method based on FSRCNN and OPE is obtained.

3. The image super-resolution reconstruction method based on FSRCNN and OPE according to claim 2 is characterized in that the expression of the feature of the image extracted in step S3 is:

Where xm _,n is the pixel value in the input image, is the weight of the convolution kernel, bi _,j is the bias term, f is the ReLU activation function, yi _,j is the pixel value of the output feature map, M×N is the size of the convolution kernel, where m and n represent the position coordinates of the pixel in the image, m is the horizontal coordinate, n is the vertical coordinate, M is the width of the image, and N is the height of the image;

The output of the FSRCNN network is a feature map F, and the dimension of F is H×W×C, where H is the height of the feature map, W is the width of the feature map, and C is the number of channels of the feature map.

4. According to the method for image super-resolution reconstruction based on FSRCNN and OPE in claim 3, it is characterized in that, in step S5, the preliminary SR image is post-processed, denoised using a non-local means technique, high-pass filtered using a Sobel operator, and finally histogram equalized to obtain a super-resolution processed image.

5. According to the method of claim 4, the image super-resolution reconstruction method based on FSRCNN and OPE is characterized in that the specific implementation method of step S6 comprises the following steps:

S6.1. Performance evaluation: The performance of the reconstructed super-resolution image is evaluated using the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM) indicators. The calculation expression of PSNR is:

Among them, MAX _I is the maximum pixel value of the image, MSE is the mean square error, and the calculation formula of MSE is:

Where I _HR is the original high-resolution image, I _SR is the image after super-resolution processing, M and N are the number of rows and columns of the image respectively;

The calculation expression of SSIM is:

in: and are the means of the original high-resolution image and the image after super-resolution processing, and are the variances of the original high-resolution image and the image after super-resolution processing, is the covariance of the original high-resolution image and the image after super-resolution processing, _C1 and _C2 are constants used to stabilize the denominator of SSIM;

S6.2. Set MSE and SSIM to jointly construct the loss function of model training, the expression is:

Loss＝α·MSE(I,K)+(1-α)·(1-SSIM(I,K))

Among them, α is the weight parameter, ranging from [0,1]. The larger the Loss value, the worse the super-resolution image reconstructed by the model.

6. An electronic device, characterized in that it includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of an image super-resolution reconstruction method based on FSRCNN and OPE as described in any one of claims 1 to 5 when executing the computer program.

7. A computer-readable storage medium having a computer program stored thereon, characterized in that when the computer program is executed by a processor, the image super-resolution reconstruction method based on FSRCNN and OPE according to any one of claims 1 to 5 is implemented.