Disclosure of Invention
The invention aims to solve the problem that an existing image super-resolution algorithm needs to independently train a model under different upsampling rates, and provides an image super-resolution reconstruction method, electronic equipment and a storage medium based on FSRCNN and OPE.
In order to achieve the above purpose, the present invention is realized by the following technical scheme:
A FSRCNN and OPE-based image super-resolution reconstruction method comprises the following steps:
S1, acquiring a high-resolution image or acquiring a high-resolution image based on a DIV2K Dataset open source data set, preprocessing to acquire a high-resolution image HR and a corresponding low-resolution image LR, and constructing a data set for training based on FSRCNN and OPE image super-resolution reconstruction methods;
s2, constructing FSRCNN network structures and OPE modules;
S3, inputting the low-resolution image in the dataset obtained in the step S1 into a FSRCNN network, extracting the features of the image through a convolution layer, and outputting a feature map F;
S4, inputting the feature map obtained in the step S3 into an OPE module, coding the feature map to obtain an orthogonal position code, then carrying out up-sampling processing on the feature map by utilizing the obtained orthogonal position code, and calculating a pixel value of a super-resolution image SR through linear combination of the OPE module to generate a preliminary SR image;
S5, performing post-processing on the preliminary SR image obtained in the step S4 to obtain an image subjected to super-resolution processing;
s6, performing performance evaluation on the super-resolution processed image obtained in the step S5, calculating a loss function, adjusting FSRCNN parameters of a network structure and an OPE module through back propagation according to the result of the loss function, performing iterative training by using a data set, and outputting a reconstruction result from an LR image to an SR image.
Further, the specific implementation method of the step S1 includes the following steps:
s1.1, downsampling a high-resolution image by a bicubic interpolation method to obtain a low-resolution image LR, adjusting the reduction ratio of the bicubic interpolation method to obtain a low-resolution image LR with three ratios of 1/2, 1/4 and 1/8, and respectively placing the HR and the LR under different folders;
S1.2, carrying out normalization processing on the HR and LR obtained in the step S1.1, wherein the pixel value range is [0,1], and obtaining a dataset used for training an image super-resolution reconstruction method based on FSRCNN and OPE.
Further, the specific implementation method of the step S2 includes the following steps:
S2.1, constructing FSRCNN a network structure, wherein the first layer is a 3x3 convolution layer, 32 convolution kernels are used, a ReLU activation function is connected to the first layer, the second layer is a 3x3 maximum pooling layer, the third layer is a 3x3 convolution layer, 64 convolution kernels are used, a ReLU activation function is connected to the second layer, and the fourth layer is a 1x1 convolution layer used for compressing feature mapping;
S2.2, constructing an OPE module, namely selecting sine and cosine functions as orthogonal bases, setting maximum frequency and coding length, selecting Adam as an optimizer for model training, and setting initial learning rate lr to be 0.001.
Further, the expression of the feature of the extracted image in step S3 is:
Where x m,n is the pixel value in the input image, B i,j is the weight of the convolution kernel, f is the ReLU activation function, y i,j is the pixel value of the output feature image, M×N is the size of the convolution kernel, wherein M and N represent the position coordinates of the pixels in the image, M is the coordinates in the horizontal direction, N is the coordinates in the vertical direction, M is the width of the image, and N is the height of the image;
The output of FSRCNN network is the feature map F, the dimension of F is h×w×c, where H is the height of feature map, W is the width of feature map, and C is the number of channels of feature map.
Further, the specific implementation method of the step S4 includes the following steps:
s4.1, defining a group of orthogonal basis functions, wherein the group of orthogonal basis functions comprises an orthogonal basis function in the horizontal direction and an orthogonal basis function in the vertical direction, and the expression is as follows:
Wherein phi p (x) is a cosine base function in the horizontal direction, phi q (y) is a cosine base function in the vertical direction, phi p (x) is a sine base function in the horizontal direction, phi q (y) is a sine base function in the vertical direction, p and q are frequency indexes of orthogonal base functions respectively, and x and y are positions of feature maps in the horizontal direction and the vertical direction respectively;
S4.2, for each pair of orthogonal basis functions, calculating projection coefficients of characteristic mapping on the orthogonal basis functions, wherein the expression is as follows:
Wherein F m,n,c is the pixel value of the feature map at positions m, n and channel c, phi p (m) and phi q (n) are the values of the sine base function at positions m, n, respectively, and phi p (m) and phi q (n) are the values of the cosine base function at positions m, n, respectively;
s4.3, carrying out up-sampling processing on the feature mapping by utilizing the projection coefficient of the obtained feature mapping on the orthogonal basis function, calculating the pixel value of the super-resolution image SR through the linear combination of the OPE module, generating a preliminary SR image, and calculating the expression as follows:
Where I i,j is the pixel value of the high-resolution image at position I, j, K and L are the number of orthogonal basis functions in the horizontal and vertical directions, and P p,q is the coefficient of the P-th and q-th basis functions in the encoded feature map.
Further, step S5 performs post-processing on the preliminary SR image, denoising by using a non-local mean technique, performing high-pass filtering by using a Sobel operator, and finally obtaining the super-resolution processed image through histogram equalization.
Further, the specific implementation method of the step S6 includes the following steps:
S6.1, performance evaluation, namely performing performance evaluation on the reconstructed super-resolution image by using peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) indexes, wherein the PSNR has a calculation expression as follows:
Wherein MAX I is the maximum pixel value of the image, MSE is the mean square error, and the calculation formula of MSE is:
Wherein, I HR is the original high-resolution image, I SR is the image processed by super resolution, M and N are the number of rows and columns of the image respectively;
The calculated expression for SSIM is:
Wherein: And The average of the original high resolution image and the super resolution processed image,AndThe variances of the original high resolution image and the super resolution processed image respectively,Is the covariance of the original high-resolution image and the super-resolution processed image, C 1 and C 2 are constants for stabilizing the denominator of SSIM;
s6.2, setting MSE and SSIM to jointly construct a loss function of model training, wherein the expression is as follows:
Loss=α·MSE(I,K)+(1-α)·(1-SSIM(I,K))
Wherein alpha is a weight parameter, the range is between 0 and 1, and the larger the Loss value is, the worse the effect of the super-resolution image reconstructed by the model is.
An electronic device comprising a memory and a processor, the memory storing a computer program, said processor implementing the steps of said one image super-resolution reconstruction method based on FSRCNN and OPE when executing said computer program.
A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the above-described image super-resolution reconstruction method based on FSRCNN and OPE.
The invention has the beneficial effects that:
the image super-resolution reconstruction method based on FSRCNN and OPE can solve the problem of image super-resolution of any scale, does not need to train an independent model aiming at each scale, and saves training time and calculation resources.
According to the image super-resolution reconstruction method based on FSRCNN and OPE, the OPE module is utilized to improve the accuracy of the up-sampling process, reduce the blurring and distortion introduced by the traditional interpolation method, and improve the definition and visual quality of the image.
According to the image super-resolution reconstruction method based on FSRCNN and OPE, the image super-resolution task of any scale can be processed by training the model once.
According to the image super-resolution reconstruction method based on FSRCNN and OPE, the lightweight design of the FSRCNN network ensures the calculation efficiency of the reconstruction process.
The image super-resolution reconstruction method based on FSRCNN and OPE has the effectiveness and potential in practical application. With the continuous development of technology, the method of the invention is expected to play an important role in more fields.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and detailed description. It should be understood that the embodiments described herein are for purposes of illustration only and are not intended to limit the invention, i.e., the embodiments described are merely some, but not all, of the embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein can be arranged and designed in a wide variety of different configurations, and the present invention can have other embodiments as well.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
For further understanding of the invention, the following detailed description is to be taken in conjunction with fig. 1-3, in which the following detailed description is given, of the invention:
the first embodiment is as follows:
A FSRCNN and OPE-based image super-resolution reconstruction method comprises the following steps:
S1, acquiring a high-resolution image or acquiring a high-resolution image based on a DIV2K Dataset open source data set, preprocessing to acquire a high-resolution image HR and a corresponding low-resolution image LR, and constructing a data set for training based on FSRCNN and OPE image super-resolution reconstruction methods;
Further, the specific implementation method of the step S1 includes the following steps:
s1.1, downsampling a high-resolution image by a bicubic interpolation method to obtain a low-resolution image LR, adjusting the reduction ratio of the bicubic interpolation method to obtain a low-resolution image LR with three ratios of 1/2, 1/4 and 1/8, and respectively placing the HR and the LR under different folders;
S1.2, carrying out normalization processing on the HR and LR obtained in the step S1.1, wherein the pixel value range is [0,1], and obtaining a dataset used for training an image super-resolution reconstruction method based on FSRCNN and OPE.
S2, constructing FSRCNN network structures and OPE modules;
further, the specific implementation method of the step S2 includes the following steps:
S2.1, constructing FSRCNN a network structure, wherein the first layer is a 3x3 convolution layer, 32 convolution kernels are used, a ReLU activation function is connected to the first layer, the second layer is a 3x3 maximum pooling layer, the third layer is a 3x3 convolution layer, 64 convolution kernels are used, a ReLU activation function is connected to the second layer, and the fourth layer is a 1x1 convolution layer used for compressing feature mapping;
S2.2, constructing an OPE module, namely selecting sine and cosine functions as orthogonal bases, setting maximum frequency and coding length, selecting Adam as an optimizer for model training, and setting initial learning rate lr to be 0.001.
S3, inputting the low-resolution image in the dataset obtained in the step S1 into a FSRCNN network, extracting the features of the image through a convolution layer, and outputting a feature map F;
Further, the expression of the feature of the extracted image in step S3 is:
Where x m,n is the pixel value in the input image, B i,j is the weight of the convolution kernel, f is the ReLU activation function, y i,j is the pixel value of the output feature image, M×N is the size of the convolution kernel, wherein M and N represent the position coordinates of the pixels in the image, M is the coordinates in the horizontal direction, N is the coordinates in the vertical direction, M is the width of the image, and N is the height of the image;
The output of FSRCNN network is the feature map F, the dimension of F is h×w×c, where H is the height of feature map, W is the width of feature map, and C is the number of channels of feature map.
S4, inputting the feature map obtained in the step S3 into an OPE module, coding the feature map to obtain an orthogonal position code, then carrying out up-sampling processing on the feature map by utilizing the obtained orthogonal position code, and calculating a pixel value of a super-resolution image SR through linear combination of the OPE module to generate a preliminary SR image;
Further, the specific implementation method of the step S4 includes the following steps:
s4.1, defining a group of orthogonal basis functions, wherein the group of orthogonal basis functions comprises an orthogonal basis function in the horizontal direction and an orthogonal basis function in the vertical direction, and the expression is as follows:
Wherein phi p (x) is a cosine base function in the horizontal direction, phi q (y) is a cosine base function in the vertical direction, phi p (x) is a sine base function in the horizontal direction, phi q (y) is a sine base function in the vertical direction, p and q are frequency indexes of orthogonal base functions respectively, and x and y are positions of feature maps in the horizontal direction and the vertical direction respectively;
S4.2, for each pair of orthogonal basis functions, calculating projection coefficients of characteristic mapping on the orthogonal basis functions, wherein the expression is as follows:
Wherein F m,n,c is the pixel value of the feature map at positions m, n and channel c, phi p (m) and phi q (n) are the values of the sine base function at positions m, n, respectively, and phi p (m) and phi q (n) are the values of the cosine base function at positions m, n, respectively;
s4.3, carrying out up-sampling processing on the feature mapping by utilizing the projection coefficient of the obtained feature mapping on the orthogonal basis function, calculating the pixel value of the super-resolution image SR through the linear combination of the OPE module, generating a preliminary SR image, and calculating the expression as follows:
Wherein I i,j is the pixel value of the high-resolution image at position I, j, K and L are the number of orthogonal basis functions in the horizontal and vertical directions, and P p,q is the coefficient of the P-th and q-th basis functions in the encoded feature map;
S5, performing post-processing on the preliminary SR image obtained in the step S4 to obtain an image subjected to super-resolution processing;
further, step S5 performs post-processing on the preliminary SR image, denoising by using a non-local mean technique, performing high-pass filtering by using a Sobel operator, and finally obtaining a super-resolution processed image through histogram equalization;
S6, performing performance evaluation on the super-resolution processed image obtained in the step S5, calculating a loss function, adjusting FSRCNN parameters of a network structure and OPE module through back propagation according to the result of the loss function, performing iterative training by using a data set, and outputting a reconstruction result from an LR image to an SR image;
further, the specific implementation method of the step S6 includes the following steps:
S6.1, performance evaluation, namely performing performance evaluation on the reconstructed super-resolution image by using peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) indexes, wherein the PSNR has a calculation expression as follows:
Wherein MAX I is the maximum pixel value of the image, MSE is the mean square error, and the calculation formula of MSE is:
Wherein, I HR is the original high-resolution image, I SR is the image processed by super resolution, M and N are the number of rows and columns of the image respectively;
The calculated expression for SSIM is:
Wherein: And The average of the original high resolution image and the super resolution processed image,AndThe variances of the original high resolution image and the super resolution processed image respectively,Is the covariance of the original high-resolution image and the super-resolution processed image, C 1 and C 2 are constants for stabilizing the denominator of SSIM;
s6.2, setting MSE and SSIM to jointly construct a loss function of model training, wherein the expression is as follows:
Loss=α·MSE(I,K)+(1-α)·(1-SSIM(I,K))
Wherein alpha is a weight parameter, the range is between 0 and 1, and the larger the Loss value is, the worse the effect of the super-resolution image reconstructed by the model is.
Further, as shown in fig. 2, the iterative training process using the data set, the specific implementation method includes the following steps:
S6.2.1, inputting a training set, a verification set, an up-sampling rate and iteration times epoch into a FSRCNN network structure and an OPE module to train, and obtaining a training result;
S6.2.2, calculating a peak signal to noise ratio PSNR average value of the verification set by using a training result, then judging whether the constructed FSRCNN network structure and the OPE module are the best model, if so, updating the best model, and if not, updating the iteration times to carry out the next training until the best model is output.
The face image with 160x160 pixels is input into the model as a test case, and up-sampling multiples of the model are respectively set to be 2 times, 4 times and 6 times, so that HR images with 3 resolutions are obtained. By the implementation steps, a high-resolution face image is obtained, the resolution of the face image is improved from 160x160 pixels to 320x320 pixels, and the like, and the face image with multiple resolutions can be obtained only through one training. The reconstructed HR image has significant improvement in detail, texture, edges, etc., and no significant artifacts or noise. Both the quantitative evaluation indexes PSNR and SSIM reach higher levels.
The second embodiment is as follows:
An electronic device includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the image super-resolution reconstruction method based on FSRCNN and OPE according to the embodiment when executing the computer program.
The computer device of the present invention may be a device including a processor and a memory, such as a single chip microcomputer including a central processing unit. And the processor is used for executing the computer program stored in the memory to realize the steps of the image super-resolution reconstruction method based on FSRCNN and OPE.
The Processor may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may mainly include a storage program area which may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), etc., and a storage data area which may store data created according to the use of the cellular phone (such as audio data, a phonebook, etc.), etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart memory card (SMART MEDIA CARD, SMC), secure Digital (SD) card, flash memory card (FLASH CARD), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
And a third specific embodiment:
a computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements a method for reconstructing an image super-resolution based on FSRCNN and OPE according to the embodiment.
The computer readable storage medium of the present invention may be any form of storage medium that is readable by a processor of a computer device, including but not limited to, nonvolatile memory, volatile memory, ferroelectric memory, etc., on which a computer program is stored, and when the processor of the computer device reads and executes the computer program stored in the memory, the steps of an image super resolution reconstruction method based on FSRCNN and OPE described above can be implemented.
The computer program comprises computer program code which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
Although the application has been described above with reference to specific embodiments, various modifications may be made and equivalents may be substituted for elements thereof without departing from the scope of the application. In particular, the features of the disclosed embodiments may be combined with each other in any manner so long as there is no structural conflict, and the exhaustive description of these combinations is not given in this specification solely for the sake of brevity and resource saving. Therefore, it is intended that the application not be limited to the particular embodiments disclosed herein, but that the application will include all embodiments falling within the scope of the appended claims.