CN117522748A - Fisheye image correction method and system based on SKNet attention mechanism - Google Patents
Fisheye image correction method and system based on SKNet attention mechanism Download PDFInfo
- Publication number
- CN117522748A CN117522748A CN202311571580.5A CN202311571580A CN117522748A CN 117522748 A CN117522748 A CN 117522748A CN 202311571580 A CN202311571580 A CN 202311571580A CN 117522748 A CN117522748 A CN 117522748A
- Authority
- CN
- China
- Prior art keywords
- image
- scale
- distortion
- corrected
- correction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Processing (AREA)
Abstract
Description
技术领域Technical field
本发明涉及图像处理技术领域,具体地说基于SKNet注意力机制的鱼眼图像矫正方法及系统。The present invention relates to the field of image processing technology, specifically a fisheye image correction method and system based on the SKNet attention mechanism.
背景技术Background technique
近年来,随着机器人视觉和虚拟现实技术的迅速发展,鱼眼摄像机因其大视角和小体积,在特殊用途图像采集领域广泛应用,如监视系统、安全系统和各种会议。然而,鱼眼图像伴随着畸变,尤其在外围区域畸变严重,因此,鱼眼图像的畸变矫正变得至关重要,以符合人类视觉和减少特征提取的干扰。In recent years, with the rapid development of robot vision and virtual reality technology, fisheye cameras have been widely used in special-purpose image acquisition fields due to their large viewing angle and small size, such as surveillance systems, security systems and various conferences. However, fisheye images are accompanied by distortion, especially in the peripheral areas. Therefore, distortion correction of fisheye images becomes crucial to conform to human vision and reduce interference in feature extraction.
鱼眼镜头是一种具有超大视场角的光学镜头,广泛用于视频监控、全景拍摄和高空测量等领域。然而,在测量领域,鱼眼镜头的非线性畸变会导致较大的误差,影响实际应用。目前,鱼眼相机成像系统通过计算机图像视觉技术对图像畸变进行矫正,鱼眼图像的畸变矫正算法在面对畸变图像的挑战时应运而生,成为研究重点,以满足高精度测量和实际应用的需求。通过应用鱼眼矫正技术,可以将鱼眼镜头采集的图像转换为更符合人眼观察习惯的直角投影图像,从而消除或减少图像的畸变,鱼眼矫正技术有助于提高图像的可视化效果,提供更准确的图像信息,以及改善后续算法和应用的性能。The fisheye lens is an optical lens with a large field of view and is widely used in fields such as video surveillance, panoramic photography, and high-altitude measurement. However, in the field of measurement, the nonlinear distortion of fisheye lenses can lead to large errors, affecting practical applications. At present, fisheye camera imaging systems use computer image vision technology to correct image distortion. The fisheye image distortion correction algorithm emerged as the times require when facing the challenge of distorted images, and has become a research focus to meet the needs of high-precision measurement and practical applications. need. By applying fisheye correction technology, the image captured by the fisheye lens can be converted into a right-angle projection image that is more in line with the observation habits of the human eye, thereby eliminating or reducing the distortion of the image. The fisheye correction technology helps to improve the visualization effect of the image and provides More accurate image information, and improved performance of subsequent algorithms and applications.
如何对鱼眼图像进行矫正,减少陡峭急剧和陡峭消失的问题,是需要解决的技术问题。How to correct the fisheye image and reduce the problem of steepness and disappearance is a technical problem that needs to be solved.
发明内容Contents of the invention
本发明的技术任务是针对以上不足,提供基于SKNet注意力机制的鱼眼图像矫正方法及系统,来解决如何对鱼眼图像进行矫正,减少陡峭急剧和陡峭消失的问题的技术问题。The technical task of the present invention is to address the above shortcomings and provide a fish-eye image correction method and system based on the SKNet attention mechanism to solve the technical problem of how to correct the fish-eye image and reduce the problems of steepness and disappearance.
第一方面,本发明一种基于SKNet注意力机制的鱼眼图像矫正方法,包括如下步骤:In the first aspect, the present invention is a fisheye image correction method based on the SKNet attention mechanism, which includes the following steps:
图像采集:获取真实图像以及对应的鱼眼图像构建样本集;Image collection: Obtain real images and corresponding fisheye images to construct a sample set;
流估计模型构建:构建流估计网络模型,所述流估计网络模型为U型编码器-解码器结构,用于对输入的鱼眼图像进行外观特征提取、并将外观特征图映射为流,输出多尺度外观流;Flow estimation model construction: Construct a flow estimation network model. The flow estimation network model is a U-shaped encoder-decoder structure, which is used to extract appearance features from the input fisheye image, map the appearance feature map into a flow, and output multi-scale appearance flow;
畸变矫正模型构建:构建畸变矫正网络模型,所述畸变矫正网络模型包括生成器、矫正层和鉴别器,所述生成器包括U型编码器-解码器结构,编码器引入有SKNet注意力机制层,用于对输入的鱼眼图像进行畸变特征提取、输出多尺度畸变特征图;所述矫正层用于以多尺度外观流和多尺度畸变特征图为输入、基于外观流对相应尺度的畸变特征图进行矫正,输出多尺度矫正后畸变特征图;所述编码器用于基于输入的多尺度矫正后畸变特征图进行图像重建,输出多尺度校正后图像;所述鉴别器引入有谱归一化层,用于以多尺度校正后图像为输入、判断每个尺度校正后图像是否为真实图像,并预测输出每个尺度校正后图像的类别标签以及对应的类别概率,类别标签包括真实图像和伪造图像;Distortion correction model construction: Construct a distortion correction network model. The distortion correction network model includes a generator, a correction layer and a discriminator. The generator includes a U-shaped encoder-decoder structure, and the encoder introduces an SKNet attention mechanism layer. , used to extract distortion features from the input fisheye image and output a multi-scale distortion feature map; the correction layer is used to take the multi-scale appearance flow and the multi-scale distortion feature map as input, and extract the distortion features of the corresponding scales based on the appearance flow. The image is corrected and a multi-scale corrected distortion feature map is output; the encoder is used for image reconstruction based on the input multi-scale corrected distortion feature map and outputs a multi-scale corrected image; the discriminator introduces a spectral normalization layer, Used to take multi-scale corrected images as input, determine whether the corrected image at each scale is a real image, and predict and output the category label of the corrected image at each scale and the corresponding category probability. The category labels include real images and fake images;
模型训练:基于样本集对流估计网络模型和畸变矫正网络模型进行模型训练,得到训练后流估计网络模型以及训练后畸变矫正网络模型;Model training: Carry out model training on the flow estimation network model and the distortion correction network model based on the sample set, and obtain the post-training flow estimation network model and the post-training distortion correction network model;
图像矫正:以待矫正的鱼眼图像为输入,通过训练后流估计网络模型以及训练后畸变矫正网络模型预测输出每个尺度校正后图像的类别标签以及类别类别概率,基于类别类别概率得到最终校正后图像。Image correction: Taking the fisheye image to be corrected as input, the trained flow estimation network model and the trained distortion correction network model predict and output the category label and category probability of the corrected image at each scale, and the final correction is obtained based on the category probability. After image.
作为优选,对于所述流估计网络模型,编码器包括N个卷积结构,解码器包括N-1个反卷积结构;Preferably, for the flow estimation network model, the encoder includes N convolution structures, and the decoder includes N-1 deconvolution structures;
每个卷积结构作为下采样结构包括依次连接的卷积层、归一化层和Leaky ReLU激活函数,用于对输入的图像进行降采样操作和外观特征提取,输出外观特征图;As a downsampling structure, each convolutional structure includes a convolution layer, a normalization layer and a Leaky ReLU activation function connected in sequence, which is used to downsample the input image and extract appearance features, and output the appearance feature map;
每个反卷积结构作为上采样结构包括一个残差模块和一个反卷积层,用于对输入的外观特征图进行上采样操作,输出N-1个尺度的外观特征图;Each deconvolution structure includes a residual module and a deconvolution layer as an upsampling structure, which is used to upsample the input appearance feature map and output appearance feature maps of N-1 scales;
每个反卷积结构的输出端连接有卷积层,卷积层用于对输入的外观特征图进行卷积操作、并将外观特征图映射为流,得到外观流。The output end of each deconvolution structure is connected to a convolution layer. The convolution layer is used to perform a convolution operation on the input appearance feature map and map the appearance feature map into a stream to obtain an appearance stream.
作为优选,生成器包括编码器、解码器、矫正层、图像处理结构和图像矫正结构,编码器包括N个卷积结构和N-1个反卷积结构,解码器包括N-1个反卷积结构;Preferably, the generator includes an encoder, a decoder, a correction layer, an image processing structure and an image correction structure, the encoder includes N convolution structures and N-1 deconvolution structures, and the decoder includes N-1 deconvolutions. product structure;
对于前N-1个卷积结构,每个卷积结构作为下采样结构包括卷积层和Leaky ReLU激活函数,且每个下采样结构引入有sknet注意力机制层,用于对输入的鱼眼图像进行下采样操作以及畸变特征提取,输出对应尺度的畸变特征图;For the first N-1 convolution structures, each convolution structure includes a convolution layer and a Leaky ReLU activation function as a down-sampling structure, and each down-sampling structure introduces a sknet attention mechanism layer for input fisheye The image is subjected to downsampling operation and distortion feature extraction, and a distortion feature map of the corresponding scale is output;
对于第N个卷积结构,所述卷积结构包括Leaky ReLU激活函数,用于对输入的畸变特征图进行特征变换,输出畸变特征图;For the Nth convolution structure, the convolution structure includes a Leaky ReLU activation function, which is used to perform feature transformation on the input distortion feature map and output the distortion feature map;
矫正层基于渐进互补机制与流估计网络模型连接,用于执行如下:以多尺度外观流和多尺度畸变特征图为输入,对于每个尺度的畸变特征图,基于对应尺度的外观流对畸变特征图进行空间变换以矫正畸变特征图,输出对应尺度的矫正后畸变特征图,其中外观流和畸变特征图均共N路且一一对应;The correction layer is connected to the flow estimation network model based on the progressive complementation mechanism and is used to perform the following: taking multi-scale appearance flow and multi-scale distortion feature maps as input, for each scale distortion feature map, based on the appearance flow of the corresponding scale, the distortion feature is The image is spatially transformed to correct the distortion feature map, and the corrected distortion feature map of the corresponding scale is output, in which the appearance flow and the distortion feature map both have a total of N channels and have a one-to-one correspondence;
对于每个反卷积结构,所述卷积结构作为上采样结构包括卷积层和ReLU激活函数,用于对对应尺度的矫正后畸变特征图进行上采样操作,输出对应尺度的矫正后畸变特征图;For each deconvolution structure, the convolution structure includes a convolution layer and a ReLU activation function as an upsampling structure, which is used to perform an upsampling operation on the corrected distortion feature map of the corresponding scale and output the corrected distortion feature of the corresponding scale. picture;
每个反卷积结构的输出端配置有图像处理结构,图像处理结构包括卷积层、Tanh激活函数和torgb函数,用于对输入的矫正后畸变特征图进行卷积操作以及激活操作、并通过torgb函数将矫正后畸变特征图转换为RGB图像,得到RGB图像形式的矫正后畸变特征图;The output end of each deconvolution structure is configured with an image processing structure. The image processing structure includes a convolution layer, a Tanh activation function and a torgb function, which are used to perform convolution operations and activation operations on the input corrected distortion feature map, and pass The torgb function converts the corrected distortion feature map into an RGB image to obtain the corrected distortion feature map in the form of an RGB image;
所述图像矫正结构为卷积层,用于以多尺度RGB图像形式的矫正后畸变特征图为输入,对每个尺度的矫正后畸变特征图进行下采样操作,输出多尺度校正后图像。The image correction structure is a convolution layer, which is used to take the corrected distortion feature map in the form of a multi-scale RGB image as input, perform a downsampling operation on the corrected distortion feature map of each scale, and output a multi-scale corrected image.
作为优选,所述鉴别器包括M个卷积网络结构,每个卷积网络结构均包括卷积层、LeakyReLU激活函数和谱归一化层,用于以每个尺度校正后图像为输入、判断校正后图像是否为真实图像,输出类别标签和类别类别概率,类别标签包括真实图像和伪造图像。Preferably, the discriminator includes M convolutional network structures. Each convolutional network structure includes a convolution layer, a LeakyReLU activation function and a spectral normalization layer, which is used to take the corrected image of each scale as input and make judgments. Whether the corrected image is a real image, the class label and class probability are output. The class label includes real images and fake images.
作为优选,基于多尺度校正后图像以及对应尺度的真实图像、通过Wasserstein距离损失函数构建对抗损失函数,所述对抗损失函数用于对生成器和鉴别器进行模型训练;Preferably, an adversarial loss function is constructed through the Wasserstein distance loss function based on the multi-scale corrected image and the real image of the corresponding scale, and the adversarial loss function is used for model training of the generator and the discriminator;
基于多尺度校正后图像以及对应尺度的真实图像构建多尺度损失函数,将对抗损失函数和对尺度损失函数加权作为流估计网络模型和畸变矫正网络模型的总损失函数,基于总损失函数对流估计网络模型和畸变矫正网络模型进行模型训练,得到训练后流估计网络模型和训练后畸变矫正网络模型。A multi-scale loss function is constructed based on the multi-scale corrected image and the real image of the corresponding scale. The adversarial loss function and the scale loss function are weighted as the total loss function of the flow estimation network model and the distortion correction network model. Based on the total loss function, the flow estimation network The model and the distortion correction network model are used for model training, and the post-training flow estimation network model and the post-training distortion correction network model are obtained.
第二方面,本发明一种基于SKNet注意力机制的鱼眼图像矫正系统,用于通过如第一方面任一项所述的一种基于SKNet注意力机制的鱼眼图像矫正方法实现鱼眼图像矫正,所述系统包括图像采集模块、流估计模型构建模块、畸变矫正模型构建模块、模型训练模块以及图像矫正模块;In the second aspect, the present invention is a fish-eye image correction system based on the SKNet attention mechanism, which is used to realize fish-eye images through a fish-eye image correction method based on the SKNet attention mechanism as described in any one of the first aspects. Correction, the system includes an image acquisition module, a flow estimation model building module, a distortion correction model building module, a model training module and an image correction module;
图像采集模块用于执行如下:获取真实图像以及对应的鱼眼图像构建样本集;The image acquisition module is used to perform the following: obtain real images and corresponding fisheye images to construct a sample set;
流估计模型构建模块用于执行如下:构建流估计网络模型,所述流估计网络模型为U型编码器-解码器结构,用于对输入的鱼眼图像进行外观特征提取、并将外观特征图映射为流,输出多尺度外观流;The flow estimation model building module is used to perform the following: construct a flow estimation network model, which is a U-shaped encoder-decoder structure, and is used to extract appearance features from the input fisheye image and convert the appearance feature map Map to a stream and output a multi-scale appearance stream;
畸变矫正模型构建模块用于执行如下:构建畸变矫正网络模型,所述畸变矫正网络模型包括生成器、矫正层和鉴别器,所述生成器包括U型编码器-解码器结构,编码器引入有SKNet注意力机制层,用于对输入的鱼眼图像进行畸变特征提取、输出多尺度畸变特征图;所述矫正层用于以多尺度外观流和多尺度畸变特征图为输入、基于外观流对相应尺度的畸变特征图进行矫正,输出多尺度矫正后畸变特征图;所述编码器用于基于输入的多尺度矫正后畸变特征图进行图像重建,输出多尺度校正后图像;所述鉴别器引入有谱归一化层,用于以多尺度校正后图像为输入、判断每个尺度校正后图像是否为真实图像,并预测输出每个尺度校正后图像的类别标签以及对应的类别概率,类别标签包括真实图像和伪造图像;The distortion correction model building module is used to perform the following: construct a distortion correction network model. The distortion correction network model includes a generator, a correction layer and a discriminator. The generator includes a U-shaped encoder-decoder structure. The encoder introduces The SKNet attention mechanism layer is used to extract distortion features of the input fisheye image and output multi-scale distortion feature maps; the correction layer is used to take multi-scale appearance flow and multi-scale distortion feature maps as input, and based on the appearance flow pair Correct the distortion feature map of the corresponding scale, and output the multi-scale corrected distortion feature map; the encoder is used to perform image reconstruction based on the input multi-scale corrected distortion feature map, and output the multi-scale corrected image; the discriminator introduces spectral The normalization layer is used to take the multi-scale corrected image as input, determine whether the corrected image at each scale is a real image, and predict and output the category label of the corrected image at each scale and the corresponding category probability. The category label includes the real image. images and forged images;
模型训练模块用于执行如下:基于样本集对流估计网络模型和畸变矫正网络模型进行模型训练,得到训练后流估计网络模型以及训练后畸变矫正网络模型;The model training module is used to perform the following: perform model training on the flow estimation network model and the distortion correction network model based on the sample set, and obtain the trained flow estimation network model and the trained distortion correction network model;
图像矫正模块用于执行如下:以待矫正的鱼眼图像为输入,通过训练后流估计网络模型以及训练后畸变矫正网络模型预测输出每个尺度校正后图像的类别标签以及类别类别概率,基于类别类别概率得到最终校正后图像。The image correction module is used to perform the following: taking the fisheye image to be corrected as input, predicting and outputting the category label and category probability of the corrected image at each scale through the trained flow estimation network model and the trained distortion correction network model, based on the category The class probability is used to obtain the final corrected image.
作为优选,对于所述流估计网络模型,编码器包括N个卷积结构,解码器包括N-1个反卷积结构;Preferably, for the flow estimation network model, the encoder includes N convolution structures, and the decoder includes N-1 deconvolution structures;
每个卷积结构作为下采样结构包括依次连接的卷积层、归一化层和Leaky ReLU激活函数,用于对输入的图像进行降采样操作和外观特征提取,输出外观特征图;As a downsampling structure, each convolutional structure includes a convolution layer, a normalization layer and a Leaky ReLU activation function connected in sequence, which is used to downsample the input image and extract appearance features, and output the appearance feature map;
每个反卷积结构作为上采样结构包括一个残差模块和一个反卷积层,用于对输入的外观特征图进行上采样操作,输出N-1个尺度的外观特征图;Each deconvolution structure includes a residual module and a deconvolution layer as an upsampling structure, which is used to upsample the input appearance feature map and output appearance feature maps of N-1 scales;
每个反卷积结构的输出端连接有卷积层,卷积层用于对输入的外观特征图进行卷积操作、并将外观特征图映射为流,得到外观流。The output end of each deconvolution structure is connected to a convolution layer. The convolution layer is used to perform a convolution operation on the input appearance feature map and map the appearance feature map into a stream to obtain an appearance stream.
作为优选,生成器包括编码器、解码器、矫正层、图像处理结构和图像矫正结构,编码器包括N个卷积结构和N-1个反卷积结构,解码器包括N-1个反卷积结构;Preferably, the generator includes an encoder, a decoder, a correction layer, an image processing structure and an image correction structure, the encoder includes N convolution structures and N-1 deconvolution structures, and the decoder includes N-1 deconvolutions. product structure;
对于前N-1个卷积结构,每个卷积结构作为下采样结构包括卷积层和Leaky ReLU激活函数,且每个下采样结构引入有sknet注意力机制层,用于对输入的鱼眼图像进行下采样操作以及畸变特征提取,输出对应尺度的畸变特征图;For the first N-1 convolution structures, each convolution structure includes a convolution layer and a Leaky ReLU activation function as a down-sampling structure, and each down-sampling structure introduces a sknet attention mechanism layer for input fisheye The image is subjected to downsampling operation and distortion feature extraction, and a distortion feature map of the corresponding scale is output;
对于第N个卷积结构,所述卷积结构包括Leaky ReLU激活函数,用于对输入的畸变特征图进行特征变换,输出畸变特征图;For the Nth convolution structure, the convolution structure includes a Leaky ReLU activation function, which is used to perform feature transformation on the input distortion feature map and output the distortion feature map;
矫正层基于渐进互补机制与流估计网络模型连接,用于执行如下:以多尺度外观流和多尺度畸变特征图为输入,对于每个尺度的畸变特征图,基于对应尺度的外观流对畸变特征图进行空间变换以矫正畸变特征图,输出对应尺度的矫正后畸变特征图,其中外观流和畸变特征图均共N路且一一对应;The correction layer is connected to the flow estimation network model based on the progressive complementation mechanism and is used to perform the following: taking multi-scale appearance flow and multi-scale distortion feature maps as input, for each scale distortion feature map, based on the appearance flow of the corresponding scale, the distortion feature is The image is spatially transformed to correct the distortion feature map, and the corrected distortion feature map of the corresponding scale is output, in which the appearance flow and the distortion feature map both have a total of N channels and have a one-to-one correspondence;
对于每个反卷积结构,所述卷积结构作为上采样结构包括卷积层和ReLU激活函数,用于对对应尺度的矫正后畸变特征图进行上采样操作,输出对应尺度的矫正后畸变特征图;For each deconvolution structure, the convolution structure includes a convolution layer and a ReLU activation function as an upsampling structure, which is used to perform an upsampling operation on the corrected distortion feature map of the corresponding scale and output the corrected distortion feature of the corresponding scale. picture;
每个反卷积结构的输出端配置有图像处理结构,图像处理结构包括卷积层、Tanh激活函数和torgb函数,用于对输入的矫正后畸变特征图进行卷积操作以及激活操作、并通过torgb函数将矫正后畸变特征图转换为RGB图像,得到RGB图像形式的矫正后畸变特征图;The output end of each deconvolution structure is configured with an image processing structure. The image processing structure includes a convolution layer, a Tanh activation function and a torgb function, which are used to perform convolution operations and activation operations on the input corrected distortion feature map, and pass The torgb function converts the corrected distortion feature map into an RGB image to obtain the corrected distortion feature map in the form of an RGB image;
所述图像矫正结构为卷积层,用于以多尺度RGB图像形式的矫正后畸变特征图为输入,对每个尺度的矫正后畸变特征图进行下采样操作,输出多尺度校正后图像。The image correction structure is a convolution layer, which is used to take the corrected distortion feature map in the form of a multi-scale RGB image as input, perform a downsampling operation on the corrected distortion feature map of each scale, and output a multi-scale corrected image.
作为优选,所述鉴别器包括M个卷积网络结构,每个卷积网络结构均包括卷积层、LeakyReLU激活函数和谱归一化层,用于以每个尺度校正后图像为输入、判断校正后图像是否为真实图像,输出类别标签和类别类别概率,类别标签包括真实图像和伪造图像。Preferably, the discriminator includes M convolutional network structures. Each convolutional network structure includes a convolution layer, a LeakyReLU activation function and a spectral normalization layer, which is used to take the corrected image of each scale as input and make judgments. Whether the corrected image is a real image, the class label and class probability are output. The class label includes real images and fake images.
作为优选,所述模型训练模块用于基于多尺度校正后图像以及对应尺度的真实图像、通过Wasserstein距离损失函数构建对抗损失函数,所述对抗损失函数用于对生成器和鉴别器进行模型训练;Preferably, the model training module is used to construct an adversarial loss function through the Wasserstein distance loss function based on the multi-scale corrected image and the real image of the corresponding scale, and the adversarial loss function is used to perform model training on the generator and the discriminator;
所述模型训练模块用于基于多尺度校正后图像以及对应尺度的真实图像构建多尺度损失函数,将对抗损失函数和对尺度损失函数加权作为流估计网络模型和畸变矫正网络模型的总损失函数,并用于基于总损失函数对流估计网络模型和畸变矫正网络模型进行模型训练,得到训练后流估计网络模型和训练后畸变矫正网络模型。The model training module is used to construct a multi-scale loss function based on the multi-scale corrected image and the real image of the corresponding scale, and weight the adversarial loss function and the scale loss function as the total loss function of the flow estimation network model and the distortion correction network model, And it is used to conduct model training on the flow estimation network model and the distortion correction network model based on the total loss function, and obtain the post-training flow estimation network model and the post-training distortion correction network model.
本[W用1]发明的基于SKNet注意力机制的鱼眼图像矫正方法及系统具有以下优点:The fisheye image correction method and system based on the SKNet attention mechanism invented by this [WYong1] has the following advantages:
1、通过流估计网络模型对鱼眼图像进行外观特征提取,输出多尺度外观流,通过生成器中编码器对鱼眼图像进行畸变特征提取、输出多尺度畸变特征,并基于渐进互补机制引入外观流对对应尺度的畸变特征进行矫正,通过生成器中解码器基于矫正后畸变特征进行图像重建生成校正后图像,并基于鉴别器对校正后图像进行识别判断是否为真实图像,实现了鱼眼图像的矫正,且生成器的编码器中引入有SKNet注意力机制层,通过引入SKNet自注意力机制来引入选择性的多尺度卷积核,在不增加网络复杂性的情况下提升了特征提取的能力,在鉴别器中引入有谱归一化层,通过约束网络的权重矩阵的谱范数,来控制权的大小,从而减少了陡峭急剧和陡峭消失的问题[W用2],有助于更容易地训练深层次的神经网络,特别是在生成器和判别器之间的复杂博弈中;1. Extract the appearance features of the fisheye image through the flow estimation network model and output the multi-scale appearance flow. Extract the distortion features of the fisheye image through the encoder in the generator, output the multi-scale distortion features, and introduce the appearance based on the progressive complementation mechanism. The stream corrects the distortion features of the corresponding scale, and the decoder in the generator performs image reconstruction based on the corrected distortion features to generate a corrected image, and uses the discriminator to identify the corrected image and determine whether it is a real image, realizing the fisheye image. Correction, and the SKNet attention mechanism layer is introduced in the generator encoder. By introducing the SKNet self-attention mechanism, selective multi-scale convolution kernels are introduced, which improves the performance of feature extraction without increasing network complexity. Ability to introduce a spectral normalization layer in the discriminator to control the size of the weight by constraining the spectral norm of the weight matrix of the network, thus reducing the problems of steep abruptness and steep disappearance [W with 2], which helps to improve Easily train deep neural networks, especially in complex games between generators and discriminators;
2、引入Wasserstein距离损失函数作为生成器和鉴别器之间的损失函数,用Wasserstein距离来衡量生成的图像与真实图像之间的差异,彻底解决生成器和鉴别器训练不稳定的问题,不再需要小心平衡生成器和判别器的训练程度。2. Introduce the Wasserstein distance loss function as the loss function between the generator and the discriminator, use the Wasserstein distance to measure the difference between the generated image and the real image, completely solve the problem of unstable training of the generator and the discriminator, and no longer Care needs to be taken to balance the degree of training of the generator and discriminator.
附图说明Description of drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments or prior art will be briefly introduced below. Obviously, the drawings in the following description are only illustrative of the present invention. For some embodiments, for those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.
下面结合附图对本发明进一步说明。The present invention will be further described below in conjunction with the accompanying drawings.
图1为实施例1一种基于SKNet注意力机制的鱼眼图像矫正方法的流程框图;Figure 1 is a flow chart of a fisheye image correction method based on the SKNet attention mechanism in Embodiment 1;
图2为图1为实施例1一种基于SKNet注意力机制的鱼眼图像矫正方法中SKNet注意力机制层的结构原理框图。Figure 2 is a structural principle block diagram of the SKNet attention mechanism layer in the fisheye image correction method based on the SKNet attention mechanism in Embodiment 1 of Figure 1.
具体实施方式Detailed ways
下面结合附图和具体实施例对本发明作进一步说明,以使本领域的技术人员可以更好地理解本发明并能予以实施,但所举实施例不作为对本发明的限定,在不冲突的情况下,本发明实施例以及实施例中的技术特征可以相互结合。The present invention will be further described below in conjunction with the accompanying drawings and specific examples, so that those skilled in the art can better understand the present invention and implement it. However, the illustrated embodiments are not intended to limit the present invention. In the absence of conflict, Below, the embodiments of the present invention and the technical features in the embodiments can be combined with each other.
本发明实施例提供基于SKNet注意力机制的鱼眼图像矫正方法及系统,用于解决如何对鱼眼图像进行矫正,减少陡峭急剧和陡峭消失的问题的技术问题。Embodiments of the present invention provide a fish-eye image correction method and system based on the SKNet attention mechanism, which is used to solve the technical problem of how to correct the fish-eye image and reduce the problems of sharpness and disappearance of steepness.
实施例1:Example 1:
本发明一种基于SKNet注意力机制的鱼眼图像矫正方法,包括图像采集、流估计模型构建、畸变矫正模型构建、模型训练以及图像校正五个步骤。The present invention is a fisheye image correction method based on the SKNet attention mechanism, which includes five steps: image acquisition, flow estimation model construction, distortion correction model construction, model training and image correction.
步骤S100图像采集:获取真实图像以及对应的鱼眼图像构建样本集。Step S100 Image collection: Obtain real images and corresponding fisheye images to construct a sample set.
步骤S200流估计模型构建:构建流估计网络模型,所述流估计网络模型为U型编码器-解码器结构,用于对输入的鱼眼图像进行外观特征提取、并将外观特征图映射为流,输出多尺度外观流。Step S200 Flow estimation model construction: Construct a flow estimation network model. The flow estimation network model is a U-shaped encoder-decoder structure, which is used to extract appearance features from the input fisheye image and map the appearance feature map into a flow. , output a multi-scale appearance stream.
本实施例流估计网络模型中编码器包括N个卷积结构,解码器包括N-1个反卷积结构。其中,每个卷积结构作为下采样结构包括依次连接的卷积层、归一化层和Leaky ReLU激活函数,用于对输入的图像进行降采样操作和外观特征提取,输出外观特征图;每个反卷积结构作为上采样结构包括一个残差模块和一个反卷积层,用于对输入的外观特征图进行上采样操作,输出N-1个尺度的外观特征图;每个反卷积结构的输出端连接有卷积层,卷积层用于对输入的外观特征图进行卷积操作、并将外观特征图映射为流,得到外观流。In the flow estimation network model of this embodiment, the encoder includes N convolution structures, and the decoder includes N-1 deconvolution structures. Among them, each convolution structure as a down-sampling structure includes a convolution layer, a normalization layer and a Leaky ReLU activation function connected in sequence, which is used to perform down-sampling operations and appearance feature extraction on the input image, and output the appearance feature map; each A deconvolution structure as an upsampling structure includes a residual module and a deconvolution layer, which is used to upsample the input appearance feature map and output appearance feature maps of N-1 scales; each deconvolution The output end of the structure is connected to a convolution layer. The convolution layer is used to perform a convolution operation on the input appearance feature map and map the appearance feature map into a stream to obtain the appearance stream.
作为流估计网络模型的具体实施,编码器为6层卷积层组成,每层包含1个卷积、归一化和LReLu激活函数激活等操作,用于进行图像的降采样和提取特征,将输入图像的分辨率降低,将256×256×3大小的图像降采样成4×4×512大小。As a specific implementation of the flow estimation network model, the encoder is composed of 6 convolutional layers. Each layer contains 1 convolution, normalization and LReLu activation function activation and other operations, which are used to downsample the image and extract features. The resolution of the input image is reduced, and the 256×256×3 size image is downsampled to a 4×4×512 size.
解码器由5层反卷积层组成,每层包含1个残差块和1个反卷积操作,用于进行图像的上采样操作,把图像恢复成原始分辨率。The decoder consists of 5 deconvolution layers, each layer contains 1 residual block and 1 deconvolution operation, which is used to upsample the image and restore the image to its original resolution.
流估计模块利用编码器-解码器结构提取特征并生成一系列形状流。在解码器的输出端设置有卷积层,该卷积层利用3x3大小的卷积核对提取的外观特征图进行处理,得到两个通道的外观流,从将不同分辨率的特征图映射到流的输出,通过这种方式,得到了大小为128、64、32、16、8的5个外观流。The flow estimation module utilizes an encoder-decoder structure to extract features and generate a series of shape flows. A convolutional layer is provided at the output end of the decoder. The convolutional layer uses a 3x3 convolution kernel to process the extracted appearance feature map to obtain an appearance stream of two channels. The feature maps of different resolutions are mapped to the stream. The output, in this way, is obtained 5 appearance streams with sizes of 128, 64, 32, 16, and 8.
流估计网络模型评估鱼眼图像的失真程度并将其呈现为外观流。该网络模型利用编码器--解码器结构提取特征并生成序列。具体来说,每个解码器的反卷积结构的输出特征经过额外的3x3卷积操作,得到两个通道的外观流,最终得到5个图形的外观流。The flow estimation network model evaluates the degree of distortion of the fisheye image and renders it as an apparent flow. The network model uses an encoder-decoder structure to extract features and generate sequences. Specifically, the output features of the deconvolution structure of each decoder undergo an additional 3x3 convolution operation to obtain an appearance stream of two channels, and finally obtain an appearance stream of 5 graphics.
其中Iin是输入的鱼眼图像,Gs呈现外观流估计模块,是第i个反卷积结构输出的外观流。where I in is the input fisheye image, G s presents the appearance flow estimation module, is the appearance stream output by the i-th deconvolution structure.
步骤S300畸变矫正模型构建:构建畸变矫正网络模型,所述畸变矫正网络模型包括生成器、矫正层和鉴别器,所述生成器包括U型编码器-解码器结构,编码器引入有SKNet注意力机制层,用于对输入的鱼眼图像进行畸变特征提取、输出多尺度畸变特征图;所述矫正层用于以多尺度外观流和多尺度畸变特征图为输入、基于外观流对相应尺度的畸变特征图进行矫正,输出多尺度矫正后畸变特征图;所述编码器用于基于输入的多尺度矫正后畸变特征图进行图像重建,输出多尺度校正后图像;所述鉴别器引入有谱归一化层,用于以多尺度校正后图像为输入、判断每个尺度校正后图像是否为真实图像,并预测输出每个尺度校正后图像的类别标签以及对应的类别概率,类别标签包括真实图像和伪造图像。Step S300 Distortion correction model construction: Construct a distortion correction network model. The distortion correction network model includes a generator, a correction layer and a discriminator. The generator includes a U-shaped encoder-decoder structure, and the encoder introduces SKNet attention. The mechanism layer is used to extract distortion features of the input fisheye image and output a multi-scale distortion feature map; the correction layer is used to take the multi-scale appearance flow and the multi-scale distortion feature map as input, and perform the corresponding scale based on the appearance flow. The distortion feature map is corrected and a multi-scale corrected distortion feature map is output; the encoder is used to perform image reconstruction based on the input multi-scale corrected distortion feature map and output a multi-scale corrected image; the discriminator introduces spectral normalization The layer is used to take multi-scale corrected images as input, determine whether the corrected image at each scale is a real image, and predict and output the category label of the corrected image at each scale and the corresponding category probability. The category labels include real images and fake images. image.
本实施例畸变矫正模型类似GAN网络,生成器包括编码器、解码器、矫正层、图像处理结构和图像矫正结构,编码器包括N个卷积结构和N-1个反卷积结构,解码器包括N-1个反卷积结构。The distortion correction model in this embodiment is similar to a GAN network. The generator includes an encoder, a decoder, a correction layer, an image processing structure and an image correction structure. The encoder includes N convolution structures and N-1 deconvolution structures. The decoder Includes N-1 deconvolution structures.
对于前N-1个卷积结构,每个卷积结构作为下采样结构包括卷积层和Leaky ReLU激活函数,且每个下采样结构引入有sknet注意力机制层,用于对输入的鱼眼图像进行下采样操作以及畸变特征提取,输出对应尺度的畸变特征图;对于第N个卷积结构,所述卷积结构包括Leaky ReLU激活函数,用于对输入的畸变特征图进行特征变换,输出畸变特征图;矫正层基于渐进互补机制与流估计网络模型连接,用于执行如下:以多尺度外观流和多尺度畸变特征图为输入,对于每个尺度的畸变特征图,基于对应尺度的外观流对畸变特征图进行空间变换以矫正畸变特征图,输出对应尺度的矫正后畸变特征图,其中外观流和畸变特征图均共N路且一一对应;对于每个反卷积结构,所述卷积结构作为上采样结构包括卷积层和ReLU激活函数,用于对对应尺度的矫正后畸变特征图进行上采样操作,输出对应尺度的矫正后畸变特征图;每个反卷积结构的输出端配置有图像处理结构,图像处理结构包括卷积层、Tanh激活函数和torgb函数,用于对输入的矫正后畸变特征图进行卷积操作以及激活操作、并通过torgb函数将矫正后畸变特征图转换为RGB图像,得到RGB图像形式的矫正后畸变特征图;图像矫正结构为卷积层,用于以多尺度RGB图像形式的矫正后畸变特征图为输入,对每个尺度的矫正后畸变特征图进行下采样操作,输出多尺度校正后图像。For the first N-1 convolution structures, each convolution structure includes a convolution layer and a Leaky ReLU activation function as a down-sampling structure, and each down-sampling structure introduces a sknet attention mechanism layer for input fisheye The image is subjected to downsampling operation and distortion feature extraction, and a distortion feature map of the corresponding scale is output; for the Nth convolution structure, the convolution structure includes a Leaky ReLU activation function, which is used to perform feature transformation on the input distortion feature map and output Distortion feature map; the correction layer is connected to the flow estimation network model based on the asymptotic complementation mechanism and is used to perform the following: taking the multi-scale appearance flow and the multi-scale distortion feature map as input, for the distortion feature map of each scale, based on the appearance of the corresponding scale The flow performs spatial transformation on the distortion feature map to correct the distortion feature map, and outputs the corrected distortion feature map of the corresponding scale, in which the appearance flow and the distortion feature map both have a total of N channels and correspond one to one; for each deconvolution structure, the As an upsampling structure, the convolution structure includes a convolution layer and a ReLU activation function, which is used to upsample the corrected distortion feature map of the corresponding scale and output the corrected distortion feature map of the corresponding scale; the output of each deconvolution structure The terminal is equipped with an image processing structure. The image processing structure includes a convolution layer, Tanh activation function and torgb function, which is used to perform convolution and activation operations on the input corrected distortion feature map, and convert the corrected distortion feature map through the torgb function. Convert to an RGB image to obtain a corrected distortion feature map in the form of an RGB image; the image correction structure is a convolutional layer, which is used to take the corrected distortion feature map in the form of a multi-scale RGB image as input, and calculate the corrected distortion features of each scale The image is downsampled and the multi-scale corrected image is output.
鉴别器包括M个卷积网络结构,每个卷积网络结构均包括卷积层、LeakyReLU激活函数和谱归一化层,用于以每个尺度校正后图像为输入、判断校正后图像是否为真实图像,输出类别标签和类别类别概率,类别标签包括真实图像和伪造图像。The discriminator includes M convolutional network structures. Each convolutional network structure includes a convolution layer, a LeakyReLU activation function and a spectral normalization layer. It is used to take the corrected image at each scale as input and determine whether the corrected image is Real images, output class labels and class probability, class labels include real images and fake images.
为了解决畸变发散问题,本实施例在生成器中插入特征矫正层,用于在传递之前对图像特征进行预矫正,同时在生成器的编码器中加入SKNet注意力机制层,可以提高模型的感受野和特征选择能力,可以更适合处理复杂的图像数据,预测的外观流被用于执行空间变换,从而得到矫正后的特征图。In order to solve the problem of distortion divergence, this embodiment inserts a feature correction layer into the generator to pre-correct image features before transmission. At the same time, a SKNet attention mechanism layer is added to the encoder of the generator to improve the experience of the model. With the field and feature selection capabilities, it can be more suitable for processing complex image data. The predicted appearance flow is used to perform spatial transformation to obtain the corrected feature map.
作为具体实施,生成器的主要部分是一个修复生成网络,组成为一个编码器和一个解码器,通过编码器对鱼眼图像进行特征提取得到多尺度的畸变特征图,对于每个畸变特征图、通过外观流进行矫正,得到矫正后畸变特征图,基于矫正后畸变特征图进行图像重建、生成校正后图像。As a specific implementation, the main part of the generator is a repair generation network, which is composed of an encoder and a decoder. The encoder extracts features from the fisheye image to obtain a multi-scale distortion feature map. For each distortion feature map, Correction is performed through the appearance flow to obtain the corrected distortion feature map, and image reconstruction is performed based on the corrected distortion feature map to generate the corrected image.
其中,编码器包括六个卷积结构,但对于前五个卷积结构,每个卷积结构作为下采样层包括一层卷积层和一个Leaky ReLU激活函数,卷积核大小为3*3,步长为1;最后一个卷积结构,使用ReLU激活函数,但没有下采样,用于对提取的特征图进行特征变换,这些卷积结构的输出将被用作生成器网络的中间特征表示。Among them, the encoder includes six convolution structures, but for the first five convolution structures, each convolution structure includes a convolution layer and a Leaky ReLU activation function as a downsampling layer, and the convolution kernel size is 3*3 , with a step size of 1; the last convolution structure, using the ReLU activation function but without downsampling, is used to perform feature transformation on the extracted feature maps. The output of these convolution structures will be used as the intermediate feature representation of the generator network .
解码器包括五个反卷积结构,每个反卷积结构作为上采样层由步长为1的3*3大小的卷积核和ReLU激活函数组成,用于对输入的矫正后畸变特征图进行上采样。The decoder includes five deconvolution structures. Each deconvolution structure serves as an upsampling layer and consists of a 3*3 convolution kernel with a stride of 1 and a ReLU activation function, which is used to correct the input distortion feature map. Perform upsampling.
每个反卷积结构的输出端连接有包括卷积层、Tanh激活函数和torgb函数的图像处理结构,卷积层使用步长为1的1x1卷积对矫正后畸变特征图进行卷积操作,并进行Tanh激活,然后通过torgb函数将灰度状态的矫正后畸变特征图映射为3通道的RGB图像。The output end of each deconvolution structure is connected to an image processing structure including a convolution layer, Tanh activation function and torgb function. The convolution layer uses a 1x1 convolution with a step size of 1 to perform a convolution operation on the corrected distortion feature map. And carry out Tanh activation, and then map the corrected distortion feature map of the grayscale state into a 3-channel RGB image through the torgb function.
图像矫正结构为具有3×3核的卷积层,将矫正后畸变特征图输入到该卷积层进行卷积操作,得到多尺度校正后图像,尺寸具体为8、16、32、64、128、。The image correction structure is a convolution layer with a 3×3 kernel. The corrected distortion feature map is input to the convolution layer for convolution operation to obtain a multi-scale corrected image. The sizes are specifically 8, 16, 32, 64, and 128. ,.
鉴别器包括四个卷积网路结构,每个卷积网络结构均包括卷积层、LeakyReLU激活函数和谱归一化层,其中前三个卷积网络结构中卷积层具有步长为2的5*5大小卷积核,最后一个卷积网络结构中卷积层具有步长为1的5*5大小卷积核。用于接收以校正后图像为输入,并逐渐提取特征,以便鉴别器可以识别输入的校正后图像是否为真实图像。卷积层的输出通道数为1,因为鉴别器的任务是二元分类,它决定输入数据是真实数据1还是生成器生成的伪造数据0。The discriminator includes four convolutional network structures. Each convolutional network structure includes a convolutional layer, a LeakyReLU activation function and a spectral normalization layer. The convolutional layer in the first three convolutional network structures has a step size of 2. 5*5 size convolution kernel, the convolution layer in the last convolutional network structure has a 5*5 size convolution kernel with a stride of 1. It is used to receive a corrected image as input and gradually extract features so that the discriminator can identify whether the input corrected image is a real image. The number of output channels of the convolutional layer is 1 because the task of the discriminator is binary classification, which decides whether the input data is real data 1 or fake data generated by the generator 0.
本实施例鉴别器中引入谱归一化层,用于约束鉴别器网络的权重矩阵,从而提高网络模型的稳定性和训练效率。谱归一化层主要应用于鉴别器的权重层(如卷积层、全连接层等),其作用是限制权重矩阵的谱范数。In this embodiment, a spectral normalization layer is introduced in the discriminator to constrain the weight matrix of the discriminator network, thereby improving the stability and training efficiency of the network model. The spectral normalization layer is mainly used in the weight layer of the discriminator (such as convolutional layer, fully connected layer, etc.), and its function is to limit the spectral norm of the weight matrix.
权重矩阵的每一列或行,都会通过一种数学操作来归一化,确保矩阵的最大奇异值(最大特征值的平方根)等于预定义的值,通常是1。这个过程会在每次参数更新之前对权重进行调整,因此网络的权重被限制在一个固定的范围内。具体体现在如下:Each column or row of the weight matrix is normalized by a mathematical operation that ensures that the largest singular value (the square root of the largest eigenvalue) of the matrix is equal to a predefined value, usually 1. This process adjusts the weights before each parameter update, so the network's weights are limited to a fixed range. Specifically reflected in the following:
(1)权重的谱范数限制:对于每个权重矩阵(权重层),谱归一化层通过将其转化为一个具有单位范数的权重矩阵来约束其谱范数,即它确保了矩阵的最大奇异值(最大特征值的平方根)等于1;(1) Spectral norm restriction of weights: For each weight matrix (weight layer), the spectral normalization layer constrains its spectral norm by converting it into a weight matrix with unit norm, that is, it ensures that the matrix The maximum singular value (the square root of the maximum eigenvalue) is equal to 1;
(2)动态调整:谱归一化是一个动态的过程,在每次参数更新之前,会对权重进行调整;具体来说,它通过计算权重矩阵的最大奇异值,并对权重矩阵的每一列(或行)进行相应的缩放,以使其符合所设定的最大奇异值;(2) Dynamic adjustment: Spectral normalization is a dynamic process. The weights are adjusted before each parameter update; specifically, it calculates the maximum singular value of the weight matrix and adjusts each column of the weight matrix. (or row) is scaled accordingly to make it comply with the set maximum singular value;
(3)在网络中的位置:谱归一化层通常被放置在鉴别器网络的权重层之前,作为这些层的一部分,它可以直接应用于网络的卷积层或全连接层,通过调整这些层的权重来限制其谱范数。(3) Position in the network: The spectral normalization layer is usually placed before the weight layer of the discriminator network. As part of these layers, it can be directly applied to the convolutional layer or fully connected layer of the network. By adjusting these The weight of the layer is used to limit its spectral norm.
本实施例中,在编码器中引入有SKNet注意力机制层,SKNet注意力机制层分为split、fuse、select三部分。第一部分为拆分,对于任何给定的特征图X∈C×h×w,默认情况下,首先进行M(图中M=2)个分支(其实就是卷积,对应图2SKNet结构图中的X到两个特征图U)。两个特征图的内核大小分别为3×3和5×5,这里的卷积是有效的分组/深度卷积、批量归一化和Relu函数依次组成。无非就是在两个支路上运用了深度卷积和分组卷积,以及一个支路是3×3Conv,另一个是5×5Conv(实际上是3x3,D=2的空洞卷积)。In this embodiment, the SKNet attention mechanism layer is introduced into the encoder. The SKNet attention mechanism layer is divided into three parts: split, fuse, and select. The first part is splitting. For any given feature map X∈C×h×w, by default, M (M=2 in the figure) branches (actually convolutions, corresponding to the SKNet structure diagram in Figure 2 X to two feature maps U). The kernel sizes of the two feature maps are 3×3 and 5×5 respectively. The convolution here is composed of effective group/depth convolution, batch normalization and Relu function in sequence. It is nothing more than using depth convolution and grouped convolution on two branches, and one branch is 3×3Conv, and the other is 5×5Conv (actually a 3x3, D=2 hole convolution).
第二部分Fuse就是特征融合,对Split两个分支的特征图U进行相加后,再做全局平均池化得到1×1×C的特征图(图中的s),s经过FC(全连接)层聚合压缩得到d×1的向量(图中的z),d的取值是由公式d=max(C/r,L)确定,r是一个缩小的比率,L表示d的最小值,实验中L的值为32。The second part of Fuse is feature fusion. After adding the feature maps U of the two branches of Split, global average pooling is performed to obtain a 1×1×C feature map (s in the picture). s is passed through FC (fully connected ) layer aggregation and compression to obtain a d×1 vector (z in the figure). The value of d is determined by the formula d=max(C/r,L). r is a reduction ratio, and L represents the minimum value of d. The value of L in the experiment is 32.
第三部分Select,跨通道的软注意力用于自适应地选择信息的不同空间尺度,这由压缩特征z指导。对向量z分别经M个FC层(示意图M=2)重新变回长度C,对2个向量在通道维度上求softmax,得到各自的权重向量,并将权重与Split的2个输出进行相乘得到精炼的的特征图,对两个精炼的特征图进行求和得到最终输出V。具体来说,将softmax运算符应用于通道数字:生成的z通过ac和bc两个函数,并将生成的函数值与原先的U1和U2相乘。由于ac和bc的函数值相加等于1,因此能够实现对分支中的特征图设置权重,因为不同的分支卷积核尺寸不同,因此实现了让网络自己选择合适的卷积核(ac和bc中的A、B矩阵均是需要在训练之前初始化的,其尺寸均为C×d)。In the third part, Select, cross-channel soft attention is used to adaptively select different spatial scales of information, which is guided by compressed features z. The vector z is transformed back to length C through M FC layers (schematic diagram M=2). Softmax is calculated for the two vectors in the channel dimension to obtain their respective weight vectors, and the weights are multiplied by the two outputs of Split. The refined feature map is obtained, and the final output V is obtained by summing the two refined feature maps. Specifically, the softmax operator is applied to the channel numbers: the resulting z is passed through two functions ac and bc, and the resulting function value is multiplied by the original U1 and U2. Since the sum of the function values of ac and bc is equal to 1, it is possible to set the weight of the feature map in the branch. Because the convolution kernel sizes of different branches are different, it is possible to let the network choose the appropriate convolution kernel (ac and bc). The A and B matrices in need to be initialized before training, and their sizes are C×d).
预测的流量被用于执行如下的空间变换,Forecasted traffic is used to perform the following spatial transformations,
是畸变矫正网络模型中的第个卷积网络结构输出的畸变特征图,对应的矫正后的畸变特征图是/> is the distortion feature map output by the convolutional network structure in the distortion correction network model, and the corresponding corrected distortion feature map is/>
基于特征图的可视化结果,本实施例提出了逐渐互补的机制。在畸变矫正网络模型的编码器部分,连续的卷积和池化操作会模糊特征图的边缘,使得畸变程度呈逐渐缩小的趋势。在流估计网络模型的解码器部分,由跳过连接传递的失真特征迫使预测的流量具有更大的位移以表示更大的失真。直观地说,可以将变形和位移的程度表示如下:Based on the visualization results of the feature map, this embodiment proposes a gradually complementary mechanism. In the encoder part of the distortion correction network model, continuous convolution and pooling operations will blur the edges of the feature map, causing the degree of distortion to gradually shrink. In the decoder part of the flow estimation network model, the distortion features passed by skip connections force the predicted flows to have larger displacements to represent larger distortions. Intuitively, the degree of deformation and displacement can be expressed as follows:
其中D是估计输入特征失真程度的函数,M是估计外观流的位移程度的函数。图像失真越大,矫正所需的位移就越大。此外,c和k是常数,where D is a function that estimates the degree of distortion of the input features, and M is a function that estimates the degree of displacement of the appearance flow. The greater the image distortion, the greater the displacement required for correction. Furthermore, c and k are constants,
从公式中可以看出,和/>之间有明显的等级对应关系。它们与128、64、32、16、8的大小相同。It can be seen from the formula that and/> There is an obvious hierarchical correspondence between them. They are the same size as 128, 64, 32, 16, 8.
通过渐进互补,可以利用外观流对畸变特征图进行预矫正。Through progressive complementation, the appearance flow can be used to pre-correct the distorted feature map.
本实施例中,畸变矫正网络模型利用外观流来矫正矫正相应的失真特征。矫正后的图像特征由多尺度损失监控以增强性能。在逐渐互补机制的帮助下,对于解码器上的畸变特征图已经得到了粗略的矫正。畸变矫正网络模型提取编码器上的特征。由于编码器特征包含失真,我们将其视为畸变特征。每层特征被发送到矫正层,利用相应的外观流进行预矫正。此后,在矫正特征的帮助下,解码器可以专注于内容重建。In this embodiment, the distortion correction network model uses the appearance flow to correct the corresponding distortion features. The rectified image features are monitored by multi-scale loss to enhance the performance. With the help of the gradual complementation mechanism, the distorted feature map on the decoder has been roughly corrected. The distortion correction network model extracts features on the encoder. Since the encoder features contain distortion, we consider them as distortion features. The features of each layer are sent to the rectification layer, which is pre-rectified using the corresponding appearance stream. Thereafter, with the help of corrective features, the decoder can focus on content reconstruction.
在渐进互补机制的帮助下,解码器上用于拼接的特征映射已经被粗略地矫正。矫正后的特征图可以带来很多细节,不传递失真结构。因此,网络可以生成视觉上更真实的矫正图像。该畸变矫正网络模型的工作过程可以表示为:The feature maps used for concatenation at the decoder have been roughly rectified with the help of a progressive complementation mechanism. The corrected feature map can bring a lot of details without conveying distortion structure. As a result, the network can generate visually more realistic rectified images. The working process of the distortion correction network model can be expressed as:
其中表示第i层矫正图像,Gc表示畸变矫正网络模型。为了保证矫正质量,本实施例将级联特征发送到具有3×3核的卷积层以获得多尺度矫正图像,并在相同尺度下对真实图像进行下采样以监督它们。in represents the i-th layer corrected image, and Gc represents the distortion correction network model. In order to ensure the quality of correction, this embodiment sends the cascaded features to a convolutional layer with a 3×3 kernel to obtain multi-scale rectified images, and downsamples the real images at the same scale to supervise them.
步骤S400模型训练:基于样本集对流估计网络模型和畸变矫正网络模型进行模型训练,得到训练后流估计网络模型以及训练后畸变矫正网络模型。Step S400 model training: perform model training on the flow estimation network model and the distortion correction network model based on the sample set, and obtain the trained flow estimation network model and the trained distortion correction network model.
本实施例中引入Wasserstein作为生成器和鉴别器之间的损失函数,用Wasserstein距离来衡量生成的图像与真实图像之间的差异,彻底解决GAN训练不稳定的问题,不再需要小心平衡生成器和判别器的训练程度。In this embodiment, Wasserstein is introduced as the loss function between the generator and the discriminator, and the Wasserstein distance is used to measure the difference between the generated image and the real image, completely solving the problem of unstable GAN training and no longer needing to carefully balance the generator. and the training level of the discriminator.
对于对尺度损失函数,可选用现有的损失函数构造方法、基于校正后图像和真实图像进行损失函数构建。For the scaling loss function, the existing loss function construction method can be used to construct the loss function based on the corrected image and the real image.
步骤S500图像矫正:以待矫正的鱼眼图像为输入,通过训练后流估计网络模型以及训练后畸变矫正网络模型预测输出每个尺度校正后图像的类别标签以及类别类别概率,基于类别类别概率得到最终校正后图像。Step S500 Image correction: Taking the fisheye image to be corrected as input, the trained flow estimation network model and the trained distortion correction network model are used to predict and output the category label and category probability of each scale-corrected image, and based on the category category probability, the Final corrected image.
本实施例的方法将矫正和内容重建分开,它包含两个网络模型:流量估计和畸变矫正。首先,流估计网络模型估计失真的图像结构,并用密集的外观流来表示结果。第二,畸变矫正网络模型利用流程来矫正畸变的特征,并使用矫正的特征来重建似是而非的结果。为了连接这两个网路模型,引入了一种渐进互补机制来实现多级矫正,在畸变矫正网路模型加入SKNet机制来帮助特征的提取,解决多尺度信息融合的问题,使网络能够适应不同尺度的特征。同时将谱归一化到鉴别器中,通过约束网络的权重矩阵的谱范数,来控制权的大小,从而减少了陡峭急剧和陡峭消失的问题,有助于更容易地训练深层次的神经网络,特别是在生成器和判别器之间的复杂博弈中,使生成的图像更真实自然。The method of this embodiment separates correction and content reconstruction, and it includes two network models: traffic estimation and distortion correction. First, a flow estimation network model estimates the distorted image structure and represents the result as a dense appearance flow. Second, the distortion correction network model utilizes a process to correct distorted features and uses the corrected features to reconstruct plausible results. In order to connect these two network models, a progressive complementary mechanism is introduced to achieve multi-level correction. The SKNet mechanism is added to the distortion correction network model to help extract features, solve the problem of multi-scale information fusion, and enable the network to adapt to different Characteristics of scale. At the same time, the spectrum is normalized into the discriminator, and the size of the weight is controlled by constraining the spectral norm of the weight matrix of the network, thus reducing the problems of steepness and disappearance, and helping to train deep neural networks more easily. Networks, especially in the complex game between generator and discriminator, make the generated images more realistic and natural.
实施例2:Example 2:
本发明一种基于SKNet注意力机制的鱼眼图像矫正系统,包括图像采集模块、流估计模型构建模块、畸变矫正模型构建模块、模型训练模块以及图像矫正模块,该系统执行实施例1公开的方法。The present invention is a fisheye image correction system based on the SKNet attention mechanism, including an image acquisition module, a flow estimation model building module, a distortion correction model building module, a model training module and an image correction module. The system executes the method disclosed in Embodiment 1 .
图像采集模块用于执行如下:获取真实图像以及对应的鱼眼图像构建样本集。The image acquisition module is used to perform the following: obtain real images and corresponding fisheye images to construct a sample set.
流估计模型构建模块用于执行如下:构建流估计网络模型,所述流估计网络模型为U型编码器-解码器结构,用于对输入的鱼眼图像进行外观特征提取、并将外观特征图映射为流,输出多尺度外观流。The flow estimation model building module is used to perform the following: construct a flow estimation network model, which is a U-shaped encoder-decoder structure, and is used to extract appearance features from the input fisheye image and convert the appearance feature map Map to a stream and output a multi-scale appearance stream.
本实施例流估计网络模型中编码器包括N个卷积结构,解码器包括N-1个反卷积结构。其中,每个卷积结构作为下采样结构包括依次连接的卷积层、归一化层和Leaky ReLU激活函数,用于对输入的图像进行降采样操作和外观特征提取,输出外观特征图;每个反卷积结构作为上采样结构包括一个残差模块和一个反卷积层,用于对输入的外观特征图进行上采样操作,输出N-1个尺度的外观特征图;每个反卷积结构的输出端连接有卷积层,卷积层用于对输入的外观特征图进行卷积操作、并将外观特征图映射为流,得到外观流。In the flow estimation network model of this embodiment, the encoder includes N convolution structures, and the decoder includes N-1 deconvolution structures. Among them, each convolution structure as a down-sampling structure includes a convolution layer, a normalization layer and a Leaky ReLU activation function connected in sequence, which is used to perform down-sampling operations and appearance feature extraction on the input image, and output the appearance feature map; each A deconvolution structure as an upsampling structure includes a residual module and a deconvolution layer, which is used to upsample the input appearance feature map and output appearance feature maps of N-1 scales; each deconvolution The output end of the structure is connected to a convolution layer. The convolution layer is used to perform a convolution operation on the input appearance feature map and map the appearance feature map into a stream to obtain the appearance stream.
作为流估计网络模型的具体实施,编码器为6层卷积层组成,每层包含1个卷积、归一化和LReLu激活函数激活等操作,用于进行图像的降采样和提取特征,将输入图像的分辨率降低,将256×256×3大小的图像降采样成4×4×512大小。As a specific implementation of the flow estimation network model, the encoder is composed of 6 convolutional layers. Each layer contains 1 convolution, normalization and LReLu activation function activation and other operations, which are used to downsample the image and extract features. The resolution of the input image is reduced, and the 256×256×3 size image is downsampled to a 4×4×512 size.
解码器由5层反卷积层组成,每层包含1个残差块和1个反卷积操作,用于进行图像的上采样操作,把图像恢复成原始分辨率。The decoder consists of 5 deconvolution layers, each layer contains 1 residual block and 1 deconvolution operation, which is used to upsample the image and restore the image to its original resolution.
流估计模块利用编码器-解码器结构提取特征并生成一系列形状流。在解码器的输出端设置有卷积层,该卷积层利用3x3大小的卷积核对提取的外观特征图进行处理,得到两个通道的外观流,从将不同分辨率的特征图映射到流的输出,通过这种方式,得到了大小为128、64、32、16、8的5个外观流。The flow estimation module utilizes an encoder-decoder structure to extract features and generate a series of shape flows. A convolutional layer is provided at the output end of the decoder. The convolutional layer uses a 3x3 convolution kernel to process the extracted appearance feature map to obtain an appearance stream of two channels. The feature maps of different resolutions are mapped to the stream. The output, in this way, is obtained 5 appearance streams with sizes of 128, 64, 32, 16, and 8.
流估计网络模型评估鱼眼图像的失真程度并将其呈现为外观流。该网络模型利用编码器--解码器结构提取特征并生成序列。具体来说,每个解码器的反卷积结构的输出特征经过额外的3x3卷积操作,得到两个通道的外观流,最终得到5个图形的外观流。The flow estimation network model evaluates the degree of distortion of the fisheye image and renders it as an apparent flow. The network model uses an encoder-decoder structure to extract features and generate sequences. Specifically, the output features of the deconvolution structure of each decoder undergo an additional 3x3 convolution operation to obtain an appearance stream of two channels, and finally obtain an appearance stream of 5 graphics.
其中Iin是输入的鱼眼图像,Gs呈现外观流估计模块,是第i个反卷积结构输出的外观流。where I in is the input fisheye image, G s presents the appearance flow estimation module, is the appearance stream output by the i-th deconvolution structure.
畸变矫正模型构建模块用于执行如下:构建畸变矫正网络模型,所述畸变矫正网络模型包括生成器、矫正层和鉴别器,所述生成器包括U型编码器-解码器结构,编码器引入有SKNet注意力机制层,用于对输入的鱼眼图像进行畸变特征提取、输出多尺度畸变特征图;所述矫正层用于以多尺度外观流和多尺度畸变特征图为输入、基于外观流对相应尺度的畸变特征图进行矫正,输出多尺度矫正后畸变特征图;所述编码器用于基于输入的多尺度矫正后畸变特征图进行图像重建,输出多尺度校正后图像;所述鉴别器引入有谱归一化层,用于以多尺度校正后图像为输入、判断每个尺度校正后图像是否为真实图像,并预测输出每个尺度校正后图像的类别标签以及对应的类别概率,类别标签包括真实图像和伪造图像。The distortion correction model building module is used to perform the following: construct a distortion correction network model. The distortion correction network model includes a generator, a correction layer and a discriminator. The generator includes a U-shaped encoder-decoder structure. The encoder introduces The SKNet attention mechanism layer is used to extract distortion features of the input fisheye image and output multi-scale distortion feature maps; the correction layer is used to take multi-scale appearance flow and multi-scale distortion feature maps as input, and based on the appearance flow pair Correct the distortion feature map of the corresponding scale, and output the multi-scale corrected distortion feature map; the encoder is used to perform image reconstruction based on the input multi-scale corrected distortion feature map, and output the multi-scale corrected image; the discriminator introduces spectral The normalization layer is used to take the multi-scale corrected image as input, determine whether the corrected image at each scale is a real image, and predict and output the category label of the corrected image at each scale and the corresponding category probability. The category label includes the real image. images and fake images.
本实施例畸变矫正模型类似GAN网络,生成器包括编码器、解码器、矫正层、图像处理结构和图像矫正结构,编码器包括N个卷积结构和N-1个反卷积结构,解码器包括N-1个反卷积结构。The distortion correction model in this embodiment is similar to a GAN network. The generator includes an encoder, a decoder, a correction layer, an image processing structure and an image correction structure. The encoder includes N convolution structures and N-1 deconvolution structures. The decoder Includes N-1 deconvolution structures.
对于前N-1个卷积结构,每个卷积结构作为下采样结构包括卷积层和Leaky ReLU激活函数,且每个下采样结构引入有sknet注意力机制层,用于对输入的鱼眼图像进行下采样操作以及畸变特征提取,输出对应尺度的畸变特征图;对于第N个卷积结构,所述卷积结构包括Leaky ReLU激活函数,用于对输入的畸变特征图进行特征变换,输出畸变特征图;矫正层基于渐进互补机制与流估计网络模型连接,用于执行如下:以多尺度外观流和多尺度畸变特征图为输入,对于每个尺度的畸变特征图,基于对应尺度的外观流对畸变特征图进行空间变换以矫正畸变特征图,输出对应尺度的矫正后畸变特征图,其中外观流和畸变特征图均共N路且一一对应;对于每个反卷积结构,所述卷积结构作为上采样结构包括卷积层和ReLU激活函数,用于对对应尺度的矫正后畸变特征图进行上采样操作,输出对应尺度的矫正后畸变特征图;每个反卷积结构的输出端配置有图像处理结构,图像处理结构包括卷积层、Tanh激活函数和torgb函数,用于对输入的矫正后畸变特征图进行卷积操作以及激活操作、并通过torgb函数将矫正后畸变特征图转换为RGB图像,得到RGB图像形式的矫正后畸变特征图;图像矫正结构为卷积层,用于以多尺度RGB图像形式的矫正后畸变特征图为输入,对每个尺度的矫正后畸变特征图进行下采样操作,输出多尺度校正后图像。For the first N-1 convolution structures, each convolution structure includes a convolution layer and a Leaky ReLU activation function as a down-sampling structure, and each down-sampling structure introduces a sknet attention mechanism layer for input fisheye The image is subjected to downsampling operation and distortion feature extraction, and a distortion feature map of the corresponding scale is output; for the Nth convolution structure, the convolution structure includes a Leaky ReLU activation function, which is used to perform feature transformation on the input distortion feature map and output Distortion feature map; the correction layer is connected to the flow estimation network model based on the asymptotic complementation mechanism and is used to perform the following: taking the multi-scale appearance flow and the multi-scale distortion feature map as input, for the distortion feature map of each scale, based on the appearance of the corresponding scale The flow performs spatial transformation on the distortion feature map to correct the distortion feature map, and outputs the corrected distortion feature map of the corresponding scale, in which the appearance flow and the distortion feature map both share N channels and correspond one to one; for each deconvolution structure, the As an upsampling structure, the convolution structure includes a convolution layer and a ReLU activation function, which is used to upsample the corrected distortion feature map of the corresponding scale and output the corrected distortion feature map of the corresponding scale; the output of each deconvolution structure The terminal is equipped with an image processing structure. The image processing structure includes a convolution layer, Tanh activation function and torgb function, which is used to perform convolution and activation operations on the input corrected distortion feature map, and convert the corrected distortion feature map through the togb function. Convert to an RGB image to obtain a corrected distortion feature map in the form of an RGB image; the image correction structure is a convolutional layer, which is used to take the corrected distortion feature map in the form of a multi-scale RGB image as input, and calculate the corrected distortion features of each scale The image is downsampled and the multi-scale corrected image is output.
鉴别器包括M个卷积网络结构,每个卷积网络结构均包括卷积层、LeakyReLU激活函数和谱归一化层,用于以每个尺度校正后图像为输入、判断校正后图像是否为真实图像,输出类别标签和类别类别概率,类别标签包括真实图像和伪造图像。The discriminator includes M convolutional network structures. Each convolutional network structure includes a convolution layer, a LeakyReLU activation function and a spectral normalization layer. It is used to take the corrected image at each scale as input and determine whether the corrected image is Real images, output class labels and class probability, class labels include real images and fake images.
为了解决畸变发散问题,本实施例在生成器中插入特征矫正层,用于在传递之前对图像特征进行预矫正,同时在生成器的编码器中加入SKNet注意力机制层,可以提高模型的感受野和特征选择能力,可以更适合处理复杂的图像数据,预测的外观流被用于执行空间变换,从而得到矫正后的特征图。In order to solve the problem of distortion divergence, this embodiment inserts a feature correction layer into the generator to pre-correct image features before transmission. At the same time, a SKNet attention mechanism layer is added to the encoder of the generator to improve the experience of the model. With the field and feature selection capabilities, it can be more suitable for processing complex image data. The predicted appearance flow is used to perform spatial transformation to obtain the corrected feature map.
作为具体实施,生成器的主要部分是一个修复生成网络,组成为一个编码器和一个解码器,通过编码器对鱼眼图像进行特征提取得到多尺度的畸变特征图,对于每个畸变特征图、通过外观流进行矫正,得到矫正后畸变特征图,基于矫正后畸变特征图进行图像重建、生成校正后图像。As a specific implementation, the main part of the generator is a repair generation network, which is composed of an encoder and a decoder. The encoder extracts features from the fisheye image to obtain a multi-scale distortion feature map. For each distortion feature map, Correction is performed through the appearance flow to obtain the corrected distortion feature map, and image reconstruction is performed based on the corrected distortion feature map to generate the corrected image.
其中,编码器包括六个卷积结构,但对于前五个卷积结构,每个卷积结构作为下采样层包括一层卷积层和一个Leaky ReLU激活函数,卷积核大小为3*3,步长为1;最后一个卷积结构,使用ReLU激活函数,但没有下采样,用于对提取的特征图进行特征变换,这些卷积结构的输出将被用作生成器网络的中间特征表示。Among them, the encoder includes six convolution structures, but for the first five convolution structures, each convolution structure includes a convolution layer and a Leaky ReLU activation function as a downsampling layer, and the convolution kernel size is 3*3 , with a step size of 1; the last convolution structure, using the ReLU activation function but without downsampling, is used to perform feature transformation on the extracted feature maps. The output of these convolution structures will be used as the intermediate feature representation of the generator network .
解码器包括五个反卷积结构,每个反卷积结构作为上采样层由步长为1的3*3大小的卷积核和ReLU激活函数组成,用于对输入的矫正后畸变特征图进行上采样。The decoder includes five deconvolution structures. Each deconvolution structure serves as an upsampling layer and consists of a 3*3 convolution kernel with a stride of 1 and a ReLU activation function, which is used to correct the input distortion feature map. Perform upsampling.
每个反卷积结构的输出端连接有包括卷积层、Tanh激活函数和torgb函数的图像处理结构,卷积层使用步长为1的1x1卷积对矫正后畸变特征图进行卷积操作,并进行Tanh激活,然后通过torgb函数将灰度状态的矫正后畸变特征图映射为3通道的RGB图像。The output end of each deconvolution structure is connected to an image processing structure including a convolution layer, Tanh activation function and torgb function. The convolution layer uses a 1x1 convolution with a step size of 1 to perform a convolution operation on the corrected distortion feature map. And carry out Tanh activation, and then map the corrected distortion feature map of the grayscale state into a 3-channel RGB image through the torgb function.
图像矫正结构为具有3×3核的卷积层,将矫正后畸变特征图输入到该卷积层进行卷积操作,得到多尺度校正后图像,尺寸具体为8、16、32、64、128、。The image correction structure is a convolution layer with a 3×3 kernel. The corrected distortion feature map is input to the convolution layer for convolution operation to obtain a multi-scale corrected image. The sizes are specifically 8, 16, 32, 64, and 128. ,.
鉴别器包括四个卷积网路结构,每个卷积网络结构均包括卷积层、LeakyReLU激活函数和谱归一化层,其中前三个卷积网络结构中卷积层具有步长为2的5*5大小卷积核,最后一个卷积网络结构中卷积层具有步长为1的5*5大小卷积核。用于接受以校正后图像为输入,并逐渐提取特征,以便鉴别器可以识别输入的校正后图像是否为真实图像。卷积层的输出通道数为1,因为鉴别器的任务是二元分类,它决定输入数据是真实数据1还是生成器生成的伪造数据0。The discriminator includes four convolutional network structures. Each convolutional network structure includes a convolutional layer, a LeakyReLU activation function and a spectral normalization layer. The convolutional layer in the first three convolutional network structures has a step size of 2. 5*5 size convolution kernel, the convolution layer in the last convolutional network structure has a 5*5 size convolution kernel with a stride of 1. It is used to accept a corrected image as input and gradually extract features so that the discriminator can identify whether the input corrected image is a real image. The number of output channels of the convolutional layer is 1 because the task of the discriminator is binary classification, which decides whether the input data is real data 1 or fake data generated by the generator 0.
本实施例中,在编码器中引入有SKNet注意力机制层,SKNet注意力机制层分为split、fuse、select三部分。第一部分为拆分,对于任何给定的特征图X∈C×h×w,默认情况下,首先进行M(图中M=2)个分支(其实就是卷积,对应图2SKNet结构图中的X到两个特征图U)。两个特征图的内核大小分别为3×3和5×5,这里的卷积是有效的分组/深度卷积、批量归一化和Relu函数依次组成。无非就是在两个支路上运用了深度卷积和分组卷积,以及一个支路是3×3Conv,另一个是5×5Conv(实际上是3x3,D=2的空洞卷积)。In this embodiment, the SKNet attention mechanism layer is introduced into the encoder. The SKNet attention mechanism layer is divided into three parts: split, fuse, and select. The first part is splitting. For any given feature map X∈C×h×w, by default, M (M=2 in the figure) branches (actually convolutions, corresponding to the SKNet structure diagram in Figure 2 X to two feature maps U). The kernel sizes of the two feature maps are 3×3 and 5×5 respectively. The convolution here is composed of effective group/depth convolution, batch normalization and Relu function in sequence. It is nothing more than using depth convolution and grouped convolution on two branches, and one branch is 3×3Conv, and the other is 5×5Conv (actually a 3x3, D=2 hole convolution).
第二部分Fuse就是特征融合,对Split两个分支的特征图U进行相加后,再做全局平均池化得到1×1×C的特征图(图中的s),s经过FC(全连接)层聚合压缩得到d×1的向量(图中的z),d的取值是由公式d=max(C/r,L)确定,r是一个缩小的比率,L表示d的最小值,实验中L的值为32。The second part of Fuse is feature fusion. After adding the feature maps U of the two branches of Split, global average pooling is performed to obtain a 1×1×C feature map (s in the picture). s is passed through FC (fully connected ) layer aggregation and compression to obtain a d×1 vector (z in the figure). The value of d is determined by the formula d=max(C/r,L). r is a reduction ratio, and L represents the minimum value of d. The value of L in the experiment is 32.
第三部分Select,跨通道的软注意力用于自适应地选择信息的不同空间尺度,这由压缩特征z指导。对向量z分别经M个FC层(示意图M=2)重新变回长度C,对2个向量在通道维度上求softmax,得到各自的权重向量,并将权重与Split的2个输出进行相乘得到精炼的的特征图,对两个精炼的特征图进行求和得到最终输出V。具体来说,将softmax运算符应用于通道数字:生成的z通过ac和bc两个函数,并将生成的函数值与原先的U1和U2相乘。由于ac和bc的函数值相加等于1,因此能够实现对分支中的特征图设置权重,因为不同的分支卷积核尺寸不同,因此实现了让网络自己选择合适的卷积核(ac和bc中的A、B矩阵均是需要在训练之前初始化的,其尺寸均为C×d)。In the third part, Select, cross-channel soft attention is used to adaptively select different spatial scales of information, which is guided by compressed features z. The vector z is transformed back to length C through M FC layers (schematic diagram M=2). Softmax is calculated for the two vectors in the channel dimension to obtain their respective weight vectors, and the weights are multiplied by the two outputs of Split. The refined feature map is obtained, and the final output V is obtained by summing the two refined feature maps. Specifically, the softmax operator is applied to the channel numbers: the resulting z is passed through two functions ac and bc, and the resulting function value is multiplied by the original U1 and U2. Since the sum of the function values of ac and bc is equal to 1, it is possible to set the weight of the feature map in the branch. Because the convolution kernel sizes of different branches are different, it is possible to let the network choose the appropriate convolution kernel (ac and bc). The A and B matrices in need to be initialized before training, and their sizes are C×d).
预测的流量被用于执行如下的空间变换,Forecasted traffic is used to perform the following spatial transformations,
是畸变矫正网络模型中的第个卷积网络结构输出的畸变特征图,对应的矫正后的畸变特征图是/> is the distortion feature map output by the convolutional network structure in the distortion correction network model, and the corresponding corrected distortion feature map is/>
基于特征图的可视化结果,本实施例提出了逐渐互补的机制。在畸变矫正网络模型的编码器部分,连续的卷积和池化操作会模糊特征图的边缘,使得畸变程度呈逐渐缩小的趋势。在流估计网络模型的解码器部分,由跳过连接传递的失真特征迫使预测的流量具有更大的位移以表示更大的失真。直观地说,可以将变形和位移的程度表示如下:Based on the visualization results of the feature map, this embodiment proposes a gradually complementary mechanism. In the encoder part of the distortion correction network model, continuous convolution and pooling operations will blur the edges of the feature map, causing the degree of distortion to gradually shrink. In the decoder part of the flow estimation network model, the distortion features passed by skip connections force the predicted flows to have larger displacements to represent larger distortions. Intuitively, the degree of deformation and displacement can be expressed as follows:
其中D是估计输入特征失真程度的函数,M是估计外观流的位移程度的函数。图像失真越大,矫正所需的位移就越大。此外,c和k是常数,where D is a function that estimates the degree of distortion of the input features, and M is a function that estimates the degree of displacement of the appearance flow. The greater the image distortion, the greater the displacement required for correction. Furthermore, c and k are constants,
从公式中可以看出,和/>之间有明显的等级对应关系。它们与128、64、32、16、8的大小相同。It can be seen from the formula that and/> There is an obvious hierarchical correspondence between them. They are the same size as 128, 64, 32, 16, 8.
通过渐进互补,可以利用外观流对畸变特征图进行预矫正。Through progressive complementation, the appearance flow can be used to pre-correct the distorted feature map.
本实施例中,畸变矫正网络模型利用外观流来矫正矫正相应的失真特征。矫正后的图像特征由多尺度损失监控以增强性能。在逐渐互补机制的帮助下,对于解码器上的畸变特征图已经得到了粗略的矫正。畸变矫正网络模型提取编码器上的特征。由于编码器特征包含失真,将其视为畸变特征。每层特征被发送到矫正层,利用相应的外观流进行预矫正。此后,在矫正特征的帮助下,解码器可以专注于内容重建。In this embodiment, the distortion correction network model uses the appearance flow to correct the corresponding distortion features. The rectified image features are monitored by multi-scale loss to enhance the performance. With the help of the gradual complementation mechanism, the distorted feature map on the decoder has been roughly corrected. The distortion correction network model extracts features on the encoder. Since the encoder features contain distortion, they are treated as distortion features. The features of each layer are sent to the rectification layer, which is pre-rectified using the corresponding appearance stream. Thereafter, with the help of corrective features, the decoder can focus on content reconstruction.
在渐进互补机制的帮助下,解码器上用于拼接的特征映射已经被粗略地矫正。矫正后的特征图可以带来很多细节,不传递失真结构。因此,网络可以生成视觉上更真实的矫正图像。该畸变矫正网络模型的工作过程可以表示为:The feature maps used for concatenation at the decoder have been roughly rectified with the help of a progressive complementation mechanism. The corrected feature map can bring a lot of details without conveying distortion structure. As a result, the network can generate visually more realistic rectified images. The working process of the distortion correction network model can be expressed as:
其中表示第i层矫正图像,Gc表示畸变矫正网络模型。为了保证矫正质量,本实施例将级联特征发送到具有3×3核的卷积层以获得多尺度矫正图像,并在相同尺度下对真实图像进行下采样以监督它们。in represents the i-th layer corrected image, and Gc represents the distortion correction network model. In order to ensure the quality of correction, this embodiment sends the cascaded features to a convolutional layer with a 3×3 kernel to obtain multi-scale rectified images, and downsamples the real images at the same scale to supervise them.
模型训练模块用于执行如下:基于样本集对流估计网络模型和畸变矫正网络模型进行模型训练,得到训练后流估计网络模型以及训练后畸变矫正网络模型。The model training module is used to perform the following: perform model training on the flow estimation network model and the distortion correction network model based on the sample set, and obtain the trained flow estimation network model and the trained distortion correction network model.
本实施例中引入Wasserstein作为生成器和鉴别器之间的损失函数,用Wasserstein距离来衡量生成的图像与真实图像之间的差异,彻底解决GAN训练不稳定的问题,不再需要小心平衡生成器和判别器的训练程度。In this embodiment, Wasserstein is introduced as the loss function between the generator and the discriminator, and the Wasserstein distance is used to measure the difference between the generated image and the real image, completely solving the problem of unstable GAN training and no longer needing to carefully balance the generator. and the training level of the discriminator.
对于对尺度损失函数,可选用现有的损失函数构造方法、基于校正后图像和真实图像进行损失函数构建。For the scaling loss function, the existing loss function construction method can be used to construct the loss function based on the corrected image and the real image.
图像矫正模块用于执行如下:以待矫正的鱼眼图像为输入,通过训练后流估计网络模型以及训练后畸变矫正网络模型预测输出每个尺度校正后图像的类别标签以及类别类别概率,基于类别类别概率得到最终校正后图像。The image correction module is used to perform the following: taking the fisheye image to be corrected as input, predicting and outputting the category label and category probability of the corrected image at each scale through the trained flow estimation network model and the trained distortion correction network model, based on the category The class probability is used to obtain the final corrected image.
上文通过附图和优选实施例对本发明进行了详细展示和说明,然而本发明不限于这些已揭示的实施例,基与上述多个实施例本领域技术人员可以知晓,可以组合上述不同实施例中的代码审核手段得到本发明更多的实施例,这些实施例也在本发明的保护范围之内。The present invention has been shown and described in detail through the drawings and preferred embodiments above. However, the present invention is not limited to these disclosed embodiments. Based on the above-mentioned multiple embodiments, those skilled in the art will know that the above-mentioned different embodiments can be combined. The code review means in the method can lead to more embodiments of the present invention, and these embodiments are also within the protection scope of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311571580.5A CN117522748A (en) | 2023-11-23 | 2023-11-23 | Fisheye image correction method and system based on SKNet attention mechanism |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311571580.5A CN117522748A (en) | 2023-11-23 | 2023-11-23 | Fisheye image correction method and system based on SKNet attention mechanism |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN117522748A true CN117522748A (en) | 2024-02-06 |
Family
ID=89764121
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311571580.5A Pending CN117522748A (en) | 2023-11-23 | 2023-11-23 | Fisheye image correction method and system based on SKNet attention mechanism |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117522748A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117952877A (en) * | 2024-03-26 | 2024-04-30 | 松立控股集团股份有限公司 | Low-quality image correction method based on hierarchical structure modeling |
-
2023
- 2023-11-23 CN CN202311571580.5A patent/CN117522748A/en active Pending
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117952877A (en) * | 2024-03-26 | 2024-04-30 | 松立控股集团股份有限公司 | Low-quality image correction method based on hierarchical structure modeling |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111798400B (en) | Reference-free low-light image enhancement method and system based on generative adversarial network | |
| CN112233038B (en) | True image denoising method based on multi-scale fusion and edge enhancement | |
| CN113139907A (en) | Generation method, system, device and storage medium for visual resolution enhancement | |
| CN111835983B (en) | A method and system for multi-exposure high dynamic range imaging based on generative adversarial network | |
| CN108875900B (en) | Video image processing method and device, neural network training method, storage medium | |
| CN115359370B (en) | A remote sensing image cloud detection method, device, computer device and storage medium | |
| CN114170290A (en) | Image processing method and related equipment | |
| CN117635478B (en) | Low-light image enhancement method based on spatial channel attention | |
| CN115100409B (en) | A video portrait segmentation algorithm based on Siamese network | |
| CN115965559A (en) | An integrated aerial image enhancement method for forest scenes | |
| CN113344773A (en) | Single picture reconstruction HDR method based on multi-level dual feedback | |
| CN118823558A (en) | A 3D point cloud quality prediction method based on graph convolutional neural network | |
| CN114972062A (en) | Image restoration model based on parallel self-adaptive guide network and method thereof | |
| CN117522748A (en) | Fisheye image correction method and system based on SKNet attention mechanism | |
| CN115018726A (en) | An estimation method of image non-uniform blur kernel based on U-Net | |
| CN118799369A (en) | A multimodal image registration method and system based on iterative optimization | |
| CN111311732B (en) | 3D human body grid acquisition method and device | |
| CN112767269B (en) | Panoramic image defogging method and device | |
| CN118096978B (en) | A method for rapid generation of 3D art content based on arbitrary stylization | |
| CN117809043B (en) | Foundation cloud picture segmentation and classification method | |
| CN117611442A (en) | Near infrared face image generation method | |
| CN116309072A (en) | A binocular image super-resolution method based on feature channel separation and fusion | |
| CN116245968A (en) | Method for generating HDR image based on LDR image of transducer | |
| CN114283064A (en) | Super-resolution self-supervision learning system based on real world dual-focus image | |
| CN113674186A (en) | Image synthesis method and device based on self-adaptive adjustment factor |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |