[go: up one dir, main page]

CN120640000A - Multi-scale semantic guided image compression method, system and storage medium - Google Patents

Multi-scale semantic guided image compression method, system and storage medium

Info

Publication number
CN120640000A
CN120640000A CN202511121158.9A CN202511121158A CN120640000A CN 120640000 A CN120640000 A CN 120640000A CN 202511121158 A CN202511121158 A CN 202511121158A CN 120640000 A CN120640000 A CN 120640000A
Authority
CN
China
Prior art keywords
scale
semantic
compression method
image compression
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202511121158.9A
Other languages
Chinese (zh)
Inventor
周开军
廖婷
周鲜成
谭平
覃业梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangjiang Laboratory
Original Assignee
Xiangjiang Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangjiang Laboratory filed Critical Xiangjiang Laboratory
Priority to CN202511121158.9A priority Critical patent/CN120640000A/en
Publication of CN120640000A publication Critical patent/CN120640000A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)

Abstract

本发明公开了一种多尺度语义引导图像压缩方法、系统及存储介质,包括以下步骤:获取输入图像数据,对输入图像进行预处理操作,得到标准化图像数据;将标准化图像数据输入预训练的语义分割网络,生成多尺度语义特征图及其对应的语义权重图;构建三级金字塔编码器,对标准化图像数据逐级执行:深度可分离卷积下采样,生成多尺度特征;可逆神经网络对多尺度特征进行非线性变换;自适应离散小波变换将经过非线性变换的多尺度特征分解为低频子带与高频子带,基于语义权重图对高频子带执行动态选择性状态空间建模,生成压缩码流;将压缩码流输入解码器,基于轻量化Mamba模块解码,结合逆小波变换与语义权重图重建图像。

The present invention discloses a multi-scale semantic-guided image compression method, system and storage medium, comprising the following steps: acquiring input image data, performing a preprocessing operation on the input image to obtain standardized image data; inputting the standardized image data into a pre-trained semantic segmentation network to generate a multi-scale semantic feature map and its corresponding semantic weight map; constructing a three-level pyramid encoder to perform the following steps on the standardized image data: depthwise separable convolution downsampling to generate multi-scale features; performing a nonlinear transformation on the multi-scale features using a reversible neural network; decomposing the nonlinearly transformed multi-scale features into low-frequency sub-bands and high-frequency sub-bands through adaptive discrete wavelet transform, performing dynamic selective state space modeling on the high-frequency sub-bands based on the semantic weight map to generate a compressed code stream; inputting the compressed code stream into a decoder, decoding based on a lightweight Mamba module, and reconstructing the image by combining inverse wavelet transform with the semantic weight map.

Description

Multi-scale semantic guided image compression method, system and storage medium
Technical Field
The invention relates to the technical field of image data processing, in particular to a multi-scale semantic guided image compression method, a system and a storage medium.
Background
In recent years, image compression methods based on deep learning gradually replace the compression algorithms of traditional manual designs. However, existing compression models based on attention mechanisms or convolution structures still present challenges in dealing with long-range dependencies and semantic region preservation. Especially in low bit rate scenes, the perceived restoration quality of important areas (such as faces and characters) is difficult to guarantee, and key areas are fuzzy and semantically distorted. The existing model based on deep learning is difficult to balance the long-distance dependent modeling and calculation efficiency.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides a multi-scale semantic guided image compression method, a system and a storage medium, which can solve the problems of low modeling efficiency and serious compression distortion when the current image compression algorithm is used for efficiently modeling long-distance dependence, maintaining the image quality of a key semantic region and realizing high-resolution image compression.
According to an embodiment of the first aspect of the invention, a multi-scale semantic guided image compression method comprises the following steps:
s100, acquiring input image data, and preprocessing the input image to obtain standardized image data ;
S200, the standardized image dataInputting a pre-trained semantic segmentation network to generate a multi-scale semantic feature mapSemantic weight map corresponding to the same;
S300, constructing a three-level pyramid encoder for the standardized image dataStep by step execution:
s301, depth separable convolution downsampling to generate multi-scale features ;
S302, the reversible neural network performs the multi-scale featurePerforming nonlinear transformation;
s303, adaptively discrete wavelet transforming the multi-scale features subjected to nonlinear transformation Decomposition into low frequency subbandsAnd high frequency sub-bands;
S400, based on the semantic weight graphFor the high frequency sub-bandPerforming dynamic selective state space modeling, comprising:
Updating the state by a bidirectional scanning mechanism;
generating a dynamic convolution kernel;
Channel-space two-way attention gating enhancement features;
s500, performing non-uniform quantization and entropy coding on the modeled features of each layer in the step S400 to generate a compressed code stream;
S600, inputting the compressed code stream into a decoder, decoding based on a lightweight Mamba module, and combining inverse wavelet transformation and a semantic weight map An image is reconstructed.
According to some embodiments of the invention, in S200, the normalizing the image dataInputting a pre-trained semantic segmentation network to generate a multi-scale semantic feature mapSemantic weight map corresponding to the sameComprising the following steps:
Extracting multi-scale features from Xception65 trunk of DeepLabv & lt3+ & gt network, and generating semantic feature map after fusion by using a cavity space pyramid pooling module Aligning the coding layers through bilinear interpolation, and generating the semantic weight graph by taking the maximum value along the channel dimension
In accordance with some embodiments of the invention, in S300, the depth separable convolutionally downsampled, a multi-scale feature is generatedComprising the following steps:
Original image scale feature Half-scale featuresQuarter scale features;
Wherein R is real space, H is feature map height, and W is feature map width.
According to some embodiments of the invention, in S300, the reversible neural network is configured to determine the multi-scale featurePerforming the nonlinear transformation includes:
The reversible neural network performs a forward transform:
;
;
;
the reversible neural network performs a reverse transformation:
;
;
;
Wherein:
For multi-scale features Two parts divided along the channel dimension;
Is an output feature;
F. g is a three-layer convolution residual block;
T Representing a forward mapping function;
Representing the inverse mapping function.
According to some embodiments of the invention, in S400, the bidirectional scanning mechanism includes:
forward state update equation:
;
the backward state update equation:
;
Wherein:
respectively representing forward and backward states;
e, R C×1 is a state memory weight;
e, R C×C is an input gating weight matrix;
a depth separable convolution operation for dynamically generating a convolution kernel;
Is an element-level product.
According to some embodiments of the invention, in S400, the dynamic convolution kernel generation includes:
Based on the semantic weight map And the high frequency sub-bandGenerating a query Q, a key K and a value V, and calculating a convolution kernel parameter matrix through multiple attentions:
;
Wherein d is the dimension of the attention head, softmaxIs a normalized exponential function, and T is a matrix transposition.
According to some embodiments of the invention, in S400, the channel-space dual-channel attention-gating enhancement feature comprises:
channel attention:
;
Spatial attention:
;
final output characteristics:
;
Wherein:
x is the input feature;
GAP is global average pooling;
MLP is a multi-layer perceptron;
Is that Is a function of the activation of (a);
the product by element is indicated as follows.
According to some embodiments of the invention, in S500, the non-uniform quantization measure includes:
The quantization step size is adaptively adjusted according to the semantic weight graph:
;
Wherein:
taking the quantized step length as a basis;
is the position in the semantic weight graph Response intensity of (2);
Lambda is the adjustment coefficient;
The quantization operation is defined as: ;
Wherein: representing the position ,Z i,j is the characteristic value to be quantized, and is output from dynamic SSM modeling; rounding to a rounding function; For adaptive quantization step size, the semantic weights are dynamically adjusted.
According to some embodiments of the invention, in S500, the entropy encoding includes:
based on the state space modeling super prior network, constructing a joint probability model for potential variables:
;
Wherein:
N represents a Gaussian distribution;
respectively represent feature dimension ,The predicted mean and standard deviation of the dimension are generated by the super prior network:
;;
Wherein:
lightweight neural network module for state space modeling for estimating parameters for each feature location ;
Lightweight neural network module for state space modeling for estimating parameters for each feature location;
Representing a channel splicing operation;
is a decoded feature.
According to some embodiments of the invention, in S600, the inputting the compressed code stream into a decoder, based on lightweight Mamba module decoding, combines inverse wavelet transform with a semantic weight mapReconstructing an image comprises the steps of:
Mamba module decodes the state update:
;
Wherein:
is the current moment state vector;
A、B State transition coefficients generated for semantic guidance;
Is a gating function;
inverse wavelet transform reconstruction features:
;
Wherein:
reconstructing a feature map for a kth layer;
is a learnable inverse wavelet transform operator;
multiscale semantic fusion output image:
;
Wherein:
is a learnable fusion weight.
According to some embodiments of the invention, in S500, the method further includes median deviation mapping quantization encoding, including the steps of:
potential eigenvalues Mapping to a median reference coordinate system, and calculating the interval to which the median reference coordinate system belongsMedian of (2)And calculates the deviation value:
;
Symmetric discrete quantization of the deviation:
;
Wherein gamma is the quantization step length;
Generating a ternary Performing compression representation;
Wherein: for the spatial position index, To quantify the deviation.
According to the embodiment of the second aspect of the invention, the multi-scale semantic guidance image compression system comprises a memory and a processor, wherein the processor realizes the multi-scale semantic guidance image compression method when executing a computer program stored in the memory.
According to an embodiment of the third aspect of the present invention, a storage medium stores a determination program of a multi-scale semantic guided image compression method, which is implemented when executed by a processor.
The multi-scale semantic guided image compression method, system and storage medium have the advantages that by introducing a dynamic selective state space modeling mechanism and combining bidirectional scanning and semantic sensitive attention gating, the computing complexity is effectively reduced, the detail capturing and global context understanding capability of key areas of an image such as faces and characters is enhanced, and the problem of image detail blurring caused by high-frequency information loss in the traditional compression method is solved. The multi-scale-wavelet joint coding architecture constructed by fusing the reversible neural network and the adaptive wavelet transformation effectively realizes lossless compression of the low-frequency sub-band, avoids low-frequency distortion, and simultaneously retains texture details by virtue of lightweight dynamic convolution coding of the high-frequency sub-band so as to meet the real-time processing requirement of edge equipment. At the decoding end, a selective state space activation mechanism based on Mamba module decoding structure is introduced, only key channels are reserved to participate in image reconstruction, the decoding calculation amount is obviously reduced, and the decoding calculation amount is obviously reduced through inverse wavelet transformation and a semantic weight graphAnd fusing the reconstructed output images.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The invention is further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a schematic diagram of a computer device in a hardware operating environment according to one embodiment of the invention;
FIG. 2 is a flow diagram of a multi-scale semantic guided image compression method according to one embodiment of the present invention;
FIG. 3 is a schematic diagram of the overall network structure of a multi-scale semantic guided image compression method according to an embodiment of the present invention;
FIG. 4 is a block diagram of an ASPP module of a multi-scale semantic guided image compression method according to one embodiment of the present invention;
FIG. 5 is a block diagram of a multi-scale-wavelet coding module of a multi-scale semantic guided image compression method according to one embodiment of the present invention;
FIG. 6 is a diagram of a dynamic SSM module architecture of a multi-scale semantic guided image compression method according to one embodiment of the present invention;
FIG. 7 is a median deviation map quantized encoding strategy diagram of a multi-scale semantic guided image compression method according to one embodiment of the present invention;
FIG. 8 is a diagram of a decoding module of a multi-scale semantic guided image compression method according to one embodiment of the present invention;
FIG. 9 is a block diagram of a multi-scale semantic guided image compression system according to one embodiment of the present invention.
Reference numerals:
Processor 1001, communication bus 1002, user interface 1003, network interface 1004, memory 1005.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
In the description of the present invention, it should be understood that the direction or positional relationship indicated with respect to the description of the orientation, such as up, down, etc., is based on the direction or positional relationship shown in the drawings, is merely for convenience of describing the present invention and simplifying the description, and does not indicate or imply that the apparatus or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In the description of the present invention, plural means two or more. The description of the first and second is for the purpose of distinguishing between technical features only and should not be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, unless explicitly defined otherwise, terms such as arrangement, installation, connection, etc. should be construed broadly and the specific meaning of the terms in the present invention can be reasonably determined by a person skilled in the art in combination with the specific contents of the technical scheme.
Referring to fig. 1, fig. 1 is a schematic diagram of a computer device structure of a hardware running environment according to an embodiment of the present application.
As shown in FIG. 1, the computer device may include a processor 1001, such as a central processing unit (CentralProcessing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a high-speed random access Memory (RandomAccess Memory, RAM) or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
Those skilled in the art will appreciate that the architecture shown in fig. 1 is not limiting of a computer device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
As shown in fig. 1, an operating system, a network communication module, a user interface module, and a face detection program for edge-oriented computing may be included in the memory 1005 as one storage medium.
In the computer device shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server, the user interface 1003 is mainly used for data interaction with a user, the processor 1001 and the memory 1005 in the application can be arranged in the computer device, and the computer device calls a multi-scale semantic guidance image compression program based on dynamic state space modeling stored in the memory 1005 through the processor 1001 and executes the multi-scale semantic guidance image compression method based on dynamic state space modeling provided by the embodiment of the application.
Referring to fig. 2, the invention discloses a multi-scale semantic guided image compression method, which comprises the following steps:
s100, acquiring input image data, and preprocessing the input image to obtain standardized image data ;
S200, normalizing the image dataInputting a pre-trained semantic segmentation network to generate a multi-scale semantic feature mapSemantic weight map corresponding to the same;
S300, constructing a three-level pyramid encoder, and standardizing image dataStep by step execution:
s301, depth separable convolution downsampling to generate multi-scale features ;
S302, reversible neural network is used for multi-scale featurePerforming nonlinear transformation;
s303, adaptive discrete wavelet transformation is to be subjected to nonlinear transformation on multi-scale characteristics Decomposition into low frequency subbandsAnd high frequency sub-bands;
S400, based on semantic weight graphFor high frequency sub-bandsPerforming dynamic selective state space modeling, comprising:
Updating the state by a bidirectional scanning mechanism;
generating a dynamic convolution kernel;
Channel-space two-way attention gating enhancement features;
s500, performing non-uniform quantization and entropy coding on the modeled features of each layer in the step S400 to generate a compressed code stream;
S600, inputting the compressed code stream into a decoder, decoding based on a lightweight Mamba module, and combining inverse wavelet transformation and a semantic weight map An image is reconstructed.
In this embodiment, the input RGB image is subjected to color space conversion (RGB→YUV), normalization ([ 0,255] →1,1 ]) and anomaly filtering to generate a normalized image. Semantic segmentation network (DeepLabv & lt3+ & gt) for extracting multi-scale feature mapAnd generating a semantic weight graph by taking maximum value along the channel. The encoder adopts a three-level pyramid structure:
depth separable convolutional downsampling (step size = 2);
The INN block performs a coupled transformation: ,;
adaptive discrete wavelet transform multi-scale features to be non-linearly transformed Decomposition into low frequency subbandsAnd high frequency sub-bands
And the dynamic SSM coding module is used for fusing a dynamic convolution and attention mechanism in a state space model, so that the state space modeling and dynamic coding are carried out on the image data, and the modeling capacity of a key region is enhanced. Dynamic SSM coding module for high frequency sub-bandsPerforming semantic guided state space modeling, entropy coding adopts a non-uniform quantization strategy, and reconstructing an image by a lightweight Mamba module and inverse wavelet transform (IDWT) at a decoding end. Semantic weight mapAs a space importance priori, the dynamic SSM coding module models, quantizes step length allocation and decoding path activation, and realizes resource allocation according to requirements.
In some embodiments of the present invention, in step S100, input image data is acquired by an image acquisition module, and color space conversion, normalization, and abnormal image filtering are performed to generate normalized image data. The step of generating normalized image data includes converting an RGB image into a YUV color space using a standard conversion matrix, separating luminance (Y) and chrominance (UV) components, preserving Y channels for subsequent compression, sub-sampling the UV components to reduce the amount of data, linearly normalizing YUV channel pixel values while preserving sign bits to support negative computation of subsequent wavelet transforms, computing an image sharpness score by a Laplacian operator, counting the duty ratio of pixel values in luminance channels exceeding a threshold, determining an outlier image and filtering, and finally outputting the preprocessed normalized image dataE R H×W×3 whose dimensions are consistent with the original input.
In step S200, the image data is normalizedInputting a pre-trained semantic segmentation network to generate a multi-scale semantic feature mapSemantic weight map corresponding to the sameComprising the following steps:
Extracting multi-scale features from Xception65 trunk of DeepLabv & lt3+ & gt network, and generating semantic feature map after fusion by using a cavity space pyramid pooling module Aligning the coding layers through bilinear interpolation, and taking the maximum value along the channel dimension to generate a semantic weight map
In this embodiment, a pre-trained network is used to extract multi-scale semantic features, and a semantic weight map is generated by hole space pyramid pooling for semantic guidance of subsequent encoding.
Specifically, a pre-trained network is adopted to extract multi-scale semantic features, and semantic weight graphs are generated through hole space pyramid pooling;
It should be noted that, a pre-trained DeepLabv3+ model was used, and the backbone network was Xception65. Standardizing image data Feature extraction is carried out through DeepLabv & lt3+ & gt convolution layers in sequence, and low-level features are obtainedMid-level featuresAnd advanced featuresWherein c1=256, c2=512, c3=1024.
It should be noted that multi-scale context modeling, high-level featuresThe input ASPP modules, as shown in fig. 4, are processed by five parallel branches:
branch 1:1X1 standard convolution, keeping space dimension unchanged;
3×3 hole convolutions with hole ratios of 6, 12 and 18 respectively;
and 5, carrying out global average pooling, then up-sampling to the original space dimension, and recovering the channel dimension through 1X 1 convolution.
The five branch outputs are subjected to channel dimension splicing and are fused through 1X 1 convolution to obtain a multi-scale feature fusion graph
Semantic segmentation map generation F ASPP is input into the 3 x 3 convolution and Softmax layers to generate a semantic prediction mapWhere Cs represents the number of semantic categories.
S is up-sampled to be aligned with each level of input resolution of the encoder through bilinear interpolation, and a three-level semantic feature map { S 0,S1,S2 }, which corresponds to the original resolution, 1/2 resolution and 1/4 resolution respectively, is generated.
Subsequently for each of which a multi-scale semantic feature map is generatedTaking the maximum value along the channel dimension to obtain a weight graph:
;
The semantic weight map Will be used for dynamic attention generation and important region guidance during the encoding and compression stages.
In some embodiments of the invention, depth separable convolution downsampling, in S300, generates multi-scale featuresComprising the following steps:
Original image scale feature Half-scale featuresQuarter scale features
In this embodiment, through the multi-scale-wavelet joint coding module, a three-level pyramid structure is constructed by using depth separable convolution with a step length of 2 to perform step-by-step downsampling, and then adaptive wavelet transformation is performed on the feature map of each scale to decompose the image features into features and low-frequency sub-bandsAnd high frequency sub-bandsAnd lossless compression is achieved through a reversible neural network.
Specifically, each stage of the pyramid encoder performs the following operations, as shown in fig. 5:
initial downsampling and feature extraction to normalize image data Inputting the three layers of depth separable convolution networks to generate three layers of characteristic expressions:
First stage, input standardized image data Output of;
Second stage, inputOutput of;
Third stage, inputOutput of;
Wherein the depth separable convolution operation is defined as:
;
Wherein, the From the following componentsConvolution and its applicationThe convolution series connection structure can remarkably reduce the calculation cost and is suitable for edge calculation equipment.
Reversible neural network transformation (INN) for each level of featuresDecompose it intoAndTwo parts, the following coupling layer forward mapping is performed:
;
;
;
Output of = (,) Where F and G are three layers of convolved residual blocks, each layer containing BatchNorm (batch normalization), leakyReLU (modified linear unit with leakage) and 3 x 3 convolutions.
Adaptive wavelet decomposition to output INN characteristicsInputting to a custom wavelet decomposition module, performing a learnable discrete wavelet transform (Learnable DWT):
;
Wherein:
the low frequency sub-band, keep the main structural information of the picture;
the high frequency sub-band keeps texture and edge details and is further sent to a state space modeling module for processing.
The high frequency subband size and channel number are: wherein the triple channels correspond to horizontal, vertical and diagonal detail directions in the wavelet transform.
It should be noted that, the reversible neural network INN supports precise inverse transformation, and the inverse function is defined as:
;
;
;
Representing a reverse mapping function to ensure output of the encoding end Can be recovered by INN in the decoding stage, and can meet the lossless compression requirement of the low-frequency sub-band.
In some embodiments of the present invention, in S400, the bi-directional scanning mechanism includes:
forward state update equation:
;
the backward state update equation:
;
Wherein:
respectively representing forward and backward states;
e, R C×1 is a state memory weight;
e, R C×C is an input gating weight matrix;
a depth separable convolution operation for dynamically generating a convolution kernel;
Is an element-level product.
In the embodiment, a dynamic SSM coding module is used for fusing a dynamic convolution and an attention mechanism in a state space model to perform state space modeling and dynamic coding on image data, so that the modeling capacity of a key region is enhanced.
And a bidirectional scanning mechanism and a dynamic convolution kernel driven by a semantic weight graph are introduced, and the modeling of a key region is enhanced through channel-space double-channel attention gating, so that the computational complexity is reduced. The method specifically comprises the following steps:
as shown in fig. 6, the implementation of the dynamic SSM module includes:
Receiving high frequency subband features from step S300 And the multi-scale semantic weight map generated in the step S200A dynamic selective state space modeling (DYNAMIC SELECTIVE SSM) module is constructed to realize dynamic modeling and compressed representation of high semantic region features.
The semantic alignment process includes first employing a bilinear interpolation function resizeWeight map for meaningHigh frequency features with spatial dimensions adjusted to corresponding dimensionsAnd (3) coincidence:
;
Wherein: is the aligned semantic weight graph, size A function is adjusted for the spatial dimension.
The operation ensures that the semantic guidance effect corresponds to the feature space one by one, and the guidance precision is enhanced.
The bi-directional state update mechanism includes:
For feature modeling on a time sequence, a bidirectional state space structure is introduced, comprising two directions of forward state updating and backward state updating. The specific update formula is as follows:
the forward state update equation is:
;
The backward state update equation is:
;
Wherein:
respectively represent a forward direction state and a backward direction state, E R C×1 is state memory weight and is subject to a multi-scale semantic weight graphThe guiding of the guide is performed,E R C×C is the input gating weight matrix,Depth separable convolution operations that dynamically generate convolution kernels,Is an element-level product.
In some embodiments of the present invention, in S400, the dynamic convolution kernel generation includes:
Semantic weight graph And high frequency sub-bandsGenerating a query Q, a key K and a value V, and calculating a convolution kernel parameter matrix through multiple attentions:
;
Where d is the dimension of the attention head, softmaxIs a normalized exponential function, and T is a matrix transposition.
In this embodiment, the dynamic convolution kernel generation mechanism:
the weights of the dynamic convolution kernel are generated based on an attention mechanism. Firstly, respectively carrying out 1×1 convolution on the semantic graph and the high-frequency features to obtain a query Q, a key K and a value V:
;
;
;
based on standard multi-head attention mechanism, calculating dynamic convolution kernel parameter matrix :
;
Where d is the dimension of the attention head for normalization to prevent gradient explosions.
In some embodiments of the present invention, in step S400, the channel-space two-way attention-gating enhancement feature comprises:
channel attention:
;
Spatial attention:
;
final output characteristics:
;
Wherein:
x is the input feature;
GAP is global average pooling;
MLP is a multi-layer perceptron;
Is that Is a function of the activation of (a);
the product by element is indicated as follows.
Specifically, the channel-space dual attention gating mechanism includes:
the attention gating enhancement is carried out on the characteristic X generated by the dynamic convolution kernel, and the characteristic X is divided into two stages of Channel Attention (CA) and Space Attention (SA):
The channel attention calculation formula is: ;
The spatial attention calculation formula is: ;
the final output characteristic is a two-way enhancement characteristic:
;
in some embodiments of the present invention, in step S500, the non-uniform quantization measurement includes:
The quantization step size is adaptively adjusted according to the semantic weight graph:
;
Wherein:
taking the quantized step length as a basis;
is the position in the semantic weight graph Response intensity of (2);
Lambda is an adjustment coefficient for enhancing the resolution of the high semantic region;
The quantization operation is defined as: ;
Wherein: representing the position ,Z i,j is the characteristic value to be quantized, and is output from dynamic SSM modeling; rounding to a rounding function; For adaptive quantization step size, the semantic weights are dynamically adjusted.
In this embodiment, the semantic guidance based heterogeneous quantization strategy and joint probability modeling process includes receiving dynamic features Z=from the S400 outputAnd carrying out quantization and modeling to realize the balance of the compression rate and the fidelity.
Calculating an adaptive quantization step size at each position using the formula:
;
The quantization operation is defined as:
;
Quantized features Is an entropy-encodable compressed representation. This strategy can guarantee that semantically important regions (e.g., faces, text) get finer coding, while background regions can be coarser processed to save bit rate.
In some embodiments of the present invention, in S500, entropy encoding includes:
based on a state space modeling super prior network, constructing a joint probability model for potential variables:
;
Wherein:
N represents a Gaussian distribution;
respectively represent feature dimension ,The predicted mean and standard deviation of the dimension are generated by the super prior network:
;;
Wherein:
lightweight neural network module for state space modeling for estimating parameters for each feature location ;
Lightweight neural network module for state space modeling for estimating parameters for each feature location;
Representing a channel splicing operation;
is a decoded feature.
In the present embodiment, to improve compression efficiency, the joint probability distribution of the features is modeledThe construction form is as follows:
;
the super prior network adopts a multi-scale feature fusion mode to decode the features Semantic feature map under corresponding scaleAfter splicing, inputting the parameters to a state space modeling module (Mamba) for parameter prediction, specifically:
;
and (c) represents a channel splicing operation, and an exponential function is used to ensure that the standard deviation of the prediction is positive.
In some embodiments of the present invention, in S600, the compressed code stream is input to a decoder, decoded based on a lightweight Mamba module, and combined with an inverse wavelet transform and a semantic weight mapReconstructing an image comprises the steps of:
Mamba module decodes the state update:
;
Wherein:
is the current moment state vector;
A、B State transition coefficients generated for semantic guidance;
Is a gating function;
inverse wavelet transform reconstruction features:
;
Wherein:
is a learnable inverse wavelet transform operator;
multiscale semantic fusion output image:
;
Wherein:
is a learnable fusion weight.
In an implementation, as shown in fig. 8, the semantic guided image reconstruction process based on Mamba decoders includes:
receiving a representation of a code stream output by an encoding stage Combining semantic weight graphsAnd wavelet subband [ ]) The original image is reconstructed through Mamba decoder and inverse wavelet transform module.
The Mamba decoder based on state space modeling, mamba decoder uses a lightweight state space mechanism to update states based on a semantically guided selectively activated channel, and the calculation process is as follows:
;
Wherein: e R C is the current channel state, State transition coefficients generated for semantic guidance;
Invalid feature channels are suppressed for gating functions.
An inverse wavelet reconstruction module receives Mamba the output characteristics of the decoderMatching with low frequency sub-bandsAnd high frequency sub-bandsPerforming inverse wavelet transform:
;
Wherein the method comprises the steps of And reconstructing the current scale image feature map for the learnable inverse wavelet transform operator.
Multi-scale fusion and semantic guided reconstruction, namely weighting and fusing the decoding result of three scales with a semantic graph to generate a final reconstructed image:
;
Wherein: E, R is fusion weight; Is a semantic weight map under the corresponding scale.
The fusion strategy gives a larger influence to the semantic high-weight region, so that key structures (such as face contours and text edges) are ensured to be more completely reserved in image reconstruction.
In a specific implementation, the added loss function is as follows:
in order to effectively improve the restoration quality, structural fidelity and visual consistency of semantic regions in the image compression process, a reconstructed image is obtained in a model training stage And original imageIntroducing multiple loss functions between them to construct a composite loss objective functionThe loss function includes semantic edge loss, structure retention loss, and conventional reconstruction loss terms. The method comprises the following steps:
pixel level reconstruction loss (MSE):
This loss is used to measure the error between the reconstructed image and the original at the pixel level, defined as:
;
where N is the total number of pixels in the image, AndRespectively reconstructing the image and the pixel points in the original imageIs a value of (2).
Structure retention Loss (SSIM Loss):
The structural similarity index (Structural Similarity Index, SSIM) is used to measure the structural consistency of the image, and is defined as:
;
The loss term mainly constrains the brightness, contrast and structural information of the image, ensuring that the reconstructed image remains perceptually similar to the original image.
Considering that the edges of semantic regions typically carry important structural information, a semantic edge guide penalty is introduced:
;
Wherein:
M is a normalization factor;
k is pyramid level index;
is the pixel space coordinate;
gradient values for the reconstructed image;
gradient values for the original image;
Representing a high response semantic region at a kth level;
e [0,1] is the guiding strength of the position in the semantic weight graph;
The loss term emphasizes the fidelity of the semantic region edge reconstruction.
In the training process, synchronously considering a compression rate target, adding a code rate control item:
;
Wherein:
For the purpose of quantization features Is a desired value of (2);
To quantify characteristics Probability values of (2);
as a function of the amount of information.
The loss term measures the average number of bits of the encoded compressed code and is derived from the joint probability model established in step S500.
It will be appreciated that the total loss function combination is as follows:
;
Wherein:
lambda 1~λ4 is a loss term balance coefficient, is optimized in a cross verification mode in the training process, lambda 31 is generally set to highlight reconstruction accuracy of a semantic edge region, and lambda 4 can be properly improved for real-time compression application to control the overall code rate.
Through the multi-loss combined optimization strategy, the structure retaining capacity and reconstruction quality of an image compression system in a semantic significant region (such as a human face, a text and an object outline) can be effectively improved, and meanwhile, the overall image compression efficiency is considered, so that the method is applicable to various scenes requiring high compression ratio and high fidelity reconstruction.
In some embodiments of the present invention, in step S500, further comprising median deviation map quantization encoding, comprising the steps of:
potential eigenvalues Mapping to a median reference coordinate system, and calculating the interval to which the median reference coordinate system belongsMedian of (2)And calculates the deviation value:
;
Wherein: Is the original characteristic value;
Symmetric discrete quantization of the deviation:
;
Wherein gamma is the quantization step length;
Generating a ternary Performing compression representation;
Wherein the method comprises the steps of For the spatial position index,To quantify the deviation.
The distribution model is used in the encoder to optimize bit rate allocation and a priori guidance is made in the decoder for reconstruction.
In some embodiments of the present invention, a discrete coding mechanism based on a bias representation is further introduced on the basis of a non-uniform quantization module, as shown in fig. 7. The potential feature value or pixel value range [0,255] is first divided into N interval segments, each interval defined as:
;
setting an intermediate value for each interval As a reconstructed reference value. For each encoded feature pointBy looking up the interval to which it belongsAnd calculate its relative median deviation:
;
Symmetric discrete quantization of the deviation:
;
Will eventually Triads are encoded in whichFor the spatial position index,To quantify the deviation. Entropy encoding or variable length encoding may be further employed to compress the triplet data to achieve more efficient code stream expression.
According to the application, by introducing a dynamic selective state space modeling mechanism and combining bidirectional scanning and semantic sensitive attention gating, the computational complexity is effectively reduced, the detail capturing and global context understanding capability of key areas of images such as faces and characters is enhanced, and the problem of image detail blurring caused by high-frequency information loss in the traditional compression method is solved. Furthermore, based on a semantic-guided non-uniform quantization strategy and a median deviation mapping quantization coding mechanism, quantization step sizes can be dynamically adjusted according to a semantic weight graph, three-dimensional coordinate coding is performed by utilizing spatial position and deviation information, finer quantization levels are distributed to a high semantic value area, and balance between compression efficiency and reconstruction quality is optimized. The multi-scale-wavelet joint coding architecture constructed by fusing the reversible neural network and the adaptive wavelet transformation effectively realizes lossless compression of the low-frequency sub-band, avoids low-frequency distortion, and simultaneously retains texture details by virtue of lightweight dynamic convolution coding of the high-frequency sub-band so as to meet the real-time processing requirement of edge equipment. At the decoding end, a selective state space activation mechanism based on Mamba structures is introduced, only key channels are reserved to participate in image reconstruction, the decoding calculation amount is obviously reduced, and an output image is reconstructed through inverse wavelet transformation and multi-scale semantic fusion. The technical breakthroughs enable the application to have important practical value and wide application prospect in application scenes such as security monitoring, mobile communication, medical images and the like which need to balance compression rate and visual fidelity.
Referring to fig. 9, the invention also discloses a multi-scale semantic guided image compression system, which comprises a memory and a processor, wherein the processor realizes the multi-scale semantic guided image compression method when executing the computer program stored in the memory.
Further, to implement end-to-end of an image compression system, at a system architecture level, the image compression system provided by the present invention further includes the following modules:
And the image preprocessing module is used for acquiring an original image, performing color space conversion, normalization processing and abnormal image rejection and outputting a standardized image.
And the semantic guidance module is used for extracting semantic features by utilizing a DeepLabv3+ semantic segmentation model and obtaining a multi-scale semantic weight map through bilinear interpolation to guide subsequent encoding.
And the multi-scale wavelet coding module is used for constructing a three-level coding pyramid, wherein each level consists of depth separable convolution, a reversible neural network and self-adaptive wavelet transformation and outputs a low-frequency sub-band and a high-frequency sub-band.
And the dynamic SSM modeling module is used for integrating semantic graph guidance and a bidirectional scanning mechanism, realizing state space modeling and outputting dynamic convolution coding characteristics.
And the entropy coding module is used for extracting the context characteristics by using the super prior network, estimating probability distribution by combining the context characteristics with the semantic weight map, and executing a non-uniform quantization and median deviation mapping quantization coding strategy to generate a code stream.
And the decoding and reconstructing module combines Mamba state updating and a semantic gating mechanism, and finally generates a reconstructed image through inverse wavelet transformation and semantic weighted fusion.
The invention also discloses a storage medium, wherein the storage medium stores a determining program of the multi-scale semantic guidance image compression method, and the determining program realizes the multi-scale semantic guidance image compression method when being executed by a processor.
The multi-scale semantic guidance image compression system and the storage medium adopt all the technical schemes of the multi-scale semantic guidance image compression method of the above embodiment, so that the multi-scale semantic guidance image compression system and the storage medium at least have all the beneficial effects brought by the technical schemes of the above embodiment, and are not repeated herein.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of one of ordinary skill in the art without departing from the spirit of the present invention.

Claims (13)

1.一种多尺度语义引导图像压缩方法,其特征在于,包括以下步骤:1. A multi-scale semantically guided image compression method, comprising the following steps: S100、获取输入图像数据,对所述输入图像进行预处理操作,得到标准化图像数据S100: Obtain input image data, perform preprocessing on the input image, and obtain standardized image data. ; S200、将所述标准化图像数据输入预训练的语义分割网络,生成多尺度语义特征图及其对应的语义权重图S200, the standardized image data Input the pre-trained semantic segmentation network to generate multi-scale semantic feature maps and its corresponding semantic weight graph ; S300、构建三级金字塔编码器,对所述标准化图像数据逐级执行:S300, constructing a three-level pyramid encoder to decode the standardized image data Execute step by step: S301、深度可分离卷积下采样,生成多尺度特征S301, depth-separable convolution downsampling, generating multi-scale features ; S302、可逆神经网络对所述多尺度特征进行非线性变换;S302, reversible neural network for the multi-scale features Perform nonlinear transformations; S303、自适应离散小波变换将经过非线性变换的所述多尺度特征分解为低频子带与高频子带S303, adaptive discrete wavelet transform the multi-scale features after nonlinear transformation Decomposition into low-frequency subbands With high frequency sub-band ; S400、基于所述语义权重图对所述高频子带执行动态选择性状态空间建模,包括:S400, based on the semantic weight map For the high frequency sub-band Perform dynamic selective state-space modeling, including: 双向扫描机制更新状态;Bidirectional scanning mechanism updates status; 动态卷积核生成;Dynamic convolution kernel generation; 通道-空间双路注意力门控增强特征;Channel-spatial dual-path attention gating enhancement features; S500、对步骤S400中各层建模后特征执行非均匀量化与熵编码,生成压缩码流;S500, performing non-uniform quantization and entropy coding on the features after modeling at each layer in step S400 to generate a compressed bit stream; S600、将所述压缩码流输入解码器,基于轻量化Mamba模块解码,结合逆小波变换与语义权重图重建图像。S600: Input the compressed code stream into the decoder, decode it based on the lightweight Mamba module, and combine the inverse wavelet transform with the semantic weight map. Reconstruct the image. 2.根据权利要求1所述的多尺度语义引导图像压缩方法,其特征在于,所述S200中,所述将所述标准化图像数据输入预训练的语义分割网络,生成多尺度语义特征图及其对应的语义权重图包括:2. The multi-scale semantic guided image compression method according to claim 1, wherein in S200, the standardized image data Input the pre-trained semantic segmentation network to generate multi-scale semantic feature maps and its corresponding semantic weight graph include: 用DeepLabv3+网络的Xception65主干提取多尺度特征,经空洞空间金字塔池化模块融合后生成语义特征图,并通过双线性插值对齐编码层级,沿通道维度取最大值生成所述语义权重图The Xception65 backbone of the DeepLabv3+ network is used to extract multi-scale features, which are then fused through the dilated spatial pyramid pooling module to generate a semantic feature map. And align the encoding levels through bilinear interpolation, and take the maximum value along the channel dimension to generate the semantic weight map . 3.根据权利要求1所述的多尺度语义引导图像压缩方法,其特征在于,所述S300中,所述深度可分离卷积下采样,生成多尺度特征包括:3. The multi-scale semantic guided image compression method according to claim 1, characterized in that in said S300, said depthwise separable convolution downsampling generates multi-scale features include: 原图尺度特征、半尺度特征 、四分之一尺度特征Original image scale characteristics , half-scale features , quarter-scale features ; 其中:R为实数空间;H是特征图高度;W为特征图宽度。Where: R is the real number space; H is the feature map height; W is the feature map width. 4.根据权利要求1所述的多尺度语义引导图像压缩方法,其特征在于,所述S300中,所述可逆神经网络对所述多尺度特征进行非线性变换包括:4. The multi-scale semantic guided image compression method according to claim 1, wherein in S300, the reversible neural network is used to compress the multi-scale features. Performing nonlinear transformations includes: 所述可逆神经网络执行正向变换:The reversible neural network performs the forward transformation: ; ; ; 所述可逆神经网络执行逆向变换:The reversible neural network performs the inverse transformation: ; ; ; 其中:in: 为多尺度特征沿通道维度划分的两部分; Multi-scale features Two parts divided along the channel dimension; 为输出特征; is the output feature; F、G为三层卷积残差块;F and G are three-layer convolution residual blocks; T表示正向映射函数;T represents the forward mapping function; 表示逆向映射函数。 Represents the inverse mapping function. 5.根据权利要求1所述的多尺度语义引导图像压缩方法,其特征在于,所述S400中,所述双向扫描机制包括:5. The multi-scale semantically guided image compression method according to claim 1, wherein in S400, the bidirectional scanning mechanism comprises: 前向状态更新方程:Forward state update equation: ; 后向状态更新方程:Backward state update equation: ; 其中:in: 分别表示前向与后向状态; Represent the forward and backward states respectively; ∈RC×1为状态记忆权重; ∈R C×1 is the state memory weight; ∈RC×C为输入门控权重矩阵; ∈R C×C is the input gating weight matrix; 为动态生成卷积核的深度可分离卷积操作; Depthwise separable convolution operation for dynamically generated convolution kernels; 为元素级乘积。 is the element-wise product. 6.根据权利要求1所述的多尺度语义引导图像压缩方法,其特征在于,所述S400中,所述动态卷积核生成包括:6. The multi-scale semantically guided image compression method according to claim 1, wherein in S400, generating the dynamic convolution kernel comprises: 基于所述语义权重图与所述高频子带生成查询Q、键K、值V,通过多头注意力计算卷积核参数矩阵Based on the semantic weight map With the high frequency sub-band Generate query Q, key K, value V, and calculate the convolution kernel parameter matrix through multi-head attention : ; 其中:d为注意力头的维度;Softmax为归一化指数函数;T为矩阵转置。Where: d is the dimension of the attention head; Softmax is the normalized exponential function; T is the matrix transpose. 7.根据权利要求1所述的多尺度语义引导图像压缩方法,其特征在于,所述S400中,所述通道-空间双路注意力门控增强特征包括:7. The multi-scale semantically guided image compression method according to claim 1, wherein in S400, the channel-spatial dual-path attention gating enhancement feature comprises: 通道注意力:Channel Attention: ; 空间注意力:Spatial Attention: ; 最终输出特征:Final output features: ; 其中:in: X为输入的特征;X is the input feature; GAP为全局平均池化;GAP is global average pooling; MLP为多层感知机;MLP is a multi-layer perceptron; 的激活函数; for The activation function of ⊙为逐元素乘积。⊙ is the element-wise product. 8.根据权利要求1所述的多尺度语义引导图像压缩方法,其特征在于,所述S500中,非均匀量化测量包括:8. The multi-scale semantically guided image compression method according to claim 1, wherein in S500, the non-uniform quantization measurement comprises: 量化步长根据语义权重图自适应调整:The quantization step size is adaptively adjusted according to the semantic weight map: ; 其中:in: 为基础量化步长; is the basic quantization step size; 为语义权重图中位置的响应强度; is the position in the semantic weight graph The intensity of the response; λ为调节系数;λ is the adjustment coefficient; 量化操作定义为:The quantization operation is defined as: ; 其中:表示位置处特征值;Z i,j 为待量化特征值;为四舍五入取整函数;为自适应量化步长。in: Indicates location , The eigenvalue at ; Zi ,j is the eigenvalue to be quantized; is the rounding function; is the adaptive quantization step size. 9.根据权利要求8所述的多尺度语义引导图像压缩方法,其特征在于,所述S500中,所述熵编码包括:9. The multi-scale semantically guided image compression method according to claim 8, wherein in S500, the entropy coding comprises: 基于所述状态空间建模的超先验网络,对潜在变量构建联合概率模型:Based on the hyper-prior network of state space modeling, a joint probability model is constructed for the latent variables: ; 其中:in: N表示高斯分布;N represents Gaussian distribution; 分别表示特征维度第维的预测均值和标准差,由超先验网络生成: Represents the feature dimension , The predicted mean and standard deviation of the dimension, generated by the hyperprior network: ; ; 其中:in: 为基于状态空间建模的轻量神经网络模块,用于估计每个特征位置的参数 It is a lightweight neural network module based on state space modeling, used to estimate the parameters of each feature position ; 为基于状态空间建模的轻量神经网络模块,用于估计每个特征位置的参数 It is a lightweight neural network module based on state space modeling, used to estimate the parameters of each feature position ; 表示通道拼接操作; Represents channel splicing operation; 为已解码特征。 Decoded features. 10.根据权利要求1所述的多尺度语义引导图像压缩方法,其特征在于,所述S600中,所述将所述压缩码流输入解码器,基于轻量化Mamba模块解码,结合逆小波变换与语义权重图重建图像包括以下步骤:10. The multi-scale semantic guided image compression method according to claim 1, characterized in that in said S600, said compressed code stream is input into a decoder, decoded based on a lightweight Mamba module, combined with inverse wavelet transform and semantic weight map Reconstructing an image involves the following steps: Mamba模块解码状态更新:Mamba module decoding status update: ; 其中:in: 为当前时刻状态向量; is the state vector at the current moment; A、B为语义引导生成的状态转移系数;A 、B State transition coefficients generated for semantic guidance; 为门控函数; is the gating function; 逆小波变换重建特征:Inverse wavelet transform to reconstruct features: ; 其中:in: 为第k层重建特征图; Reconstruct the feature map for the kth layer; 为可学习逆小波变换算子; is a learnable inverse wavelet transform operator; 多尺度语义融合输出图像:Multi-scale semantic fusion output image: ; 其中:in: 为可学习融合权重。 is the learnable fusion weight. 11.根据权利要求1所述的多尺度语义引导图像压缩方法,其特征在于,所述S500中,还包括中值偏差映射量化编码,包括以下步骤:11. The multi-scale semantically guided image compression method according to claim 1, wherein said S500 further comprises median deviation mapping quantization coding, comprising the following steps: 将潜在特征值映射至中值参照坐标系,计算其所属区间的中值,并计算偏差值:The potential feature value Map to the median reference coordinate system and calculate the interval to which it belongs median , and calculate the deviation value: ; 对偏差进行对称离散量化:Symmetrically discretize the deviations: ; 其中:为取整函数;γ为量化步长;in: is the rounding function; γ is the quantization step size; 生成三元进行压缩表示;Generate ternary Perform compressed representation; 其中:为空间位置索引,为量化偏差。in: is the spatial position index, To quantify the deviation. 12.一种多尺度语义引导图像压缩系统,其特征在于,包括存储器、处理器,所述处理器执行所述存储器存储的计算机程序时实现如权利要求1至11任一项所述的多尺度语义引导图像压缩方法。12. A multi-scale semantically guided image compression system, characterized in that it includes a memory and a processor, and when the processor executes the computer program stored in the memory, it implements the multi-scale semantically guided image compression method according to any one of claims 1 to 11. 13.一种存储介质,其特征在于,所述存储介质存储有多尺度语义引导图像压缩方法的确定程序,所述确定程序被处理器执行时实现如权利要求1-11任一项所述的多尺度语义引导图像压缩方法。13. A storage medium, characterized in that the storage medium stores a determination program of a multi-scale semantically guided image compression method, and when the determination program is executed by a processor, the multi-scale semantically guided image compression method according to any one of claims 1 to 11 is implemented.
CN202511121158.9A 2025-08-12 2025-08-12 Multi-scale semantic guided image compression method, system and storage medium Pending CN120640000A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202511121158.9A CN120640000A (en) 2025-08-12 2025-08-12 Multi-scale semantic guided image compression method, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202511121158.9A CN120640000A (en) 2025-08-12 2025-08-12 Multi-scale semantic guided image compression method, system and storage medium

Publications (1)

Publication Number Publication Date
CN120640000A true CN120640000A (en) 2025-09-12

Family

ID=96970116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202511121158.9A Pending CN120640000A (en) 2025-08-12 2025-08-12 Multi-scale semantic guided image compression method, system and storage medium

Country Status (1)

Country Link
CN (1) CN120640000A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3608844A1 (en) * 2018-08-10 2020-02-12 Naver Corporation Methods for training a crnn and for semantic segmentation of an inputted video using said crnn
CN113240040A (en) * 2021-05-27 2021-08-10 西安理工大学 Polarized SAR image classification method based on channel attention depth network
CN119904628A (en) * 2024-12-12 2025-04-29 河海大学 A remote sensing image semantic segmentation method and device based on wavelet transform convolution

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3608844A1 (en) * 2018-08-10 2020-02-12 Naver Corporation Methods for training a crnn and for semantic segmentation of an inputted video using said crnn
CN113240040A (en) * 2021-05-27 2021-08-10 西安理工大学 Polarized SAR image classification method based on channel attention depth network
CN119904628A (en) * 2024-12-12 2025-04-29 河海大学 A remote sensing image semantic segmentation method and device based on wavelet transform convolution

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation", COMPUTER VISION AND PATTERN RECOGNITION, 7 February 2018 (2018-02-07) *
闫河: "融合注意力机制的改进型Deeplabv3+语义分割", 光学精密工程, 20 March 2025 (2025-03-20) *

Similar Documents

Publication Publication Date Title
CN113658040B (en) Human face super-resolution method based on priori information and attention fusion mechanism
CN113177882B (en) Single-frame image super-resolution processing method based on diffusion model
CN119011851B (en) Video compression method and system based on variational autoencoder with improved entropy model
WO2020062074A1 (en) Reconstructing distorted images using convolutional neural network
EP2449524A1 (en) Contrast enhancement
US20240054605A1 (en) Methods and systems for wavelet domain-based normalizing flow super-resolution image reconstruction
CN116385281A (en) Remote sensing image denoising method based on real noise model and generated countermeasure network
Li et al. Underwater image high definition display using the multilayer perceptron and color feature-based SRCNN
CN115345785A (en) A dark-light video enhancement method and system based on multi-scale spatio-temporal feature fusion
CN118573888B (en) Adaptive image compression method, device, equipment and storage medium
CN117314733A (en) Video filling method, device, equipment and storage medium based on diffusion model
CN110796622A (en) An Image Bit Enhancement Method Based on Multilayer Features of Concatenated Neural Networks
Liu et al. Image compression based on octave convolution and semantic segmentation
US20250142099A1 (en) Parallel processing of image regions with neural networks – decoding, post filtering, and rdoq
CN118628406B (en) Image restoration method, image restoration device, electronic device and storage medium
WO2024164694A9 (en) Image compression method and apparatus, electronic device, computer program product, and storage medium
CN113628114A (en) A two-channel sparse coding method for image super-resolution reconstruction
CN119110084A (en) A high compression ratio image compression method based on optimal transmission mapping
CN105590296B (en) A kind of single-frame images Super-Resolution method based on doubledictionary study
Han et al. Toward variable-rate generative compression by reducing the channel redundancy
CN115035011B (en) A low-light image enhancement method based on adaptive RetinexNet under a fusion strategy
CN114663315B (en) Image bit enhancement method and device for generating countermeasure network based on semantic fusion
CN120198293A (en) Infrared image super-resolution reconstruction method based on noise decoupling
Hu et al. UAV image high fidelity compression algorithm based on generative adversarial networks under complex disaster conditions
CN120125436A (en) A method for image super-resolution reconstruction based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination