CN120238649A

CN120238649A - A lightweight semantic communication method and system for image transmission

Info

Publication number: CN120238649A
Application number: CN202510713473.4A
Authority: CN
Inventors: 涂杰楠; 何若欣; 吴志豪; 刘晓东; 徐子晨; 周福辉
Original assignee: Nanchang University
Current assignee: Nanchang University
Priority date: 2025-05-30
Filing date: 2025-05-30
Publication date: 2025-07-01
Anticipated expiration: 2045-05-30

Abstract

The invention relates to the technical field of communication, and provides a lightweight semantic communication method and a system for image transmission, which are used for replacing traditional convolution in downsampling by shift convolution, performing external shift convolution processing on an input target image, splicing a first feature image obtained by the external shift convolution into input data of the downsampling to form two shift convolution operations, effectively improving the extraction efficiency of semantic features, continuously executing downsampling for a plurality of times, multiplexing the external shift convolution, further reducing the demand on computing resources, improving the lightweight degree, splicing original information into downsampling according to the original information obtained by the target image, reducing the influence of original information loss caused by continuous repeated shift convolution, and improving the quality of a reconstructed image. The lightweight semantic communication method and the lightweight semantic communication system for image transmission can meet the requirement of light weight through the combination of internal and external shift convolution, and can effectively improve the image reconstruction quality.

Description

Lightweight semantic communication method and system for image transmission

Technical Field

The invention relates to the technical field of communication, in particular to a lightweight semantic communication method and system for image transmission.

Background

With the rapid development of bandwidth-intensive applications such as the internet of things and virtual reality, the demand of wireless communication networks for data transmission has increased exponentially, which makes efficient wireless communication system design a research hotspot. Among them, semantic communication has become an important research direction of 6G communication technology by virtue of its potential in improving transmission efficiency, reducing redundancy, adapting to complex channel environments, and the like.

Currently, research on performance improvement of a semantic communication system has made many progress, but efficient semantic extraction is still a key challenge to be solved. The traditional semantic extraction method based on convolutional neural network (Convolutional Neural Network, CNN) can realize higher precision, but consumes a large amount of computing resources. Deep learning (DEEP LEARNING, DL) has been applied in semantic communication systems by virtue of its significant advantage of automatically extracting semantic features, but its demands on power consumption and computing resources remain high, and the amount of parameters of the employed network model is large. The power consumption and the computing resources of the Internet of things equipment are limited, so that the semantic communication system in the prior art is difficult to be efficiently applied to the Internet of things equipment, and the development of the Internet of things technology is limited.

Disclosure of Invention

Based on the above, the invention aims to provide an image transmission-oriented lightweight semantic communication method and system, so as to solve the problems that the semantic communication system in the prior art is high in resource consumption and difficult to be applied to Internet of things equipment efficiently, and the development of Internet of things technology is limited.

The invention provides a lightweight semantic communication method for image transmission, which comprises the following steps:

Sequentially performing first convolution processing, downsampling and first hole space pyramid pooling on the target image to obtain coded data;

Sequentially performing up-sampling, second convolution processing and second hole space pyramid pooling according to the received coded data to obtain a reconstructed image of the target image;

And performing outer shift convolution on the target image to obtain a first feature map, wherein the feature size of the first feature map is consistent with the feature size of the downsampled input data;

The downsampling comprises feature stitching, inner shift rolling and nonlinear activation which are sequentially carried out, wherein the feature stitching is used for stitching the first feature map into downsampled input data;

the first feature map is also superimposed into the output data of the intra-shift convolution by a residual connection;

the outer shift convolution and the inner shift convolution comprise shift, batch normalization and point state convolution which are sequentially carried out;

The downsampling is performed a plurality of times in succession.

Optionally, the downsampling further comprises:

Adjusting the channel weight of the down-sampled input data according to a channel attention mechanism to obtain first intermediate data;

the feature stitching is used for stitching the first feature map to the first intermediate data to obtain second intermediate data, so that inner shift convolution is performed according to the second intermediate data.

Optionally, the step of obtaining the first intermediate data further comprises adjusting channel weights of the input data according to a channel attention mechanism:

the third convolution processing, the first global average pooling, the full-connection layer dimension reduction, the activation, the full-connection layer dimension increase, the second global average pooling and the channel weighting are sequentially carried out;

Wherein the output of the third convolution process is also superimposed into the second global average pooled output data by a residual connection.

Optionally, the step of obtaining encoded data is in a sequential processing mode and further comprises obtaining a scaling factor from a difference in size of the current down-sampled input data and the target image and obtaining a shift operation step of the outer shift convolution from the scaling factor such that the obtained size of the first feature map is consistent with the current down-sampled input data.

Optionally, the method further comprises the step of stitching the first feature map to the first hole space pyramid-pooled input data.

Another aspect of the present invention provides an image transmission oriented lightweight semantic communication system, comprising:

The encoder comprises a first convolution module, a downsampling module and a first cavity space pyramid pooling module which are sequentially connected, and is used for sequentially carrying out first convolution processing, downsampling and first cavity space pyramid pooling on a target image so as to obtain encoded data;

the decoder comprises an up-sampling module, a second convolution module and a second cavity space pyramid pooling module which are sequentially connected, and is used for sequentially carrying out up-sampling, second convolution processing and second cavity space pyramid pooling according to the received coded data so as to obtain a reconstructed image of the target image;

The encoder further comprises an outer shift convolution unit, wherein the outer shift convolution unit is used for performing outer shift convolution on the target image to obtain a first characteristic diagram, and the characteristic size of the first characteristic diagram is consistent with the characteristic size of the downsampled image;

The downsampling module comprises a characteristic splicing unit, an internal shift convolution unit and a nonlinear activation unit which are connected in sequence, wherein the characteristic splicing is used for splicing the first characteristic diagram into input data of the downsampling module;

The outer shift convolution unit is further connected to an output end of the inner shift convolution unit, so that the first feature map is overlapped into output data of the inner shift convolution unit through residual connection;

the outer shift convolution unit and the inner shift convolution unit comprise a shift layer, a batch normalization layer and a point state convolution layer which are sequentially connected;

the downsampling module is arranged in a plurality of continuous modes.

Optionally, the downsampling module further comprises:

the compression excitation unit is used for adjusting the channel weight of the down-sampled input data according to a channel attention mechanism to obtain first intermediate data;

the feature stitching unit is used for stitching the first feature map to the first intermediate data to obtain second intermediate data, and the second intermediate data is used as input data of the inner shift convolution unit.

Optionally, the compression excitation unit includes:

the third convolution layer, the first global average pooling layer, the full-connection layer dimension reduction layer, the activation layer, the full-connection layer dimension increasing layer, the second global average pooling layer and the channel weighting layer are sequentially connected;

Wherein the output of the third convolutional layer is also superimposed into the output data of the second global average pooling layer by a residual connection.

Optionally, the working mode of the encoder is a sequential processing mode, and the outer shift convolution unit is further configured to obtain a scaling factor according to a difference between a size of the input data of the current downsampling module and a size of the target image, and obtain a shift operation step size of the outer shift convolution unit according to the scaling factor, so that the obtained size of the first feature map is consistent with the size of the input data of the current downsampling module.

Optionally, the first feature map is further spliced into the input data of the first hole space pyramid pooling module.

The lightweight semantic communication method for image transmission provided by the invention can effectively reduce the computational complexity by replacing the traditional convolution requirement in downsampling through shift convolution, and performs external shift convolution processing on an input target image, and the obtained first feature image is spliced into the input data of the downsampling to form two shift convolution operations, so that the extraction efficiency of semantic features can be effectively improved, the downsampling is continuously performed for a plurality of times, the external shift convolution can be multiplexed, the requirement on computational resources is further reduced, the lightweight degree is improved, the first feature image is original information obtained according to the target image, the original information is spliced into each downsampling, the original information loss influence caused by continuous and repeated shift convolution can be reduced, and the quality of a reconstructed image is improved. The lightweight semantic communication method for image transmission provided by the invention can effectively reduce the computational resource requirement on a coding end through the combination of the outer shift convolution and the inner shift convolution of multiple downsampling, can reduce the loss of original information, and can improve the quality of a reconstructed image obtained by decoding.

Drawings

FIG. 1 is a main flow chart of a lightweight semantic communication method for image transmission in an embodiment of the present invention;

FIG. 2 is a flow chart of downsampling correlation of a lightweight semantic communication method for image transmission in an embodiment of the present invention;

Fig. 3 is a test result of a lightweight semantic communication method for image transmission in an embodiment of the present invention.

The invention will be further described in the following detailed description in conjunction with the above-described figures.

Detailed Description

In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

It will be understood that when an element is referred to as being "mounted" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

The method aims at solving the problems that the semantic communication system in the prior art is high in resource consumption, difficult to be applied to the Internet of things equipment in an efficient manner and limited in development of the Internet of things technology. The application provides an image transmission-oriented lightweight semantic communication method, which can effectively reduce the computational complexity by replacing the traditional convolution requirement in downsampling through shift convolution, and performs external shift convolution processing on an input target image, and the obtained first feature image is spliced into the internal shift convolution of the downsampling to form two shift convolution operations, so that the extraction efficiency of semantic features can be effectively improved, the downsampling is continuously performed for a plurality of times, the external shift convolution can be multiplexed, the requirement on computational resources is further reduced, the lightweight degree is improved, the first feature image is original information obtained according to the target image, the original information loss influence caused by the continuous and repeated shift convolution can be reduced, and the quality of a reconstructed image is improved.

Referring to fig. 1 and fig. 2, flowcharts of a lightweight semantic communication method for image transmission according to an embodiment of the invention are shown.

At the encoder end, the encoding step comprises the steps of sequentially carrying out first convolution processing, downsampling and first hole space pyramid pooling on the target image to obtain encoded data, and a receiving end sends the encoded data through a physical channel.

At the decoder end, the decoding step comprises upsampling, second convolution processing and second hole space pyramid pooling to obtain a reconstructed image of the target image.

The first convolution processing and the second convolution processing can be implemented by adopting a convolution neural network (Convolutional Neural Networks, CNN), wherein the first convolution processing is used for increasing the number of characteristic channels of data, for example, R, G, B three-color characteristic channels of an RGB image are increased to 32 channels, and the second convolution processing is used for multiple tasks of characteristic refinement, fusion, noise reduction, dimension adjustment and the like so as to obtain the original information of a target image.

Compared with the traditional CNN convolution, the downsampling selection shift convolution (ShiftConv) technology can effectively reduce the requirement on computing resources and realize lightweight design.

As shown in fig. 2, the downsampling mainly includes an inner Shift convolution operation, and an outer Shift convolution operation includes shifting (Shift), batch normalization (BatchNorm) and point state convolution (1×1 convolution) performed sequentially, the shifting is used for enhancing feature diversity through spatial translation, the batch normalization is used for ensuring stability of training, the point state convolution is used for optimizing channel interaction, and model training efficiency and robustness of the inner Shift convolution can be improved as a whole.

In order to improve the extraction efficiency of semantic features, in this embodiment, downsampling is continuously performed multiple times, and the encoder end also performs outer shift convolution, and the outer shift convolution processes the target image to obtain a first feature map, and the first feature map is spliced into input data of inner shift convolution of each downsampling through feature splicing operation, so that loss of original information of the target image in the multiple shift convolutions can be reduced.

The operation and architecture of the outer shift convolution and the inner shift convolution are consistent, so that the characteristic size of the first characteristic diagram obtained by the outer shift convolution is consistent with the characteristic size of the inner shift convolution, and the first characteristic diagram can be effectively spliced into the input data of the inner shift convolution through characteristic splicing.

The characteristic splicing operation can be specifically splicing according to Channel dimensions (Channel-wise Concatenation).

After the inner shift convolution, nonlinear activation is also performed by PReLU activation functions. By introducing nonlinearity, linear constraint can be broken, gradient propagation can be optimized, feature expression can be enhanced, and semantic feature extraction efficiency can be effectively improved.

In order to improve the efficiency of the downsampling training, in this embodiment, the first feature map is further superimposed into the output data of the inner shift convolution through a residual connection.

To improve the model performance, the downsampling further comprises a compression-Excitation (SE) operation for adjusting channel weights of the downsampled input data according to a channel attention mechanism to obtain first intermediate data. The compression (Squeeze) operation is mainly global average pooling, the Excitation (expression) operation is mainly full connection layer (Fully Connected Layer) processing, the sensitivity of the model to the channel is enhanced through the global average pooling and the full connection layer, and the network performance is improved under the condition that the image feature resolution is kept unchanged.

The feature stitching is used for stitching the first feature map to first intermediate data output by the compression excitation operation to obtain second intermediate data, and moving convolution is carried out according to the second intermediate data.

As shown in FIG. 2, the compression excitation operation specifically comprises a third convolution process (realized by CNN), a first global average pooling, full-link layer dimension reduction, activation (activated by ReLU activation function), full-link layer dimension increase, a second global average pooling and channel weighting (Scale) which are sequentially performed, wherein the output of the third convolution process is further overlapped into the output data of the second global average pooling through residual connection.

After downsampling, the output feature image and the input feature image have scaling difference in size, in order to ensure the size matching of the first feature image and the output feature image of each external shift convolution performed for multiple times, the step of obtaining the encoded data is in a sequential processing mode (after the encoding of the current image is completed, the encoding of the next image is performed), and the method further comprises the steps of obtaining a scaling factor according to the size difference between the current downsampled input data and the size of the target image, and obtaining a shift operation step length of the external shift convolution according to the scaling factor, so that the obtained size of the first feature image is consistent with the current downsampled input data.

When the Internet of things equipment is applied to a scene with low requirements on communication speed, a sequential processing mode is adopted, and the size matching requirement of two feature graphs participating in splicing can be met through the step length dynamic adjustment of external shift convolution, so that the splicing effectiveness is ensured, the multiplexing of the external shift convolution is realized, and the light weight degree of the system is ensured.

Specifically, the image is generally two-dimensional data, after shift convolution, the two-dimensional size of the obtained feature image is different from the original image, scaling factors (scaling factors) in two dimensions can be obtained respectively, and an average value of the two scaling factors is used as the scaling factor of the external shift convolution, and the size transformation formula of the shift convolution is as follows: , wherein, To shift the size of the convolved feature map,To scale down the integer, i is the input image size, p is the fill operation, k is the size of the convolution kernel, and S is the scale factor, consistent with the shift step size of the shift convolution.

In order to further enhance the feature extraction effect, as shown in fig. 1, in this embodiment, the method further includes stitching the first feature map to the input data of the first hole space pyramid pooling (Atrous SPATIAL PYRAMID Pooling, ASPP). And the first feature map containing the original information of the target image is spliced into the pyramid pooling of the first cavity space, so that details (such as edges and textures) of deep feature loss can be reduced, and the multi-scale feature extraction effect is improved.

The invention also provides a lightweight semantic communication system facing image transmission, which comprises:

The decoder comprises an up-sampling module, a second convolution module and a second hole space pyramid pooling module which are sequentially connected, and is used for sequentially carrying out up-sampling, second convolution processing and second hole space pyramid pooling according to received coded data so as to obtain a reconstructed image of the target image;

the encoder further comprises an outer shift convolution unit, wherein the outer shift convolution unit is used for performing outer shift convolution on the target image to obtain a first characteristic diagram, and the characteristic size of the first characteristic diagram is consistent with the characteristic size of downsampling;

The outer shift convolution unit is also connected to the output end of the inner shift convolution unit so as to superimpose the first feature map into the output data of the inner shift convolution unit through residual connection;

the downsampling module is arranged in a plurality.

The downsampling module further comprises a compression excitation unit for adjusting channel weights of downsampled input data according to a channel attention mechanism to obtain first intermediate data, wherein the feature stitching unit is used for stitching the first feature map to the first intermediate data to obtain second intermediate data, and the second intermediate data is used as input data of the inner shift convolution unit.

The compression excitation unit specifically comprises a third convolution layer, a first global average pooling layer, a full-connection layer dimension reduction layer, an activation layer, a full-connection layer dimension lifting layer, a second global average pooling layer and a channel weighting layer which are connected in sequence, wherein the output of the third convolution layer is further overlapped into the output data of the second global average pooling layer through residual connection.

The embodiment is mainly used in a scene with lower requirements on communication speed, and correspondingly, the working mode of the encoder is a sequential processing mode, and the external shift convolution unit is further used for obtaining a scaling factor according to the size difference between the input data of the current downsampling module and the size of the target image, and obtaining the shift operation step length of the external shift convolution unit according to the scaling factor so as to enable the size of the obtained first feature map to be consistent with the size of the input data of the current downsampling module.

As shown in fig. 3, in the embodiment, the test result of the lightweight semantic communication method for image transmission uses peak signal-to-noise ratio (PSNR) as a quality evaluation index of a reconstructed image, and under the condition of low signal-to-noise ratio (SNR), the performance of the lightweight semantic communication method for image transmission shows a stable and growing trend along with the improvement of the SNR, no steep decline phenomenon occurs, the original information loss is small, and the image reconstruction quality is guaranteed.

The lightweight semantic communication method for image transmission provided by the invention can effectively reduce the computational complexity by replacing the traditional convolution requirement in downsampling through shift convolution, and performs external shift convolution processing on an input target image, and the first feature map obtained by the external shift convolution is spliced into the input data of downsampling to form two shift convolution operations, so that the extraction efficiency of semantic features can be effectively improved, the downsampling is continuously performed for a plurality of times, the requirement on computational resources can be further reduced, the lightweight degree is improved, the first feature map is the original information obtained according to the target image, the original information loss influence caused by the continuous and repeated shift convolution can be reduced, and the quality of a reconstructed image is improved.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above examples merely represent a few specific embodiments of the present invention, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. A lightweight semantic communication method for image transmission, comprising:

The target image is sequentially subjected to a first convolution process, downsampling, and a first dilated spatial pyramid pooling to obtain encoded data;

Performing upsampling, a second convolution process, and a second dilated spatial pyramid pooling in sequence according to the received encoded data to obtain a reconstructed image of the target image;

and performing an outer shift convolution on the target image to obtain a first feature map, wherein a feature size of the first feature map is consistent with a feature size of the downsampled input data;

The downsampling includes sequentially performing feature splicing, inner shift convolution and nonlinear activation, and the feature splicing is used to splice the first feature map into the downsampled input data;

The first feature map is also superimposed on the output data of the inner shift convolution through a residual connection;

The outer shift convolution and the inner shift convolution both include shifting, batch normalization and pointwise convolution performed in sequence;

The downsampling is performed multiple times in succession.

2. The lightweight semantic communication method for image transmission according to claim 1, wherein the downsampling further comprises:

Adjusting the channel weights of the downsampled input data according to a channel attention mechanism to obtain first intermediate data;

The feature splicing is used to splice the first feature map to the first intermediate data to obtain second intermediate data, so as to perform inner shift convolution according to the second intermediate data.

3. The lightweight semantic communication method for image transmission according to claim 2, characterized in that the channel weight of the input data is adjusted according to the channel attention mechanism, and the step of obtaining the first intermediate data further comprises:

The third convolution process, the first global average pooling, the fully connected layer dimensionality reduction, activation, the fully connected layer dimensionality increase, the second global average pooling and channel weighting are performed in sequence;

The output of the third convolution processing is also superimposed on the output data of the second global average pooling through a residual connection.

4. According to claim 1, the lightweight semantic communication method for image transmission is characterized in that the step of obtaining the encoded data is a sequential processing mode, and also includes: obtaining a scaling factor based on the difference between the size of the currently downsampled input data and the size of the target image, and obtaining the shift operation step size of the outer shift convolution based on the scaling factor, so that the size of the obtained first feature map is consistent with the size of the currently downsampled input data.

5. The lightweight semantic communication method for image transmission according to claim 1, further comprising: splicing the first feature map into the input data of the first atrous spatial pyramid pooling.

6. A lightweight semantic communication system for image transmission, comprising:

An encoder, comprising a first convolution module, a downsampling module and a first dilated space pyramid pooling module connected in sequence, for sequentially performing a first convolution process, downsampling and a first dilated space pyramid pooling on a target image to obtain encoded data;

A decoder, comprising an upsampling module, a second convolution module and a second atrous spatial pyramid pooling module connected in sequence, configured to sequentially perform upsampling, a second convolution process and a second atrous spatial pyramid pooling according to the received encoded data to obtain a reconstructed image of the target image;

The encoder further includes an outer shift convolution unit, which is used to perform an outer shift convolution on the target image to obtain a first feature map, wherein the feature size of the first feature map is consistent with the feature size of the downsampling;

The downsampling module comprises a feature splicing unit, an inner shift convolution unit and a nonlinear activation unit connected in sequence, wherein the feature splicing is used to splice the first feature map into the input data of the downsampling module;

The outer shift convolution unit is also connected to the output end of the inner shift convolution unit to superimpose the first feature map onto the output data of the inner shift convolution unit through a residual connection;

The outer shift convolution unit and the inner shift convolution unit each include a shift layer, a batch normalization layer and a pointwise convolution layer connected in sequence;

A plurality of down-sampling modules are continuously arranged.

7. The lightweight semantic communication system for image transmission according to claim 6, wherein the downsampling module further comprises:

A compression excitation unit, configured to adjust the channel weight of the downsampled input data according to a channel attention mechanism to obtain first intermediate data;

The feature splicing unit is used to splice the first feature map to the first intermediate data to obtain second intermediate data, and use the second intermediate data as input data of the inner shift convolution unit.

8. The lightweight semantic communication system for image transmission according to claim 7, wherein the compression excitation unit comprises:

The third convolutional layer, the first global average pooling layer, the fully connected layer dimension reduction layer, the activation layer, the fully connected layer dimension increase layer, the second global average pooling layer and the channel weighted layer are connected in sequence;

The output of the third convolutional layer is also superimposed on the output data of the second global average pooling layer through a residual connection.

9. According to claim 6, the lightweight semantic communication system for image transmission is characterized in that the working mode of the encoder is a sequential processing mode, and the external shift convolution unit is also used to: obtain a scaling factor based on the difference between the size of the input data of the current downsampling module and the size of the target image, and obtain the shift operation step of the external shift convolution unit based on the scaling factor, so that the size of the obtained first feature map is consistent with the size of the input data of the current downsampling module.

10. The lightweight semantic communication system for image transmission according to claim 6, wherein the first feature map is also spliced into the input data of the first atrous spatial pyramid pooling module.