CN119831845B - Image processing and model training method and device - Google Patents
Image processing and model training method and deviceInfo
- Publication number
- CN119831845B CN119831845B CN202510314154.6A CN202510314154A CN119831845B CN 119831845 B CN119831845 B CN 119831845B CN 202510314154 A CN202510314154 A CN 202510314154A CN 119831845 B CN119831845 B CN 119831845B
- Authority
- CN
- China
- Prior art keywords
- target
- coding
- image
- super
- type group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Processing (AREA)
Abstract
The application discloses an image processing and model training method and device, wherein the image processing method comprises the steps of determining a target coding mode of a first image, determining a target super-resolution model corresponding to the target coding mode from a plurality of super-resolution models, wherein the plurality of super-resolution models correspond to different coding type groups, the target coding mode is matched with the coding type group corresponding to the target super-resolution model, and processing the first image by utilizing the target super-resolution model to generate a second image. The model training method comprises the steps of determining super-resolution models to be trained of various coding type groups, obtaining a first sample image and a corresponding second sample image which are coded by the coding mode for each coding mode of the coding type groups, and training the super-resolution model corresponding to the coding type group by taking the first sample image corresponding to the coding type group as training data and the second sample image corresponding to the first sample image as a training target for each coding type group.
Description
Technical Field
The present application relates to the field of image processing technologies, and in particular, to an image processing and model training method and apparatus.
Background
Super-resolution reconstruction refers to an implementation technique for recovering a super-resolution image with higher resolution from a low-resolution image. However, the image quality of the super-resolution image obtained by reconstructing the low-resolution image is poor based on the super-resolution reconstruction technology at present.
Disclosure of Invention
In one aspect, the present application provides an image processing method, including:
obtaining a decoded first image, and determining a target coding mode corresponding to the first image;
determining a target super-resolution model corresponding to the target coding mode from a plurality of super-resolution models, wherein the plurality of super-resolution models correspond to different coding type groups, the target coding mode is matched with the coding type group corresponding to the target super-resolution model, and the coding modes matched with the same coding type group have at least one same coding characteristic;
and processing the first image by using the target super-resolution model to generate a second image.
In one possible implementation manner, determining a target super-resolution model corresponding to the target coding mode from a plurality of super-resolution models includes:
And determining a target super-resolution model corresponding to the target coding mode based on a mapping list, wherein the mapping list records at least two coding modes respectively included by a plurality of coding type groups and the super-resolution model corresponding to each coding type group.
In yet another possible implementation manner, the determining, based on the mapping list, a target super-resolution model corresponding to the target coding manner includes:
if the target coding mode is queried from a mapping list, determining a target super-resolution model corresponding to the target coding mode from the mapping list;
And if the target coding mode is not queried from the mapping list, determining a target coding type group with the similarity of the corresponding coding mode and the target coding mode meeting the requirement from the mapping list, and determining a target super-resolution model corresponding to the target coding type group in the mapping list.
In yet another possible implementation manner, the determining the target super-resolution model corresponding to the target coding type group in the mapping list includes:
If a plurality of target coding type groups are determined from the mapping list, a plurality of target super-resolution models corresponding to the target coding type groups in the mapping list are determined;
The processing the first image by using the target super-resolution model to generate a second image includes:
Processing the first image by utilizing each target super-resolution model, and fusing candidate images obtained by processing the first image by each target super-resolution model based on the weight corresponding to each target super-resolution model to obtain a second image;
Or alternatively
Based on model parameters in each target super-resolution model, a comprehensive super-resolution model is constructed, and the first image is processed by using the comprehensive super-resolution model to generate a second image.
In yet another possible implementation manner, the super-resolution model recorded in the mapping list uses a first sample image coded by adopting each coding mode in the coding type group as training data, uses a second sample image corresponding to the first sample image as a training target, and trains out a super-resolution model corresponding to the coding type group, wherein the quality of the second sample image is higher than that of the first sample image.
In another possible implementation manner, the second sample image corresponding to the first sample image is an uncoded original image corresponding to the first sample image or an optimized image obtained by performing image optimization processing on the original image, wherein the quality of the original image corresponding to the first sample image is higher than that of the first sample image;
the optimized image is obtained by sharpening the detail area of the original image.
In yet another possible implementation manner, different area weights are given to different areas in the second sample image in a loss function adopted by training the super-resolution model, and the detail complexity of the different areas in the second sample image is different;
And/or training a loss function adopted by the super-resolution model to include a noise suppression term, wherein different suppression weights are given to different pixel point areas in the second sample image in the noise suppression term, wherein the larger the difference between the pixel point areas in the second sample image and corresponding pixel point areas in a target image is, the larger the suppression weights given to the pixel point areas in the second sample image are, the pixel point areas comprise at least one pixel point, and the target image is an image generated by processing the first sample image by the super-resolution model;
and/or training the loss function adopted by the super-resolution model corresponding to each coding type group to be matched with the coding characteristics of the coding mode corresponding to the coding type group.
In yet another aspect, the present application further provides a model training method, including:
Determining super-resolution models to be trained corresponding to at least two coding type groups respectively, wherein the coding modes matched with the same coding type group have at least one same coding characteristic;
For each coding mode corresponding to each coding type group, obtaining a first sample image coded by the coding mode corresponding to the coding type group and a second sample image corresponding to the first sample image, wherein the quality of the second sample image is higher than that of the first sample image;
And for each coding type group, taking a first sample image corresponding to the coding type group as training data, taking a second sample image corresponding to the first sample image as a training target, and training a super-resolution model corresponding to the coding type group to obtain a trained super-resolution model corresponding to the coding type group.
In one possible implementation manner, the obtaining a first sample image encoded by the encoding manner corresponding to the encoding type group and a second sample image corresponding to the first sample image includes:
Obtaining an uncoded original image;
encoding the original image by utilizing the encoding mode corresponding to the encoding type group to obtain a first sample image;
and determining the original image or an optimized image obtained by optimizing the original image as a second sample image corresponding to the first sample image.
In another possible implementation manner, the training of the super-resolution model corresponding to the coding type group with the first sample image corresponding to the coding type group as training data and the second sample image corresponding to the first sample image as a training target includes:
Taking a first sample image corresponding to the coding type group as training data, taking a second sample image corresponding to the first sample image as a training target, and training a super-resolution model corresponding to the coding type group by combining a loss function;
In the loss function, different area weights are given to different areas in the second sample image, and the detail complexity of the different areas in the second sample image is different;
And/or the loss function comprises a noise suppression term, in the noise suppression term, different suppression weights are given to different pixel point areas in the second sample image, wherein the larger the difference between the pixel point areas in the second sample image and corresponding pixel point areas in a target image is, the larger the suppression weights given to the pixel point areas in the second sample image are, the target image is an image generated by processing the first sample image through the super-resolution model, and the pixel point areas comprise at least one pixel point;
and/or training the loss function adopted by the super-resolution model corresponding to each coding type group to be matched with the coding characteristics of the coding mode corresponding to the coding type group.
In still another aspect, the present application also provides an image processing apparatus, including:
the type determining unit is used for obtaining a decoded first image and determining a target coding mode corresponding to the first image;
The model determining unit is used for determining a target super-resolution model corresponding to the target coding mode from a plurality of super-resolution models, wherein the plurality of super-resolution models correspond to different coding type groups, the target coding mode is matched with the coding type group corresponding to the target super-resolution model, and the coding modes matched with the same coding type group have at least one same coding characteristic;
And the image processing unit is used for processing the first image by utilizing the target super-resolution model to generate a second image.
In yet another aspect, the present application further provides a model training apparatus, including:
The model determining unit is used for determining super-resolution models to be trained corresponding to at least two coding type groups respectively, and the coding modes matched with the same coding type group have at least one same coding characteristic;
A sample obtaining unit, configured to obtain, for each coding mode corresponding to each coding type group, a first sample image coded by the coding mode corresponding to the coding type group, and a second sample image corresponding to the first sample image, where the quality of the second sample image is higher than that of the first sample image;
The model training unit is used for training the super-resolution model corresponding to the coding type group by taking the first sample image corresponding to the coding type group as training data and the second sample image corresponding to the first sample image as a training target for each coding type group so as to obtain the trained super-resolution model corresponding to the coding type group.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
FIG. 1 is a schematic flow chart of an image processing method according to the present application;
FIG. 2 is a schematic flow chart of the model training method provided by the application;
FIG. 3 is a schematic flow chart of a model training method according to the present application;
FIG. 4 is an exemplary diagram of an implementation framework for training a super-resolution model corresponding to a single encoding mode in the present application;
FIG. 5 is a schematic flow chart of an image processing method according to the present application;
FIG. 6 is a schematic diagram of a composition of an image processing apparatus according to the present application;
FIG. 7 is a schematic diagram of a composition structure of the model training apparatus according to the present application;
fig. 8 is a schematic diagram of a composition architecture of an electronic device according to the present application.
Detailed Description
Embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. The terminology used in the description of the embodiments of the application herein is for the purpose of describing particular embodiments of the application only and is not intended to be limiting of the application. As one of ordinary skill in the art can know, with the development of technology and the appearance of new scenes, the technical scheme provided by the embodiment of the application is also applicable to similar technical problems.
The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which embodiments of the application have been described in connection with the description of the objects having the same attributes. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the present application, where the method of the present application is applied to an electronic device, and the electronic device may be a notebook computer, a desktop computer, a platform computer, or the like, or may be a device node in a cloud or a distributed cluster, which is not limited.
The method of the present embodiment may include the following steps S101 to S103:
s101, obtaining a decoded first image, and determining a target coding mode corresponding to the first image.
The first image is an image which needs to be subjected to image reconstruction processing by using a super-resolution model. Since the first image is an image decoded from the encoded and compressed image, the first image is affected by an encoder, for example, noise is introduced in the process of compression encoding the image by the encoder, so that the image quality of the decoded first image is relatively low. For example, the image resolution of the first image is relatively low, noise is high, and so on.
The target coding scheme is a coding scheme used when the first image is coded. It will be appreciated that, when the type of encoder used to encode the first image is different, the encoding scheme used to encode the first image is also different, and thus the target encoding scheme is also the encoding scheme corresponding to the encoder used to encode the first image.
In the present application, the encoding modes may be various, for example, the encoding modes may include, but are not limited to, joint photographic experts group (Joint Photographic Experts Group, JPEG) encoding, high efficiency Video encoding (HIGH EFFICIENCY Video Coding, HEVC), or h.264 (advanced Video encoding of moving picture experts group-4), wherein the high efficiency Video encoding is also called h.265 encoding.
The determining the target coding mode corresponding to the first image may be determining the target coding mode corresponding to the first image by analyzing metadata of the first image. The metadata of the first image may include at least a coding type of the first image, and may further include other attribute information of the first image, which is not limited in particular. For example, since the first image is obtained by decoding the decoded image, metadata of the encoded image can be obtained, and the target encoding mode of the encoded image is determined based on the metadata of the encoded image, thereby obtaining the target encoding mode corresponding to the first image. Wherein the metadata of the encoded image is actually the metadata of the first image.
The target coding mode may be determined by analyzing an extension of the first image, or the target coding mode corresponding to the first image may be detected by a specific coding detection tool, or the target coding mode corresponding to the first image may be determined by analyzing a compression noise mode corresponding to the first image, a frequency domain feature, or other features related to image coding, which is not particularly limited.
In the present application, the first image may be an independent single-frame image, or may be an image frame in a video, which is not particularly limited.
S102, determining a target super-resolution model corresponding to the target coding mode from the plurality of super-resolution models.
The target coding mode is matched with the coding type group corresponding to the target super-resolution model. For convenience of distinction, the super-resolution model corresponding to the target encoding scheme is referred to as a target super-resolution model, and thus the target super-resolution model belongs to the plurality of super-resolution models.
In the present application, the coding modes matched with the same coding type group have at least one identical coding characteristic. For example, a coding scheme matching the same coding type group may include at least two coding schemes, which may have one or more identical coding features. Based on this, the present application actually matches each coding mode having at least one same coding feature to the same coding type group, and the coding modes matched to the same coding type group can share the same super-resolution model, so that it is not necessary to separately construct a super-resolution model for each coding mode, and naturally the number of super-resolution models to be constructed and stored can be reduced.
In one possible case, the possible code type groups may be pre-partitioned, so that the code patterns included in each code type group are predetermined, and of course, the code patterns included in the same code type group have at least one identical characteristic. In this case, after determining the target coding mode corresponding to the first image, a target coding type group including the target coding mode may be determined, and a target super-resolution model corresponding to the target coding type group may be determined.
In yet another possible scenario, the present application may determine only that each coding type group corresponds to at least one coding feature, but the coding modes included in the coding type group are not pre-partitioned. In this case, after determining the target coding mode corresponding to the first image, the present application may determine, based on the coding features of the target coding mode, a target coding type group to which the target coding mode is matched, and determine a target super-resolution model corresponding to the target coding type group.
The coding mode may have coding features with multiple dimensions, for example, the coding features of the coding mode may include, but are not limited to, coding algorithm principles, coding complexity, and coding applicable scenes.
Taking as an example the determination of different coding type groups based on this coding feature of the coding algorithm principle:
The coding algorithm principles corresponding to the different coding type groups are different, for example, the coding algorithm principles corresponding to the different coding type groups can be transform-based coding (also called transform coding), block prediction-based coding, entropy-based coding, or the like.
For example, for a coding type group whose coding characteristics match that of transform-based coding, transform-based coding may be employed, and thus, coding schemes matching to the coding type group may include, but are not limited to, coding schemes such as JPEG coding, AV1 (open media video generation 1) coding, and Intra-frame coding (VCC Intra) in multifunctional video coding (VERSATILE VIDEO CODING, VCC).
For a coding type group with matched coding characteristics, such as coding based on block prediction, a coding mode based on block prediction can be adopted, and thus, the coding mode matched to the coding type group can include, but is not limited to, H.264 coding, HEVC coding and Inter-frame coding (VCC Inter) in VCC.
Similarly, for the set of coding types for which the respective coding modes of the entropy-based coding match, the coding modes may include, but are not limited to, huffman coding, shannon coding, and the like. In the application, the target super-resolution model is suitable for super-resolution reconstruction of images coded by various coding modes matched with the coding type group corresponding to the target super-resolution model. Wherein, for convenience of distinction, the encoding type group corresponding to the target super-resolution model may be referred to as a target encoding type group. For example, the target super-resolution model can pertinently reduce noise of an image encoded by any encoding mode corresponding to the target encoding type group, and reduce noise of a corresponding type in the image, which is introduced by the encoder encoding of any encoding mode corresponding to the target encoding type group.
The target super-resolution model is a trained model, and the model type of the target super-resolution model can have various possibilities, for example, the target super-resolution model can be a convolutional neural network model, a deep recursive convolutional network or a generating countermeasure network model, and the like, which is not limited.
In the application, the specific implementation of training to obtain the target super-resolution model is not limited. For example, in one possible implementation manner, the target super-resolution model is a super-resolution model that uses a first sample image encoded by an encoding manner corresponding to the target encoding type group as training data, uses a second sample image corresponding to the first sample image as a training target, and is trained and corresponds to the target encoding type group. Wherein the quality of the second sample image is higher than the quality of the first sample image.
Wherein the quality of the second sample image being higher than the quality of the first sample image may be embodied in a number of ways. For example, the quality of the second sample image being higher than the quality of the first sample image may at least be indicative of the resolution of the second sample image being higher than the resolution of the first sample image, and may be indicative of the noise in the second sample image being less than the noise in the first sample image, and the edge sharpening degree and the detail richness of the second sample image being higher than the edge sharpening degree and the detail richness of the sample image, etc.
S103, processing the first image by using the target super-resolution model to generate a second image.
Wherein the second image is a high quality image generated by the target super resolution model based on the first image, and therefore the quality of the second image is higher than the quality of the first image.
For example, a quality of the second image being higher than a quality of the first image may at least be indicative of a resolution of the second image being higher than a resolution of the first image. Of course, the quality of the second image being higher than the quality of the first image may also be indicative of less noise contained in the second image than the first image, etc.
It will be appreciated that the present application may also output a second image after it is generated, so that a user can obtain the second image with relatively high image quality. Of course, in actual application, after the second image is generated, the second image may be further saved according to actual needs, which is not described in detail.
As can be seen from the above, after the first image is decoded, the present application determines the target coding mode corresponding to the first image, and processes the first image by using the target super-resolution model corresponding to the target coding type group matched with the target coding mode, so that the target super-resolution model suitable for processing the first image coded by the target coding mode can be selected, noise or other influence of the first image in the coding process can be removed more effectively by the target super-resolution model, and further, the image quality of the generated second image can be improved.
In addition, each coding mode matched to the same coding type group has at least one same coding characteristic, so that the coding characteristics of each coding mode matched to the same coding type group are similar, each coding mode matched to the same coding type group can share the same super-resolution model, the universality of the coding modes suitable for a single super-resolution model is improved, and the number of super-resolution models needing to be constructed and stored can be reduced.
In one possible implementation manner, in order to efficiently determine a super-resolution model corresponding to a coding type group matched to a coding manner, and reduce the time required for determining the super-resolution model corresponding to the coding manner, the present application may further construct a mapping list in which at least two coding manners respectively included in multiple coding type groups and a super-resolution model corresponding to each coding type group are recorded. Wherein different super-resolution models in the mapping list correspond to different groups of coding types, respectively.
Based on the mapping list, the application can determine the target super-resolution model corresponding to the target coding mode.
For example, the mapping list records the group identifier of each coding type group, at least two coding modes corresponding to the group identifier of each coding type group, and the model identifier of the super-resolution model corresponding to the group identifier of each coding type group. The group identifier may be a coding feature shared by all coding modes in the coding type group, or information for uniquely identifying the coding type group, such as a group number corresponding to the coding type group, which is not particularly limited. Similarly, the model identifier of the super-resolution model may be a model name, a number, or a call address of the super-resolution model, which is not limited in particular.
The target super-resolution model corresponding to the target coding mode is determined by inquiring the mapping list, so that time consumption generated by calculating the target coding type group matched with the coding characteristics of the target coding mode can be reduced, the efficiency of determining the target super-resolution model can be naturally improved, time consumption required by processing the first image to be reconstructed can be reduced, and the efficiency of reconstructing the super-resolution image of the first image is improved.
In the present application, the model structure of the super-resolution model corresponding to each coding type group may have various possibilities, and in particular, reference may be made to the description of the foregoing possible model structure of the target super-resolution model, which is not repeated.
In the application, each super-resolution model can be obtained through pre-training, and the specific training process is not limited. The super-resolution model is trained, so that the same super-resolution model has different processing effects on the super-resolution processing of the images coded by the coding modes corresponding to different coding type groups.
In one possible implementation manner, in order to make the super-resolution model corresponding to each coding type group pay more attention to the influence of noise and the like induced by using each coding mode (or the encoder corresponding to each coding mode) in the coding type group on image coding, for the super-resolution model corresponding to any coding type group, the super-resolution model may be a trained super-resolution model corresponding to the coding type group, where a first sample image coded by using each coding mode in the coding type group is used as training data, and a second sample image corresponding to the first sample image is used as a training target. As previously described, the quality of the second sample image is higher than the quality of the first sample image.
For any one coding type group, each first sample image corresponds to one coding mode in the coding type group, the number of the first sample images for training the super-resolution model can be multiple, and the coding modes corresponding to the plurality of the first sample images can comprise at least two coding modes matched with the coding type group.
For ease of understanding, one implementation of training to obtain super-resolution models corresponding to various coding type groups is described below in conjunction with the flowchart shown in fig. 2. Fig. 2 shows a flowchart of an implementation of a model training method according to an embodiment of the present application, where the model training method may include the following steps S201 to S203:
s201, determining super-resolution models to be trained corresponding to at least two coding type groups.
Wherein the coding modes matched with the same coding type group have at least one same coding characteristic. For example, before training the super-resolution model, at least two matching such groups of coding types may be determined for each group of coding types.
The model types of the super-resolution models to be trained corresponding to different coding type groups can be the same or different, and can be specifically selected according to actual needs without limitation. The types of the super-resolution models that may be selected may be referred to in the foregoing description, and will not be described herein.
S202, for each coding mode corresponding to each coding type group, obtaining a first sample image coded by the coding mode corresponding to the coding type group and a second sample image corresponding to the first sample image.
The first sample image may be an image obtained by encoding an original image with image quality meeting a requirement by an encoder corresponding to one encoding mode in the encoding type group. Wherein the quality of the original image is higher than the quality of the first sample image.
In one implementation, each encoding type group may correspond to a plurality of first sample images including at least one first sample image encoded with each encoding type of the encoding type group.
For example, in the case where each coding scheme in the coding type group belongs to coding based on transform compression, assuming that the coding type group includes two coding schemes of JPEG and AV1, at least one original image having a resolution exceeding a set threshold may be compression-coded by a JPEG encoder to obtain at least one first sample image corresponding to the JPEG coding scheme, and at least one original image having a resolution exceeding the set threshold may also need to be compression-coded by an AV1 encoder to obtain at least one first sample image corresponding to the AV1 coding scheme. Wherein each first sample image contains the same object content as its corresponding original image, but the resolution of the first sample image is lower than the resolution of the original image.
Wherein the quality of the second sample image is higher than the quality of the first sample image.
The target of training the super-resolution model corresponding to the coding type group is to expect that the super-resolution model corresponding to the coding type group can generate the second sample image based on the first sample image, so that the second sample image is a training target of training the super-resolution model corresponding to the coding type group.
S203, for each coding type group, taking a first sample image corresponding to the coding type group as training data, taking a second sample image corresponding to the first sample image as a training target, and training a super-resolution model corresponding to the coding type group to obtain a trained super-resolution model corresponding to the coding type group.
The second sample image corresponding to the first sample image is taken as a training target, and it is essentially desirable to minimize the difference between the target image generated by the super-resolution model based on the first sample image and the first sample image, so that the target image generated by the super-resolution model based on the first sample image is as similar as possible to the second sample image.
As can be seen from the above, for each coding type group, the present application uses each first sample image coded by each coding mode in the coding type group as training data, and uses a second sample image with relatively higher quality corresponding to the first sample image as a training target, and trains to obtain a super-resolution model corresponding to the coding type group, so that the super-resolution model corresponding to each coding type group can reconstruct an image coded by any coding mode in the coding type group into a high-quality image.
In addition, although noise is introduced when the encoder corresponding to each coding mode in each coding type group codes the image, the noise types introduced by the encoder corresponding to each coding mode matched to the same coding type group codes the image are similar, so that the super-resolution image reconstruction is carried out on the image coded by the encoder corresponding to each coding mode in each coding type group by adopting the same super-resolution model, the super-resolution model not only can effectively remove the specific type of noise introduced by the encoder corresponding to the coding type group, improve the quality of the reconstructed super-resolution image, but also can effectively reduce the problems of huge model quantity and high maintenance cost caused by training a super-resolution model for each coding mode in the coding type group.
It can be appreciated that in the training process of the super-resolution model corresponding to each coding type group, the training of the super-resolution model needs to be controlled by combining the corresponding loss function. The loss function is used to quantify a difference of a target image generated by the super-resolution model based on the first sample image and a second sample image corresponding to the first sample image. For example, the function value of the loss function can be combined to judge whether the super-resolution model reaches the training target, and adjust the model parameters of the super-resolution model.
Based on this, in the present application, for each coding type group, the super-resolution model corresponding to the coding type may be trained by using the first sample image corresponding to the coding type group as training data, using the second sample image corresponding to the first sample image as a training target, and combining the loss function.
The loss function can be selected according to actual needs, and is not particularly limited.
In one possible case of a loss function, different regions in the second sample image are given different region weights in the loss function employed for training the super resolution model. Wherein the detail complexity of different regions in the second sample image is different.
Each region in the second sample image has a one-to-one correspondence with a corresponding region in the target image generated by the super-resolution model based on the first sample image. Based on this, the function value of the loss function is related to the difference between the target image and the second sample image over different areas and the area weights corresponding to the different areas.
Unlike conventional loss functions, where the function value is related only to the overall difference between the target image generated by the super-resolution model and the second sample image, in the present application, because different areas of the second sample image are given different area weights in the loss function, the learning ability of the super-resolution model for areas with certain detail complexity meeting requirements can be enhanced in the training process of the super-resolution model.
The second sample image may have multiple dividing bases for different regions, and several possible cases of dividing different regions in the second sample image are described below:
For example, in one possible case, a detail region and a flat region may be divided in the second sample image. On the basis, different area weights are given to the detail area and the flat area in the second sample image in the loss function, and the area weight of the detail area is larger than that of the flat area, so that the super-resolution model can learn more detail information in the second sample image in the process of training the super-resolution model, the sharpening degree of the detail area of the generated image is enhanced, and the flat area of the generated image can be kept smooth, so that new noise introduced into the generated image due to sharpening processing is reduced.
The detail area of the image refers to an area containing abundant information, wherein the pixel value of the image changes severely. The detail area of the image typically contains edges, textures, patterns or complex structures. The flat region of the image is a region in which the pixel value changes smoothly and contains less information. For example, a region of the image having a detail complexity level not lower than the set threshold may be a detail region, and a region of the image having a detail complexity level lower than the set threshold may be a flat region. One or more detail regions may be included in the image, and correspondingly, one or more flat regions may be included in the image.
In the present application, there is no limitation on the specific implementation of determining the detail area and the flat area in the image. For ease of understanding, two implementations are illustrated.
For example, in one implementation, edge detection algorithms may be employed to detect edge regions and non-edge regions in an image, the edge regions typically belonging to detail regions, and the non-edge regions belonging to flat regions. The specific implementation process includes that edge detection is carried out on an image to obtain a binarized edge map, an area where white pixels are located in the edge map is determined to be a detail area, and an area where black pixels are located is determined to be a flat area.
In yet another implementation, a texture analysis method may be used to determine a detail region and a flat region of an image based on a texture complexity of the image, where the texture complexity reflects a degree of richness of the texture of a local region of the image, a region having a texture complexity not lower than a set complexity threshold is the detail region, and a region having a texture complexity lower than the complexity threshold is the flat region. For example, texture feature extraction methods such as a local binary pattern, a gray level co-occurrence matrix and the like can be used for extracting texture feature values of different pixel areas in an image, the texture feature values can represent texture complexity, and accordingly, a detail area and a flat area of the image are determined by combining a set texture feature threshold.
Of course, there may be other ways of determining the detail area and the flat area in the image, which will not be described herein.
In yet another possible case, the second sample image is divided into an object region having the object and a background region having no object based on the object present in the second sample image, resulting in a plurality of different regions in the second sample image.
Wherein, the object area and the background area in the second sample image can be determined by using an edge detection algorithm or an object recognition model, and the like, and the method is not limited in particular. For example, taking an edge detection algorithm to determine an object region and a background region in the second sample image as an example, edges of each object in the second sample image can be detected based on the edge detection algorithm, so as to obtain an object region of each object in the second sample image and a background region outside the object region.
In yet another possible scenario, the object of interest in the second sample image may be set as desired, and the region of interest and the non-region of interest in the second image may be determined, resulting in a plurality of different regions in the second sample image. For example, the region of interest and the region of non-interest in the second sample image may be identified in combination with the identification model, or the second sample image may be semantically segmented based on a semantic segmentation algorithm to segment the region of interest and the region of non-interest in the second sample image. Of course, there may be other ways of determining the region of interest and the non-region of interest in the second sample image, without limitation.
In the present application, the specific form of the loss function may be varied. For example, the loss function may be a pixel-level loss function in which at least one of an absolute difference value and a square difference of pixel values of the target image and the second sample image at pixel points at different positions is calculated. As another example, the loss function may be a perceptual loss function in which a distance of the target image from the second sample image in the feature space may be calculated. As another example, the loss function may be a gradient loss function. Of course, the loss function may take other forms, without limitation.
To facilitate understanding of the loss function, a specific implementation of weighting different regions of the second sample image is described below with the loss function as a pixel level loss function, and in combination with a specific functional form of the loss function. The following formula is a loss functionIs an expression of (a):
wherein, the Is the total number of pixels in the second sample image (or target image).
AndThe height and width of the second sample image are respectively, namely the number of pixels on each column and the number of pixels on each row in the second sample image;
is at the first in the second sample image Column IAnd the pixel points on the row are corresponding to the weights.
Object image generated for super-resolution modelColumn IPixel values of pixel points on a row.
Is the first in the second sample imageColumn IPixel values of pixel points on a row.
In the present application, the region weights of different regions in the second sample image or the target image are different, and therefore, in the above formula, the weights of the pixel pointsAnd (3) the region weight corresponding to the region of the pixel point in the second sample image (or the target image). From this, it can be seen that the weights corresponding to the pixel points located in the different regions in the second sample imageNor is it the same.
In yet another possible scenario, in order to be able to reduce the noise that the super resolution model may amplify during reconstruction of the super resolution image, in the present application, a noise suppression term may be included in the loss function, in which different suppression weights are given to different pixel point areas in the second sample image. Wherein each pixel area comprises at least one pixel.
For any pixel point region in the second sample image, the larger the difference between the pixel point region and the corresponding pixel point region in the target image is, the larger the suppression weight is given to the pixel point region. Wherein the target image is an image generated by processing the first sample image by the super-resolution model.
The corresponding pixel point area in the target image is the same as the pixel point area in the second sample image in the coordinate area. Based on this, the suppression weights corresponding to the pixel areas of the second sample image are actually suppression weights corresponding to the corresponding pixel areas in the target image.
The value of the noise suppression term is a weighted sum of differences between the target image and the second sample image on each pixel point area based on the suppression weight of each pixel point area in the second sample image. The larger the value of the noise suppression term is, the larger the function value of the loss function is.
It will be appreciated that the greater the difference between the target image and the second sample image in the same pixel region, the more noise in that pixel region in the target image. In order to reduce noise in each pixel area in the target image in the super-resolution model training process, the method and the device assign larger weight (namely inhibit weight) to the pixel area with larger difference between the target image and the second sample image, so that the super-resolution model can learn the pixel area with larger difference between the target image and the second sample image, and noise of the pixel area in the target image is reduced.
The difference between the target image and the second sample image in different pixel areas may change along with the training of the super-resolution model, so that the suppression weights corresponding to the different pixel areas may also be adaptively adjusted along with the change of the difference between the target image and the second sample image in the different pixel areas.
Still taking the above formula as an example, with continuous training of the super-resolution model, the difference between the target image and the second sample image in different pixel areas will change, so that the suppression weight corresponding to the pixel area changes, and the weight of each pixel in the pixel area is changedAnd will vary accordingly.
It will be appreciated that in practical applications, the loss function for training the super-resolution model may include the two possible cases that the loss function includes the above-mentioned loss suppression term while different region weights are given to different regions in the second sample image. For example, in the loss function, the weight given to a certain pixel point in the second sample image includes a weight a and a weight b. The weight a is a fixed weight allocated to the region where the pixel point is located in the second sample image. The weight b is a suppression weight determined based on the difference between the pixel point in the second sample image and the corresponding pixel point in the target image, so the weight b can be a variable weight value along with continuous training of the super-resolution model.
In another possible case, considering that in the present application, the first sample image may be a still image including a static object such as a landscape or an object, or may be a dynamic image including a dynamic object such as an animal or a person, or may be a dynamic image derived from a video, or the like, in order to train the super-resolution model corresponding to each coding type group more reasonably, different loss functions may be determined by combining the duty ratio of the still image and the dynamic image in the first sample image corresponding to the coding type group.
In yet another possible scenario, the loss function employed to train the super-resolution model for each set of coding types matches the coding characteristics that the coding scheme for that set of coding types has. Based on this, for any coding type group, if the same coding characteristics of each coding mode in the coding type group are different, the loss function selected by training the super-resolution model corresponding to the coding type group may also be different. The appropriate loss function may be determined specifically in combination with the edge features common to the coding modes in the set of coding types. It will be appreciated that the loss function of training the super-resolution model may be specifically without limitation by including one or more of the above many possible scenarios.
In the present application, many possible implementations of obtaining the second sample image are possible.
For example, in one possible case, the second sample image may be an original image corresponding to the above-mentioned first sample image, i.e., an unencoded original image corresponding to the first sample image.
In this case, an uncoded original image may be obtained first for each coding type group. Then, the original image is encoded by an encoder of an encoding mode corresponding to the encoding type group, and a first sample image is obtained. Accordingly, the original image corresponding to the first sample image may be determined as the second sample image.
It will be appreciated that by compression encoding the original image, not only the resolution of the original image is reduced, but compression noise or the like may be introduced into the original image, so that the quality of the generated first sample image is lower than its corresponding original image. The present application trains the super-resolution model by using the first sample image as training data, and aims to enable the super-resolution model to restore the first sample image to the original image by training the super-resolution model.
In still another possible case, the second sample image is an optimized image obtained by performing image optimization processing on an original image corresponding to the first sample image.
It will be appreciated that the image quality of the original image may be improved by performing an optimization process on the original image, and therefore, the quality of the optimized image is higher than that of not only the first sample image but also the second sample image. Based on the method, the optimized image is used as a second sample image corresponding to the first sample image, and the super-resolution model is trained, so that the quality of the image generated by the super-resolution model is improved.
The method of optimizing the original image may be any processing method capable of improving the quality of the original image, which is not limited.
In an alternative way, in consideration of the fact that the detail area information in the image is rich, the optimized image can be obtained by sharpening the detail area of the original image. Wherein, the flat area in the original image is kept unchanged in the process of sharpening the detail area of the original image.
On the basis, the optimized image is obtained by sharpening the detail area of the original image, so that the optimized image is used as a second sample image corresponding to the first sample image to train the super-resolution model, the capability of the super-resolution model to learn the detail information in the image can be enhanced, the image quality generated by the super-resolution model can be improved, the excessive processing of the flat area in the image can be avoided to a certain extent, and the situation that new noise is introduced due to sharpening enhancement of the flat area of the image in the process of generating the image by the super-resolution model is reduced.
For easy understanding, the model training method of the present application will be described below by taking the second sample image as an example of the optimized image corresponding to the first sample image. Referring to fig. 3, a schematic flow chart of a model training method provided by the present application is shown, where this embodiment may include:
s301, determining super-resolution models to be trained corresponding to at least two coding type groups.
S302, for each encoding type group, an uncoded original image is obtained.
The original image is a high-quality image which is not encoded, for example, the original image which is not encoded by each encoding mode in the encoding type group.
For example, for each coding type group, at least one original image may be obtained.
S303, the original image is encoded by utilizing the encoding mode corresponding to the encoding type group, and the first sample image is obtained.
For example, the original image is encoded by using the encoder corresponding to each encoding mode in the encoding type group, so as to obtain at least one first sample image corresponding to each encoding mode in the encoding type group, and the original images corresponding to the first sample images are identical.
It can be understood that, after the original image is encoded by the encoder corresponding to each encoding mode, not only the resolution of the original image is reduced, but also noise of a noise type corresponding to the encoder is introduced, so that the quality of the first sample image obtained by encoding the original image by any encoding mode is lower than that of the original image.
S304, sharpening the flat area in the original image, and keeping the flat area in the original image unchanged to obtain an optimized image obtained by processing the original image.
By sharpening the flat areas of the original image, edges and details in the original image can be enhanced, thereby improving image quality. And only the flat area in the original image is sharpened, and new noise can be introduced as little as possible, so that the optimized image has higher image quality than the original image.
S305, taking the first sample image corresponding to the coding type group as training data, taking the optimized image as a training target, and training the super-resolution model corresponding to the coding type group by combining a loss function to obtain the trained super-resolution model corresponding to the coding type group.
Because the optimized image has higher image quality than the original image, the optimized image is used as a training target for training the super-resolution model, and the quality of the image generated by the super-resolution model can be further improved through continuous training.
The loss function may be any one or a combination of the foregoing possible cases, which will not be described in detail.
In a possible implementation manner, the present application may also train the super-resolution model corresponding to the coding manner separately for some more commonly used coding manners, so as to obtain the super-resolution model corresponding to each of the at least one coding manner. On the basis, after determining the target coding mode corresponding to the first image, the method can also inquire whether the super-resolution model corresponding to the target coding mode exists or not from the super-resolution models corresponding to at least one coding mode. If a super-resolution model corresponding to the target coding mode only exists, the super-resolution model is determined as a target super-resolution model.
If there is no super-resolution model corresponding to only the target coding scheme, a target super-resolution model corresponding to the target coding scheme may be determined based on the mapping list.
The specific implementation of the super-resolution model corresponding to the single coding mode is similar to the previous process of training the super-resolution model corresponding to the coding type group, except that when the super-resolution model corresponding to the single coding mode is trained, the first sample image adopted for training the super-resolution model only comprises the sample image coded by the coding mode (or the coder coded by the coder corresponding to the coding mode). The following is a brief description of the process of training a super-resolution model corresponding to a single coding scheme, in conjunction with an exemplary diagram of a model training implementation framework shown in fig. 4.
As can be seen from fig. 4, after collecting at least one original image with high quality, for each coding mode requiring training of the super-resolution model, an encoder of the coding mode may be used to perform compression coding on each original image, so as to generate at least one low quality image corresponding to the coding mode. Wherein part of the low-quality images in the at least one low-quality image are used as training data to form a training data set. The remaining part of the at least one low quality image is used as authentication data to construct an authentication data set.
In addition, before training the super-resolution model of the encoding mode, the key region (such as a detail region) of the original image is optimized (e.g. sharpened), and other regions (such as a flat region) are kept unchanged, so as to obtain an optimized image (a golden image in fig. 4).
On this basis, for each coding scheme, a super-resolution model of the coding scheme can be trained using low-quality images in the training dataset. A noise suppression term can be added in a loss function adopted for training the super-resolution model, and different region weights can be respectively given to a key region (such as a detail region) and a non-key region (such as a flat region) of the optimized image in the loss function so as to increase the learning capacity and the denoising capacity of the super-resolution model for the detail region and other heavy-point regions.
Before the super-resolution model is trained by combining the function values of the loss function, model parameters of the super-resolution model can be initialized.
In the training process, a low-quality image as training data is input into the super-resolution model to be trained, so that a target image (i.e., a forward propagation process) output by the super-resolution model. Based on the target image and the optimized image, a function value of the loss function is calculated. Model parameters of the super-resolution model are updated by back propagation based on the function values of the loss function. The process is repeated for a plurality of times, and the iterative training of the super-resolution model is realized.
After training the super-resolution model corresponding to the encoding mode only, the super-resolution model can be verified by using the low-resolution images in the verification set, and model optimization can be performed. On the basis, the finally trained super-resolution model can be subjected to model quantization, and the super-resolution model after model quantization is stored in a model library, so that the model library can comprise super-resolution models applicable to at least one encoder in an encoding mode.
It may be appreciated that, after determining the target coding mode corresponding to the first image, if the target coding mode is queried from the mapping list, the target super-resolution model corresponding to the target coding mode may be directly determined from the mapping list.
However, since the types of coding schemes are large and there are few coding schemes that can be recorded in the mapping list, there is a possibility that the target coding scheme does not exist in the mapping list after the target coding scheme corresponding to the first image is determined. In this case, the present application may also determine, from the mapping list, a target coding type group whose similarity between the corresponding coding scheme and the target coding scheme satisfies the requirement. Accordingly, a target resolution model corresponding to the target coding type group in the mapping list may be determined, so that the first image is processed using a target super resolution model corresponding to the target coding type group.
For example, the coding type group having the highest similarity between the included coding scheme and the target coding scheme may be determined as the target coding type group, or the coding type group having the similarity between the included coding scheme and the target coding scheme exceeding the set threshold may be determined as the target coding type group.
The method for determining the similarity between the coding mode corresponding to the coding type group and the target coding mode can be implemented in various ways.
For example, in one possible implementation, for each coding type group in the mapping list, the similarity of the target coding mode and each coding mode in the coding type group may be calculated separately. Correspondingly, the coding type group where the coding mode with the similarity meeting the requirement with the target coding mode is located is determined as the target coding type group.
The calculating of the similarity between the target coding mode and the coding modes in the coding type group may be calculating a vector similarity between a vector corresponding to the target coding mode and a vector corresponding to the coding modes in the coding type group. The vector corresponding to the target coding mode may be a vector converted from the name of the target coding mode, or the coding feature of the target coding mode is obtained, for example, the coding feature of the target coding mode may include coding characteristics such as a coding principle or a coding applicable scene, and the vector corresponding to the target coding mode is determined based on the coding feature of the target coding mode. The vectors corresponding to the coding modes in the coding type group are similar, and are not repeated.
In yet another possible implementation, for each coding type group in the mapping list, a similarity between the target coding mode and each coding mode in the coding type group is calculated. On the basis, for each coding type group, determining a similarity average value or a weighted average value of the similarity corresponding to each coding mode in the coding type group, and determining the corresponding similarity average value or the weighted average value as the similarity between the coding mode corresponding to the coding type group and the target coding mode, thereby determining the target coding type group with the similarity meeting the requirement between the corresponding coding mode and the target coding mode. For example, the encoding type group with the largest corresponding average similarity value is determined as the target encoding type group.
It is to be understood that there may be a plurality of code type groups whose similarity between the corresponding code manner and the target code manner satisfies the requirement, in which case the present application may randomly select one code type group from the determined plurality of code type groups as the target code type group.
In an optional manner, when it is determined that there are a plurality of coding type groups whose similarity between the corresponding coding manner and the target coding manner meets the requirement, the present application may further determine that the plurality of coding type groups are all the target coding type groups. On the basis, the method and the device can combine the target super-resolution models corresponding to the plurality of target coding type groups to comprehensively process the first image.
The following is a description with reference to fig. 5. Fig. 5 shows a schematic flow chart of an image processing method according to the present application, where the method of this embodiment may include:
s501, obtaining a decoded first image, and determining a target coding mode corresponding to the first image.
S502, inquiring whether the target coding mode exists in the mapping list, if so, executing step S503, and if not, executing step S504.
The mapping list records at least two coding modes respectively included in a plurality of coding type groups and super-resolution models corresponding to each coding type group.
S503, determining a target super-resolution model corresponding to the target coding mode from the mapping list, and executing step S506.
S504, determining a target coding type group with the similarity of the corresponding coding mode and the target coding mode meeting the requirement from the mapping list.
In this embodiment, if the target coding mode is not queried from the mapping list, a target coding type group whose similarity between the corresponding coding mode and the target coding mode meets the requirement is determined based on the coding modes included in each coding type group, so as to match the coding type of the included coding mode similar to the target coding mode, that is, determine the target coding type group to which the coding feature of the target coding mode may be matched.
For example, a target coding type group having the highest similarity between the coding scheme included in the mapping list and the target coding scheme may be determined.
S505, if a target coding type group with the similarity between the corresponding coding mode and the target coding mode meeting the requirement is determined, determining a target super-resolution model corresponding to the target coding type group in the mapping list, and executing step S506.
S506, inputting the first image into the target super-resolution model to obtain the target super-resolution model to generate a second image.
In this embodiment, if the target coding mode is not queried from the mapping list, it is detected whether a target coding type group whose similarity between the corresponding coding mode and the target coding mode meets the requirement exists in the mapping list, and if a target coding type group is queried, the target super-resolution model corresponding to the target coding type group is directly used to process the first image, so that a suitable super-resolution model can be adapted from the mapping list without retraining the corresponding super-resolution model even if a new coding mode not existing in the mapping list exists.
In particular, when there is only one target coding type group whose similarity between the corresponding coding mode and the target coding mode meets the requirement, it is explained that the coding features of the target coding mode are similar to the coding features of the coding modes in the target coding type.
S507, if a plurality of target coding type groups with the similarity meeting the requirement of the corresponding coding mode and the target coding mode are determined, a plurality of target super-resolution models corresponding to the plurality of target coding type groups are determined from the mapping list.
S508, constructing a comprehensive super-resolution model based on model parameters in each target super-resolution model, and processing the first image by using the comprehensive super-resolution model to generate a second image.
The building of the comprehensive super-resolution model based on the model parameters of each target super-resolution model may be to fuse model parameters such as weights of a plurality of target super-resolution models to generate a new super-resolution model.
In order to facilitate discrimination, the super-resolution model constructed based on each target super-resolution model is called a comprehensive super-resolution model.
The comprehensive super-resolution model integrates the image processing characteristics of each target super-resolution model, and the applicable coding mode is wider, so that the first image is processed by the comprehensive super-resolution model, a high-quality second image can be generated, and a new super-resolution model does not need to be retrained.
It will be appreciated that step 508 is illustrated with respect to one implementation in which a plurality of target super-resolution models corresponding to a plurality of target encoding type groups are present in the first image. In practical application, after determining a plurality of target super-resolution models corresponding to the plurality of target coding type groups, the method can also process the first image by utilizing each target super-resolution model to obtain candidate images generated by each target super-resolution model. And fusing candidate images obtained by processing the first image by each target super-resolution model based on the weight corresponding to each target super-resolution model to obtain a second image.
The weight of the target super-resolution model corresponding to each target coding type group can be determined based on the similarity between the coding mode corresponding to each target coding type group and the target coding mode, and the method is not particularly limited.
The application also provides an image processing device corresponding to the image processing method provided by the application.
As shown in fig. 6, which is a schematic diagram illustrating a composition structure of an image processing apparatus provided by the present application, the image processing apparatus of the present embodiment may include:
a type determining unit 601, configured to obtain a decoded first image, and determine a target coding mode corresponding to the first image;
A model determining unit 602, configured to determine a target super-resolution model corresponding to the target coding mode from multiple super-resolution models, where the multiple super-resolution models correspond to different coding type groups, the target coding mode matches the coding type group corresponding to the target super-resolution model, and the coding modes matched by the same coding type group have at least one identical coding feature;
an image processing unit 603 is configured to process the first image with the target super-resolution model, and generate a second image.
In one possible implementation, the determining unit from the model includes:
and the model determining subunit is used for determining a target super-resolution model corresponding to the target coding mode based on a mapping list, wherein the mapping list records at least two coding modes respectively included by a plurality of coding type groups and the super-resolution model corresponding to each coding type group.
In yet another possible implementation, the model determination subunit includes:
A first determining subunit, configured to determine, from a mapping list, a target super-resolution model corresponding to the target coding mode if the target coding mode is queried from the mapping list;
And the second determining subunit is used for determining a target coding type group with the similarity of the corresponding coding mode and the target coding mode meeting the requirement from the mapping list if the target coding mode is not queried from the mapping list, and determining a target super-resolution model corresponding to the target coding type group in the mapping list.
In yet another possible implementation manner, the second determining subunit is specifically configured to, when determining the target super-resolution model corresponding to the target coding type group in the mapping list, determine a plurality of target super-resolution models corresponding to a plurality of target coding type groups in the mapping list if a plurality of target coding type groups are determined from the mapping list;
The image processing unit includes:
The first processing subunit is used for processing the first image by utilizing each target super-resolution model respectively, and fusing candidate images obtained by processing the first image by each target super-resolution model based on the weight corresponding to each target super-resolution model to obtain a second image;
Or alternatively
And the second processing subunit is used for constructing a comprehensive super-resolution model based on model parameters in each target super-resolution model, and processing the first image by using the comprehensive super-resolution model to generate a second image.
In one possible implementation manner, the model determines the super-resolution model recorded in the mapping list adopted in the subunit to use the first sample image coded by each coding mode in the coding type group as training data, and uses the second sample image corresponding to the first sample image as a training target, where the trained super-resolution model corresponding to the coding type group has a quality higher than that of the first sample image.
In one possible implementation manner, the second sample image corresponding to the first sample image is an uncoded original image corresponding to the first sample image or an optimized image obtained by performing image optimization processing on the original image, wherein the quality of the original image corresponding to the first sample image is higher than that of the first sample image.
In yet another possible implementation, in the case where the second sample image is an optimized image, the optimized image is obtained by sharpening a detail region of the original image.
In yet another possible implementation, the loss function used to train the super-resolution model assigns different region weights to different regions in the second sample image, and the detail complexity of the different regions in the second sample image is different.
In yet another possible implementation manner, the loss function used for training the super-resolution model includes a noise suppression term, in which different suppression weights are given to different pixel areas in the second sample image, where the greater the difference between the pixel areas in the second sample image and the corresponding pixel areas in the target image, the greater the suppression weight given to the pixel areas in the second sample image, the pixel areas include at least one pixel, and the target image is an image generated by processing the first sample image by the super-resolution model.
In yet another possible implementation, the loss function employed to train the super-resolution model corresponding to each coding type group matches the coding features of the coding scheme corresponding to that coding type group.
In another aspect, the application further provides a model training device corresponding to the model training method. Referring to fig. 7, which is a schematic diagram illustrating a composition structure of a model training apparatus provided by the present application, the apparatus of this embodiment may include:
The model determining unit 701 is configured to determine a super-resolution model to be trained corresponding to each of at least two coding type groups, where a coding mode matched with the same coding type group has at least one same coding feature;
A sample obtaining unit 702, configured to obtain, for each coding mode corresponding to each coding type group, a first sample image encoded by the coding mode corresponding to the coding type group, and a second sample image corresponding to the first sample image, where the quality of the second sample image is higher than that of the first sample image;
The model training unit 703 is configured to train, for each coding type group, a super-resolution model corresponding to the coding type group by using a first sample image corresponding to the coding type group as training data and a second sample image corresponding to the first sample image as a training target, so as to obtain a trained super-resolution model corresponding to the coding type group.
In one possible implementation, the sample obtaining unit includes:
an original image obtaining subunit for obtaining an original image that is not encoded;
The coding processing subunit is used for coding the original image by utilizing a coding mode corresponding to the coding type group to obtain a first sample image;
and the image determining subunit is used for determining the original image or an optimized image obtained by optimizing the original image as a second sample image corresponding to the first sample image.
In another possible implementation manner, the model training unit is specifically configured to use a first sample image corresponding to a coding type group as training data, use a second sample image corresponding to the first sample image as a training target, and train a super-resolution model corresponding to the coding type group in combination with a loss function;
in the loss function, different areas in the second sample image are given different area weights, and the detail complexity of the different areas in the second sample image is different;
And/or, the loss function comprises a noise suppression term, in the noise suppression term, different suppression weights are given to different pixel point areas in the second sample image, wherein the larger the difference between the pixel point areas in the second sample image and the corresponding pixel point areas in the target image is, the larger the suppression weights given to the pixel point areas in the second sample image are, the target image is an image generated by processing the first sample image through the super-resolution model, and the pixel point areas comprise at least one pixel point;
and/or training the loss function adopted by the super-resolution model corresponding to each coding type group to be matched with the coding characteristics of the coding mode corresponding to the coding type group.
The embodiment of the application also provides electronic equipment. As shown in fig. 8, which shows a schematic view of a composition structure of the electronic device, the electronic device includes at least a processor 801 and a memory 802;
The processor 801 is configured to perform the image processing method or the model training method according to any one of the above embodiments;
The memory 802 is used for storing programs required for the processor to perform operations.
It is understood that the electronic device may further comprise a display unit 803 and an input unit 804.
Of course, the electronic device may also have more or fewer components than in fig. 8, without limitation.
The embodiment of the application also provides a computer program product, which comprises computer readable instructions, wherein the computer readable instructions enable the electronic equipment to realize any one of the image processing method or the model training method provided by the embodiment of the application when the computer readable instructions are run on the electronic equipment.
The embodiment of the application also provides a computer readable storage medium, which carries one or more computer programs, and when the one or more computer programs are executed by the electronic device, the electronic device can realize any one of the image processing method or the model training method provided by the embodiment of the application.
It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course by means of special purpose hardware including application specific integrated circuits, special purpose CPUs, special purpose memories, special purpose components, etc. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. But a software program implementation is a preferred embodiment for many more of the cases of the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., comprising several instructions for causing a computer device (which may be a personal computer, a training device, a network device, etc.) to perform the method according to the embodiments of the present application.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device, a data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk (Solid STATE DISK, SSD)), etc.
Claims (12)
1. An image processing method, comprising:
obtaining a decoded first image, and determining a target coding mode corresponding to the first image;
Determining a target super-resolution model corresponding to the target coding mode from a plurality of super-resolution models, wherein the plurality of super-resolution models correspond to different coding type groups, the target coding mode is matched with the coding type group corresponding to the target super-resolution model, the same coding type group comprises at least two coding modes, and the coding modes matched with the same coding type group have at least one same coding characteristic;
Processing the first image by using the target super-resolution model to generate a second image;
The determining the target super-resolution model corresponding to the target coding mode comprises the following steps:
if the target coding mode exists in the coding modes included in each coding type group, determining target super-resolution corresponding to the target coding mode;
If the target coding mode does not exist in the coding modes included in each coding type group, determining a target coding type group with the similarity meeting the requirement between the corresponding coding mode and the target coding mode based on the coding modes included in each coding type group, and determining a target super-resolution model corresponding to the target coding type group.
2. The image processing method according to claim 1, wherein the determining a target super-resolution model corresponding to the target coding scheme from among a plurality of super-resolution models includes:
And determining a target super-resolution model corresponding to the target coding mode based on a mapping list, wherein the mapping list records at least two coding modes respectively included by a plurality of coding type groups and the super-resolution model corresponding to each coding type group.
3. The image processing method according to claim 2, wherein the determining, based on the mapping list, a target super-resolution model corresponding to the target coding mode includes:
if the target coding mode is queried from a mapping list, determining a target super-resolution model corresponding to the target coding mode from the mapping list;
And if the target coding mode is not queried from the mapping list, determining a target coding type group with the similarity of the corresponding coding mode and the target coding mode meeting the requirement from the mapping list, and determining a target super-resolution model corresponding to the target coding type group in the mapping list.
4. The image processing method according to claim 3, wherein the determining the target super-resolution model corresponding to the target coding type group in the mapping list includes:
If a plurality of target coding type groups are determined from the mapping list, a plurality of target super-resolution models corresponding to the target coding type groups in the mapping list are determined;
The processing the first image by using the target super-resolution model to generate a second image includes:
Processing the first image by utilizing each target super-resolution model, and fusing candidate images obtained by processing the first image by each target super-resolution model based on the weight corresponding to each target super-resolution model to obtain a second image;
Or alternatively
Based on model parameters in each target super-resolution model, a comprehensive super-resolution model is constructed, and the first image is processed by using the comprehensive super-resolution model to generate a second image.
5. The image processing method according to claim 2, wherein the super-resolution model recorded in the mapping list uses a first sample image coded by each coding mode in the coding type group as training data, uses a second sample image corresponding to the first sample image as a training target, and trains out a super-resolution model corresponding to the coding type group, and the quality of the second sample image is higher than that of the first sample image.
6. The image processing method according to claim 5, wherein the second sample image corresponding to the first sample image is an uncoded original image corresponding to the first sample image or an optimized image obtained by performing image optimization processing on the original image, and the quality of the original image corresponding to the first sample image is higher than that of the first sample image;
the optimized image is obtained by sharpening the detail area of the original image.
7. The image processing method according to claim 5, wherein different areas in the second sample image are given different area weights in a loss function used for training the super-resolution model, and the detail complexity of the different areas in the second sample image is different;
And/or training a loss function adopted by the super-resolution model to include a noise suppression term, wherein different suppression weights are given to different pixel point areas in the second sample image in the noise suppression term, wherein the larger the difference between the pixel point areas in the second sample image and corresponding pixel point areas in a target image is, the larger the suppression weights given to the pixel point areas in the second sample image are, the pixel point areas comprise at least one pixel point, and the target image is an image generated by processing the first sample image by the super-resolution model;
and/or training the loss function adopted by the super-resolution model corresponding to each coding type group to be matched with the coding characteristics of the coding mode corresponding to the coding type group.
8. A model training method, comprising:
determining super-resolution models to be trained corresponding to at least two coding type groups respectively, wherein the coding modes matched with the same coding type group have at least one same coding characteristic;
For each coding mode corresponding to each coding type group, obtaining a first sample image coded by the coding mode corresponding to the coding type group and a second sample image corresponding to the first sample image, wherein the quality of the second sample image is higher than that of the first sample image;
for each coding type group, taking a first sample image corresponding to the coding type group as training data, taking a second sample image corresponding to the first sample image as a training target, training a super-resolution model corresponding to the coding type group to obtain a trained super-resolution model corresponding to the coding type group, and determining a target super-resolution model corresponding to a target coding mode corresponding to the first image during image processing, wherein the method comprises the following steps:
if the target coding mode exists in the coding modes included in each coding type group, determining target super-resolution corresponding to the target coding mode;
If the target coding mode does not exist in the coding modes included in each coding type group, determining a target coding type group with the similarity meeting the requirement between the corresponding coding mode and the target coding mode based on the coding modes included in each coding type group, and determining a target super-resolution model corresponding to the target coding type group.
9. The model training method according to claim 8, the obtaining a first sample image encoded by the encoding mode corresponding to the encoding type group, and a second sample image corresponding to the first sample image, comprising:
Obtaining an uncoded original image;
encoding the original image by utilizing the encoding mode corresponding to the encoding type group to obtain a first sample image;
and determining the original image or an optimized image obtained by optimizing the original image as a second sample image corresponding to the first sample image.
10. The model training method according to claim 8, wherein the training of the super-resolution model corresponding to the coding type group using the first sample image corresponding to the coding type group as training data and the second sample image corresponding to the first sample image as training target comprises:
Taking a first sample image corresponding to the coding type group as training data, taking a second sample image corresponding to the first sample image as a training target, and training a super-resolution model corresponding to the coding type group by combining a loss function;
In the loss function, different area weights are given to different areas in the second sample image, and the detail complexity of the different areas in the second sample image is different;
And/or the loss function comprises a noise suppression term, in the noise suppression term, different suppression weights are given to different pixel point areas in the second sample image, wherein the larger the difference between the pixel point areas in the second sample image and corresponding pixel point areas in a target image is, the larger the suppression weights given to the pixel point areas in the second sample image are, the target image is an image generated by processing the first sample image through the super-resolution model, and the pixel point areas comprise at least one pixel point;
and/or training the loss function adopted by the super-resolution model corresponding to each coding type group to be matched with the coding characteristics of the coding mode corresponding to the coding type group.
11. An image processing apparatus comprising:
the type determining unit is used for obtaining a decoded first image and determining a target coding mode corresponding to the first image;
The model determining unit is used for determining a target super-resolution model corresponding to the target coding mode from a plurality of super-resolution models, wherein the plurality of super-resolution models correspond to different coding type groups, the target coding mode is matched with the coding type group corresponding to the target super-resolution model, the same coding type group comprises at least two coding modes, and the coding modes matched with the same coding type group have at least one same coding characteristic;
The image processing unit is used for processing the first image by utilizing the target super-resolution model to generate a second image;
The determining the target super-resolution model corresponding to the target coding mode comprises the following steps:
if the target coding mode exists in the coding modes included in each coding type group, determining target super-resolution corresponding to the target coding mode;
If the target coding mode does not exist in the coding modes included in each coding type group, determining a target coding type group with the similarity meeting the requirement between the corresponding coding mode and the target coding mode based on the coding modes included in each coding type group, and determining a target super-resolution model corresponding to the target coding type group.
12. A model training apparatus comprising:
The model determining unit is used for determining super-resolution models to be trained corresponding to at least two coding type groups respectively, and the coding modes matched with the same coding type group have at least one same coding characteristic;
A sample obtaining unit, configured to obtain, for each coding mode corresponding to each coding type group, a first sample image coded by the coding mode corresponding to the coding type group, and a second sample image corresponding to the first sample image, where the quality of the second sample image is higher than that of the first sample image;
The model training unit is used for training super-resolution models corresponding to the coding type groups by taking a first sample image corresponding to the coding type groups as training data and a second sample image corresponding to the first sample image as a training target for each coding type group so as to obtain trained super-resolution models corresponding to the coding type groups, and determining target super-resolution models corresponding to target coding modes corresponding to the first image during image processing, and comprises the following steps:
if the target coding mode exists in the coding modes included in each coding type group, determining target super-resolution corresponding to the target coding mode;
If the target coding mode does not exist in the coding modes included in each coding type group, determining a target coding type group with the similarity meeting the requirement between the corresponding coding mode and the target coding mode based on the coding modes included in each coding type group, and determining a target super-resolution model corresponding to the target coding type group.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510314154.6A CN119831845B (en) | 2025-03-17 | 2025-03-17 | Image processing and model training method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510314154.6A CN119831845B (en) | 2025-03-17 | 2025-03-17 | Image processing and model training method and device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN119831845A CN119831845A (en) | 2025-04-15 |
| CN119831845B true CN119831845B (en) | 2025-07-22 |
Family
ID=95305012
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202510314154.6A Active CN119831845B (en) | 2025-03-17 | 2025-03-17 | Image processing and model training method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119831845B (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113055713A (en) * | 2021-03-08 | 2021-06-29 | Oppo广东移动通信有限公司 | Video image super-resolution method and device, storage medium and electronic equipment |
| CN116503252A (en) * | 2023-04-26 | 2023-07-28 | 北京蔚领时代科技有限公司 | Method for generating image superdivision data set, image superdivision model and training method |
| CN118247146A (en) * | 2024-03-07 | 2024-06-25 | 西安交通大学 | Remote sensing image super-resolution learning method and device based on expert knowledge supervision |
Family Cites Families (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108271024B (en) * | 2013-12-28 | 2021-10-26 | 同济大学 | Image coding and decoding method and device |
| CN111429357B (en) * | 2020-03-31 | 2023-07-28 | 广州市百果园信息技术有限公司 | Training data determining method, video processing method, device, equipment and medium |
| CN111696033B (en) * | 2020-05-07 | 2023-04-28 | 中山大学 | Real image super-resolution model and method based on corner-guided cascaded hourglass network structure learning |
| CN111970513A (en) * | 2020-08-14 | 2020-11-20 | 成都数字天空科技有限公司 | Image processing method and device, electronic equipment and storage medium |
| CN112862681B (en) * | 2021-01-29 | 2023-04-14 | 中国科学院深圳先进技术研究院 | A super-resolution method, device, terminal equipment and storage medium |
| CN116193128A (en) * | 2021-11-15 | 2023-05-30 | 深圳市中兴微电子技术有限公司 | Image processing method and device, storage medium and electronic device |
| CN115115512B (en) * | 2022-06-13 | 2023-10-03 | 荣耀终端有限公司 | A training method and device for image super-resolution network |
| CN116366861A (en) * | 2023-03-30 | 2023-06-30 | 广东博华超高清创新中心有限公司 | Video super-resolution method based on self-coding |
-
2025
- 2025-03-17 CN CN202510314154.6A patent/CN119831845B/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113055713A (en) * | 2021-03-08 | 2021-06-29 | Oppo广东移动通信有限公司 | Video image super-resolution method and device, storage medium and electronic equipment |
| CN116503252A (en) * | 2023-04-26 | 2023-07-28 | 北京蔚领时代科技有限公司 | Method for generating image superdivision data set, image superdivision model and training method |
| CN118247146A (en) * | 2024-03-07 | 2024-06-25 | 西安交通大学 | Remote sensing image super-resolution learning method and device based on expert knowledge supervision |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119831845A (en) | 2025-04-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN114731455B (en) | Apparatus and method using AI metadata related to image quality | |
| CN116233445B (en) | Video codec processing method, device, computer equipment and storage medium | |
| CN115278257B (en) | Image compression method and device, electronic equipment and storage medium | |
| CN107770525B (en) | Image coding method and device | |
| EP3343446A1 (en) | Method and apparatus for encoding and decoding lists of pixels | |
| CN117897736A (en) | Encoder and decoder for machine Video Coding (VCM) | |
| CN113810654A (en) | Image video uploading method and device, storage medium and electronic equipment | |
| US20250287002A1 (en) | Data encoding method and apparatus, data decoding method and apparatus, computer device, and storage medium | |
| CN117857815A (en) | Hybrid inter-coding using autoregressive models | |
| Thakker et al. | Lossy image compression-A comparison between wavelet transform, principal component analysis, K-means and Autoencoders | |
| Pan et al. | JND-LIC: Learned image Compression via just noticeable difference for human visual perception | |
| CN119831845B (en) | Image processing and model training method and device | |
| CN106954065B (en) | Recursive prediction image compression method based on gradient direction histogram | |
| Sun et al. | A novel fractal coding method based on MJ sets | |
| CN118101968A (en) | A convolutional neural network remote sensing image lossless compression method and system based on wavelet subband | |
| CN115802048B (en) | Point cloud attribute compression method based on inter-block prediction and graph Fourier transform | |
| Luo et al. | Super-high-fidelity image compression via hierarchical-roi and adaptive quantization | |
| CN120226355A (en) | Encoding method and apparatus, encoder, code stream, device, and storage medium | |
| KR102593004B1 (en) | Apparatus and method for successive deep image compression | |
| KR20230031797A (en) | Method, apparatus and recording medium for compressing feature map | |
| Luo et al. | Learned lossless compression for jpeg via frequency-domain prediction | |
| CN117391961A (en) | Image quality enhancement model training method, device, computer equipment and storage medium | |
| CN113691818A (en) | Video target detection method, system, storage medium and computer vision terminal | |
| Jiajia et al. | Minimum structural similarity distortion for reversible data hiding | |
| US20250056001A1 (en) | Video compression method, video decoding method, and related apparatuses |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |