Detailed Description
The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.
The terms "first," "second," and the like in the description of the present application, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or otherwise described herein, and that the objects identified by "first," "second," etc. are generally of a type not limited to the number of objects, for example, the first object may be one or more. In addition, "and/or" in the specification means at least one of the connected objects, and the character "/", generally means a relationship in which the associated objects are one kind of "or".
The terms "at least one", and the like in the description of the present application mean that they encompass any one, any two, or a combination of two or more of the objects. For example, at least one of a, b, c (item) may represent "a", "b", "c", "a and b", "a and c", "b and c" and "a, b and c", wherein a, b, c may be single or plural. Similarly, the term "at least two" means two or more, and the meaning of the expression is similar to the term "at least one".
The following terms are used to explain the terms in the embodiments of the present application:
And (3) secondary composition, namely moving the position of the main body contained in the image stored in the electronic equipment to obtain a composition picture of the new main body position.
The image processing method provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.
The image processing method provided by the embodiment of the application can be applied to a scene needing to carry out secondary composition on the image.
The image processing method provided by the embodiment of the application is exemplified by a few specific scenes.
Scene 1-Secondary composition of characters in an image
If the user wants to perform secondary composition on one image, for example, adjust the position of the person 1 in the image 1 relative to the background building, the user may trigger the electronic device to input the image 1 and the matting image corresponding to the image 1 to the first matting model, so that the first matting model outputs the first matting image with the detail feature and the edge feature corresponding to the person 1 based on the input image.
Scene 2-Secondary composition of objects in an image
If the user wants to perform secondary composition on one image, for example, adjust a display position and a display size of one object, for example, object 2, in image 2, the user may trigger the electronic device to input image 2 and the matting image corresponding to image 2 into the first matting model, so that the first matting model outputs the first matting image with detail features and edge features corresponding to object 2 based on the input image.
It should be noted that, the above-mentioned scenes 1 and 2 are only exemplary examples of some scenes where the embodiments of the present application may be applied, and in practical implementation, the embodiments of the present application may be applied to any possible scenes where parameters such as a display size, a display position, etc. of an image element included in an image need to be adjusted, which are not limited herein.
According to the image processing method, the electronic device can input a first image and a second image into a first matting model to obtain a first feature image through the first matting model based on the first image and the second image, the second image is an image obtained based on image elements in the first image, then the electronic device can process the first feature image through the first matting model to obtain at least one of a detail feature image and an edge feature image, and the first matting image is output through the first matting model based on at least one of the detail feature image and the edge feature image and the first feature image, wherein the first matting model is obtained through training based on detail features and edge features of the image. In the scheme, the second image is an image obtained based on the image elements needing to be scratched in the first image, so the second image comprises the image elements needing to be scratched, and then the electronic equipment can process based on the second image and the original first image to further obtain at least one of the detail characteristic image and the edge characteristic image, that is, the first scratched image output by the first scratched model can accurately reserve at least one of the image content of the detail part and the image content of the edge part in the image elements needing to be scratched in the first image on the basis of the second image, so that the first scratched image can meet the requirement of a user on the fineness degree of the scratched image, and the image quality of the scratched image obtained by the electronic equipment is improved.
The main execution body of the image processing method provided in the embodiment of the present application is an image processing apparatus, and the apparatus may be an electronic device, or a functional module or entity in the electronic device. An image processing method provided by an embodiment of the present application will be exemplarily described below using an electronic device as an example.
An embodiment of the present application provides an image processing method, and fig. 1 shows a flowchart of the image processing method provided by the embodiment of the present application. As shown in fig. 1, the image processing method provided by the embodiment of the present application may include the following steps 201 to 203.
Step 201, the electronic device inputs a first image and a second image into a first matting model, and obtains a first feature map based on the first image and the second image through the first matting model.
In the embodiment of the present application, the second image may be an image obtained based on image elements in the first image.
In some embodiments of the present application, the image elements included in the second image may be determined according to a matting requirement of the user on the first image.
In some embodiments of the present application, the second image may have the same image size as the first image.
In some embodiments of the present application, the second image may be a first image corresponding mask (mask) image.
It should be noted that the mask may be a simple black-and-white mask, or may be a complex mask based on the layer content included in the first image.
In some embodiments of the application, the mask may be used to define a display area for image elements for which a matting process is required to be performed from the first image.
For example, assuming that the first image includes a head image of a male user wearing glasses, and the user wants to acquire the head image through the electronic device, the user may first trigger the electronic device to perform a matting process based on the head image, so that the electronic device outputs a black-and-white mask image corresponding to the head image as shown in fig. 2, that is, the second image. The white area in the black-and-white mask image is an image area where the user wants to scratch, namely, a head image of a male user wearing glasses, and the black area is an image area where the user does not need to scratch, namely, a background area in the first image.
In the embodiment of the application, the first matting model is obtained based on training of detail features and edge features of the image.
In some embodiments of the present application, in a case where the electronic device inputs the first image and the second image into the first matting model, the first matting model may first combine the first image and the second image into one image through superposition of image channels, and then perform processing on the combined image to obtain the first feature map.
In some embodiments of the present application, the first matting model may include an input convolution layer and N convolution modules.
In some embodiments of the application, where the electronic device inputs the first image and the second image into the first matting model, the electronic device may first downsample (subsampled) the image combined from the first image and the second image by inputting a convolutional layer to reduce the resolution of the image.
For example, assuming that the resolution of the first image and the second image is 1024×1024, the resolution of the output feature map may be reduced to 512×512 after the downsampling process by the input convolution layer.
In some embodiments of the present application, the first image and the second image may perform matrix calculation in the input convolution layer based on feature=w 0x+b0 to extract features in the input image, thereby outputting a Feature map corresponding to the input convolution layer. Where x may be image data obtained from the first image and the second image, w 0 may be a weight value corresponding to the input convolution layer, and b 0 may be a bias value corresponding to the input convolution layer.
In this way, the electronic device performs downsampling processing on the image input in the model to adjust the size of the input image, so that the calculated amount of processing the image in the model and the number of parameters used for calculating the model can be reduced, and meanwhile, the robustness and generalization capability of the model can be improved.
In some embodiments of the present application, the electronic device may input the feature map output by the input convolution layer to the convolution module in the first matting model, so as to perform a subsequent image processing process on the image.
It should be noted that, the number of convolution modules included in the first matting model and the parameters of each convolution model may be determined according to actual requirements, and the present application is not limited herein.
In some embodiments of the present application, the "acquiring the first feature map based on the first image and the second image" in the above step 201 may be specifically implemented by the following steps 201a to 201 c.
In step 201a, the electronic device performs downsampling processing on the first image and the second image through a first convolution module in the first matting model, and outputs a first feature vector.
Step 201b, the electronic device performs downsampling processing on the i-1 th feature vector through the i-th convolution module in the first matting model, and outputs the i-1 th feature vector.
Where i e [2, N ], and i is an integer and N is the number of convolution modules.
Step 201c, determining the nth feature vector as a first feature map.
In some embodiments of the present application, the electronic device may perform downsampling processing N times based on the feature map output by the input convolution layer through N convolution modules, so as to extract semantic information of the first image and the second image, that is, the feature vectors.
The semantic information of the image may include, but is not limited to, basic information such as an outline and a category of an image element included in the image, and may also include information such as an attribute, a positional relationship, and a context of the image element.
Wherein the attribute of the image element may include, but is not limited to, at least one of color, shape, size.
In some embodiments of the present application, the electronic device may increase the number of convolution kernels and the number of output channels during the down-sampling process of the input image data by the convolution module.
Thus, since each convolution kernel can extract a specific image feature and output a corresponding feature map, more kinds of features can be extracted from the image, and the richness and diversity of feature representation are enhanced, so that the model can better understand and identify different elements and modes in the image, and further loss of image feature information can be avoided.
As shown in fig. 3, if n=4, the electronic device may perform downsampling processing on the first image and the second image input into the first matting model through the input convolution layer to obtain a feature map with a resolution of 512×512, where the feature map is obtained by combining images of 32 channels, secondly, the electronic device may perform downsampling processing on the feature map with a resolution of 512×512 output from the input convolution layer through a first convolution module, i.e. a convolution module 1, to obtain a feature map with a resolution of 256×256, where the feature map is obtained by combining images of 64 channels, i.e. a first feature vector, then, the electronic device may perform downsampling processing on the feature map with a resolution of 128×128 output from a convolution module 1 through a second convolution module, i.e. a convolution module 2, where the feature map with a resolution of 128 is obtained by combining images of 128 channels, and then, the electronic device may perform downsampling processing on the feature map with a resolution of 256 by a feature map with a resolution of 128, i.e. a feature vector of 64, where the feature map with a resolution of 256 is obtained by a convolution module 3, i.e. a feature map with a resolution of 64, where the feature map with a resolution of 256 is obtained by a feature vector of 64, and finally, the feature map with a resolution of 64 is obtained by a feature vector of 32.
In some embodiments of the present application, the first matting model may perform matrix calculation through Feature i=wi(wi-1(…(w1x+b1)…)+bi-1)+bi based on the included ith convolution module to extract features in the input image, so as to output a first Feature map. The Feature i may be a Feature vector output by the ith convolution module, x may be image data received by the first convolution module, such as a Feature map output by an input convolution layer, w i may be a weight value corresponding to the ith convolution module, and b i may be an offset value corresponding to the ith convolution module.
It should be noted that the weight value and the offset value corresponding to each convolution module may be different. The weight value and the bias value may be values determined during the training of the first matting model, and the specific values are not limited herein.
In this way, the electronic device can perform N times of downsampling processing on the image received by the convolution modules through the N convolution modules included in the first matting model, so as to adjust the size of the input image, thereby reducing the calculated amount of processing the image in the model and the number of parameters used for calculating the model, and increasing the robustness and generalization capability of the model.
Step 202, the electronic device processes the first feature map through the first matting model to obtain at least one of a detail feature map and an edge feature map.
In some embodiments of the present application, N deconvolution modules may be included in the first matting model.
In some embodiments of the present application, the electronic device may perform an Upsampling (Upsampling) process on the first feature map by using the deconvolution module to improve the resolution of the first feature map, so that the subsequent first matting image may output an image with the same resolution as the first image and the second image.
In some embodiments of the present application, the detail feature map may be understood as a feature map with image details obtained by performing refined segmentation on the image element that needs to be scratched by the user in the first image.
In some embodiments of the present application, the edge feature map may understand that the edge region of the image element to be scratched in the first image is finely divided, and the obtained feature map has a significantly improved display effect on the edge region.
The above-described fine division may be understood as a method in which the division accuracy is higher than that of the conventional division method. For example, a segmentation method of accurately segmenting the hairline level can be achieved.
The conventional separation method may include, but is not limited to, any one of an image separation method for separating a single image of an RGB image and a corresponding ternary image (trimap).
In some embodiments of the present application, in combination with the steps 201a to 201c, the step 202 of "processing the first feature map to obtain the detailed feature map" may be specifically implemented by the steps 202a1 to 202a3 described below.
In step 202a1, the electronic device performs upsampling processing on the first feature map through a first deconvolution module in the first matting model to obtain a first detail feature vector.
In some embodiments of the application, the first deconvolution module in the first matting model, deconvolution module 1, may be enabled byAnd performing matrix calculation to perform up-sampling processing on the first feature map to obtain a first detail feature vector TransFeature 1. Wherein, The first characteristic diagram can be obtained by the deconvolution module 1, wherein x can be a weight value corresponding to the deconvolution module 1; may be the bias value corresponding to the first deconvolution module.
In some embodiments of the present application, the resolution of the image corresponding to the first detail feature vector may be greater than the resolution of the first feature map.
In some embodiments of the present application, the electronic device may perform the convolution operation by performing an inverse operation with the convolution module, and performing the convolution operation using the transposed convolution kernel to perform an upsampling process on the first feature map, thereby obtaining the first detail feature vector.
In some embodiments of the present application, a deconvolution module in the first matting model may be connected to a convolution module by a skip connection (skip). The skip connection may then be first added linearly based on the detailed eigenvectors output by the connected deconvolution module and the eigenvectors output by the convolution module, and then transferred to a subsequent deconvolution module for processing.
Therefore, the first matting model can enhance the image details in the feature map by fusing the feature maps of different semantic hierarchy information, and promote the prediction details of the images of the feature map.
As shown in fig. 3, after the convolution module 4 outputs the first feature map of 32×32×512, the electronic device may perform upsampling processing on the first feature map by using the first deconvolution module, that is, the deconvolution module 1, to obtain a feature map with a resolution of 64×64, where the feature map is obtained by combining images of 64 channels, that is, the first detail feature vector. Then, the electronic device may perform a linear addition on TransFeature 1 output by the deconvolution module 1 and Feature 3 output by the convolution module 3 through skip1, that is, skip 1= TransFeature 1+Feature3, to be used as an input of the second deconvolution module for performing a subsequent process.
Step 202a2, the electronic device performs upsampling processing on the nth-jth feature vector and the jth-1 detail feature vector through the jth deconvolution module in the first matting model, and outputs the jth detail feature vector.
Wherein j is [2, N ].
In some embodiments of the present application, the deconvolution module performs up-sampling processing on the input image data to increase the resolution of the feature images, and at the same time, the electronic device may compress the dimensions of the output feature images, so as to facilitate the subsequent output of a single matting image by the first matting model by reducing the number of image channels, thereby meeting the application requirements of users.
In some embodiments of the application, the method of reducing the number of image channels may include, but is not limited to, any of color space conversion, 1*1 convolutions.
In some embodiments of the application, the jth deconvolution module in the first matting model may be configured toAnd performing matrix calculation to perform up-sampling processing on the received feature vector to obtain a j-th detail feature vector TransFeature j. Wherein, The x can be the N-j feature vector and the j-1 detail feature vector; the bias value corresponding to the jth deconvolution module may be.
Illustratively, in conjunction with fig. 4A, the second deconvolution module in the first matting model, namely deconvolution module 2, may receive the image data transmitted by skip1, namely skip 1= TransFeature 1+Feature3, and thus deconvolution module 2 may pass throughAnd performing matrix calculation to perform up-sampling processing on the received feature vector corresponding to skip1, thereby obtaining a second detail feature vector TransFeature 2. Wherein, The weight value corresponding to the deconvolution module 2 can be obtained; May be the offset value corresponding to deconvolution module 2. The second detail feature vector TransFeature 2 is a feature map with a resolution of 128×128, and the feature map is obtained by combining images of 32 channels.
Step 202a3, the electronic device determines the nth detail feature vector as a detail feature map.
In some embodiments of the present application, the electronic device may determine the detail feature vector output by the last deconvolution module as the detail feature map.
For example, as shown in fig. 4B in conjunction with fig. 4A, the electronic device may first perform linear addition, that is, skip 2= TransFeature 2+Feature2, on TransFeature 2 output by the deconvolution module 2 and Feature 2 output by the convolution module 2 through skip2, so as to serve as an input of the third deconvolution module, that is, deconvolution module 3. The deconvolution module 3 can then passAnd performing matrix calculation to perform up-sampling processing on the received feature vector corresponding to skip2, thereby obtaining a third detail feature vector TransFeature 3. Wherein, The weight value corresponding to the deconvolution module 3 can be obtained; May be the offset value corresponding to deconvolution module 3. The third detail feature vector TransFeature 3 is a feature map with a resolution of 256×256, and the feature map is obtained by combining images of 16 channels. Then, the electronic device may perform linear addition on Tr a ns Fe a 3 output by the deconvolution module 3 and the Feature 1 output by the convolution module 1 through skip3, that is, skip 3= TransFeature 3+Feature1. The fourth deconvolution module in the first matting model, namely deconvolution module 4, can then receive the image data transmitted by skip3, whereupon deconvolution module 4 can then pass And performing matrix calculation to perform up-sampling processing on the received feature vector corresponding to skip3, thereby obtaining a fourth detail feature vector TransFeature 4. Wherein, The weight value corresponding to the deconvolution module 4 can be obtained; May be the bias value corresponding to deconvolution module 4. The fourth detail feature vector TransFeature 4 is a feature map with a resolution of 512×512, and the feature map is obtained by combining images of 16 channels. The electronic device may then determine the fourth detail feature vector TransFeature 4 as a detail feature map.
Therefore, the electronic equipment can improve the segmentation precision of the first matting model on the detail position of the image element needing matting through the convolution module and the deconvolution module, so that the subsequent matting image with higher precision degree can be directly obtained without manually drawing the edge position of the image element by a user, the steps required to be manually executed by the user can be reduced, and further, in the operation process of the deconvolution module, the feature fusion processing can be carried out on a plurality of feature images, so that the detail of each layer of feature output in the calculation process of the first matting model can be enriched, and the refinement degree can be further improved.
In some embodiments of the present application, after the step 202A2, the image processing method provided in the embodiment of the present application further includes the following step A1 and step A2.
And A1, the electronic equipment performs feature fusion on the j-1 th detail feature vector and the j-th detail feature vector through a j/2 th edge attention module in the first matting model to obtain the j/2 th edge feature vector.
Where j is an even number.
In some embodiments of the present application, the edge attention module may promote segmentation details at edges in an input image, so that the first matting model may determine a central region through high-level features, so that a deconvolution module subsequent to the edge attention module in the first matting model may retrain edge regions of image elements in the image to output a feature map with fine edge effects.
In some embodiments of the present application, the above-described method of feature fusion may include, but is not limited to, at least one of linear multiplication of feature vectors, linear addition of feature vectors.
It should be noted that, the specific feature fusion manner may be determined according to actual requirements, and the present application is not limited herein.
In some embodiments of the present application, since the resolutions of the feature maps corresponding to the jth-1 th detail feature vector and the jth detail feature vector may be different, the first matting model may first perform upsampling processing on the feature map with a smaller resolution, so that the resolutions of feature features received by the edge attention module are all the same, and thus the edge attention module may perform the correlation processing of feature fusion.
In some embodiments of the present application, the first matting model may first perform preprocessing on the feature map corresponding to the jth detail feature vector to output two mask images.
Illustratively, let j=2. The first edge attention module, i.e. the edge attention module 1, in the first matting model may be firstly based on the second detail feature vector output by the second deconvolution module, the feature map of the second detail feature vector may be an image similar to the image shown in fig. 2, the image area between 0 and 1 in the feature map of the second detail feature vector is set to be white, and other areas are set to be black, so as to obtain a mask image, i.e. mask1, shown in fig. 5A, and then the first matting model may perform color inversion processing based on mask1, so as to obtain a mask image, i.e. mask2, shown in fig. 5B. Thus, the inputs to edge attention module 1 are a first detail feature vector, such as TransFeature 1, output by deconvolution module 1, a second detail feature vector, such as TransFeature 2, output by deconvolution module 2, mask1 and mask2.
Further, the edge attention module 1 may perform linear multiplication based on TransFeature 1 and mask2 to preserve the highly deterministic image portion in the center region of the feature map, and may perform linear multiplication based on TransFeature 2 and mask1 to obtain the accurate image portion of the edge region of the image element in the feature map. Then, the edge attention module 1 may perform linear addition on the results of the linear multiplication, that is, edgeAtt 1=TransFeature1*mask2+TransFeature2 ×mask1, and combine the central area and the edge area of the image element in the feature map, so that the 1 st edge feature vector EdgeAtt 1 output may have a certain improvement on the matting refinement degree of the edge position of the image element.
And A2, the electronic equipment carries out up-sampling processing on the Nth-jth feature vector, the jth detail feature vector and the jth/2 edge feature vector through the (j+1) th deconvolution module in the first matting model, and outputs the (j+1) th detail feature vector.
In some embodiments of the application, the j+1st deconvolution module in the first matting model may be configured toAnd performing matrix calculation to perform up-sampling processing on the received feature vector to obtain a j+1th detail feature vector TransFeature j+1. Wherein, The x can be the N-j feature vector, the j detail feature vector and the j/2 edge feature vector; the bias value corresponding to the jth deconvolution module may be.
For example, as shown in fig. 6 in conjunction with fig. 4A, the electronic device may first perform linear addition on TransFeature 2 output by the deconvolution module 2, feature 2 output by the convolution module 2, and EdgeAtt 1 output by the attention edge module 1 through skip2, that is, skip 2= TransFeature 2+Feature2+EdgeAtt1, to serve as input to the third deconvolution module, that is, deconvolution module 3. The deconvolution module 3 can then passAnd performing matrix calculation to perform up-sampling processing on the received feature vector corresponding to skip2, thereby obtaining a third detail feature vector TransFeature 3. Wherein, The weight value corresponding to the deconvolution module 3 can be obtained; May be the offset value corresponding to deconvolution module 3. The third detail feature vector TransFeature 3 is a feature map with a resolution of 256×256, and the feature map is obtained by combining images of 16 channels.
In the execution process of the step 202A2, the step A1, and the step A2, the electronic device sequentially executes the processing corresponding to the modules based on the connection sequence of the modules in the first matting model, and at the same time, all the deconvolution modules at even positions in the N deconvolution models execute the content of the step 202A2, and all the deconvolution modules at odd positions execute the content of the step A1 and the step A2. The electronic device may then determine the detail feature vector output by the last deconvolution module as a detail feature map.
Illustratively, in the case that the first matting model includes 6 deconvolution modules and 3 edge attention modules, the electronic device may perform step 202A2 by deconvolution module 2, then perform step A1 by edge attention module 1, perform step A2 by deconvolution module 3, further perform step 202A2 by deconvolution module 4, then perform step A1 by edge attention module 2, perform step A2 by deconvolution module 5, further perform step 202A2 by deconvolution module 6, and then perform step A1 by edge attention module 3 after performing step 202 A1.
As shown in fig. 7, the electronic device may linearly add TransFeature 3 output by the deconvolution module 3 and Feature 1 output by the convolution module 1 through skip3, i.e., skip 3= TransFeature 3+Feature1. The fourth deconvolution module in the first matting model, namely deconvolution module 4, can then receive the image data transmitted by skip3, whereupon deconvolution module 4 can then passAnd performing matrix calculation to perform up-sampling processing on the received feature vector corresponding to skip3, thereby obtaining a fourth detail feature vector TransFeature 4. Wherein, The weight value corresponding to the deconvolution module 4 can be obtained; May be the bias value corresponding to deconvolution module 4. The fourth detail feature vector TransFeature 4 is a feature map with a resolution of 512×512, and the feature map is obtained by combining images of 16 channels. The electronic device may then determine the fourth detail feature vector TransFeature 4 as a detail feature map.
In some embodiments of the present application, in combination with the steps 201a to 201c, the step 202 of "processing the first feature map to obtain the edge feature map" may be specifically implemented by the steps 202b1 and 202b2 described below.
Step 202b1, the electronic device performs feature fusion on the j-1 th detail feature vector and the j-th detail feature vector through the j/2 th edge attention module in the first matting model to obtain the j/2 th edge feature vector.
In the embodiment of the application, the j-1 th detail feature vector is a detail feature vector output by the j-1 th deconvolution module in the first matting model, and the j-th detail feature vector is a detail feature vector output by the j-th deconvolution module in the first matting model.
Wherein j is e [2, N ], and j is an even number.
It should be noted that, for the specific process of obtaining the j/2 th edge feature vector in the electronic device in the step 202b1, reference may be made to the related descriptions in the step A1 and the step A2, which are not repeated herein.
Step 202b2, the electronic device determines an edge feature vector corresponding to the last edge attention module in the first matting model as an edge feature map.
In some embodiments of the present application, the last edge attention module may be an nth/2 edge attention module in case of even number of N, and the last edge attention module may be an (N-1)/2 edge attention module in case of odd number of N.
For example, as shown in fig. 7, the electronic device may set the image area between 0 and 1 in the feature map of TransFeature 3 to white and the other areas to black based on TransFeature 3 output by the deconvolution module 3 through the second edge attention module, i.e., the edge attention module 2, in the first matting model to obtain a mask image, i.e., mask3, similar to that shown in fig. 5A, and then the first matting model may perform color inversion processing based on mask3 to obtain a mask image, i.e., mask4, similar to that shown in fig. 5B. The inputs to the edge attention module 2 are then TransFeature 3 from the deconvolution module 3, transFeature 4 from the deconvolution module 4, mask3 and mask4. The edge attention module 2 may perform linear multiplication based on TransFeature 3 and mask4 to preserve the image portion with higher certainty in the center region of the feature map, and may perform linear multiplication based on TransFeature 4 and mask3 to obtain the image portion of the edge region of the image element in the accurate feature map, and then, the edge attention module 2 may perform linear addition on the result of the linear multiplication based on EdgeAtt 2=TransFeature3*mask4+TransFeature4 ×mask3 to combine the center region and the edge region of the image element in the feature map, so that the second edge feature vector EdgeAtt 2 output may have a certain improvement in the matting refinement degree of the edge position of the image element. Meanwhile, the electronic device may determine the edge feature vector corresponding to the last edge attention module, that is, edgeAtt 2 corresponding to the edge attention module 2 as the edge feature map.
Therefore, the electronic equipment can improve the segmentation precision of the detail position and the edge position of the image element needing to be scratched through the deconvolution module and the edge attention module, so that the subsequent scratched image with higher precision can be directly obtained without manually drawing the edge position of the image element by a user, the steps required to be manually executed by the user can be reduced, and further, the characteristic fusion processing can be carried out on a plurality of characteristic images in the operation process of the deconvolution module and the edge attention module, so that the detail of each layer of characteristic output in the calculation process of the first scratched model can be enriched, and the refinement degree can be further improved.
Step 203, the electronic device outputs a first matting image through the first matting model based on at least one of the detail feature image and the edge feature image and the first feature image.
In some embodiments of the application, an electronic device may include an output convolution layer.
In some embodiments of the present application, the electronic device may perform a feature fusion process and an upsampling process based on the first feature map, the detail feature map, and the edge feature map by outputting the convolution layer to output a first matting image.
For example, in conjunction with fig. 7, the electronic device may first perform linear addition on the detail Feature map TransFeature 4 output by the deconvolution module 4, the Feature 0 output by the input convolution layer, that is, the Feature map acquired based on the first image and the second image, and the edge Feature map E g 2 E output by the attention edge module 2 through skip4, that is, skip 4= TransFeature 4+Feature0+EdgeAtt2, to be input by the output convolution layer. Thus, the output convolution layer may passAnd performing matrix calculation to perform up-sampling processing on the received feature vector corresponding to skip4 to obtain a feature vector TransFeature 0, namely the first matting image. Wherein, The weight value corresponding to the output convolution layer can be obtained; May be the offset value corresponding to the output convolutional layer. The resolution of the first matting image is 512×512, and the first matting image is obtained by combining images of 4 channels.
In some embodiments of the present application, the electronic device may perform feature fusion processing and upsampling processing based on the edge feature map and the first feature map by outputting the convolutional layer, so as to output a corresponding first matting image.
Illustratively, in conjunction with fig. 7, the electronic device may perform linear addition on the edge Feature map EdgeAtt 2 output by the attention edge module 2 and the Feature 0 output by the input convolution layer, that is, the Feature map acquired based on the first image and the second image, that is, skip 4=feature 0+EdgeAtt2, to serve as an input of the output convolution layer, that is, the first matte image.
In some embodiments of the present application, the electronic device may perform feature fusion processing and upsampling processing based on the detail feature map and the first feature map by outputting the convolution layer to output a corresponding first matting image.
For example, as shown in fig. 4B, the electronic device may perform linear addition, that is, skip 4= TransFeature 4+Feature0, on the detail Feature map TransFeature 4 output by the deconvolution module 4 and the Feature 0 output by the input convolution layer, that is, the Feature map acquired based on the first image and the second image, to serve as the input of the output convolution layer, that is, the first matte image.
In the image processing method provided by the embodiment of the application, since the second image is an image obtained based on the image elements needing to be scratched in the first image, the second image comprises the image elements needing to be scratched, and then the electronic equipment can process based on the second image and the original first image to further obtain at least one of the detail characteristic image and the edge characteristic image, that is, the first scratched image output by the first scratched model can accurately reserve at least one of the image content at the detail position and the image content at the edge position in the image elements needing to be scratched in the first image on the basis of the second image, so that the first scratched image can meet the requirement of a user on the fineness degree of the scratched image, and the image quality of the scratched image acquired by the electronic equipment is improved.
In some embodiments of the present application, as shown in fig. 8 in conjunction with fig. 1, before the step 201, the image processing method provided in the embodiment of the present application further includes the following steps 301 to 303.
Step 301, the electronic device performs matting processing on the first image element in the first image to obtain a second matting image.
In an embodiment of the present application, the second matte image may include a first image element.
In some embodiments of the present application, the first image element may be an image element that needs to be scratched by a user in the first image.
In some embodiments of the present application, the above-described matting process may be an "immediate matting" method.
It should be noted that, the algorithm mainly related to the above-mentioned "immediate matting" method is a significant object detection (Salient Obejct Detection, sod) algorithm on the end side, which refers to a method for dividing the main image elements in the image to be matting so as to extract the rough division result of the image elements selected by the user.
In some embodiments of the present application, in a case where the electronic device displays a first image that needs to be scratched, the electronic device may receive a second input of a user to the first image, so as to determine a first image element according to an image area corresponding to the second input, thereby performing a scratching process on the first image element.
In some embodiments of the present application, the second input may include any of a click input, a slide input, a long press input, or other feasibility inputs, which are not limited in embodiments of the present application.
In some embodiments of the present application, the above-described click input may be any number of click inputs.
In some embodiments of the present application, the above-described sliding operation may be a sliding operation in any direction, such as sliding upward, sliding downward, sliding leftward or sliding rightward, and the like, which is not limited in the embodiments of the present application.
In some embodiments of the present application, the electronic device may determine the image area corresponding to the second input according to the input parameter of the second input, so as to determine the first image element.
In some embodiments of the present application, the input parameters may include, but are not limited to, at least one of an input location, an input trajectory.
An electronic device is exemplified as a mobile phone. As shown in fig. 9A, the mobile phone may display an image 1 that the user wants to edit by displaying an image editing interface 10, in the image 1, a man holding a beverage cup is in a building with a background, if the user wants to adjust the position of the man in the image 1, as shown in fig. 9B, the user may press "man" in the image 1 for a long time to trigger the mobile phone to determine that the user wants to obtain the image of "man" in the image 1 according to the pressed position of the user, and then may perform a matting on the image area of "man" selected by the user by using an "immediate matting" method, so as to perform a matting on the gray area shown in fig. 9C, so as to obtain a matting image corresponding to "man", that is, a second matting image.
Step 302, the electronic device segments the first image based on the image elements in the first image, so as to obtain at least two segmented images.
In some embodiments of the application, the electronic device may determine the image element contained in the first image based on the type of the image element.
In some embodiments of the present application, the types of image elements described above may include, but are not limited to, any of a person, an animal, an object, a building, a natural environment.
In some embodiments of the present application, when the image element includes a plurality of characters, the electronic device may divide the images corresponding to the plurality of characters according to information such as gender, age group, character action, character wearing, and the like, which respectively correspond to the plurality of characters.
In some embodiments of the present application, in the case where the image element includes an animal, the electronic device may segment an image corresponding to the animal according to information such as a kind, a color, a posture, and the like of the animal.
In some embodiments of the present application, in the case where the image element includes an object, the electronic device may segment the image corresponding to the object according to information such as a shape, a color, a use, and the like of the object.
In some embodiments of the application, the electronic device may categorize the image elements in the first image by an end-to-end object detector.
The end-to-end object detector described above may be a dino classification method, or may be a classification method that is modified based on the dino classification method, for example. For example, the dino model is module trimmed (finetune) so that the trimmed classification model may be more suitable for classifying image elements of images stored in an album.
In some embodiments of the application, the electrons are also segmented by segmenting a Model (SEGMENT ANYTHING Model, SAM) image elements in the first image based on the type of image element.
In some embodiments of the present application, the at least two segmented images may include a segmented image corresponding to at least one image element included in the first image, and a segmented image corresponding to an image background of the first image.
In some embodiments of the present application, after the electronic device obtains at least two divided images, the electronic device may add a tag to each divided image including an image element to label image information corresponding to each image element.
In some embodiments of the present application, the image information may include, but is not limited to, at least one of type, location.
Illustratively, in connection with fig. 9A, the handset may determine that image 1 includes three image elements, such as a "man", a beverage cup, and a building in the background, based on the dino method, and then the handset may segment the three image elements based on the sam model to obtain four segmented images, such as segmented image 1 through segmented image 4. The divided image 1 may be a divided image including "men", the divided image 2 may be a divided image including beverage cups, the divided image 3 may be a divided image including a building, and the divided image 4 may be a divided image corresponding to the remaining image except for three image elements in the image 1. The handset may then add labels for representing the types of image elements for the segmented images comprising the image elements, namely segmented image 1 to segmented image 3, respectively. For example, the mobile phone may add a "person" tag to the divided image 1 to indicate that the image elements included in the divided image 1 are persons, or the mobile phone may add a specific tag for describing person information such as "young", "man", etc. to the divided image 1.
Step 303, the electronic device determines the second image based on the area relation between the first segmentation image and the second matting image.
In the embodiment of the present application, the first segmented image may be a segmented image including a first image element of the at least two segmented images.
In some embodiments of the application, the method of the electronic device determining the first segmented image may include, but is not limited to, any of the following:
The electronic equipment can determine a first image element which is needed to be scratched by a user according to the input position of the second input, and then determine a first divided image according to the image element contained in each divided image in at least two divided images;
The electronic device may determine a first divided image according to an input position of the second input and an original display position corresponding to the at least two divided images;
The electronic device may further determine a maximum external frame based on an external frame corresponding to the first image element in the second matt image and external frames corresponding to separate images including the first image element in the at least two divided images, so as to perform image division on the image element included in the maximum external frame, and determine the first divided image.
In some embodiments of the present application, the electronic device may represent the display position of the circumscribed frame in the image by coordinates of the vertex of the upper left corner and the vertex of the lower right corner in the circumscribed frame.
In some embodiments of the present application, the electronic device may determine the above-described maximum circumscribed frame based on the following formula set (1).
The (new_x1, new_y1) may be a top left corner of the largest bounding box, (new_x2, new_y2) may be a top right corner of the largest bounding box, (sod_x1, sod_y1) may be a top left corner of the bounding box of the first image element in the second matting image, (sod_x2, sod_y2) may be a top right corner of the first image element in the second matting image, (label_x1, label_y1) may be a top left corner of the partition image corresponding to the first image element, (label_x2, label_y2) may be a top right corner of the partition image corresponding to the first image element, img_width may be a width of the first image, and img_height may be a height of the first image.
In some embodiments of the present application, in the above formula set (1), the electronic device may determine a lateral minimum value and a longitudinal minimum value based on coordinates of a vertex of an upper left corner of the second matting image and coordinates between vertices of an upper left corner of the separation image corresponding to the first image element to be combined into coordinates of a vertex of an upper left corner of a maximum bounding box, wherein since the coordinates of the vertex of the upper left corner of the maximum bounding box are located inside the first image, the numerical value of the coordinates of the vertex of the upper left corner of the maximum bounding box is minimum to be 0, that is, located at the upper left corner of the first image.
In some embodiments of the present application, in the above formula set (1), the electronic device may determine a lateral maximum value and a longitudinal maximum value based on coordinates between a vertex of a lower right corner of the second matting image and a vertex of a lower right corner of the separation image corresponding to the first image element to be combined into coordinates of a vertex of a lower right corner of the maximum bounding box, wherein, since the coordinates of the vertex of the lower right corner of the maximum bounding box are located inside the first image, an abscissa of the vertex of the lower right corner of the maximum bounding box is smaller than a width of the first image, and an ordinate of the vertex of the lower right corner of the maximum bounding box is smaller than a height of the first image.
Illustratively, assume that the first image has a width of 12 and a height of 15. In connection with fig. 9C, the second matting image may be the matting image 11 as shown in fig. 10A, the mobile phone may establish a coordinate system based on the upper left corner of the matting image 11 as the origin of coordinates, the x axis in the lateral direction and the y axis in the longitudinal direction, and then the mobile phone may pass through the coordinates (sod_x1, sod_y1) of the vertex a of the upper left corner of the external frame 12, such as (5, 7), and the coordinates (sod_x2, sod_y2) of the vertex B of the lower right corner, such as (10, 12), to represent the display position of the external frame 12. The divided image corresponding to the first image element may be the divided image 13 as shown in fig. 10B, the mobile phone may establish a coordinate system based on the upper left corner of the divided image 13 as the origin of coordinates, the lateral direction as the x-axis, and the longitudinal direction as the y-axis, and then the mobile phone may indicate the display position of the external frame 14 by the coordinates (label_x1, label_y1) of the vertex C of the upper left corner of the external frame 14, such as (5, 6) and the coordinates (label_x2, label_y2) of the vertex D of the lower right corner such as (10, 12). The handset may then perform the following calculations:
new_x1=max(0,min(5,5))=5
new_y1=max(0,min(7,6))=6
new_x2=min(12,max(10,10))=10
new_y2=min(15,max(12,12))=12
to determine the coordinates of the top left corner of the largest bounding box as (5, 6) and the coordinates of the bottom right corner as (10, 12), i.e., the coordinates of the top of bounding box 14. Then, the mobile phone may determine the circumscribed frame 14 as a maximum circumscribed frame, and perform image segmentation based on the image elements included in the maximum circumscribed frame, to determine the first segmented image.
In some embodiments of the present application, the electronic device may determine the first segmented image as the second image when the multiple relationship between the first segmented image and the second matting image is within a preset range, and may determine the second matting image as the second image when the multiple relationship between the first segmented image and the second matting image is not within the preset range.
Illustratively, it is assumed that the maximum connected domain area of the second matting image is area1, and the maximum connected domain area of the first segmentation image is area2.
That is, the electronic device may determine the second matting image as the second image in the case where the area of the first divided image is greater than 1.75 times the area of the second matting image, may determine the second matting image as the second image in the case where the area of the first divided image is less than 0.5 times the area of the second matting image, and may determine the first divided image as the second image in the case where the area of the first divided image is 0.5 to 1.75 times the area of the second matting image.
It should be noted that the above 0.5 times and 1.75 times are only for illustration, and specific numerical settings may be determined according to the needs of the user in practical applications, and the present application is not limited thereto.
In the embodiment of the application, the electronic equipment can acquire a plurality of matting results through a plurality of different matting modes, so that the situation that the stored matting results contain image parts which are not needed by a user or do not completely contain the image parts which are needed by the user in the process of directly acquiring the matting results through one matting mode is avoided. Meanwhile, the electronic equipment can determine the second image used subsequently through the ratio of the direct areas of the first segmentation image and the second matting image so as to avoid the situation that the difference between two different separation results is overlarge, and the deviation between the matting result and the content required by the user is overlarge.
In some embodiments of the present application, as shown in fig. 11 in conjunction with fig. 1, after the step 203, the image processing method provided in the embodiment of the present application further includes the following steps 401 to 403.
Step 401, the electronic device displays a first matting image in a first image.
Step 402, the electronic device receives a first input of a first matting image.
In some embodiments of the present application, the first input may be an input that triggers the electronic device to adjust a display position of the first matte image in the first image.
In some embodiments of the present application, the first input may include any of a click input, a slide input, a gesture input, or other feasibility input, which is not limited in embodiments of the present application.
In some embodiments of the present application, the above-described click input may be any number of click inputs.
In some embodiments of the present application, the above-described sliding operation may be a sliding operation in any direction, such as sliding upward, sliding downward, sliding leftward or sliding rightward, and the like, which is not limited in the embodiments of the present application.
In some embodiments of the present application, the gesture input may include, but is not limited to, at least one of a click gesture, a slide gesture, a drag gesture, a pressure recognition gesture, a long press gesture, an area change gesture, a double press gesture, a double click gesture, a specific gesture input, or other possible gesture inputs, and a specific gesture input form may be determined according to actual requirements, which is not limited in embodiments of the present application.
Step 403, the electronic device responds to the first input, adjusts the display position of the first matting image in the first image, and fills the original position of the first matting image with the image.
In some embodiments of the present application, in the process of adjusting the display position of the first matte image by the electronic device, the electronic device may adjust the display size of the first matte image according to the input of the user.
In some embodiments of the present application, after the electronic device adjusts the display position of the first matte image according to the input of the user, if the electronic device does not receive the input of the user within the preset period of time, the electronic device may start image filling for the original position of the first matte image. For example, the preset time period may be 3s.
It should be noted that, the setting of the preset time period may be a default setting time length of the electronic device, or may be set according to a user's requirement, which is not limited herein.
In some embodiments of the present application, the electronic device may perform image filling on the original position of the first matte image through the diffusion model. Wherein, the Diffusion model (Stable Diffusion) can include an image inside filling (Inpaint) module and an image outside filling (Outpaint) module.
It should be noted that, the diffusion model is a deep learning method based on the diffusion model, and the conversion from text to image can be realized by learning the mapping relationship between a large number of images and text data. In addition, the structure and texture consistency of the image can be maintained in the process of generating the image based on the diffusion model;
the working principle of the Inpaint module can be summarized as two steps, namely, firstly, providing an image to be repaired by a user, and drawing a mask on the image to indicate the region to be repaired;
The above-described processing flow of the Outpaint module for images is similar to that of the Inpaint module, but the algorithm processing objective of the Outpaint module is to complement the details of the region outside the picture.
For example, in combination with 9A, if the user wants to adjust the display position of "man" in the image 1 and reduce the display size of "man", the user may press the display area of "man" in the image 1 for a long time to trigger the mobile phone to obtain the corresponding first matting image based on "man" through the first matting model, then, the user may touch the input through two fingers, trigger the mobile phone to reduce the display size of the first matting image according to the user's requirement, and then, the user may drag the input to trigger the mobile phone to adjust the display position of the first matting image. The handset may then image fill the original location of the first matting image to display an image 15 as shown in fig. 12.
Illustratively, as shown in fig. 13A, assuming that the first image is an image 16 including a teddy bear and a building, and the teddy bear is located in a right edge region of the image 16, if the user wants to adjust a display position of the teddy bear to be a center region of the image 16, the user can press the display region of the teddy bear in the image 16 for a long time to trigger the mobile phone to obtain a corresponding matting image based on the teddy bear through the first matting model, and then the user can adjust the display position of the matting image corresponding to the teddy bear by dragging the input to trigger the mobile phone. Then, the mobile phone may fill the background building in the original position of the key image corresponding to the teddy bear, and may supplement the teddy bear to be complete based on the content in the key image of the teddy bear, so as to display the image 17 as shown in fig. 13B.
In the embodiment of the application, the electronic equipment can adjust the display parameters of the first matting image in the first image and can carry out image filling on the original display position of the first matting image, so that a user can carry out secondary composition on the first image, and the adjusted first image meets the actual demands of the user. Therefore, the convenience of image processing of the electronic equipment can be improved.
In some embodiments of the present application, in combination with the foregoing steps 401 to 403, the image processing method provided in the embodiment of the present application further includes the following step 404.
Step 404, the electronic device adjusts the display parameters of the first matte image based on the display parameters of the adjacent areas of the first matte image after the position adjustment.
In some embodiments of the present application, the display parameters described above may include, but are not limited to, any of image brightness, image saturation, image color level.
In some embodiments of the present application, before adjusting the display parameter of the first matte image, the electronic device may perform image processing on an edge area of the first matte image to weaken a feeling of a break of an edge of the first matte image.
In some embodiments of the present application, the electronic device may calculate the center brightness of the image based on the main area of the first image and the original image background area corresponding to the first matting image after the display position is adjusted, so as to adjust the tone curve of the first matting image, so that the brightness of the first matting image is similar to the brightness of the adjacent area.
In some embodiments of the present application, the electronic device may represent the first image by using the LAB color model, and then extract an L channel, that is, a luminance channel, in the LAB color model to define an average value of luminance values of all pixels in the first image by using an average value mean_light of luminance, so that different color level outputs may be adjusted by using different luminance average values, so that a main color of the first matting image is adapted to a color of an adjacent area after the adjustment. Wherein,
Wherein the Lab color model is composed of three elements, L represents brightness, and A and B are two color channels. Wherein, the color of A is from dark green to gray to bright pink, namely from low brightness value to medium brightness value to high brightness value, and the color of B is from bright blue to gray to yellow, namely from low brightness value to medium brightness value to high brightness value.
In some embodiments of the present application, the electronic device may adjust a display parameter of the first matte image when a difference between the luminance average value of the first matte image and the luminance average value of an adjacent area of the first matte image after the position adjustment is greater than a preset threshold. For example, the preset threshold may be 50.
For example, assuming that the average brightness value of the first matting image is mean_light and the average brightness value of the adjacent region of the first matting image after the position adjustment is mean_light2, the electronic device may adjust the tone scale range of the first matting image in the case of mean_light < mean_light2-50 or mean_light > mean_light 2+50.
It should be noted that, the setting of the preset threshold may be determined according to actual requirements, which is not limited herein.
In the embodiment of the application, the electronic equipment can adjust the display parameters of the first matting image so that the moved first matting image can be matched with the display parameters of the adjacent moved area, thereby improving the display effect of the adjusted first image.
In some embodiments of the present application, the image processing method provided in the embodiments of the present application further includes the following steps 501 to 503.
Step 501, the electronic device inputs the sample image into the second matting model, and performs matting processing on the second image element in the sample image to obtain a third matting image.
Step 502, the electronic device calculates a loss value based on the third matting image and the target matting image.
In the embodiment of the application, the loss value characterizes the detail characteristic difference and the edge characteristic difference of the third matting image and the target matting image.
And step 503, the electronic equipment adjusts model parameters in the second matting model based on the loss value to obtain a first matting model.
In some embodiments of the present application, the sample image may be a sufficiently large number of images. For example 5000 sample images.
It should be noted that the number of specific sample images may be determined according to the actual requirement in the model training process, and the present application is not limited herein.
In some embodiments of the present application, the second matting model has a structure substantially the same as that of the first matting model, and specific reference may be made to the description of the foregoing embodiments, which are not repeated herein.
In some embodiments of the present application, the target matting image may be a matting image obtained by manual drawing, which has a higher degree of refinement.
The smaller the difference between the third matting image and the target matting image is, the closer the display effect of the third matting image and the matting image required by the user is, and the smaller the loss value is.
In some embodiments of the present application, the electronic device may iteratively update the weight value and the bias value in the second matting model until convergence by using a back propagation algorithm based on the loss value, so as to train to obtain the first matting model.
Illustratively, it is assumed that the second matting model includes 1 input convolution layer, 4 convolution modules, 4 deconvolution modules, 1 output convolution layer, and 2 edge attention modules. The second matting model can perform 5 times of downsampling processing on an input image through the contained 1 input convolution layers and 4 convolution modules, then input a first feature image output by a fourth convolution module into the deconvolution module 1 in the contained 4 deconvolution modules to perform upsampling processing to obtain a detail feature vector 1, then the electronic equipment can perform linear addition on the detail feature vector 1 output by the deconvolution module 1 and the feature vector 3 output by the convolution module 3 through jump connection 1 to serve as input of the deconvolution module 2 to perform upsampling processing to obtain the detail feature vector 2, then the electronic equipment can input the detail feature vector 1 and the detail feature vector 2 into the edge attention module 1 to perform feature fusion to output the edge feature vector 1, then the jump connection 2 can perform linear addition on the feature vector 2 output by the deconvolution module 2 and the edge feature vector 1 to obtain the detail feature vector 3 through jump connection 1, then the electronic equipment can perform linear addition on the detail feature vector 1 and the feature vector 3 output by the deconvolution module 3 to the edge attention module 1 to obtain the detail feature vector 4, then the detail feature vector 4 can be input into the edge attention module 1 to perform feature fusion on the detail feature vector 4, and the edge attention module 4 can perform feature fusion on the detail feature vector 2 based on the detail feature vector 2 output by the deconvolution module, and the feature vector output by the input convolution layer and the edge feature vector 2 output by the edge attention module 2 are linearly added, and up-sampling processing is performed by the output convolution layer, so that a third matting image is output.
Further, the second detection model may calculate a loss value based on the third matting image and the target matting image. And based on the loss value, iteratively updating the weight value and the offset value in the second matting model by adopting a back propagation algorithm until convergence so as to train and obtain the first matting model.
Specific examples are given below for each scenario to which the embodiment of the present application is applicable in combination with each implementation scheme of the embodiment of the present application, and implementation procedures in each scenario of the embodiment of the present application are described. The electronic device is taken as a mobile phone for illustration.
Example 1a secondary composition was performed for a person in an image.
If the user wants to perform secondary composition on one image, for example, adjust the position of the person 1 in the image 1 relative to the background building, the user may trigger the electronic device to acquire a matted image corresponding to the image 1 based on the existing "immediate matted image", and input the image 1 and the matted image corresponding to the image 1 to the first matted model, so that the first matted model outputs a first matted image with detail features and edge features, such as matted image 1, corresponding to the person 1 based on the input image. Then, the electronic device can adjust the display position of the matt image 1 according to the input of the user to the matt image 1, and can fill the image of the original display position of the matt image 1 according to the background building, and adjust the parameters such as brightness and the like of the moved matt image 1, so that the user can obtain the required image 1, and the secondary composition of the image 1 is realized.
Example 2a secondary patterning is performed for an object in an image.
If the user wants to perform secondary composition on an image, for example, adjust a display position and a display size of an object, for example, object 2, in image 2, the user may trigger the electronic device to obtain a matted image corresponding to image 2 based on the existing "immediate matted", and input image 2 and the matted image corresponding to image 2 to the first matted model, so that the first matted model outputs a first matted image, such as matted image 2, corresponding to object 2, having detail features and edge features based on the input image. Then, the electronic device can adjust the display position and the display size of the matt image 2 according to the input of the user on the matt image 2, and can fill the image of the original display position of the matt image 2 according to other image elements in the image 2, and adjust parameters such as brightness and the like of the moved matt image 2, so that the user can obtain the required image 2, and the secondary composition of the image 2 is realized.
The embodiment of the application provides a matting model training method, and fig. 14 shows a flowchart of the matting model training method provided by the embodiment of the application, and the method can be applied to electronic equipment. As shown in fig. 14, the method for training a matting model provided by the embodiment of the present application may include the following steps 601 to 603.
Step 601, the electronic device inputs the sample image into a second matting model, and performs matting processing on the second image element in the sample image to obtain a third matting image.
Step 602, the electronic device calculates a loss value based on the third matting image and the target matting image.
In the embodiment of the application, the loss value characterizes the detail characteristic difference and the edge characteristic difference of the third matting image and the target matting image.
And 603, the electronic equipment adjusts model parameters in the second matting model based on the loss value to obtain a first matting model.
It should be noted that, the training method of the matting model in the foregoing steps 601 to 603 may be specifically referred to the descriptions of the steps 501 to 503 in the foregoing embodiments, and the disclosure is not repeated herein.
In the method for training the matting model provided by the embodiment of the application, the detail characteristic and the edge characteristic of the matting image can be combined on the basis of the second matting model to train to obtain the first matting model, so that in the actual use process, the electronic equipment can input the second image and the original first image which are obtained on the basis of the image elements needing matting into the first matting model, so that a user can acquire the first matting image with at least one of the detail characteristic and the edge characteristic, and the first matting image can meet the requirement of the user on the fineness of the matting image, and the image quality of the matting image acquired by the image processing device can be improved.
It should be noted that, the foregoing method embodiments, or various possible implementation manners in the method embodiments may be executed separately, or may be executed in combination with each other on the premise that no contradiction exists, and may be specifically determined according to actual use requirements, which is not limited by the embodiment of the present application.
It should be noted that, in the image processing method provided in the embodiment of the present application, the execution subject may be an image processing apparatus. In the embodiment of the present application, an image processing apparatus is described by taking an example of an image processing method performed by the image processing apparatus.
Fig. 15 shows a schematic diagram of one possible configuration of an image processing apparatus involved in an embodiment of the present application. As shown in fig. 15, the image processing apparatus 70 may include an acquisition module 71, a processing module 72, and an output module 73;
The acquiring module 71 is configured to input a first image and a second image into a first matting model, acquire a first feature image based on the first image and the second image through the first matting model, where the second image is an image obtained based on image elements in the first image;
the processing module 72 is configured to process, through the first matting model, the first feature map acquired by the acquiring module 71, to obtain at least one of a detail feature map and an edge feature map;
An output module 73, configured to output, through the first matting model, a first matting image based on the first feature image and at least one of the detail feature image and the edge feature image acquired by the acquisition module;
the first matting model is obtained through training based on detail features and edge features of the image.
In a possible implementation manner, the image processing apparatus 70 provided by the embodiment of the present application further includes a determining module, the processing module 72 is further configured to perform a matting process on a first image element in the first image before the obtaining module 71 inputs the first image and the second image into the first matting model, to obtain a second matting image, where the second matting image includes the first image element, and divide the first image based on the image element in the first image to obtain at least two divided images, and the determining module is configured to determine the second image based on an area relationship between the first divided image and the second matting image obtained by the processing module 72, where the first divided image is a divided image including the first image element in the at least two divided images.
In one possible implementation manner, the obtaining module 71 is specifically configured to perform downsampling processing on the first image and the second image through a first convolution module in the first matting model to output a first feature vector, perform downsampling processing on an i-1 th feature vector through an i-th convolution module in the first matting model to output an i-th feature vector, where i is e [2, N ], where i is an integer, and N is the number of convolution modules, and determine the N-th feature vector as the first feature map.
In one possible implementation manner, the processing module 72 is specifically configured to perform upsampling processing on the first feature map through a first deconvolution module in the first matting model to obtain a first detail feature vector, perform upsampling processing on the nth-jth feature vector and the jth-1 detail feature vector through a jth deconvolution module in the first matting model to output the jth detail feature vector, where j e [2, N ], and determine the nth detail feature vector as the detail feature map.
In a possible implementation manner, the processing module 72 is further configured to perform upsampling processing on the nth-j feature vector and the jth-1 detail feature vector, output the jth detail feature vector, perform feature fusion on the jth-1 detail feature vector and the jth detail feature vector through a jth/2 edge attention module in the first matting model to obtain the jth/2 edge feature vector, where j is an even number, and perform upsampling processing on the nth-j feature vector, the jth detail feature vector and the jth/2 edge feature vector through a jth+1 deconvolution module in the first matting model, and output the jth+1 detail feature vector.
In one possible implementation manner, the processing module 72 is specifically configured to perform feature fusion on the j-1 th detail feature vector and the j-1 th detail feature vector through the j/2 th edge attention module in the first matting model to obtain the j/2 th edge feature vector, where the j-1 th detail feature vector is a detail feature vector output by the j-1 th deconvolution module in the first matting model, and the j-th detail feature vector is a detail feature vector output by the j-th deconvolution module in the first matting model, where j e [2, n ] is an even number, and determine an edge feature vector corresponding to the last edge attention module in the first matting model as an edge feature map.
In a possible implementation manner, the image processing apparatus 70 provided by the embodiment of the present application further includes a display module and a receiving module, where the display module is configured to display the first matting image in the first image after outputting the first matting image based on the first feature image and at least one of the detail feature image and the edge feature image, the receiving module is configured to receive a first input of the first matting image displayed by the display module, and the processing module 72 is further configured to adjust a display position of the first matting image in the first image and perform image filling on an original position of the first matting image in response to the first input received by the receiving module.
In a possible implementation manner, the processing module 72 is further configured to adjust a display parameter of the first matte image based on a display parameter of an adjacent area of the first matte image after the position is adjusted.
In a possible implementation manner, the image processing apparatus 70 provided by the embodiment of the present application further includes a calculating module, the acquiring module 71 is further configured to input a sample image into the second matting model, perform matting processing on the second image element in the sample image to obtain a third matting image, the calculating module is configured to calculate a loss value, where the loss value characterizes a detail feature difference and an edge feature difference of the third matting image and the target matting image, based on the third matting image and the target matting image acquired by the acquiring module 71, and the processing module 72 is further configured to adjust a model parameter in the second matting model based on the loss value to obtain the first matting model.
In the image processing device provided by the embodiment of the application, since the second image is an image obtained based on the image elements needing to be scratched in the first image, the second image comprises the image elements needing to be scratched, and then the image processing device can process based on the second image and the original first image to further obtain at least one of the detail feature image and the edge feature image, that is, the first scratched image output by the first scratched model can accurately reserve at least one of the image content at the detail position and the image content at the edge position in the image elements needing to be scratched in the first image on the basis of the second image, so that the first scratched image can meet the requirement of a user on the fineness degree of the scratched image.
The image processing device in the embodiment of the application can be an electronic device, or can be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. The electronic device may be a Mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a Mobile internet appliance (Mobile INTERNET DEVICE, MID), an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a robot, a wearable device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc., and may also be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, etc., which are not particularly limited in the embodiments of the present application.
The image processing apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.
The image processing device provided by the embodiment of the present application can implement each process implemented by the above method embodiment, and in order to avoid repetition, details are not repeated here.
Optionally, as shown in fig. 16, the embodiment of the present application further provides an electronic device 90, which includes a processor 91 and a memory 92, where the memory 92 stores a program or instructions that can be executed by the processor 91, and the program or instructions implement the steps of the embodiment of the image processing method when executed by the processor 91, and achieve the same technical effects, so that repetition is avoided and no further description is given here.
The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.
Fig. 17 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
The electronic device 100 includes, but is not limited to, a radio frequency unit 101, a network module 102, an audio output unit 103, an input unit 104, a sensor 105, a display unit 106, a user input unit 107, an interface unit 108, a memory 109, and a processor 110.
Those skilled in the art will appreciate that the electronic device 100 may further include a power source (e.g., a battery) for powering the various components, and that the power source may be logically coupled to the processor 110 via a power management system to perform functions such as managing charging, discharging, and power consumption via the power management system. The electronic device structure shown in fig. 17 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown in the drawings, or may combine some components, or may be arranged in different components, which will not be described in detail herein.
The processor 110 is configured to input a first image and a second image into a first matting model, obtain a first feature image based on the first image and the second image through the first matting model, and process the first feature image through the first matting model to obtain at least one of a detail feature image and an edge feature image, and output the first matting image based on the at least one of the detail feature image and the edge feature image and the first feature image through the first matting model, wherein the first matting model is obtained by training based on the detail feature and the edge feature of the image.
Optionally, the processor 110 is further configured to perform matting processing on a first image element in the first image to obtain a second matting image, where the second matting image includes the first image element, and segment the first image based on the image element in the first image to obtain at least two segmented images, where the at least two segmented images include a segmented image corresponding to at least one image element included in the first image and a segmented image corresponding to an image background of the first image, respectively, and determine the second image based on an area relationship between the first segmented image and the second matting image, where the first segmented image is a segmented image including the first image element in the at least two segmented images.
Optionally, the processor 110 is specifically configured to perform downsampling processing on the first image and the second image through a first convolution module in the first matting model to output a first feature vector, perform downsampling processing on the i-1 th feature vector through an i-th convolution module in the first matting model to output an i-th feature vector, where i is e [2, N ], where i is an integer, and N is the number of convolution modules, and determine the N-th feature vector as the first feature map.
Optionally, the processor 110 is specifically configured to perform upsampling processing on the first feature map through a first deconvolution module in the first matting model to obtain a first detail feature vector, perform upsampling processing on the nth-jth feature vector and the jth-1 detail feature vector through a jth deconvolution module in the first matting model to output a jth detail feature vector, where j is [2, N ], and determine the nth detail feature vector as the detail feature map.
Optionally, the processor 110 is further configured to perform upsampling processing on the N-j th feature vector and the j-1 th detail feature vector, output the j-th detail feature vector, perform feature fusion on the j-1 th detail feature vector and the j-th detail feature vector through a j/2 th edge attention module in the first matting model to obtain a j/2 th edge feature vector, where j is an even number, and perform upsampling processing on the N-j th feature vector, the j-th detail feature vector and the j/2 th edge feature vector through a j+1 th deconvolution module in the first matting model, and output the j+1 th detail feature vector.
Optionally, the processor 110 is specifically configured to perform feature fusion on the j-1 th detail feature vector and the j-1 th detail feature vector through the j/2 th edge attention module in the first matting model to obtain the j/2 th edge feature vector, where the j-1 th detail feature vector is a detail feature vector output by the j-1 th deconvolution module in the first matting model, and the j-th detail feature vector is a detail feature vector output by the j-th deconvolution module in the first matting model, where j is [2, n ], and j is an even number, and determine an edge feature vector corresponding to the last edge attention module in the first matting model as an edge feature map.
Optionally, the display unit 106 is configured to display the first matting image in the first image after outputting the first matting image based on the first feature image and at least one of the detail feature image and the edge feature image, the user input unit 107 is configured to receive a first input to the first matting image, and the display unit 106 is further configured to adjust a display position of the first matting image in the first image in response to the first input, and perform image filling on an original position of the first matting image.
Optionally, the processor 110 is further configured to adjust a display parameter of the first matte image based on a display parameter of an adjacent area of the adjusted first matte image.
Optionally, the processor 110 is further configured to input the sample image into a second matting model, perform matting processing on the second image element in the sample image to obtain a third matting image, calculate a loss value based on the third matting image and the target matting image, characterize a detail feature difference and an edge feature difference of the third matting image and the target matting image, and adjust model parameters in the second matting model based on the loss value to obtain the first matting model.
In the electronic device provided by the embodiment of the application, since the second image is an image obtained based on the image elements needing to be scratched in the first image, the second image comprises the image elements needing to be scratched, and then the electronic device can process based on the second image and the original first image to further obtain at least one of the detail feature image and the edge feature image, that is, the first scratched image output by the first scratched model can accurately reserve at least one of the image content at the detail position and the image content at the edge position in the image elements needing to be scratched in the first image on the basis of the second image, so that the first scratched image can meet the requirement of a user on the fineness degree of the scratched image, and the image quality of the scratched image acquired by the electronic device is improved.
The electronic device provided by the embodiment of the application can realize each process realized by the embodiment of the method and can achieve the same technical effect, and in order to avoid repetition, the description is omitted here.
The beneficial effects of the various implementation manners in this embodiment may be specifically referred to the beneficial effects of the corresponding implementation manners in the foregoing method embodiment, and in order to avoid repetition, the description is omitted here.
It should be appreciated that in embodiments of the present application, the input unit 104 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 1042, the graphics processor 1041 processing image data of still pictures or video obtained by an image capturing device (e.g. a camera) in a video capturing mode or an image capturing mode. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 107 includes at least one of a touch panel 1071 and other input devices 1072. The touch panel 1071 is also referred to as a touch screen. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.
Memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a first memory area storing programs or instructions and a second memory area storing data, wherein the first memory area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 109 may include volatile memory or nonvolatile memory, or the memory 109 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATE SDRAM, DDRSDRAM), enhanced Synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCH LINK DRAM, SLDRAM), and Direct random access memory (DRRAM). Memory 109 in embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.
The processor 110 may include one or more processing units, and optionally the processor 110 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, and the like, and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 110.
The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above method embodiment, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.
Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.
The embodiment of the application further provides a chip, which comprises a processor and a communication interface, wherein the communication interface is coupled with the processor, and the processor is used for running programs or instructions to realize the processes of the embodiment of the method, and can achieve the same technical effects, so that repetition is avoided, and the description is omitted here.
It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.
Embodiments of the present application provide a computer program product stored in a storage medium, where the program product is executed by at least one processor to implement the respective processes of the above method embodiments, and achieve the same technical effects, and for avoiding repetition, a detailed description is omitted herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.