CN114897913A - A portrait segmentation method, device, computer equipment and storage medium - Google Patents
A portrait segmentation method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN114897913A CN114897913A CN202210501867.XA CN202210501867A CN114897913A CN 114897913 A CN114897913 A CN 114897913A CN 202210501867 A CN202210501867 A CN 202210501867A CN 114897913 A CN114897913 A CN 114897913A
- Authority
- CN
- China
- Prior art keywords
- convolution
- picture
- image
- processing
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
Abstract
Description
技术领域technical field
本发明涉及图像处理技术领域,尤其涉及一种人像分割方法、装置、计算机设备及存储介质。The present invention relates to the technical field of image processing, and in particular, to a portrait segmentation method, device, computer equipment and storage medium.
背景技术Background technique
伴随着图像融合和数字处理技术的愈发成熟,也对人脸、物体等对象替换技术提出了更高的要求,如何使得替换后的对象效果自然,达到看不出人为替换的成分,比如电影中就要求人脸替换技术具有更高的精度和更精细的纹理细节,这样才能保证人物的逼真性。With the maturity of image fusion and digital processing technology, higher requirements are also put forward for the replacement technology of objects such as faces and objects. In order to ensure the fidelity of the characters, the face replacement technology is required to have higher precision and finer texture details.
针对上述问题,研宄人员利用深度神经网络进行目标对象分割。如基于颜色的彩色数字图像目标分割技术,主要利用人工神经网络方法,利用颜色空间对人脸图像的感兴趣或非感兴趣像素进行分割。随后还提出基于深度卷积神经网络,即通过CNN的多任务联合学习算法实现人脸对齐和分割,该算法通过设计的残差模块,实现了跨层特征的协同融合,提升了分割的效果。In response to the above problems, researchers use deep neural networks to segment target objects. For example, the color-based color digital image target segmentation technology mainly uses artificial neural network method to segment the interesting or non-interesting pixels of the face image by using the color space. Subsequently, a deep convolutional neural network is also proposed, that is, the multi-task joint learning algorithm of CNN is used to realize face alignment and segmentation. The algorithm realizes the collaborative fusion of cross-layer features through the designed residual module and improves the effect of segmentation.
但是,上述算法整体依赖于低级视觉信息和人为辅助信息进行浅层语义分割,缺少关键的模型训练和深层语义信息,导致上述算法对于图像背景多变化的鲁棒性极差,无法完成复杂的目标对象分割任务。However, the above algorithms as a whole rely on low-level visual information and artificial auxiliary information for shallow semantic segmentation, and lack key model training and deep semantic information, resulting in extremely poor robustness of the above algorithms to changes in the image background and unable to complete complex goals. object segmentation task.
发明内容SUMMARY OF THE INVENTION
本发明的目的是提供一种人像分割方法、装置、计算机设备及存储介质,旨在解决现有分割算法对于图像背景多变化的鲁棒性极差,无法完成复杂的人像分割任务的问题。The purpose of the present invention is to provide a portrait segmentation method, device, computer equipment and storage medium, which aims to solve the problem that the existing segmentation algorithms have extremely poor robustness to changes in the image background and cannot complete complex portrait segmentation tasks.
第一方面,本发明实施例了提供一种人像分割方法,其包括:In a first aspect, an embodiment of the present invention provides a method for segmenting a portrait, which includes:
将待分割图片输入EfficientNet网络进行多次维度缩放处理,分别输出对应的降维图片;Input the image to be segmented into the EfficientNet network for multiple dimension scaling, and output the corresponding dimension-reduced images respectively;
将最后两次输出的降维图片输入注意力融合模块进行特征融合得到融合图片,其中最后两次输出的降维图片为7×7×192图片和7×7×320图片;The last two output dimensionality reduction pictures are input into the attention fusion module for feature fusion to obtain fusion pictures, of which the last two output dimensionality reduction pictures are 7×7×192 pictures and 7×7×320 pictures;
将所述融合图片输入条状注意力模块进行条带化处理得到输出图像;Inputting the fused image into a strip attention module for striping processing to obtain an output image;
对所述输出图像中的目标对象进行分割,输出目标分割图像。Segment the target object in the output image, and output the target segmented image.
第二方面,本发明实施例了提供一种人像分割装置,其包括:。In a second aspect, an embodiment of the present invention provides a device for segmenting a portrait, which includes: .
维度缩放单元,用于将待分割图片输入EfficientNet网络进行多次维度缩放处理,分别输出对应的降维图片;The dimension scaling unit is used to input the image to be segmented into the EfficientNet network for multiple dimension scaling processing, and output the corresponding dimension-reduced images respectively;
特征融合单元,用于将最后两次输出的降维图片输入注意力融合模块进行特征融合得到融合图片,其中最后两次输出的降维图片为7×7×192图片和7×7×320图片;The feature fusion unit is used to input the last two output dimensionality reduction pictures into the attention fusion module for feature fusion to obtain a fusion picture, where the last two output dimensionality reduction pictures are 7×7×192 pictures and 7×7×320 pictures ;
条带化处理单元,用于将所述融合图片输入条状注意力模块进行条带化处理得到输出图像;a striping processing unit, configured to input the fused image into a strip attention module for striping processing to obtain an output image;
分割单元,用于对所述输出图像中的目标对象进行分割,输出目标分割图像。The segmentation unit is configured to segment the target object in the output image, and output the target segmented image.
第三方面,本发明实施例又提供了一种计算机设备,其包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述第一方面所述的人像分割方法。In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer The program implements the portrait segmentation method described in the first aspect above.
第四方面,本发明实施例还提供了一种计算机可读存储介质,其中所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行上述第一方面所述的人像分割方法。In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when executed by a processor, the computer program causes the processor to execute the above-mentioned first step. The portrait segmentation method described on the one hand.
本发明实施例公开了一种人像分割方法、装置、计算机设备及存储介质。该方法包括将待分割图片输入EfficientNet网络进行多次维度缩放处理,分别输出对应的降维图片;将最后两次输出的降维图片输入注意力融合模块进行特征融合得到融合图片,其中最后两次输出的降维图片为7×7×192图片和7×7×320图片;将融合图片输入条状注意力模块进行条带化处理得到输出图像;对输出图像中的目标对象进行分割,输出目标分割图像。本发明实施例通过对网络结构进行改进,具有提高人像分割精准度的优点。Embodiments of the present invention disclose a method, device, computer equipment and storage medium for portrait segmentation. The method includes inputting the image to be segmented into the EfficientNet network for multiple dimension scaling processing, and outputting corresponding dimension-reduced images respectively; inputting the last two output dimension-reduced images into an attention fusion module for feature fusion to obtain a fusion image, wherein the last two output images are fused. The output dimensionality reduction pictures are 7 × 7 × 192 pictures and 7 × 7 × 320 pictures; the fusion picture is input into the strip attention module for striping processing to obtain the output image; the target object in the output image is segmented, and the output target Split the image. The embodiment of the present invention has the advantage of improving the accuracy of portrait segmentation by improving the network structure.
附图说明Description of drawings
为了更清楚地说明本发明实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present invention, which are of great significance to the art For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.
图1为本发明实施例提供的人像分割方法的流程示意图;1 is a schematic flowchart of a portrait segmentation method provided by an embodiment of the present invention;
图2为本发明实施例提供的人像分割方法的子流程示意图;2 is a schematic sub-flow diagram of a portrait segmentation method provided by an embodiment of the present invention;
图3为本发明实施例提供的人像分割方法的又一子流程示意图;3 is a schematic diagram of another sub-flow of the portrait segmentation method provided by an embodiment of the present invention;
图4为本发明实施例提供的人像分割方法的又一子流程示意图;4 is a schematic diagram of another sub-flow of the portrait segmentation method provided by an embodiment of the present invention;
图5为本发明实施例提供的人像分割装置的示意性框图;FIG. 5 is a schematic block diagram of a portrait segmentation apparatus provided by an embodiment of the present invention;
图6为本发明实施例提供的计算机设备的示意性框图。FIG. 6 is a schematic block diagram of a computer device according to an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It is to be understood that, when used in this specification and the appended claims, the terms "comprising" and "comprising" indicate the presence of the described features, integers, steps, operations, elements and/or components, but do not exclude one or The presence or addition of a number of other features, integers, steps, operations, elements, components, and/or sets thereof.
还应当理解,在此本发明说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本发明。如在本发明说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It is also to be understood that the terminology used in this specification of the present invention is for the purpose of describing particular embodiments only and is not intended to limit the present invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural unless the context clearly dictates otherwise.
还应当进一步理解,在本发明说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should further be understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items .
请参阅图1,图1为本发明实施例提供的人像分割方法的流程示意图;Please refer to FIG. 1, which is a schematic flowchart of a portrait segmentation method provided by an embodiment of the present invention;
如图1所示,该方法包括步骤S101~S104。As shown in FIG. 1 , the method includes steps S101 to S104.
S101、将待分割图片输入EfficientNet网络进行多次维度缩放处理,分别输出对应的降维图片;S101. Input the image to be segmented into the EfficientNet network to perform multiple dimension scaling processing, and output the corresponding dimension-reduced images respectively;
该步骤中,EfficientNet网络包含7个卷积模块,In this step, the EfficientNet network contains 7 convolution modules,
S102、将最后两次输出的降维图片输入注意力融合模块进行特征融合得到融合图片,其中最后两次输出的降维图片为7×7×192图片和7×7×320图片;S102, inputting the last two output dimensionality reduction pictures into an attention fusion module for feature fusion to obtain a fusion picture, wherein the last two output dimensionality reduction pictures are 7×7×192 pictures and 7×7×320 pictures;
S103、将融合图片输入条状注意力模块进行条带化处理得到输出图像;S103, inputting the fused image into a strip attention module for striping processing to obtain an output image;
S104、对输出图像中的目标对象进行分割,输出目标分割图像。S104: Segment the target object in the output image, and output the target segmented image.
本实施例采用AttaNet网络框架,将EfficientNet网络作为核心Backbone,可对宽度width、深度depth和分辨率resolution三个维度进行平衡,从而提高AttaNet网络的精度和速度。This embodiment adopts the AttaNet network framework and uses the EfficientNet network as the core Backbone, which can balance the three dimensions of width, depth and resolution, thereby improving the accuracy and speed of the AttaNet network.
通过EfficientNet网络对待分割图片依次进行7个卷积模块的维度缩放处理,分别输出对应的降维图片,然后通过注意力融合模块(AFM)将最后两张降维图片进行特征融合并得到融合图片,再通过条状注意力模块(SAM)对融合图片进行条带化处理得到输出图像,输出图像中采用不同色彩特征区分出不同的对象,根据目标对象的特征(即人像特征)进行目标分割,即可输出目标分割图像。The EfficientNet network is used to perform dimension scaling of seven convolution modules in turn on the images to be segmented, and output the corresponding dimension-reduced images respectively, and then the last two dimension-reduced images are fused through the attention fusion module (AFM) to obtain the fusion image. Then, the fused image is striped through the strip attention module (SAM) to obtain the output image. Different color features are used in the output image to distinguish different objects, and the target is segmented according to the characteristics of the target object (ie, portrait features), that is, The target segmentation image can be output.
在一实施例中,如图2所示,步骤S101包括:In one embodiment, as shown in FIG. 2 , step S101 includes:
S201、将224×224×32图片输入第一卷积模块进行卷积处理,输出112×112×32图片;S201. Input the 224×224×32 picture into the first convolution module for convolution processing, and output the 112×112×32 picture;
其中,第一卷积模块的卷积层数为1,卷积核为3×3;即224×224×32图片在第一卷积模块中进行1次卷积处理并输出112×112×32图片;Among them, the number of convolution layers of the first convolution module is 1, and the convolution kernel is 3 × 3; that is, 224 × 224 × 32 pictures are convolutional processed once in the first convolution module and output 112 × 112 × 32 picture;
S202、将112×112×32图片输入第二卷积模块进行卷积处理,输出56×56×24图片;S202, input the 112×112×32 pictures into the second convolution module for convolution processing, and output 56×56×24 pictures;
其中,第二卷积模块的卷积层数为3,卷积核为3×3;即112×112×32图片在第二卷积模块中进行3次卷积处理,最后输出56×56×24图片;Among them, the number of convolution layers of the second convolution module is 3, and the convolution kernel is 3 × 3; that is, the 112 × 112 × 32 image is subjected to 3 convolution processing in the second convolution module, and the final output is 56 × 56 × 24 pictures;
S203、将56×56×24图片输入第三卷积模块进行卷积处理,输出28×28×40图片;S203, input the 56×56×24 picture into the third convolution module for convolution processing, and output the 28×28×40 picture;
其中,第三卷积模块的卷积层数为2,卷积核为5×5;即56×56×24图片在第三卷积模块中进行2次卷积处理,最后输出28×28×40图片;Among them, the number of convolution layers of the third convolution module is 2, and the convolution kernel is 5 × 5; that is, the 56 × 56 × 24 image is subjected to 2 convolution processing in the third convolution module, and the final output is 28 × 28 × 40 pictures;
S204、将28×28×40图片输入第四卷积模块进行卷积处理,输出28×28×80图片;S204, input the 28×28×40 picture into the fourth convolution module for convolution processing, and output the 28×28×80 picture;
其中,第四卷积模块的卷积层数为3,卷积核为3×3;即28×28×40图片在第四卷积模块中进行3次卷积处理,最后输出28×28×80图片;Among them, the number of convolution layers in the fourth convolution module is 3, and the convolution kernel is 3 × 3; that is, the 28 × 28 × 40 image is subjected to 3 convolution processing in the fourth convolution module, and the final output is 28 × 28 × 80 pictures;
S205、将28×28×80图片输入第五卷积模块进行卷积处理,输出7×7×192图片;S205, input the 28×28×80 picture into the fifth convolution module for convolution processing, and output the 7×7×192 picture;
其中,第五卷积模块的卷积层数为7,卷积核为5×5;即28×28×80图片在第五卷积模块中进行7次卷积处理,最后输出7×7×192图片;Among them, the number of convolution layers of the fifth convolution module is 7, and the convolution kernel is 5 × 5; that is, the 28 × 28 × 80 image is subjected to 7 convolution processing in the fifth convolution module, and the final output is 7 × 7 × 192 pictures;
S206、将7×7×192图片输入第六卷积模块进行卷积处理,输出7×7×320图片。S206: Input the 7×7×192 picture into the sixth convolution module for convolution processing, and output the 7×7×320 picture.
其中,第六卷积模块的卷积层数为1,卷积核为3×3,即7×7×192图片在第六卷积模块中进行1次卷积处理,最后输出7×7×192图片。Among them, the number of convolution layers of the sixth convolution module is 1, and the convolution kernel is 3 × 3, that is, 7 × 7 × 192 pictures are convolutional processed once in the sixth convolution module, and finally output 7 × 7 × 192 pictures.
通过步骤S201-S206的7个卷积模块的降维处理,更大限度地提高了网络的整体性能。Through the dimensionality reduction processing of the seven convolution modules in steps S201-S206, the overall performance of the network is maximized.
在一实施例中,如图3所示,步骤S102包括:In one embodiment, as shown in FIG. 3 , step S102 includes:
S301、对7×7×192图片进行卷积处理,得到第一图片;S301. Perform convolution processing on a 7×7×192 picture to obtain a first picture;
该步骤中,采用积卷核3×3的卷积层进行卷积处理;In this step, a convolution layer with a convolution kernel of 3×3 is used for convolution processing;
S302、将7×7×192图片进行上采样处理后再进行卷积、池化和激活处理,并进行权值赋值后得到第二图片;S302, perform upsampling processing on the 7×7×192 picture, then perform convolution, pooling and activation processing, and perform weight assignment to obtain a second picture;
该步骤中,上采样处理后依次进行Concate函数处理、1×1卷积核的卷积处理、Relu激活函数处理、Global pool池化处理、1×1卷积处理及Sigmoid激活函数处理,再赋予权值α后得到第二图片;In this step, after the upsampling processing, the Concate function processing, the convolution processing of the 1×1 convolution kernel, the Relu activation function processing, the global pool pooling processing, the 1×1 convolution processing and the Sigmoid activation function processing are sequentially performed, and then the After the weight α, the second picture is obtained;
S303、将第一图片与第二图片进行乘积运算后再进行上采样处理,得到第三图片;S303, performing a product operation on the first picture and the second picture and then performing upsampling processing to obtain a third picture;
该步骤中,采用图像乘法运算Multply进行乘积运算;In this step, adopt image multiplication operation Multply to carry out product operation;
S304、对7×7×320图片进行卷积处理,得到第四图片;S304, performing convolution processing on the 7×7×320 picture to obtain a fourth picture;
该步骤中,采用积卷核3×3的卷积层进行卷积处理;In this step, a convolution layer with a convolution kernel of 3×3 is used for convolution processing;
S305、对第四图片进行卷积、池化和激活处理,并进行权值赋值后得到第五图片;S305, performing convolution, pooling and activation processing on the fourth picture, and performing weight assignment to obtain the fifth picture;
该步骤中,对第四图片依次进行Concate函数处理、1×1卷积核的卷积处理、Relu激活函数处理、Global pool池化处理、1×1卷积处理及Sigmoid激活函数处理,再赋予权值1-α后得到第五图片;In this step, Concate function processing, 1×1 convolution kernel convolution processing, Relu activation function processing, Global pool pooling processing, 1×1 convolution processing, and Sigmoid activation function processing are sequentially performed on the fourth picture, and then assigned After the weight is 1-α, the fifth picture is obtained;
S306、将第四图片与第五图片进行乘积运算后再进行上采样处理,得到第六图片;S306, performing the product operation on the fourth picture and the fifth picture and then performing upsampling processing to obtain the sixth picture;
S307、将第三图片与第六图片进行特征融合,得到融合图片。S307. Perform feature fusion of the third picture and the sixth picture to obtain a fusion picture.
本实施例中,通过注意力融合模块按步骤S301-S307的过程将7×7×192图片和7×7×320图片进行特征融合,实现了在保证精度的条件下,速度进一步提升。需要说明的是,本实施例在注意力融合模块中引入Global注意力机制进行权值赋值,结合权重进行特征融合,大大提升了融合图片的特征精准度。In this embodiment, the attention fusion module performs feature fusion on the 7×7×192 picture and the 7×7×320 picture according to the process of steps S301-S307, so as to further improve the speed under the condition of ensuring the accuracy. It should be noted that, in this embodiment, the global attention mechanism is introduced into the attention fusion module to assign weights, and the weights are combined to perform feature fusion, which greatly improves the feature accuracy of the fused image.
在一实施例中,如图4所示,步骤S103包括:In one embodiment, as shown in FIG. 4 , step S103 includes:
S401、对融合图片分别进行三次卷积操作,分别得到第一卷积图像、第二卷积图像和第三卷积图像;S401. Perform three convolution operations on the fused image respectively to obtain a first convolution image, a second convolution image and a third convolution image respectively;
该步骤中,可采用积卷核3×3的卷积层对融合图片分别进行三次卷积操作,得到第一卷积图像、第二卷积图像和第三卷积图像;In this step, a convolution layer with a convolution kernel of 3×3 can be used to perform three convolution operations on the fused image respectively, to obtain a first convolution image, a second convolution image and a third convolution image;
S402、对第一卷积图像进行维度转置处理,得到转置图像;S402, performing dimension transposition processing on the first convolutional image to obtain a transposed image;
S403、对第二卷积图像进行条带化处理和维度排列处理,再与转置图像进行批处理,得到注意力矩阵;S403, performing striping processing and dimension arrangement processing on the second convolution image, and then batch processing with the transposed image to obtain an attention matrix;
该步骤中,第二卷积图像通过Striping条带化处理和Reshape维度排列处理再与转置图像进行Affinity批处理后,得到N×W的注意力矩阵,去掉了H方向,大幅降低了垂直方向全局上下文编码的复杂性;In this step, after the second convolutional image is processed by Striping and Reshape dimension arrangement, and then Affinity batch processing with the transposed image, an N×W attention matrix is obtained, the H direction is removed, and the vertical direction is greatly reduced. The complexity of global context encoding;
S404、对第三卷积图像进行条带化处理和维度转置处理,并与注意力矩阵相乘后进行排列处理,再与融合图片相加得到输出图像。S404 , performing striping processing and dimension transposing processing on the third convolutional image, multiplying the third convolutional image with the attention matrix, and performing arrangement processing, and then adding the image with the fusion image to obtain an output image.
本实施例通过条状注意力模块对融合图片进行条带化处理并输出图像,输出图像中精准划分出了各个对象的区域,以便于后续进行精准分割。In this embodiment, the fused image is striped by the strip attention module, and the image is output. The output image accurately divides the regions of each object, so as to facilitate subsequent accurate segmentation.
在一实施例中,步骤S104包括:In one embodiment, step S104 includes:
获取输出图像中各个对象的区域边缘信息,采用语义分割网络根据目标对象的区域边缘信息对目标对象进行分割处理,得到目标分割图像。Obtain the regional edge information of each object in the output image, and use the semantic segmentation network to segment the target object according to the regional edge information of the target object to obtain the target segmentation image.
本实施例采用语义分割网络,基于输出图像中的精准划分出的各个对象的区域,根据待分割的目标对象区域边缘信息对目标对象进行分割处理,可得到高精准度的目标分割图像。In this embodiment, a semantic segmentation network is used to perform segmentation processing on the target object based on the accurately divided regions of each object in the output image and the edge information of the target object region to be segmented, so that a highly accurate target segmentation image can be obtained.
在一实施例中,该人像分割方法还包括:In one embodiment, the portrait segmentation method further includes:
按如下损失函数优化人像分割网络模型:Optimize the portrait segmentation network model according to the following loss function:
Loss=Length+λ·Region;Loss=Length+λ·Region;
其中,Length表示距离函数,λ表示超参数,Region表示区域函数。Among them, Length represents the distance function, λ represents the hyperparameter, and Region represents the region function.
本实施例中,在进行小目标的分割时,由于背景与前景存在明显的不均衡,背景往往大大多于前景,致使损失函数由背景主导,分割效果不佳,故本实施例引入交叉熵损失函数,以提高小目标的分割效果。In this embodiment, when the small target is segmented, due to the obvious imbalance between the background and the foreground, the background is often much larger than the foreground, so that the loss function is dominated by the background, and the segmentation effect is not good. Therefore, this embodiment introduces cross entropy loss. function to improve the segmentation effect of small objects.
本发明实施例还提供一种人像分割装置,该人像分割装置用于执行前述人像分割方法的任一实施例。具体地,请参阅图5,图5是本发明实施例提供的人像分割装置的示意性框图。An embodiment of the present invention further provides a portrait segmentation apparatus, which is used for performing any of the foregoing embodiments of the portrait segmentation method. Specifically, please refer to FIG. 5 , which is a schematic block diagram of an apparatus for segmenting a human portrait provided by an embodiment of the present invention.
如图5所示,人像分割装置500,包括:维度缩放单元501、特征融合单元502、条带化处理单元503以及分割单元504。As shown in FIG. 5 , the
维度缩放单元501,用于将待分割图片输入EfficientNet网络进行多次维度缩放处理,分别输出对应的降维图片;The
特征融合单元502,用于将最后两次输出的降维图片输入注意力融合模块进行特征融合得到融合图片,其中最后两次输出的降维图片为7×7×192图片和7×7×320图片;The
条带化处理单元503,用于将融合图片输入条状注意力模块进行条带化处理得到输出图像;The
分割单元504,用于对输出图像中的目标对象进行分割,输出目标分割图像。The
该装置采用AttaNet网络框架,将EfficientNet网络作为核心Backbone,对宽度width、深度depth和分辨率resolution三个维度进行平衡,从而提高AttaNet网络的精度和速度;通过对网络结构进行改进,具有提高人像分割精准度的优点。The device adopts the AttaNet network framework, takes the EfficientNet network as the core Backbone, and balances the three dimensions of width, depth and resolution, thereby improving the accuracy and speed of the AttaNet network; by improving the network structure, it has the ability to improve portrait segmentation. The advantage of accuracy.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
上述人像分割装置可以实现为计算机程序的形式,该计算机程序可以在如图6所示的计算机设备上运行。The above-mentioned portrait segmentation apparatus can be implemented in the form of a computer program, and the computer program can be executed on a computer device as shown in FIG. 6 .
请参阅图6,图6是本发明实施例提供的计算机设备的示意性框图。该计算机设备600是服务器,服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。Please refer to FIG. 6, which is a schematic block diagram of a computer device provided by an embodiment of the present invention. The
参阅图6,该计算机设备600包括通过系统总线601连接的处理器602、存储器和网络接口605,其中,存储器可以包括非易失性存储介质603和内存储器604。Referring to FIG. 6 , the
该非易失性存储介质603可存储操作系统6031和计算机程序6032。该计算机程序6032被执行时,可使得处理器602执行人像分割方法。The
该处理器602用于提供计算和控制能力,支撑整个计算机设备600的运行。The
该内存储器604为非易失性存储介质603中的计算机程序6032的运行提供环境,该计算机程序6032被处理器602执行时,可使得处理器602执行人像分割方法。The
该网络接口605用于进行网络通信,如提供数据信息的传输等。本领域技术人员可以理解,图6中示出的结构,仅仅是与本发明方案相关的部分结构的框图,并不构成对本发明方案所应用于其上的计算机设备600的限定,具体的计算机设备600可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。The
本领域技术人员可以理解,图6中示出的计算机设备的实施例并不构成对计算机设备具体构成的限定,在其他实施例中,计算机设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图6所示实施例一致,在此不再赘述。Those skilled in the art can understand that the embodiment of the computer device shown in FIG. 6 does not constitute a limitation on the specific structure of the computer device. Either some components are combined, or different component arrangements. For example, in some embodiments, the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are the same as those of the embodiment shown in FIG. 6 , and details are not repeated here.
应当理解,在本发明实施例中,处理器602可以是中央处理单元(CentralProcessing Unit,CPU),该处理器602还可以是其他通用处理器、数字信号处理器(DigitalSignal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable GateArray,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that, in this embodiment of the present invention, the
在本发明的另一实施例中提供计算机可读存储介质。该计算机可读存储介质可以为非易失性的计算机可读存储介质。该计算机可读存储介质存储有计算机程序,其中计算机程序被处理器执行时实现本发明实施例的人像分割方法。In another embodiment of the present invention, a computer-readable storage medium is provided. The computer-readable storage medium may be a non-volatile computer-readable storage medium. The computer-readable storage medium stores a computer program, wherein when the computer program is executed by the processor, the method for segmenting a portrait of the embodiment of the present invention is implemented.
存储介质为实体的、非瞬时性的存储介质,例如可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的实体存储介质。The storage medium is a physical, non-transitory storage medium, such as a U disk, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk or an optical disk, and other physical storage mediums that can store program codes.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的设备、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the above-described devices, devices and units, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited to this. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed by the present invention. Modifications or substitutions should be included within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210501867.XA CN114897913B (en) | 2022-05-09 | 2022-05-09 | A portrait segmentation method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210501867.XA CN114897913B (en) | 2022-05-09 | 2022-05-09 | A portrait segmentation method, device, computer equipment and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN114897913A true CN114897913A (en) | 2022-08-12 |
| CN114897913B CN114897913B (en) | 2025-06-24 |
Family
ID=82721413
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210501867.XA Active CN114897913B (en) | 2022-05-09 | 2022-05-09 | A portrait segmentation method, device, computer equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114897913B (en) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090310887A1 (en) * | 2008-06-17 | 2009-12-17 | El-Mahdy Ahmed Hazem Mohamed R | Spatially selective transformation of a spatially varying optical characteristic of an image in an array of pixels |
| CN111639692A (en) * | 2020-05-25 | 2020-09-08 | 南京邮电大学 | Shadow detection method based on attention mechanism |
| CN113378989A (en) * | 2021-07-06 | 2021-09-10 | 武汉大学 | Multi-mode data fusion method based on compound cooperative structure characteristic recombination network |
| CN113592927A (en) * | 2021-07-26 | 2021-11-02 | 国网安徽省电力有限公司电力科学研究院 | Cross-domain image geometric registration method guided by structural information |
| CN113870283A (en) * | 2021-09-29 | 2021-12-31 | 深圳万兴软件有限公司 | Image matting method and device, computer equipment and readable storage medium |
| CN114418909A (en) * | 2021-12-22 | 2022-04-29 | 深圳万兴软件有限公司 | Image processing method, device, computer equipment and storage medium |
-
2022
- 2022-05-09 CN CN202210501867.XA patent/CN114897913B/en active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090310887A1 (en) * | 2008-06-17 | 2009-12-17 | El-Mahdy Ahmed Hazem Mohamed R | Spatially selective transformation of a spatially varying optical characteristic of an image in an array of pixels |
| CN111639692A (en) * | 2020-05-25 | 2020-09-08 | 南京邮电大学 | Shadow detection method based on attention mechanism |
| CN113378989A (en) * | 2021-07-06 | 2021-09-10 | 武汉大学 | Multi-mode data fusion method based on compound cooperative structure characteristic recombination network |
| CN113592927A (en) * | 2021-07-26 | 2021-11-02 | 国网安徽省电力有限公司电力科学研究院 | Cross-domain image geometric registration method guided by structural information |
| CN113870283A (en) * | 2021-09-29 | 2021-12-31 | 深圳万兴软件有限公司 | Image matting method and device, computer equipment and readable storage medium |
| CN114418909A (en) * | 2021-12-22 | 2022-04-29 | 深圳万兴软件有限公司 | Image processing method, device, computer equipment and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114897913B (en) | 2025-06-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112001914B (en) | Depth image complement method and device | |
| CN113994384B (en) | Image coloring using machine learning | |
| US11496773B2 (en) | Using residual video data resulting from a compression of original video data to improve a decompression of the original video data | |
| WO2020199693A1 (en) | Large-pose face recognition method and apparatus, and device | |
| CN111191784A (en) | Transposed sparse matrix multiplied by dense matrix for neural network training | |
| CN113449859A (en) | Data processing method and device | |
| CN113393371B (en) | Image processing method and device and electronic equipment | |
| CN119006813B (en) | A U-shaped multimodal fusion segmentation method combining graph neural network and Mamba model | |
| CN114299088A (en) | Image processing method and device | |
| CN114202648B (en) | Text image correction method, training device, electronic equipment and medium | |
| US12288281B2 (en) | Frame interpolation for rendered content | |
| CN111667542A (en) | Decompression technique for processing compressed data suitable for artificial neural networks | |
| CN114529443A (en) | Adaptive sampling at a target sampling rate | |
| CN115439628A (en) | Joint shape and appearance optimization via topology sampling | |
| CN113222813A (en) | Image super-resolution reconstruction method and device, electronic equipment and storage medium | |
| CN113516697B (en) | Image registration method, device, electronic equipment and computer readable storage medium | |
| CN114586055B (en) | Image processing method, image processing device, and non-transitory computer readable medium | |
| CN114549728A (en) | Training method of image processing model, image processing method, device and medium | |
| CN113361535A (en) | Image segmentation model training method, image segmentation method and related device | |
| CN117576737A (en) | Palmprint image expansion method and device based on multi-connection generative adversarial network | |
| US20220140841A1 (en) | ReLU COMPRESSION TO REDUCE GPU MEMORY | |
| CN110827341A (en) | Picture depth estimation method and device and storage medium | |
| CN115908116A (en) | Image processing method, device, equipment and storage medium | |
| CN110717405A (en) | Face feature point positioning method, device, medium and electronic equipment | |
| CN111429388A (en) | Image processing method and device and terminal equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |