CN114998585B

CN114998585B - Open-world semantic segmentation method and device based on region-aware metric learning

Info

Publication number: CN114998585B
Application number: CN202210513831.3A
Authority: CN
Inventors: 董和鑫; 陈梓帆; 袁铭泽; 谢雨彤; 赵杰; 于飞; 张立; 董彬
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2022-05-11
Filing date: 2022-05-11
Publication date: 2025-09-05
Anticipated expiration: 2042-05-11
Also published as: CN114998585A

Abstract

The present invention provides an open-world semantic segmentation method and apparatus based on region-aware metric learning. The method comprises: performing abnormal region segmentation on a target image to generate unknown regions and region-aware features corresponding to the unknown regions; dividing the unknown region to generate multiple unknown subregions and region-aware features corresponding to the unknown subregions; and determining the category corresponding to the unknown subregion based on the region-aware features corresponding to the unknown subregions and the target region-aware features corresponding to a first target category; wherein the first target category is an unknown category among the multiple feature categories corresponding to the target image. The open-world semantic segmentation method based on region-aware metric learning of the present invention further segments the unknown region based on the MCA module to generate unknown subregions for incremental few-shot learning, thereby improving the model's recognition performance for out-of-distribution objects, thereby increasing the precision and accuracy of the segmentation results, and thus improving the final segmentation effect.

Description

Open world semantic segmentation method and device based on regional perception metric learning

Technical Field

The invention relates to the technical field of image segmentation, in particular to an open world semantic segmentation method and device based on regional perception metric learning.

Background

Most computer vision based applications are currently expected to have the ability to handle unknown categories, thus requiring deep learning models to handle and process out-of-distribution (OOD) data during use, such models that can accommodate new categories without forgetting the old category are known as "open-world" models. However, in the related art, an image segmentation technique concerning the open world is lacking.

Disclosure of Invention

The invention provides an open world semantic segmentation method and device based on regional perception metric learning, which are used for solving the defect that an image segmentation technology about the open world is lacked in the prior art and realizing image segmentation in an environment of the open world.

The invention provides an open world semantic segmentation method based on regional perception metric learning, which comprises the following steps:

performing abnormal region segmentation on a target image to generate an unknown region and a region perception feature corresponding to the unknown region;

Segmenting the unknown region to generate a plurality of unknown subregions and region perception features corresponding to the unknown subregions;

Determining a category corresponding to the unknown sub-region based on the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to a first target category, wherein the first target category is an unknown category in a plurality of feature categories corresponding to the target image.

According to the open world semantic segmentation method based on regional perception metric learning provided by the invention, the abnormal regional segmentation is carried out on the target image, and an unknown region and regional perception characteristics corresponding to the unknown region are generated, and the method comprises the following steps:

performing edge prediction on the target image to generate an edge prediction image;

Performing post-processing on the edge prediction image to generate a plurality of candidate areas;

performing abnormal segmentation on the plurality of candidate areas to generate area sensing characteristics corresponding to the candidate areas and area sensing abnormal probability corresponding to the candidate areas;

Generating uncertainty intensity corresponding to pixels in the candidate region based on the region perception anomaly probability;

and under the condition that the uncertainty intensity exceeds a first target threshold, determining a candidate region corresponding to the uncertainty intensity as the unknown region.

According to the open world semantic segmentation method based on region awareness metric learning provided by the invention, the abnormal segmentation is performed on the plurality of candidate regions to generate region awareness features corresponding to the candidate regions and region awareness abnormal probability corresponding to the candidate regions, and the method comprises the following steps:

Inputting the plurality of candidate areas to a RAML module, and acquiring area sensing characteristics corresponding to the candidate areas output by the RAML module;

generating region perception abnormality probability corresponding to the candidate region based on the region perception feature corresponding to the candidate region and the region perception feature corresponding to the second target class by adopting Circle loss constraint;

The RAML module is obtained after training according to a sample image with a regional perception feature tag, and the second target category is a known category in a plurality of feature categories corresponding to the target image.

According to the open world semantic segmentation method based on regional perception metric learning provided by the invention, the unknown region is segmented to generate a plurality of unknown subregions and regional perception features corresponding to the unknown subregions, and the method comprises the following steps:

inputting the unknown region into a plurality of meta-channels in an MCA module, and acquiring the plurality of unknown subregions output by the meta-channels;

The MCA module is trained based on a target loss function.

According to the open world semantic segmentation method based on regional perception metric learning provided by the invention, the determining of the category corresponding to the unknown sub-region based on the regional perception feature corresponding to the unknown sub-region and the target regional perception feature corresponding to the first target category comprises the following steps:

And determining the category corresponding to the unknown sub-region as the first target category under the condition that the cosine similarity between the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to the first target category is larger than a second target threshold and the cosine similarity between the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to the first target category is larger than the cosine similarity between the unknown sub-region and the region sensing features corresponding to other unknown categories except the first target category in the target image.

According to the open world semantic segmentation method based on regional perception metric learning provided by the invention, before the abnormal regional segmentation is carried out on the target image to generate an unknown region and regional perception features corresponding to the unknown region, the method further comprises the following steps:

inputting the target image to a feature extractor of a closed set segmentation module, and acquiring region perception features corresponding to a plurality of feature images output by the feature extractor;

Inputting the region sensing features corresponding to the plurality of feature images to a tag predictor of the closed set segmentation module, and obtaining a second target category corresponding to the region sensing features corresponding to the target feature images in the plurality of feature images output by the tag predictor;

the closed set segmentation module takes a sample image as a sample, and semantic segmentation categories corresponding to the sample image as sample tags, so that the closed set segmentation module is obtained through training;

and the second target category corresponding to the target feature image is a trained category.

The invention also provides an open world semantic segmentation device based on regional perception metric learning, which comprises:

The first processing module is used for carrying out abnormal region segmentation on the target image and generating an unknown region and a region perception feature corresponding to the unknown region;

the second processing module is used for segmenting the unknown region to generate a plurality of unknown sub-regions and region perception features corresponding to the unknown sub-regions;

The third processing module is used for determining the category corresponding to the unknown sub-region based on the region perception feature corresponding to the unknown sub-region and the target region perception feature corresponding to the first target category, wherein the first target category is an unknown category in a plurality of feature categories corresponding to the target image.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor realizes the open world semantic segmentation method based on the regional perception metric learning according to any one of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an open world semantic segmentation method based on regional awareness metric learning as described in any of the above.

The invention also provides a computer program product comprising a computer program which when executed by a processor implements an open world semantic segmentation method based on regional awareness metric learning as described in any one of the above.

According to the open world semantic segmentation method and device based on region perception measurement learning, the candidate regions are extracted by adopting a classical uncertainty-based method to conduct abnormal segmentation to generate the unknown regions so as to ensure the integrity of each segmented region, and then the unknown regions are further segmented based on the MCA module to generate high-quality unknown sub-regions so as to conduct increment less sample learning, so that the prediction performance of a model on objects outside distribution is improved, the precision and accuracy of segmentation results are improved, and the final segmentation effect is improved.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow diagram of an open world semantic segmentation method based on region-aware metric learning provided by the present invention;

FIG. 2 is a second flow chart of the open world semantic segmentation method based on regional perception metric learning provided by the present invention;

FIG. 3 is a schematic diagram showing the effects of the open world semantic segmentation method based on regional perception metric learning provided by the present invention;

FIG. 4 is a second effect diagram of the open world semantic segmentation method based on regional perception metric learning provided by the present invention;

FIG. 5 is a schematic structural diagram of an open world semantic segmentation device based on regional perception metric learning provided by the invention;

Fig. 6 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the related art, there are mainly the following image segmentation techniques regarding the open world.

1) Uncertainty-based methods the baseline for the uncertainty-based method is to use the negative of the maximum SoftMax probability of the known class as the uncertainty probability score (MSP). However, on a large-scale dataset, there may be a high similarity between known classes, affecting the distribution after SoftMax normalization, thereby impairing the performance of the MSP.

2) The method is based on pixel-level feature embedding, can erroneously segment an object into fragments and cause a large number of fine-granularity segmentation errors, forcibly distributes one-hot vector for each class as a fixed measurement center embedding expression, ignores natural distribution among different classes, and does not accord with prior distribution in visual sense.

Both of the above methods have a problem of poor segmentation effect.

The open world semantic segmentation method based on region-aware metric learning of the present invention is described below with reference to fig. 1 to 4.

The execution subject of the open world semantic segmentation method based on regional perception metric learning can be an open world semantic segmentation device based on regional perception metric learning, or a server, or a terminal of a user, including but not limited to a mobile phone, a tablet computer, a PC (personal computer) terminal and the like.

As shown in FIG. 1, the open world semantic segmentation method based on regional perception metric learning comprises a step 110, a step 120 and a step 130.

Step 110, carrying out abnormal region segmentation on a target image to generate an unknown region and a region perception feature corresponding to the unknown region;

in this step, the target image is an image to be segmented.

One or more foreground features may be included in the target image, and the classes of foreground features may be the same or different.

The target image may be an image or video frame retrieved from a database, or may be an image or video frame pulled from a network, or may also be an image or video frame acquired by a sensor, as the invention is not limited.

The unknown region is a region corresponding to the foreground features of the unidentified category.

The region-aware features corresponding to the unknown region are features that characterize foreground features in the unknown region and may be represented as low-dimensional vectors.

The outlier Region segmentation (Region-WARE METRIC LEARNING, RAML) is used to identify outlier segmented regions in the target image.

For example, the target image shown in fig. 2 comprises foreground objects such as a road surface, a bus, a car, a tree, a building, a telegraph pole and the like, wherein the categories such as the road surface, the tree, the building, the telegraph pole and the like are identifiable categories, the buses and the car are not identifiable categories, and the areas where the buses and the car are located are unknown areas.

With continued reference to fig. 2, the flow shown by the broken line in fig. 2 corresponds to the flow of the present step, and for the target image X, an unknown region and a region-aware feature (elliptical region in fig. 2) corresponding to the unknown region are generated by performing anomaly segmentation by performing an Uncertainty-based region planning (URS) method thereon.

In some embodiments, step 110 may include:

post-processing is carried out on the edge prediction image to generate a plurality of candidate areas;

and under the condition that the uncertainty intensity exceeds a first target threshold value, determining the candidate region corresponding to the uncertainty intensity as an unknown region.

In this embodiment, the candidate region is a region in the target image where abnormal segmentation may exist.

The unknown region is a region containing foreground features of unknown class.

The region-aware anomaly probability corresponding to the candidate region and the uncertainty intensity corresponding to the pixels in the candidate region are used to characterize the likelihood that the candidate region includes foreground features of unknown class.

And carrying out edge prediction on the target image to identify the edge contour of each foreground characteristic in the target image, and carrying out region segmentation on the target image based on the result of the edge prediction to generate an edge prediction image.

With continued reference to fig. 2, in actual implementation, the uncertainty-based OOD object detection method MSP may be employed as a region separation module to generate edge prediction images whose high uncertainty response around the object edge may be used as a guide-for region separation.

In other embodiments, sobel filtering may also be introduced on the target image to generate an edge prediction image, further enhancing delineation of fine-grained edges.

Specifically, the formula is as follows:

Generating an edge prediction image, wherein E is the edge prediction image, X is the target image, U is a non-normalized logic (assessment model), For the indicator function, α and β are hyper-parameters that control edge prediction.

After generating the edge prediction image E, performing processing including hole filling and connected component algorithm processing on the edge prediction image E based on the edge prediction image E by using a post-processing submodule to generate a set of a plurality of candidate regions

And R is _i∈{0,1}^H×W

Wherein R _i represents the ith region, i is not less than 1 and not more than T, i is an integer, T is the total number of candidate regions, and H and W respectively represent the length and the width of a target image.

The set of candidate regions generated in this stepThe unknown region may or may not be included.

After the set of candidate regions is obtained, each candidate region R _i in the set of candidate regions is classified, and a region-aware feature corresponding to the candidate region R _i and a region-aware anomaly probability corresponding to the candidate region R _i are generated.

The region sensing abnormal probability corresponding to the candidate region R _i is used to represent the probability that the candidate region has an unknown region, and the greater the abnormal probability is, the greater the probability that the candidate region has a position region is represented.

In some embodiments, performing anomaly segmentation on a plurality of candidate regions to generate a region-aware feature corresponding to the candidate region and a region-aware anomaly probability corresponding to the candidate region may include:

inputting a plurality of candidate areas into an RAML module, and acquiring area sensing characteristics corresponding to the candidate areas output by the RAML module;

generating region perception abnormality probabilities corresponding to the candidate regions based on the region perception features corresponding to the candidate regions and the region perception features corresponding to the second target categories;

the RAML module is obtained after training according to a sample image with a region perception feature tag, and the second target category is a known category in a plurality of feature categories corresponding to the target image.

In this embodiment, the RAML Module (Region-AWARE METRIC LEARNING Module) is a Region-aware metric learning Module that learns a set of candidate regionsClassification is performed by, for each candidate region R _i, the following formula:

Generating a region sensing feature, wherein F _object is the region sensing feature corresponding to the candidate region R _i, F ^j,k is the feature vector of the pixel (j, k), and (j, k) is the coordinate value of the pixel D (·) consists of two fully connected layers for controlling the embedding dimension.

Then, f _object compares with all prototypes of the known class (Prototype) through metric learning constrained by circular loss, using circular loss to expand the inter-class distances and reduce the intra-class distances of the data samples, thereby improving the performance of the RAML module.

Specifically, the formula is as follows:

generating region perception abnormality probability corresponding to the candidate region, wherein, The method comprises the steps of determining a candidate region R _i, determining a region sensing abnormality probability, determining F _object as a region sensing feature corresponding to the candidate region R _i, determining F as a feature image (F can be generated in advance based on a target image and will be described in a later embodiment, which will not be described in detail herein), determining F _l as a region sensing feature corresponding to a first known class (namely, a region sensing feature corresponding to a second target class), determining F _l as a feature image generated by conducting semantic segmentation prediction on the target image in advance, and determining N as a total number of the known classes.

After the region sensing feature corresponding to the candidate region and the region sensing abnormality probability corresponding to the candidate region are generated, based on the region sensing abnormality probability, the following formula is adopted:

The uncertainty intensity corresponding to the pixel in the candidate region can be generated, wherein Q ^j,k is the uncertainty intensity corresponding to the pixel (j, k) in the candidate region R _i, the pixel (j, k) belongs to the candidate region R _i, F is the feature map, The abnormal probability is perceived for the region; output 1 for pixel (j, k) in non-normalized U.

In some embodiments, the uncertainty intensity corresponding to each pixel may be normalized to 0≤Q ^j,k≤1, and then the normalized uncertainty intensity is compared with a first target threshold to determine whether the candidate region where the pixel corresponding to the normalized uncertainty intensity is located is an unknown region.

The first target threshold may be user-defined, such as set to 0.5.

It can be understood that the candidate region corresponding to the uncertainty intensity is determined to be an unknown region when the uncertainty intensity after normalization exceeds the first target threshold, and the candidate region corresponding to the uncertainty intensity is determined not to be an unknown region when the uncertainty intensity after normalization does not exceed the first target threshold.

According to the open world semantic segmentation method based on regional perception metric learning, which is provided by the embodiment of the invention, by replacing one-hot setting embedded in a fixed center with Circle loss as a target of metric learning, not only is good and natural inter-class distance maintained, but also intra-class distribution is more concentrated, and the feature space division is more beneficial to the segmentation of OOD data.

Step 120, segmenting the unknown region to generate a plurality of unknown sub-regions and region perception features corresponding to the unknown sub-regions;

In this step, in the case where it is determined that the target image includes an unknown region, the unknown region may be further segmented to segment the unknown region into a plurality of unknown sub-regions, and a region sensing feature corresponding to each of the unknown sub-regions may be generated.

With continued reference to fig. 2, the flow shown by the solid line in fig. 2 corresponds to the flow of the present step.

In the actual implementation process, a Meta-channel-based Region Separation (MCA) based method can be used to learn incremental small samples, and the unknown region is segmented into foreground features with new categories.

The MCA first over-partitions the unknown region into several meta-channels, the regions belonging to different meta-channels are aggregated to form a partition of the object, and then evaluated by the region-awareness metric learning module RAML.

For example, after the unknown region is obtained in step 110, the unknown region is marked to generate an abnormal imageAnd will be an abnormal imageAs input to the MCA module, the unknown region is further subdivided by the MCA module and the OOD objects are segmented using metric learning to subdivide the foreground features in the unknown region into different new classes of features.

The specific implementation of this step is described below.

In some embodiments, step 120 may include:

inputting the unknown region into a plurality of meta-channels in the MCA module, and obtaining a plurality of unknown subregions output by the meta-channels;

the MCA module is trained based on a target loss function.

In this embodiment, an MCA module may be provided for generating an abnormal image from the abnormal imageCreating a sub-region in the unknown region.

The abnormal image is a target image and comprises a small image with an unknown area.

It should be noted that the MCA module includes a SoftMax activation function C, where:

C∈[0,1]^(N+K)×H×W

The first N channels are the segmentation results of all the distributed internal categories, the last K (K > M) channels are meta-channels, the K original channels are used for excessively segmenting the unknown regions, M is the number of the unknown categories in all the foreground features in the target image, M is more than or equal to 0, and M, N is an integer.

The target loss function is an integrated function generated based on all MCA-related loss functions.

For example, the objective loss function may be determined by the following formula.

Wherein, the As a function of the loss of interest,For characterizing the segmentation loss of all intra-distribution categories,As a Dice Loss function,For avoiding the aggregation of unknown sub-regions (candidates for OOD objects) in several specific channels,For causing the output of all channels to reconstruct the entire image.

Specifically:

Wherein, the Characterizing the dice, (C _i,C_j) characterizing the ith and jth channels of the split output, (N+K) as the total number of channels.

Wherein, the The pixel (j, k) output representing the ith channel, eta is the hyper-parameter controlling the separation, when the unknown subregion is spread over the output channels according to the Jenson inequality,Reaching a minimum.

Wherein, as follows, the element-wise multiplication operator,H W is used to characterize the length by width of the target image as a matrix of all 1's.

The inventors have found during development that in the related art, MCA tends to split objects according to local semantic information, so that an unknown object may be split into multiple channels and lose integrity (e.g., windows and wheels of a car may be split into different channels).

In the present invention, however, by aggregating unknown sub-regions from some meta-channels based on few-shot (here L-shot) marker images, a set of further segmented candidate regions is generatedA final RAML module for incremental little sample learning, wherein:

Wherein, the In order to further segment the unknown region in the target image, a set of candidate regions is generated, and T' is the total number of candidate regions generated after further segmentation of the target image.

Based on the optimized candidate regions R' _i, the region-aware features f _object corresponding to each of the optimized candidate regions may be generated.

Table 1 shows the comparison relation between the result generated by the abnormal segmentation based on the RAML module and the result obtained by the abnormal segmentation based on other modes, the RAML module provided by the invention can generate a higher response value and better integrity for the region of the abnormal object, and the false negative condition is obviously reduced.

TABLE 1

Fig. 3 illustrates a comparison of the anomaly segmentation method proposed by the present invention with other methods, where (a) is a target image, (b) is an anomaly region label, (c) is an edge prediction result generated in step 110, (d) is a result of other related technologies, and (e) is a result of the anomaly segmentation method proposed by the present invention, and it can be seen that the present invention can generate a higher response value and better integrity for a region of an anomaly object, and significantly reduce false negative conditions.

And 130, determining the category corresponding to the unknown sub-region based on the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to the first target category.

In this step, the first target class is any one of unknown classes (i.e., untrained classes) among the classes corresponding to all the foreground features of the target image.

The target region perception feature is a region perception feature corresponding to a pre-generated foreground feature of the first target class.

In the incremental learning process, firstly, images corresponding to unknown foreground features in a target image are acquired, and manual marking is carried out to generate a marked image.

It is understood that each unknown class may correspond to at least one annotation image.

For example, a prototype of the ith unknown class of a new marker image may be defined as:

wherein c _i is the target region perception feature corresponding to the i unknown class prototype; The feature embedding of the ith unknown class in the jth marked image is characterized, L is the total number of marked images corresponding to the ith unknown class (for example, L can be 1 or 5, etc.), and i is more than or equal to 1 and less than or equal to M.

After the region sensing features corresponding to the unknown sub-regions and the region sensing features corresponding to the first target categories are generated, the unknown sub-regions can be divided into the regions corresponding to the corresponding first target categories by calculating the similarity between the region sensing features and the region sensing features.

The implementation of this step will be described below by taking cosine similarity as an example.

In some embodiments, step 130 may include:

And determining the category corresponding to the unknown sub-region as the first target category under the condition that the cosine similarity between the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to the first target category is larger than the second target threshold and the cosine similarity between the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to the first target category is larger than the cosine similarity between the region sensing feature corresponding to the unknown region and the region sensing features corresponding to other position categories except the first target category in the target image.

In this embodiment, the formula may be:

a cosine similarity is determined, wherein, The method is used for representing cosine similarity between an unknown sub-region and a prototype of an i-th unknown class, f _object is a region sensing feature corresponding to the unknown sub-region, c _i is a region sensing feature corresponding to the prototype of the i-th unknown class, and M is the total number of the unknown classes in the target image.

After the cosine similarity is generated, firstly screening to obtain a class with the maximum cosine similarity with the first target class, obtaining the value of the cosine similarity corresponding to the class with the maximum cosine similarity, then judging whether the value of the cosine similarity corresponding to the class with the maximum cosine similarity is larger than a second target threshold value, and determining the class corresponding to the unknown subarea as the first target class under the condition that the value of the cosine similarity corresponding to the class with the maximum cosine similarity is larger than the second target threshold value.

That is, in the case where the cosine similarity between the region-sensing feature corresponding to the unknown sub-region R ' _i and the region-sensing feature c _i (i.e., the target region-sensing feature) corresponding to the i-th unknown class (i.e., the first target class) is greater than the second target threshold, and the cosine similarity between the region-sensing feature corresponding to the unknown sub-region R ' _i and the region-sensing feature c _i corresponding to the i-th unknown class is greater than the cosine similarity between the region-sensing feature corresponding to the unknown sub-region R ' _i and the region-sensing feature c _i corresponding to all other unknown classes except the i-th unknown class, the class corresponding to the unknown sub-region is determined to be the i-th unknown class.

The second target threshold may be user-defined based or may be determined as a hyper-parameter controlling classification. The value of the second target threshold may be set to 0.5, although in other embodiments, the value of the second target threshold may be set to other values, which is not limited by the present invention. For example, the formula may be:

classifying the unknown sub-regions, wherein, For cosine similarity between the unknown subregion and the prototype of the i-th unknown class,For cosine similarity between unknown subregions and prototypes of other unknown categories, θ _novel is a hyper-parameter (i.e., second target threshold) that controls classification.

It will be appreciated that only in cosine similarityWhen the above two criteria are met, the candidate region R' _i can be classified as the i-th new class C _out,i.

In the research and development process, CITYSCAPES data are taken as examples for testing, and the method provided by the invention is compared with other methods to obtain comparison results shown in table 2.

It should be noted that in the experiment, the car, truck and bus are 3 OOD classes that do not participate in the training phase, and the other 16 classes are regarded as distributed intra-classes.

TABLE 2

Table 2：Incremental few-shot learning results on Cityscapes for 16+1 setting(OOD class is car)and 16+3 setting(OOD classes are car,truck,bus).The unknown classes are in blue.Finetune(FT)is the baseline with catastrophic forgetting.

Table 2 illustrates incremental, low sample learning results for the 16+1 setting (OOD class is automobile) and the 16+3 setting (OOD class is automobile, truck, bus) on CITYSCAPES, with the shaded portion being the unknown class and Finetune (FT) being the base line for catastrophic forgetfulness. The method provided by the invention has better effect.

Fig. 4 illustrates a comparison of the method proposed by the present invention with other methods, in which (a) is a target image, (b) is a label image, (C) is a closed set output, (d) is an abnormal segmentation output, (e) is an MCA output, (f) is an output result of other related technologies, and (g) is a final output result of the present invention.

According to fig. 4, the proposed method shows significant ability to preserve the integrity of the results of these objects, furthermore the proposed RAML generated feature embedding maintains reasonable inter-class distances and their intra-class distributions are more concentrated, which can facilitate the model to obtain robust decision boundaries.

According to the open world semantic segmentation method based on region perception measurement learning, the candidate regions are extracted by adopting a classical uncertainty-based method to conduct abnormal segmentation to generate the unknown regions so as to ensure the integrity of each segmented region, and then the unknown regions are further segmented based on the MCA module to generate high-quality unknown sub-regions so as to conduct increment less sample learning, so that the prediction performance of a model on objects outside distribution is improved, the precision and the accuracy of segmentation results are improved, and the final segmentation effect is improved.

In some embodiments, prior to step 110, the method may further comprise:

Inputting the region sensing features corresponding to the plurality of feature images to a tag predictor of the closed set segmentation module, and acquiring a second target category corresponding to a target feature image in the region sensing features corresponding to the plurality of feature images output by the tag predictor;

The closed set segmentation module (or semantic segmentation network) is trained by taking a sample image as a sample and a semantic segmentation class corresponding to the sample image as a sample label;

the second target category corresponding to the target feature image is a trained category.

In this embodiment, the closed set partitioning module(Or semantic segmentation network) includes a feature extractorAnd tag predictorWherein:

the output end of the feature extractor is connected with the input end of the tag predictor, the feature extractor is used for extracting foreground features in the target image and inputting the foreground features to the tag predictor, and the tag predictor is used for generating regional perception features corresponding to each foreground feature, predicting the category corresponding to each foreground feature and outputting the predicted category.

The target feature images are feature images corresponding to the features of the known categories in the feature images, and each target feature image corresponds to a second target category.

The second target category is a known category in the categories corresponding to all foreground features in the target image, namely a trained category, namely a category which can be predicted by the label predictor.

For example, the second target category may be expressed as:

C_in＝{C_in,1,C_in,2,…C_in,N}

Wherein C _in is a class within N distributions, each of which is annotated in the training dataset.

The first target class may be expressed as:

C_out＝{C_out,1,C_out,2,…C_out,M}

wherein C _out is a new class of M non-participating training datasets.

During training, for closed set segmentation, based on minimizing lossThe module is trained such that, among other things,For guidingA pixel-level segmentation is generated for the categories within the distribution.

Wherein, the Segmentation loss for all intra-distribution categories; characterizing multi-class cross entropy loss, X is a sample image, and Y is a sample label corresponding to the sample image, and H multiplied by W is used for representing the length and the width of the sample image.

After training is completed, a trained feature extractor can be obtainedAnd a trained label predictor

In practical application, a plurality of characteristic images F and non-normalized logit U output by the closed set segmentation module can be obtained by inputting a target image into the trained closed set segmentation module, wherein;

wherein X is a target image, and H×W is the length and width of the target image; By removing Obtained from SoftMax layer of (c).

The feature image F and non-normalized logic U generated in this embodiment will be used in steps 110-130 above.

According to the open world semantic segmentation method based on region perception metric learning provided by the embodiment of the invention, the open world semantic segmentation is realized by providing an integral model which comprises a trunk module for closed set segmentation, an abnormal segmentation module for drawing an unknown region of OOD data, an increment little sample learning module for splitting the unknown region into objects with new categories, and the like, so that the effective segmentation of the new categories in the image is realized under the condition of small samples, and the image segmentation effect is remarkably improved.

The open world semantic segmentation device based on regional perception metric learning provided by the invention is described below, and the open world semantic segmentation device based on regional perception metric learning described below and the open world semantic segmentation method based on regional perception metric learning described above can be correspondingly referred to each other.

As shown in fig. 5, the open world semantic segmentation device based on regional perception metric learning includes a first processing module 510, a second processing module 520, and a third processing module 530.

The first processing module 510 is configured to perform abnormal region segmentation on the target image, and generate an unknown region and a region sensing feature corresponding to the unknown region;

The second processing module 520 is configured to segment the unknown region, and generate a plurality of unknown sub-regions and region sensing features corresponding to the unknown sub-regions;

The third processing module 530 is configured to determine a class corresponding to the unknown sub-region based on the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to the first target class, where the first target class is an unknown class in the plurality of feature classes corresponding to the target image.

According to the open world semantic segmentation device based on region perception measurement learning, the candidate regions are extracted by adopting a classical uncertainty-based method to conduct abnormal segmentation to generate the unknown regions so as to ensure the integrity of each segmented region, and then the unknown regions are further segmented based on the MCA module to generate high-quality unknown sub-regions so as to conduct increment less sample learning, so that the prediction performance of a model on objects outside distribution is improved, the precision and the accuracy of segmentation results are improved, and the final segmentation effect is improved.

In some embodiments, the first processing module 510 may also be configured to:

In some embodiments, the second processing module 520 may also be configured to:

the MCA module is trained based on a target loss function.

In some embodiments, the third processing module 530 may also be configured to:

and determining the category corresponding to the unknown sub-region as the first target category under the condition that the cosine similarity between the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to the first target category is greater than the second target threshold and the cosine similarity between the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to the first target category is greater than the cosine similarity between the region sensing feature corresponding to the unknown sub-region and the region sensing features corresponding to other unknown categories except the first target category in the target image.

In some embodiments, the apparatus may further include a fourth processing module to:

Before an abnormal region segmentation is carried out on a target image and an unknown region and region perception features corresponding to the unknown region are generated, the target image is input to a feature extractor of a closed set segmentation module, and the region perception features corresponding to a plurality of feature images output by the feature extractor are obtained;

Inputting the region sensing features corresponding to the plurality of feature images to a tag predictor of the closed set segmentation module, and acquiring a second target category corresponding to the region sensing features corresponding to the target feature images in the plurality of feature images output by the tag predictor;

The closed set segmentation module takes a sample image as a sample, and semantic segmentation categories corresponding to the sample image as sample labels, so that the closed set segmentation module is obtained through training;

Fig. 6 illustrates a physical schematic diagram of an electronic device, which may include a processor 610, a communication interface Communications Interface, a memory 630, and a communication bus 640, as shown in fig. 6, where the processor 610, the communication interface 620, and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform an open world semantic segmentation method based on region awareness metric learning, where the method includes performing abnormal region segmentation on a target image to generate an unknown region and a region awareness feature corresponding to the unknown region, segmenting the unknown region to generate a plurality of unknown sub-regions and region awareness features corresponding to the unknown sub-regions, and determining a class corresponding to the unknown sub-regions based on the region awareness feature corresponding to the unknown sub-regions and a target region awareness feature corresponding to a first target class, where the first target class is an unknown class in a plurality of feature classes corresponding to the target image.

Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

In another aspect, the invention further provides a computer program product, which comprises a computer program stored on a non-transitory computer readable storage medium, wherein the computer program comprises program instructions, when the program instructions are executed by a computer, the computer can execute the open world semantic segmentation method based on the regional perception metric learning provided by the methods, the method comprises the steps of carrying out abnormal regional segmentation on a target image to generate an unknown region and regional perception features corresponding to the unknown region, carrying out segmentation on the unknown region to generate a plurality of unknown subregions and regional perception features corresponding to the unknown subregions, and determining a category corresponding to the unknown subregions based on the regional perception features corresponding to the unknown subregions and target regional perception features corresponding to a first target category, wherein the first target category is the unknown category in a plurality of feature categories corresponding to the target image.

In still another aspect, the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program is implemented when executed by a processor to perform the above-provided open world semantic segmentation method based on region-aware metric learning, where the method includes performing abnormal region segmentation on a target image to generate an unknown region and a region-aware feature corresponding to the unknown region, segmenting the unknown region to generate a plurality of unknown sub-regions and region-aware features corresponding to the unknown sub-regions, determining a class corresponding to the unknown sub-regions based on the region-aware feature corresponding to the unknown sub-regions and a target region-aware feature corresponding to a first target class, where the first target class is an unknown class in a plurality of feature classes corresponding to the target image.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.

Claims

1. An open world semantic segmentation method based on regional perception metric learning, comprising:

determining a category corresponding to the unknown sub-region based on the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to a first target category, wherein the first target category is an unknown category in a plurality of feature categories corresponding to the target image;

the method for performing abnormal region segmentation on the target image to generate an unknown region and a region perception feature corresponding to the unknown region comprises the following steps:

Under the condition that the uncertainty intensity exceeds a first target threshold, determining a candidate region corresponding to the uncertainty intensity as the unknown region;

the performing abnormal segmentation on the plurality of candidate areas to generate an area sensing feature corresponding to the candidate areas and an area sensing abnormal probability corresponding to the candidate areas, including:

The plurality of candidate areas are input to an RAML module, and the area sensing characteristics corresponding to the candidate areas output by the RAML module are obtained;

The RAML module is obtained after training according to a sample image with a regional perception feature tag, wherein the second target category is a known category in a plurality of feature categories corresponding to the target image;

the step of segmenting the unknown region to generate a plurality of unknown sub-regions and region perception features corresponding to the unknown sub-regions includes:

The unknown region is input into a plurality of meta-channels in an MCA module, the plurality of unknown subregions output by the meta-channels are obtained, the MCA module is used for further dividing the unknown region to generate a high-quality unknown subregion so as to perform increment less sample learning, and the MCA module is trained based on a target loss function.

2. The open world semantic segmentation method based on regional perception metric learning according to claim 1, wherein the determining the category corresponding to the unknown sub-region based on the target regional perception feature corresponding to the unknown sub-region and the target regional perception feature corresponding to the first target category comprises:

3. The open world semantic segmentation method based on region-aware metric learning of claim 1, wherein prior to the performing abnormal region segmentation on the target image to generate an unknown region and a region-aware feature corresponding to the unknown region, the method further comprises:

4. An open world semantic segmentation device based on regional perception metric learning, comprising:

the third processing module is used for determining a category corresponding to the unknown sub-region based on the region perception feature corresponding to the unknown sub-region and the target region perception feature corresponding to a first target category, wherein the first target category is an unknown category in a plurality of feature categories corresponding to the target image;

Wherein the device is further for:

the device is also for:

The device is further used for inputting the unknown region into a plurality of meta-channels in an MCA module to obtain the plurality of unknown subregions output by the meta-channels, wherein the MCA module is used for further dividing the unknown region to generate a high-quality unknown subregion so as to perform increment less sample learning, and the MCA module is trained based on a target loss function.

5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the open world semantic segmentation method based on regional awareness metric learning of any one of claims 1 to 3 when executing the program.

6. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the open world semantic segmentation method based on regional awareness metric learning according to any one of claims 1 to 3.

7. A computer program product comprising a computer program which, when executed by a processor, implements the open world semantic segmentation method based on regional awareness metric learning as claimed in any one of claims 1 to 3.