Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the related art, there are mainly the following image segmentation techniques regarding the open world.
1) Uncertainty-based methods the baseline for the uncertainty-based method is to use the negative of the maximum SoftMax probability of the known class as the uncertainty probability score (MSP). However, on a large-scale dataset, there may be a high similarity between known classes, affecting the distribution after SoftMax normalization, thereby impairing the performance of the MSP.
2) The method is based on pixel-level feature embedding, can erroneously segment an object into fragments and cause a large number of fine-granularity segmentation errors, forcibly distributes one-hot vector for each class as a fixed measurement center embedding expression, ignores natural distribution among different classes, and does not accord with prior distribution in visual sense.
Both of the above methods have a problem of poor segmentation effect.
The open world semantic segmentation method based on region-aware metric learning of the present invention is described below with reference to fig. 1 to 4.
The execution subject of the open world semantic segmentation method based on regional perception metric learning can be an open world semantic segmentation device based on regional perception metric learning, or a server, or a terminal of a user, including but not limited to a mobile phone, a tablet computer, a PC (personal computer) terminal and the like.
As shown in FIG. 1, the open world semantic segmentation method based on regional perception metric learning comprises a step 110, a step 120 and a step 130.
Step 110, carrying out abnormal region segmentation on a target image to generate an unknown region and a region perception feature corresponding to the unknown region;
in this step, the target image is an image to be segmented.
One or more foreground features may be included in the target image, and the classes of foreground features may be the same or different.
The target image may be an image or video frame retrieved from a database, or may be an image or video frame pulled from a network, or may also be an image or video frame acquired by a sensor, as the invention is not limited.
The unknown region is a region corresponding to the foreground features of the unidentified category.
The region-aware features corresponding to the unknown region are features that characterize foreground features in the unknown region and may be represented as low-dimensional vectors.
The outlier Region segmentation (Region-WARE METRIC LEARNING, RAML) is used to identify outlier segmented regions in the target image.
For example, the target image shown in fig. 2 comprises foreground objects such as a road surface, a bus, a car, a tree, a building, a telegraph pole and the like, wherein the categories such as the road surface, the tree, the building, the telegraph pole and the like are identifiable categories, the buses and the car are not identifiable categories, and the areas where the buses and the car are located are unknown areas.
With continued reference to fig. 2, the flow shown by the broken line in fig. 2 corresponds to the flow of the present step, and for the target image X, an unknown region and a region-aware feature (elliptical region in fig. 2) corresponding to the unknown region are generated by performing anomaly segmentation by performing an Uncertainty-based region planning (URS) method thereon.
In some embodiments, step 110 may include:
performing edge prediction on the target image to generate an edge prediction image;
post-processing is carried out on the edge prediction image to generate a plurality of candidate areas;
Performing abnormal segmentation on the plurality of candidate areas to generate area sensing characteristics corresponding to the candidate areas and area sensing abnormal probability corresponding to the candidate areas;
Generating uncertainty intensity corresponding to pixels in the candidate region based on the region perception anomaly probability;
and under the condition that the uncertainty intensity exceeds a first target threshold value, determining the candidate region corresponding to the uncertainty intensity as an unknown region.
In this embodiment, the candidate region is a region in the target image where abnormal segmentation may exist.
The unknown region is a region containing foreground features of unknown class.
The region-aware anomaly probability corresponding to the candidate region and the uncertainty intensity corresponding to the pixels in the candidate region are used to characterize the likelihood that the candidate region includes foreground features of unknown class.
And carrying out edge prediction on the target image to identify the edge contour of each foreground characteristic in the target image, and carrying out region segmentation on the target image based on the result of the edge prediction to generate an edge prediction image.
With continued reference to fig. 2, in actual implementation, the uncertainty-based OOD object detection method MSP may be employed as a region separation module to generate edge prediction images whose high uncertainty response around the object edge may be used as a guide-for region separation.
In other embodiments, sobel filtering may also be introduced on the target image to generate an edge prediction image, further enhancing delineation of fine-grained edges.
Specifically, the formula is as follows:
Generating an edge prediction image, wherein E is the edge prediction image, X is the target image, U is a non-normalized logic (assessment model), For the indicator function, α and β are hyper-parameters that control edge prediction.
After generating the edge prediction image E, performing processing including hole filling and connected component algorithm processing on the edge prediction image E based on the edge prediction image E by using a post-processing submodule to generate a set of a plurality of candidate regions
And R is i∈{0,1}H×W
Wherein R i represents the ith region, i is not less than 1 and not more than T, i is an integer, T is the total number of candidate regions, and H and W respectively represent the length and the width of a target image.
The set of candidate regions generated in this stepThe unknown region may or may not be included.
After the set of candidate regions is obtained, each candidate region R i in the set of candidate regions is classified, and a region-aware feature corresponding to the candidate region R i and a region-aware anomaly probability corresponding to the candidate region R i are generated.
The region sensing abnormal probability corresponding to the candidate region R i is used to represent the probability that the candidate region has an unknown region, and the greater the abnormal probability is, the greater the probability that the candidate region has a position region is represented.
In some embodiments, performing anomaly segmentation on a plurality of candidate regions to generate a region-aware feature corresponding to the candidate region and a region-aware anomaly probability corresponding to the candidate region may include:
inputting a plurality of candidate areas into an RAML module, and acquiring area sensing characteristics corresponding to the candidate areas output by the RAML module;
generating region perception abnormality probabilities corresponding to the candidate regions based on the region perception features corresponding to the candidate regions and the region perception features corresponding to the second target categories;
the RAML module is obtained after training according to a sample image with a region perception feature tag, and the second target category is a known category in a plurality of feature categories corresponding to the target image.
In this embodiment, the RAML Module (Region-AWARE METRIC LEARNING Module) is a Region-aware metric learning Module that learns a set of candidate regionsClassification is performed by, for each candidate region R i, the following formula:
Generating a region sensing feature, wherein F object is the region sensing feature corresponding to the candidate region R i, F j,k is the feature vector of the pixel (j, k), and (j, k) is the coordinate value of the pixel D (·) consists of two fully connected layers for controlling the embedding dimension.
Then, f object compares with all prototypes of the known class (Prototype) through metric learning constrained by circular loss, using circular loss to expand the inter-class distances and reduce the intra-class distances of the data samples, thereby improving the performance of the RAML module.
Specifically, the formula is as follows:
generating region perception abnormality probability corresponding to the candidate region, wherein, The method comprises the steps of determining a candidate region R i, determining a region sensing abnormality probability, determining F object as a region sensing feature corresponding to the candidate region R i, determining F as a feature image (F can be generated in advance based on a target image and will be described in a later embodiment, which will not be described in detail herein), determining F l as a region sensing feature corresponding to a first known class (namely, a region sensing feature corresponding to a second target class), determining F l as a feature image generated by conducting semantic segmentation prediction on the target image in advance, and determining N as a total number of the known classes.
After the region sensing feature corresponding to the candidate region and the region sensing abnormality probability corresponding to the candidate region are generated, based on the region sensing abnormality probability, the following formula is adopted:
The uncertainty intensity corresponding to the pixel in the candidate region can be generated, wherein Q j,k is the uncertainty intensity corresponding to the pixel (j, k) in the candidate region R i, the pixel (j, k) belongs to the candidate region R i, F is the feature map, The abnormal probability is perceived for the region; output 1 for pixel (j, k) in non-normalized U.
In some embodiments, the uncertainty intensity corresponding to each pixel may be normalized to 0≤Q j,k≤1, and then the normalized uncertainty intensity is compared with a first target threshold to determine whether the candidate region where the pixel corresponding to the normalized uncertainty intensity is located is an unknown region.
The first target threshold may be user-defined, such as set to 0.5.
It can be understood that the candidate region corresponding to the uncertainty intensity is determined to be an unknown region when the uncertainty intensity after normalization exceeds the first target threshold, and the candidate region corresponding to the uncertainty intensity is determined not to be an unknown region when the uncertainty intensity after normalization does not exceed the first target threshold.
According to the open world semantic segmentation method based on regional perception metric learning, which is provided by the embodiment of the invention, by replacing one-hot setting embedded in a fixed center with Circle loss as a target of metric learning, not only is good and natural inter-class distance maintained, but also intra-class distribution is more concentrated, and the feature space division is more beneficial to the segmentation of OOD data.
Step 120, segmenting the unknown region to generate a plurality of unknown sub-regions and region perception features corresponding to the unknown sub-regions;
In this step, in the case where it is determined that the target image includes an unknown region, the unknown region may be further segmented to segment the unknown region into a plurality of unknown sub-regions, and a region sensing feature corresponding to each of the unknown sub-regions may be generated.
With continued reference to fig. 2, the flow shown by the solid line in fig. 2 corresponds to the flow of the present step.
In the actual implementation process, a Meta-channel-based Region Separation (MCA) based method can be used to learn incremental small samples, and the unknown region is segmented into foreground features with new categories.
The MCA first over-partitions the unknown region into several meta-channels, the regions belonging to different meta-channels are aggregated to form a partition of the object, and then evaluated by the region-awareness metric learning module RAML.
For example, after the unknown region is obtained in step 110, the unknown region is marked to generate an abnormal imageAnd will be an abnormal imageAs input to the MCA module, the unknown region is further subdivided by the MCA module and the OOD objects are segmented using metric learning to subdivide the foreground features in the unknown region into different new classes of features.
The specific implementation of this step is described below.
In some embodiments, step 120 may include:
inputting the unknown region into a plurality of meta-channels in the MCA module, and obtaining a plurality of unknown subregions output by the meta-channels;
the MCA module is trained based on a target loss function.
In this embodiment, an MCA module may be provided for generating an abnormal image from the abnormal imageCreating a sub-region in the unknown region.
The abnormal image is a target image and comprises a small image with an unknown area.
It should be noted that the MCA module includes a SoftMax activation function C, where:
C∈[0,1](N+K)×H×W
The first N channels are the segmentation results of all the distributed internal categories, the last K (K > M) channels are meta-channels, the K original channels are used for excessively segmenting the unknown regions, M is the number of the unknown categories in all the foreground features in the target image, M is more than or equal to 0, and M, N is an integer.
The target loss function is an integrated function generated based on all MCA-related loss functions.
For example, the objective loss function may be determined by the following formula.
Wherein, the As a function of the loss of interest,For characterizing the segmentation loss of all intra-distribution categories,As a Dice Loss function,For avoiding the aggregation of unknown sub-regions (candidates for OOD objects) in several specific channels,For causing the output of all channels to reconstruct the entire image.
Specifically:
Wherein, the Characterizing the dice, (C i,Cj) characterizing the ith and jth channels of the split output, (N+K) as the total number of channels.
Wherein, the The pixel (j, k) output representing the ith channel, eta is the hyper-parameter controlling the separation, when the unknown subregion is spread over the output channels according to the Jenson inequality,Reaching a minimum.
Wherein, as follows, the element-wise multiplication operator,H W is used to characterize the length by width of the target image as a matrix of all 1's.
The inventors have found during development that in the related art, MCA tends to split objects according to local semantic information, so that an unknown object may be split into multiple channels and lose integrity (e.g., windows and wheels of a car may be split into different channels).
In the present invention, however, by aggregating unknown sub-regions from some meta-channels based on few-shot (here L-shot) marker images, a set of further segmented candidate regions is generatedA final RAML module for incremental little sample learning, wherein:
Wherein, the In order to further segment the unknown region in the target image, a set of candidate regions is generated, and T' is the total number of candidate regions generated after further segmentation of the target image.
Based on the optimized candidate regions R' i, the region-aware features f object corresponding to each of the optimized candidate regions may be generated.
Table 1 shows the comparison relation between the result generated by the abnormal segmentation based on the RAML module and the result obtained by the abnormal segmentation based on other modes, the RAML module provided by the invention can generate a higher response value and better integrity for the region of the abnormal object, and the false negative condition is obviously reduced.
TABLE 1
Fig. 3 illustrates a comparison of the anomaly segmentation method proposed by the present invention with other methods, where (a) is a target image, (b) is an anomaly region label, (c) is an edge prediction result generated in step 110, (d) is a result of other related technologies, and (e) is a result of the anomaly segmentation method proposed by the present invention, and it can be seen that the present invention can generate a higher response value and better integrity for a region of an anomaly object, and significantly reduce false negative conditions.
And 130, determining the category corresponding to the unknown sub-region based on the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to the first target category.
In this step, the first target class is any one of unknown classes (i.e., untrained classes) among the classes corresponding to all the foreground features of the target image.
The target region perception feature is a region perception feature corresponding to a pre-generated foreground feature of the first target class.
In the incremental learning process, firstly, images corresponding to unknown foreground features in a target image are acquired, and manual marking is carried out to generate a marked image.
It is understood that each unknown class may correspond to at least one annotation image.
For example, a prototype of the ith unknown class of a new marker image may be defined as:
wherein c i is the target region perception feature corresponding to the i unknown class prototype; The feature embedding of the ith unknown class in the jth marked image is characterized, L is the total number of marked images corresponding to the ith unknown class (for example, L can be 1 or 5, etc.), and i is more than or equal to 1 and less than or equal to M.
After the region sensing features corresponding to the unknown sub-regions and the region sensing features corresponding to the first target categories are generated, the unknown sub-regions can be divided into the regions corresponding to the corresponding first target categories by calculating the similarity between the region sensing features and the region sensing features.
The implementation of this step will be described below by taking cosine similarity as an example.
In some embodiments, step 130 may include:
And determining the category corresponding to the unknown sub-region as the first target category under the condition that the cosine similarity between the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to the first target category is larger than the second target threshold and the cosine similarity between the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to the first target category is larger than the cosine similarity between the region sensing feature corresponding to the unknown region and the region sensing features corresponding to other position categories except the first target category in the target image.
In this embodiment, the formula may be:
a cosine similarity is determined, wherein, The method is used for representing cosine similarity between an unknown sub-region and a prototype of an i-th unknown class, f object is a region sensing feature corresponding to the unknown sub-region, c i is a region sensing feature corresponding to the prototype of the i-th unknown class, and M is the total number of the unknown classes in the target image.
After the cosine similarity is generated, firstly screening to obtain a class with the maximum cosine similarity with the first target class, obtaining the value of the cosine similarity corresponding to the class with the maximum cosine similarity, then judging whether the value of the cosine similarity corresponding to the class with the maximum cosine similarity is larger than a second target threshold value, and determining the class corresponding to the unknown subarea as the first target class under the condition that the value of the cosine similarity corresponding to the class with the maximum cosine similarity is larger than the second target threshold value.
That is, in the case where the cosine similarity between the region-sensing feature corresponding to the unknown sub-region R ' i and the region-sensing feature c i (i.e., the target region-sensing feature) corresponding to the i-th unknown class (i.e., the first target class) is greater than the second target threshold, and the cosine similarity between the region-sensing feature corresponding to the unknown sub-region R ' i and the region-sensing feature c i corresponding to the i-th unknown class is greater than the cosine similarity between the region-sensing feature corresponding to the unknown sub-region R ' i and the region-sensing feature c i corresponding to all other unknown classes except the i-th unknown class, the class corresponding to the unknown sub-region is determined to be the i-th unknown class.
The second target threshold may be user-defined based or may be determined as a hyper-parameter controlling classification. The value of the second target threshold may be set to 0.5, although in other embodiments, the value of the second target threshold may be set to other values, which is not limited by the present invention. For example, the formula may be:
classifying the unknown sub-regions, wherein, For cosine similarity between the unknown subregion and the prototype of the i-th unknown class,For cosine similarity between unknown subregions and prototypes of other unknown categories, θ novel is a hyper-parameter (i.e., second target threshold) that controls classification.
It will be appreciated that only in cosine similarityWhen the above two criteria are met, the candidate region R' i can be classified as the i-th new class C out,i.
In the research and development process, CITYSCAPES data are taken as examples for testing, and the method provided by the invention is compared with other methods to obtain comparison results shown in table 2.
It should be noted that in the experiment, the car, truck and bus are 3 OOD classes that do not participate in the training phase, and the other 16 classes are regarded as distributed intra-classes.
TABLE 2
Table 2:Incremental few-shot learning results on Cityscapes for 16+1 setting(OOD class is car)and 16+3 setting(OOD classes are car,truck,bus).The unknown classes are in blue.Finetune(FT)is the baseline with catastrophic forgetting.
Table 2 illustrates incremental, low sample learning results for the 16+1 setting (OOD class is automobile) and the 16+3 setting (OOD class is automobile, truck, bus) on CITYSCAPES, with the shaded portion being the unknown class and Finetune (FT) being the base line for catastrophic forgetfulness. The method provided by the invention has better effect.
Fig. 4 illustrates a comparison of the method proposed by the present invention with other methods, in which (a) is a target image, (b) is a label image, (C) is a closed set output, (d) is an abnormal segmentation output, (e) is an MCA output, (f) is an output result of other related technologies, and (g) is a final output result of the present invention.
According to fig. 4, the proposed method shows significant ability to preserve the integrity of the results of these objects, furthermore the proposed RAML generated feature embedding maintains reasonable inter-class distances and their intra-class distributions are more concentrated, which can facilitate the model to obtain robust decision boundaries.
According to the open world semantic segmentation method based on region perception measurement learning, the candidate regions are extracted by adopting a classical uncertainty-based method to conduct abnormal segmentation to generate the unknown regions so as to ensure the integrity of each segmented region, and then the unknown regions are further segmented based on the MCA module to generate high-quality unknown sub-regions so as to conduct increment less sample learning, so that the prediction performance of a model on objects outside distribution is improved, the precision and the accuracy of segmentation results are improved, and the final segmentation effect is improved.
In some embodiments, prior to step 110, the method may further comprise:
inputting the target image to a feature extractor of a closed set segmentation module, and acquiring region perception features corresponding to a plurality of feature images output by the feature extractor;
Inputting the region sensing features corresponding to the plurality of feature images to a tag predictor of the closed set segmentation module, and acquiring a second target category corresponding to a target feature image in the region sensing features corresponding to the plurality of feature images output by the tag predictor;
The closed set segmentation module (or semantic segmentation network) is trained by taking a sample image as a sample and a semantic segmentation class corresponding to the sample image as a sample label;
the second target category corresponding to the target feature image is a trained category.
In this embodiment, the closed set partitioning module(Or semantic segmentation network) includes a feature extractorAnd tag predictorWherein:
the output end of the feature extractor is connected with the input end of the tag predictor, the feature extractor is used for extracting foreground features in the target image and inputting the foreground features to the tag predictor, and the tag predictor is used for generating regional perception features corresponding to each foreground feature, predicting the category corresponding to each foreground feature and outputting the predicted category.
The target feature images are feature images corresponding to the features of the known categories in the feature images, and each target feature image corresponds to a second target category.
The second target category is a known category in the categories corresponding to all foreground features in the target image, namely a trained category, namely a category which can be predicted by the label predictor.
For example, the second target category may be expressed as:
Cin={Cin,1,Cin,2,…Cin,N}
Wherein C in is a class within N distributions, each of which is annotated in the training dataset.
The first target class may be expressed as:
Cout={Cout,1,Cout,2,…Cout,M}
wherein C out is a new class of M non-participating training datasets.
During training, for closed set segmentation, based on minimizing lossThe module is trained such that, among other things,For guidingA pixel-level segmentation is generated for the categories within the distribution.
Wherein, the Segmentation loss for all intra-distribution categories; characterizing multi-class cross entropy loss, X is a sample image, and Y is a sample label corresponding to the sample image, and H multiplied by W is used for representing the length and the width of the sample image.
After training is completed, a trained feature extractor can be obtainedAnd a trained label predictor
In practical application, a plurality of characteristic images F and non-normalized logit U output by the closed set segmentation module can be obtained by inputting a target image into the trained closed set segmentation module, wherein;
wherein X is a target image, and H×W is the length and width of the target image; By removing Obtained from SoftMax layer of (c).
The feature image F and non-normalized logic U generated in this embodiment will be used in steps 110-130 above.
According to the open world semantic segmentation method based on region perception metric learning provided by the embodiment of the invention, the open world semantic segmentation is realized by providing an integral model which comprises a trunk module for closed set segmentation, an abnormal segmentation module for drawing an unknown region of OOD data, an increment little sample learning module for splitting the unknown region into objects with new categories, and the like, so that the effective segmentation of the new categories in the image is realized under the condition of small samples, and the image segmentation effect is remarkably improved.
The open world semantic segmentation device based on regional perception metric learning provided by the invention is described below, and the open world semantic segmentation device based on regional perception metric learning described below and the open world semantic segmentation method based on regional perception metric learning described above can be correspondingly referred to each other.
As shown in fig. 5, the open world semantic segmentation device based on regional perception metric learning includes a first processing module 510, a second processing module 520, and a third processing module 530.
The first processing module 510 is configured to perform abnormal region segmentation on the target image, and generate an unknown region and a region sensing feature corresponding to the unknown region;
The second processing module 520 is configured to segment the unknown region, and generate a plurality of unknown sub-regions and region sensing features corresponding to the unknown sub-regions;
The third processing module 530 is configured to determine a class corresponding to the unknown sub-region based on the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to the first target class, where the first target class is an unknown class in the plurality of feature classes corresponding to the target image.
According to the open world semantic segmentation device based on region perception measurement learning, the candidate regions are extracted by adopting a classical uncertainty-based method to conduct abnormal segmentation to generate the unknown regions so as to ensure the integrity of each segmented region, and then the unknown regions are further segmented based on the MCA module to generate high-quality unknown sub-regions so as to conduct increment less sample learning, so that the prediction performance of a model on objects outside distribution is improved, the precision and the accuracy of segmentation results are improved, and the final segmentation effect is improved.
In some embodiments, the first processing module 510 may also be configured to:
performing edge prediction on the target image to generate an edge prediction image;
post-processing is carried out on the edge prediction image to generate a plurality of candidate areas;
Performing abnormal segmentation on the plurality of candidate areas to generate area sensing characteristics corresponding to the candidate areas and area sensing abnormal probability corresponding to the candidate areas;
Generating uncertainty intensity corresponding to pixels in the candidate region based on the region perception anomaly probability;
and under the condition that the uncertainty intensity exceeds a first target threshold value, determining the candidate region corresponding to the uncertainty intensity as an unknown region.
In some embodiments, the first processing module 510 may also be configured to:
inputting a plurality of candidate areas into an RAML module, and acquiring area sensing characteristics corresponding to the candidate areas output by the RAML module;
Generating region perception abnormality probability corresponding to the candidate region based on the region perception feature corresponding to the candidate region and the region perception feature corresponding to the second target class by adopting Circle loss constraint;
the RAML module is obtained after training according to a sample image with a region perception feature tag, and the second target category is a known category in a plurality of feature categories corresponding to the target image.
In some embodiments, the second processing module 520 may also be configured to:
inputting the unknown region into a plurality of meta-channels in the MCA module, and obtaining a plurality of unknown subregions output by the meta-channels;
the MCA module is trained based on a target loss function.
In some embodiments, the third processing module 530 may also be configured to:
and determining the category corresponding to the unknown sub-region as the first target category under the condition that the cosine similarity between the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to the first target category is greater than the second target threshold and the cosine similarity between the region sensing feature corresponding to the unknown sub-region and the target region sensing feature corresponding to the first target category is greater than the cosine similarity between the region sensing feature corresponding to the unknown sub-region and the region sensing features corresponding to other unknown categories except the first target category in the target image.
In some embodiments, the apparatus may further include a fourth processing module to:
Before an abnormal region segmentation is carried out on a target image and an unknown region and region perception features corresponding to the unknown region are generated, the target image is input to a feature extractor of a closed set segmentation module, and the region perception features corresponding to a plurality of feature images output by the feature extractor are obtained;
Inputting the region sensing features corresponding to the plurality of feature images to a tag predictor of the closed set segmentation module, and acquiring a second target category corresponding to the region sensing features corresponding to the target feature images in the plurality of feature images output by the tag predictor;
The closed set segmentation module takes a sample image as a sample, and semantic segmentation categories corresponding to the sample image as sample labels, so that the closed set segmentation module is obtained through training;
the second target category corresponding to the target feature image is a trained category.
Fig. 6 illustrates a physical schematic diagram of an electronic device, which may include a processor 610, a communication interface Communications Interface, a memory 630, and a communication bus 640, as shown in fig. 6, where the processor 610, the communication interface 620, and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform an open world semantic segmentation method based on region awareness metric learning, where the method includes performing abnormal region segmentation on a target image to generate an unknown region and a region awareness feature corresponding to the unknown region, segmenting the unknown region to generate a plurality of unknown sub-regions and region awareness features corresponding to the unknown sub-regions, and determining a class corresponding to the unknown sub-regions based on the region awareness feature corresponding to the unknown sub-regions and a target region awareness feature corresponding to a first target class, where the first target class is an unknown class in a plurality of feature classes corresponding to the target image.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
In another aspect, the invention further provides a computer program product, which comprises a computer program stored on a non-transitory computer readable storage medium, wherein the computer program comprises program instructions, when the program instructions are executed by a computer, the computer can execute the open world semantic segmentation method based on the regional perception metric learning provided by the methods, the method comprises the steps of carrying out abnormal regional segmentation on a target image to generate an unknown region and regional perception features corresponding to the unknown region, carrying out segmentation on the unknown region to generate a plurality of unknown subregions and regional perception features corresponding to the unknown subregions, and determining a category corresponding to the unknown subregions based on the regional perception features corresponding to the unknown subregions and target regional perception features corresponding to a first target category, wherein the first target category is the unknown category in a plurality of feature categories corresponding to the target image.
In still another aspect, the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program is implemented when executed by a processor to perform the above-provided open world semantic segmentation method based on region-aware metric learning, where the method includes performing abnormal region segmentation on a target image to generate an unknown region and a region-aware feature corresponding to the unknown region, segmenting the unknown region to generate a plurality of unknown sub-regions and region-aware features corresponding to the unknown sub-regions, determining a class corresponding to the unknown sub-regions based on the region-aware feature corresponding to the unknown sub-regions and a target region-aware feature corresponding to a first target class, where the first target class is an unknown class in a plurality of feature classes corresponding to the target image.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.