CN110210560B

CN110210560B - Incremental training method, classification method and device, equipment and medium of classification network

Info

Publication number: CN110210560B
Application number: CN201910472078.6A
Authority: CN
Inventors: 侯赛辉; 潘薪宇; 林达华; 吕健勤
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2021-11-30
Anticipated expiration: 2039-05-31
Also published as: CN110210560A

Abstract

The embodiment of the application provides an incremental training method, a classification device, equipment and a storage medium of a classification network. The method comprises the following steps: performing feature extraction on a plurality of training image samples of a plurality of classes by using a first classification network to obtain first sample features of the plurality of training image samples, wherein the plurality of training image samples comprise: a first number of old class image samples and a second number of new class image samples, the second number being greater than the first number; normalizing the first sample characteristics of the training image samples to obtain first normalized characteristics of the training image samples; determining a network loss based on the first normalized features of the plurality of training image samples; adjusting a network parameter of the first classification network based on the network loss.

Description

Incremental training method, classification method and device, equipment and medium of classification network

Technical Field

The present application relates to the field of machine learning technologies, and in particular, to an incremental training method, a classification method and apparatus, a device, and a storage medium for a classification network.

Background

Incremental learning is an important challenge to be met when the deep learning related technology is actually deployed in a big data scene, and the incremental image classification problem is a type of problem which is basic and representative. The multi-class incremental classification task means that under the condition that the target classifiable class set is gradually expanded, a single classification model is incrementally learned, and high classification precision is achieved on all target classes.

In practice, limited by computational overhead, storage space, and data privacy, it is often not desirable to retrain all data including before each time the classifiable set is expanded. On the other hand, in a longer incremental learning sequence, the single training using the newly added category data leads to the severe decrease of the performance on the original category, which is a major difficulty of the multi-category incremental classification task.

Disclosure of Invention

The embodiment of the application provides an incremental training scheme of a classification network.

In one aspect of the embodiments of the present disclosure, a method for incremental training of a classification network is provided, including: performing feature extraction on a plurality of training image samples of a plurality of classes by using a first classification network to obtain first sample features of the plurality of training image samples, wherein the plurality of training image samples comprise: a first number of old class image samples and a second number of new class image samples, the second number being greater than the first number; normalizing the first sample characteristics of the training image samples to obtain first normalized characteristics of the training image samples; determining a network loss based on the first normalized features of the plurality of training image samples; adjusting a network parameter of the first classification network based on the network loss.

Based on the above scheme, the method further comprises: carrying out normalization processing on the classification weights of the multiple categories to obtain first normalization weights of the multiple categories; determining a network loss based on the first normalized features of the plurality of training image samples, comprising: determining the network loss based on the first normalized features of the plurality of training image samples and the first normalized weights of the plurality of classes.

Based on the above scheme, the determining the network loss based on the first normalized features of the plurality of training image samples and the first normalized weights of the plurality of classes includes: selecting K new categories corresponding to the old category image samples from the multiple categories based on the first normalized features of the old category image samples; determining a first loss term in the network loss based on the first normalized features of the old class image samples and the first normalized weights of the K new classes, wherein K is a positive integer not less than 2.

Based on the above scheme, the selecting, from the multiple categories, K new categories corresponding to the old category image samples based on the first normalized feature of the old category image samples includes: and selecting K new categories corresponding to the old category image samples from the multiple categories based on the similarity between the first normalized weight of each new category in the multiple categories and the first normalized feature of the old category image samples.

Based on the above scheme, the determining a first loss term in the network loss based on the first normalized features of the old category image samples and the first normalized weights of the K new categories includes: determining a first loss term in the network loss based on a similarity between the first normalized features of the old class image samples and the first normalized weight for each of the K new classes.

Based on the above scheme, the determining the network loss based on the first normalized features of the plurality of training image samples and the first normalized weights of the plurality of classes includes: obtaining a classification probability of each training image sample based on a first normalization feature of each training image sample in the plurality of training image samples and a first normalization weight of a prediction class of each training image sample obtained by the first classification network; and obtaining a second loss item of the network loss based on the classification probability of each training image sample in the plurality of training image samples and the labeling class information of each training image sample.

Based on the above scheme, the method further comprises: acquiring second sample characteristics of the old category image samples obtained by performing characteristic extraction on the old category image samples by using a second classification network, wherein the second classification network is an initial network of the incremental training; normalizing the second sample characteristic of the old type image sample to obtain a second normalized characteristic of the old type image sample; determining a network loss based on the first normalized features of the plurality of training image samples, comprising: and obtaining a third loss term of the network loss based on the similarity between the second normalized feature of the old class image sample and the first normalized feature of the old class image sample.

Based on the above scheme, the method further comprises: determining a first weighting factor based on the first number; determining a network loss based on the first normalized features of the plurality of training image samples, comprising: and obtaining the network loss based on the product of the first weighting coefficient and the first loss term.

Based on the above scheme, the method further comprises: determining a second weighting coefficient of the third loss term according to the number of classes contained in the plurality of classes and the number of new classes; determining a network loss based on the first normalized features of the plurality of training image samples, comprising: and obtaining the network loss based on the product of the third loss term and the second weighting coefficient.

A method of classification, comprising: acquiring an image to be processed; and classifying the images to be processed by using a target classification network to obtain a classification result of the images to be processed, wherein the target classification network is obtained by training by using an increment training method of the classification network provided by any technical scheme.

In one aspect of the embodiments of the present disclosure, an incremental training apparatus for a classification network is provided, including: a first obtaining module, configured to perform feature extraction on multiple training image samples of multiple classes by using a first classification network, so as to obtain first sample features of the multiple training image samples, where the multiple training image samples include: a first number of old class image samples and a second number of new class image samples, the second number being greater than the first number; the first normalization module is used for performing normalization processing on the first sample characteristics of the training image samples to obtain first normalization characteristics of the training image samples; a first determination module to determine a network loss based on a first normalized feature of the plurality of training image samples; an adjusting module, configured to adjust a network parameter of the first classification network based on the network loss.

Based on the above scheme, the apparatus further comprises: the second normalization module is used for performing normalization processing on the classification weights of the multiple categories to obtain first normalization weights of the multiple categories; the first determining module is specifically configured to determine the network loss based on the first normalized features of the plurality of training image samples and the first normalized weights of the plurality of classes.

Based on the above scheme, the first determining module is specifically configured to select, based on the first normalized feature of the old category image sample, K new categories corresponding to the old category image sample from the multiple categories; determining a first loss term in the network loss based on the first normalized features of the old class image samples and the first normalized weights of the K new classes, wherein K is a positive integer not less than 2.

Based on the above scheme, the first determining module is specifically configured to select, based on a similarity between the first normalization weight of each new category in the multiple categories and the first normalization feature of the old category image sample, K new categories corresponding to the old category image sample from the multiple categories.

Based on the above scheme, the first determining module is specifically configured to determine the first loss term in the network loss based on a similarity between the first normalized feature of the old category image sample and the first normalized weight of each new category in the K new categories.

Based on the above scheme, the first determining module is specifically configured to obtain the classification probability of each training image sample based on the first normalized feature of each training image sample in the plurality of training image samples and the first normalized weight of the prediction class of each training image sample obtained by the first classification network; and obtaining a second loss item of the network loss based on the classification probability of each training image sample in the plurality of training image samples and the labeling class information of each training image sample.

Based on the above scheme, the apparatus further comprises: the first obtaining module is used for obtaining second sample characteristics of the old category image samples obtained by performing characteristic extraction on the old category image samples by using a second classification network, wherein the second classification network is an initial network of the incremental training; the third normalization module is used for performing normalization processing on the second sample characteristics of the old type image samples to obtain second normalization characteristics of the old type image samples; the first determining module is specifically configured to obtain a third loss term of the network loss based on a similarity between the second normalized feature of the old-category image sample and the first normalized feature of the old-category image sample.

Based on the above scheme, the apparatus further comprises: a second determining module for determining a first weighting factor based on the first number; the first determining module is specifically configured to obtain the network loss based on a product of the first weighting coefficient and the first loss term.

Based on the above scheme, the apparatus further comprises: a third determining module, configured to determine a second weighting coefficient of the third loss term according to the number of classes included in the plurality of classes and the number of new classes; the first determining module is specifically configured to obtain the network loss based on a product of the third loss term and the second weighting coefficient.

In one aspect of the disclosed embodiments, a classification apparatus is provided, which includes: the acquisition module is used for acquiring an image to be processed; and the classification module is used for performing classification processing on the image to be processed by using a target classification network to obtain a classification result of the image to be processed, wherein the target classification network is obtained by training by using an increment training method of the classification network provided by any technical scheme.

In an aspect of the disclosed embodiments, there is further provided an electronic device, including: a memory; and the processor is connected with the memory and used for realizing the incremental training method or the classification method of the classification network provided by any technical scheme by executing the computer executable instructions stored on the memory.

In an aspect of the embodiments of the present disclosure, a computer storage medium is provided, where the computer storage medium stores computer-executable instructions, and the computer-executable instructions implement the incremental training method or the classification method of the classification network provided in any of the foregoing technical solutions when running.

According to the technical scheme, in the process of carrying out incremental training on the classification network, iterative training of the classification network is carried out by utilizing a first number of old class image samples and a second number of new class image samples, and the sample characteristics of the old class image samples and the sample characteristics of the new class image samples are normalized, so that the modular lengths of the sample characteristics of a plurality of training image samples are similar, the phenomenon that the classification accuracy of the classification network obtained through incremental training on the old class image samples is low due to unbalance of the new and old class image samples is avoided, and the classification accuracy and the overall performance of the classification network obtained through training on the images of various classes are improved.

Drawings

Fig. 1 is a schematic flowchart of an incremental training method for a classification network according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of effects before and after normalization of a first sample feature provided in an embodiment of the present application;

fig. 3 is another schematic flowchart of an incremental training method for a classification network according to an embodiment of the present disclosure;

fig. 4 is another schematic flowchart of an incremental training method for a classification network according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating comparison of knowledge learned from old class image samples before and after introducing a second loss term according to an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating distances between an old class image sample and a new class image sample after a third lossy term is introduced according to an embodiment of the present application;

fig. 7 is a schematic flowchart of an incremental training method for a classification network according to an embodiment of the present disclosure;

fig. 8 is a schematic flowchart of a classification method according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a classification network training apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a sorting apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solution of the present application is further described in detail with reference to the drawings and specific embodiments of the specification.

As shown in fig. 1, some embodiments provide a method of incremental training of a classification network, comprising:

s110: performing feature extraction on a plurality of training image samples of a plurality of classes by using a first classification network to obtain first sample features of the plurality of training image samples, wherein the plurality of training image samples comprise: a first number of old class image samples and a second number of new class image samples, the second number being greater than the first number;

s120: normalizing the first sample characteristics of the training image samples to obtain first normalized characteristics of the training image samples;

s130: determining a network loss based on the first normalized features of the plurality of training image samples;

s140: adjusting a network parameter of the first classification network based on the network loss.

In some embodiments, the classification network has classification capabilities capable of identifying the class to which an image belongs. In some embodiments, the classification network is a multi-class classification network, and may classify more than 2 classes, for example, may classify 3 classes.

In some embodiments, to reduce the severe degradation of the classification accuracy of the incrementally trained classification network on the old class images, the training image sample in one iteration includes two parts: a first number of old class image samples and a second number of new class image samples. The old-class image samples are image samples belonging to old classes, namely the image samples marked as the old classes, and the old classes are classes which can be identified by the initial classification network (namely the second classification network) of the current incremental training. The new-class image samples are image samples belonging to a new class, the second classification network cannot identify the new class, and the incremental training is also used for enabling the classification network to have the capacity of identifying the images of the new class through training.

In some embodiments, the first number is smaller than the second number, for example, the first number is much smaller than the second number, so that iteratively training the classification network with fewer old class image samples and more new class image samples is beneficial to make the incrementally trained classification network have higher classification accuracy on both the old class and the new class.

In some embodiments, the first number is a predetermined set number, for example, the predetermined number may be a number less than 10, for example, 5, 6, or 3, and the specific value of the first number is not limited in the embodiments of the present disclosure.

In some embodiments, the above-mentioned process is any iterative process of the incremental training, for example, the above-mentioned process is the first iteration of the incremental training, then the first classification network and the initial network of the incremental training (i.e., the second classification network) are the same network, and for example, the above-mentioned process is the middle or last iteration of the incremental training, then the first classification network is obtained by performing one or more network parameter adjustments on the second classification network, which is not limited in this disclosure.

In some embodiments, a first number of old image samples and a second number of new image samples are input together as a batch (batch) into the first classification network for processing, the first classification network processes the old image samples and the new image samples accordingly to obtain a processing result, and the network parameters of the first classification network are adjusted based on the processing result of the old image samples and the processing result of the new image samples.

In some optional embodiments, the first classification network mainly comprises: the device comprises a feature extraction module and a classification module connected with the feature extraction module. The feature extraction module may be configured to extract feature information in a training image sample, and the classification module may be configured to determine a category of the image sample based on the feature information extracted by the feature extraction module.

In this disclosure, the feature information, the feature data, and the sample feature are different description manners of the data output by the feature extraction module, and may specifically include at least one feature vector, at least one feature matrix, at least one feature map, or a feature tensor, which is not limited in this disclosure.

In some optional embodiments, the classification network is a deep learning network. The network parameters of the classification network may include, but are not limited to, weights and/or bias values of the deep learning network.

For example, the network parameters include, but are not limited to:

extracting weight and/or bias value of the feature extraction module;

a classification weight and/or a classification bias value of the classification module.

As another example, the network parameters include: a classification weight for each of a plurality of classes.

In some embodiments, the plurality of layers included in the feature extraction module sequentially process the input image to obtain feature information of the input image. In order to solve the imbalance between the old image sample and the new image sample, the input feature data is normalized in the last layer of the feature extraction module, so that the feature data of the old image sample has the same modular length as the feature data of the new image sample.

In fig. 2, the dashed arrows respectively represent the first sample feature and the first normalized feature of the old class image sample; the solid arrows represent the first sample feature and the first normalized feature of the new class of image samples, respectively. As can be seen from fig. 2, before the normalization process, the modular length of the first sample feature of the old class image sample is much smaller than the first sample feature of the new class image sample when in the planar coordinate system or the three-dimensional rectangular coordinate system. If the normalization processing is not performed, because the first number is smaller than the second number, the first classification network pays more attention to the characteristics of the second number of new-class image samples in the training process, and the response of the first classification network to the new-class image samples is larger by adjusting the network parameters of the first classification network, so that the modulus of the first sample characteristics of the new-class image samples is larger than that of the old-class image samples. The sample characteristics of the old class image samples and the sample characteristics of the new class image samples are on the same spherical surface through normalization processing, so that the modular length difference between the characteristic data of the image samples of different classes is eliminated.

In some embodiments, the first sample features of the plurality of training image samples are normalized such that the resulting moduli after any two of the first sample features are normalized are equal. For example, the modulus value of the first normalized feature of the old class image sample and the modulus value of the first normalized feature of the new class image sample are both 1, but this is not limited by the embodiment of the present disclosure.

During the iterative training of the first classification network, the classification network responds to the new class more during the adjustment of the network parameters due to more new class image samples. If left alone, it may cause the trained classification network to be more sensitive to the new class in the case of the imbalance between the new class image sample and the old class image sample, and ignore or suppress the old class, thereby causing the classification accuracy of the old class to be reduced. By carrying out normalization processing on the feature data of a plurality of training image samples, the new class image samples and the old class image samples can have similar contribution weights to parameter adjustment of the classification network, so that the prediction accuracy of the classification network obtained by training on all class images and the overall performance of the first classification network are improved.

In S140, if the network loss is smaller than the loss threshold, or the network loss has reached the minimum, it indicates that the network parameter of the first classification network has been optimized, or the iteration number reaches a preset value, and the iteration may be stopped, thereby completing the incremental training of the first classification network. If the network loss is not less than the loss threshold, or the network loss does not reach the minimum, it is determined that the network parameter of the first classification network has not reached the optimum, or the iteration number does not reach a preset value, the iteration is continued to optimize the network parameter of the first classification network, which is not limited in the embodiment of the present disclosure.

In some embodiments, as shown in fig. 3, the method further comprises:

s121: carrying out normalization processing on the classification weights of the multiple categories to obtain first normalization weights of the multiple categories;

accordingly, the S130 may include S131: determining the network loss based on the first normalized features of the plurality of training image samples and the first normalized weights of the plurality of classes.

In the embodiments of the present disclosure, the classification weight of a class may include one or more vectors, matrices, tensors, or graphs, which is not limited by the embodiments of the present disclosure.

The classification weights of different classes are normalized by using a first classification network to obtain the first normalization weights, for example, J classes may include at least one old class and at least one new class, and a classification module of the first classification network may be provided with J classification weights, and the J classification weights are normalized to obtain J first normalization weights.

In some embodiments, the classification module obtains a classification probability corresponding to each training image sample based on the first normalized weight of each training image sample and the first normalized feature of each of the plurality of classes, thereby obtaining the predicted class of the training image sample.

In some embodiments, a first loss term for the network loss may be determined based on a classification probability or prediction class of a training image sample.

In the iterative training process of the classification network, because more new class samples exist, the weight value of the classification weight which marks the corresponding sample as the new class image sample in the classification weight is larger. If the number of the new-category image samples is not equal to the number of the old-category image samples, the trained classification network is more sensitive to the categories included in the new-category image samples, and since the number of the new-category image samples is far greater than the number of the old-category image samples, more image samples need to be classified into the new category in the classification process, so that the classification weight of the new category is increased by the network in the training process, and the classification weight of the subsequent old category is smaller than the classification weight of the new category. If the network loss is directly calculated without normalization processing of the classification weight, the classification weight of the new class is continuously increased, and the adjustment of the classification weight of the old class is reduced or inhibited, so that the image classification precision of the old class is reduced by the first classification network of the incremental training. In the embodiment, the normalization processing of the classification weights is used for reducing the phenomenon that the classification weights of the new class are continuously increased relative to the classification weights of the old class due to the fact that the modular lengths of the classification weights of the new class and the old class are different, so that the first classification network obtained through the incremental training of the classification network keeps high classification accuracy and overall high classification performance of the old class.

In some embodiments, through the normalization processing of the classification weights, the modular lengths of the first normalization weights of different types in different numbers can be made to be equal, so that based on the first normalization features and the first normalization weights, the classification weights corresponding to the new-type image samples cannot be generated, and are far greater than the classification weights of the new-type image samples, so that the new-type image samples are considered in a bias manner in a loss value, and old-type image samples are ignored or suppressed, so that the first classification network after multiple times of iterative optimization cannot accurately classify the types contained in the old-type image samples, and the accuracy of the types contained in the first-type samples is improved again.

In some embodiments, as shown in fig. 4, the S131 may include:

s131 and 131 a: selecting K new categories corresponding to the old category image samples from the multiple categories based on the first normalized features of the old category image samples;

s131 and 131 b: determining a first loss term in the network loss based on the first normalized features of the old class image samples and the first normalized weights of the K new classes, wherein K is a positive integer not less than 2.

For example, in some embodiments, a first similarity is obtained using a first normalization feature of an old class image sample and a first normalization weight of the old class image sample; and then obtaining a second similarity with the first normalized feature of the old class image sample by using the first normalized weights of the K new classes, and determining the first loss term by combining the first similarity and the second similarity.

In some embodiments, the first loss term may be obtained using the following equation (1);

wherein L is_mr(x) Is the first loss term;

a first normalized sub-feature for the old class image sample x;

a first normalized weight that is the actual class with x;

a first normalized weight for the kth category of the K new categories.

Is composed of

And

inner product of (d);

is composed of

And

the inner product of (d).

In some embodiments, the S130 determines the network loss at least according to a first loss term.

Specifically, the S131a may include:

and selecting K new categories corresponding to the old category image samples from the multiple categories based on the similarity between the first normalized weight of each new category in the multiple categories and the first normalized feature of the old category image samples.

For example, the K new classes with the highest similarity are selected through similarity calculation between the first normalized weight of each new class in the multiple classes and the first normalized feature of the old class image sample.

When calculating the similarity between the first normalized weight of each new category and the first normalized feature of the old category image sample, the calculation can be realized by solving the inner product of the first normalized weight of each new category and the first normalized feature of the old category image sample.

As shown in fig. 5, by introducing the first loss term, the distance between the anchor point corresponding to the old class image sample and the positive sample of the old class image sample is greater than the distance between the anchor point and the negative sample of the old class image sample. While negative examples of old class image examples include: new class image samples that are very similar to the old class image samples, e.g., K new class image samples of the second normalized feature corresponding to x.

In some embodiments, the S131b may include: determining a first loss term in the network loss based on a similarity between the first normalized features of the old class image samples and the first normalized weight for each of the K new classes.

In some embodiments, the first loss term is derived by calculating a similarity between a first normalized feature based on the old class image sample and a first normalized weight for each of the K new classes. Specifically, when the first loss term is not less than zero, the first loss term is positively correlated with the similarity.

The method may further comprise: determining a first weighting coefficient according to the first number;

the S130 may further include: and obtaining the network loss based on the product of the first weighting coefficient and the first loss term.

In some embodiments, the network loss comprises a plurality of loss terms, and the first loss term may be only one of the loss terms.

To accurately calculate the network loss, in some embodiments, the first weighting factor is calculated based on the first amount. The first weighting factor is used for multiplying the first loss term, and the network loss is calculated based on the product.

For example, the first number is | N_oL, |; the first weighting factor may be: 1/| N_o|。N_oThe set of the old image samples; n is a radical of_nIs a collection of new category image samples.

In some embodiments, the S130 may further include:

obtaining the classification probability of each training image sample based on the first normalized feature of each training image sample in the plurality of training image samples and the first normalized weight of the prediction class of each training image sample obtained by the first classification network;

and obtaining a second loss item of the network loss based on the classification probability of each training image sample in the plurality of training image samples and the labeling class information of each training image sample.

In some embodiments, the prediction class is a class obtained by performing recognition and classification on each training image sample by the first classification network. The labeling class information may be an actual class of each training image sample labeled manually or by a device.

In some embodiments, the second loss term is derived from the classification probability.

For example, the classification probability is calculated by formula (2):

wherein,

a first normalized weight for the ith class;

a first normalized feature for a training image sample; p is a radical of_i(x) The classification probability that a classified object belongs to the ith prediction class; j is the total number of categories.

Is the inner product of the first normalized weight and the first normalized feature; η is a mathematical parameter of the first classification network for limiting the peak of the inner product, where the peak may be the maximum or minimum of the inner product. The value range of η may be real number between 0.1 and 10, specifically, it may be positive integer such as 3, 4 or 5, and may also be decimal such as 3.5, 4.6.

After the classification probability is calculated, the second loss term will be determined based on the classification probability. In some embodiments, the second loss term may be determined using equation (3).

Wherein, the p is_iTo train the image sampleThe classification probability of the x divided into the ith prediction category; y is_iLabeling the ith category information for the training image sample x; and | c | is the sum of the number of samples of the new class image sample and the old class image sample contained in the training image sample c.

In some embodiments, the method further comprises:

acquiring second sample characteristics of the old category image samples obtained by performing characteristic extraction on the old category image samples by using a second classification network, wherein the second classification network is an initial network of the incremental training;

normalizing the second sample characteristic of the old type image sample to obtain a second normalized characteristic of the old type image sample;

the S130 may include: and obtaining a third loss term of the network loss based on the similarity between the second normalized feature of the old class image sample and the first normalized feature of the old class image sample.

Since the second classification network is the initial network of the first classification network, the structure of the second classification network and the first classification network is the same, but the network parameters may be different.

In some embodiments, to further improve the optimized training result of the first classification network, the old classification image samples are input into the second classification network; and the second classification network performs feature extraction on the old classification image sample to obtain the second sample feature.

Similarly, the second sample characteristic is normalized to obtain the second normalized characteristic.

In some embodiments, a similarity is obtained based on a first normalized feature converted from a first sample feature extracted from an old-category image sample by the first classification network and a second normalized feature converted from a second sample feature extracted from an old-category image sample by the second classification network, and the third loss term is obtained based on the similarity.

Specifically, the similarity is inversely related to the third loss term, that is, the higher the similarity between the normalized features corresponding to the old image samples by the first classification network and the second classification network is, the smaller the third loss term is.

In some embodiments, the third loss term may be calculated using equation (4):

wherein,

is the third loss term;

a first normalized feature of an old class image sample x corresponding to the first classification model;

is prepared by reacting with

A second normalization feature of the corresponding same training image sample x;

is composed of

And

the inner product of (d). In some embodiments, use is made of

The aforementioned similarity is indicated.

As shown in fig. 6, if the third loss term is not obtained, since the second number is far greater than the first number, the original knowledge learned from the first class image sample is forgotten, and after the new class image sample is introduced, the knowledge learned from the old class image sample is shifted, so that the accuracy of subsequently classifying the class corresponding to the old class image sample by using the classification network trained by the new class image sample is not sufficient. After the second loss term is introduced, even if a new-class image sample is introduced, as can be seen from fig. 6, the transfer amount of the knowledge learned from the old-class image sample is small, so that the classification network after the new-class image sample is introduced for performing the classification network training still has higher classification accuracy on the class corresponding to the old-class image sample.

In some embodiments, as shown in fig. 7, the old class image samples are respectively input into the first classification network and the second classification network, and the first classification network also inputs the new class image samples, and the first classification network extracts the first sample feature f from the old class image samples_odd(ii) a The first classification network also extracts a first sample feature f from the new class image sample_new(ii) a Meanwhile, the second classification network extracts the first sample characteristics of the old class image samples from the old class image samples

Through f_oddAnd

then obtaining the respective normalized features, f_oddAnd

the modular lengths of the normalized features obtained after normalization are the same and are then used to calculate a third loss term.

The method further comprises the following steps: determining a second weighting coefficient of the third loss term according to the number of classes contained in the plurality of classes and the number of new classes;

the S130 may include: and obtaining the network loss based on the product of the third loss term and the second weighting coefficient.

For example, Cn is a set of new categories; co is a set of old classes; cn is the number of new classes;| Co | is the number of old categories; the second weighting factor may be: Cn/Co,

Wherein λ is_baseThe specific value may be 5 or 10, and the like, and the specific value may be set according to requirements, which is not limited herein.

In some embodiments, the first number is | N_oL, |; the second number is | N_nL, |; and | N | ═ N_o|+|N_n|，N_oThe set of the old image samples; n is a radical of_nIs a collection of new category image samples.

The third weighting factor obtained based on the first number and the second number may be an inverse of the first number, and then the third weighting factor may be: 1/| N |.

In some embodiments, the network loss is derived by combining the first, second, and third loss terms. For example, the network loss is calculated using equation (5):

l is the network loss; said L_ce(x1) is a second loss term; the above-mentioned

Is the third loss term; said L_mr(x2) is the first loss term. λ is a second weighting coefficient. x1 is any one of the training image samples; x2 is any one of the second training image samples.

Is a first weighting coefficient;

is a third weighting factor. N is a radical of_oA set of the old category image samples; n is a graph including old categoryA set of training image samples like an image sample and the new class of image samples.

In the embodiment of the disclosure, the determination of the network loss is performed based on the normalized first normalized feature of the plurality of training image samples, and is not performed directly based on the first sample feature extracted by the first classification network, so that the situation that the classification network after multiple iterations and optimization cannot accurately classify the classes contained in the old class image samples due to the fact that the network loss caused by the modular length of the sample feature of the new class image samples and the old class image samples participating in the network loss calculation considers the new class image samples rather than neglecting or inhibiting the old class image samples is avoided, and the accuracy of the classes contained in the old class image samples and the overall performance of the first classification network are improved.

As shown in fig. 8, some embodiments provide a classification method comprising:

s210: acquiring an image to be processed;

s220: and classifying the images to be processed by using a target classification network to obtain a classification result of the images to be processed, wherein the target classification network is obtained by training by using the incremental training method provided by any of the embodiments.

The classification is carried out by using the target classification network, so that the classification accuracy of each class can be ensured to be higher.

As shown in fig. 9, some embodiments provide a classification network training apparatus, including:

a first obtaining module 110, configured to perform feature extraction on a plurality of training image samples of multiple classes by using a first classification network, to obtain first sample features of the plurality of training image samples, where the plurality of training image samples include: a first number of old class image samples and a second number of new class image samples, the second number being greater than the first number;

a first normalization module 120 configured to perform normalization processing on first sample features of the plurality of training image samples, and a first determination module 130 configured to determine a network loss based on the first normalized features of the plurality of training image samples;

an adjusting module 140, configured to adjust a network parameter of the first classification network based on the network loss;

in some embodiments, the first obtaining module 110, the first normalizing module 120, the first determining module 130, and the adjusting module 140 may be program modules; the program modules can be executed by a processor to implement the functions of the modules.

In other embodiments, the first obtaining module 110, the first normalizing module 120, the first determining module 130 and the adjusting module 140 may be a hardware-software combining module; the soft and hard combining module comprises but is not limited to a programmable array; the programmable array includes, but is not limited to, a complex programmable array or a field programmable array.

In some embodiments, the apparatus further comprises:

the second normalization module is used for performing normalization processing on the classification weights of the multiple categories to obtain first normalization weights of the multiple categories;

the first determining module 130 is specifically configured to determine the network loss based on the first normalized features of the plurality of training image samples and the first normalized weights of the plurality of classes.

In some embodiments, the first determining module 130 is specifically configured to select, based on the first normalized feature of the old category image sample, K new categories corresponding to the old category image sample from the multiple categories; determining a first loss term in the network loss based on the first normalized features of the old class image samples and the first normalized weights of the K new classes, wherein K is a positive integer not less than 2.

In some embodiments, the first determining module 130 is specifically configured to select, based on a similarity between the first normalized weight of each new category in the multiple categories and the first normalized feature of the old category image sample, K new categories corresponding to the old category image sample from the multiple categories.

In some embodiments, the first determining module 130 is specifically configured to determine the first loss term in the network loss based on a similarity between the first normalized feature of the old class image sample and the first normalized weight of each of the K new classes.

In some embodiments, the first determining module 130 is specifically configured to obtain a classification probability of each training image sample based on the first normalized feature of each training image sample in the plurality of training image samples and the first normalized weight of the prediction class of each training image sample obtained by the first classification network; and obtaining a second loss item of the network loss based on the classification probability of each training image sample in the plurality of training image samples and the labeling class information of each training image sample.

In some embodiments, the apparatus further comprises:

the first obtaining module is used for obtaining second sample characteristics of the old category image samples obtained by performing characteristic extraction on the old category image samples by using a second classification network, wherein the second classification network is an initial network of the incremental training;

the third normalization module is used for performing normalization processing on the second sample characteristics of the old type image samples to obtain second normalization characteristics of the old type image samples;

the first determining module 130 is specifically configured to obtain a third loss term of the network loss based on a similarity between the second normalized feature of the old-category image sample and the first normalized feature of the old-category image sample.

In some embodiments, the apparatus further comprises: a second determining module for determining a first weighting factor based on the first number and the second number; the first determining module 130 is specifically configured to obtain the network loss based on a product of the first weighting coefficient and the first loss term.

In some embodiments, the apparatus further comprises:

a third determining module, configured to determine a second weighting coefficient of the third loss term according to the number of classes included in the plurality of classes and the number of new classes;

the first determining module 130 is specifically configured to obtain the network loss based on a product of the third loss term and the second weighting coefficient.

As shown in fig. 10, some embodiments also provide a sorting apparatus comprising:

an obtaining module 210, configured to obtain an image to be processed;

the classification module 220 is configured to perform classification processing on the image to be processed by using a target classification network to obtain a classification result of the image to be processed, where the target classification network is obtained by training an incremental training method for classifying as a network provided in any of the foregoing technical solutions.

In some embodiments, the obtaining module 210 and the classifying module 220 may be program modules; the program modules can be executed by a processor to implement the functions of the modules.

In other embodiments, the obtaining module 210 and the classifying module 220 may be a combination of software and hardware modules; the soft and hard combining module comprises but is not limited to a programmable array; the programmable array includes, but is not limited to, a complex programmable array or a field programmable array.

Several specific examples are provided below in connection with any of the embodiments described above:

example 1:

In practice, limited by computational overhead, storage space, and data privacy, it is often not desirable to retrain using all of the data before each expansion of the classifiable set. On the other hand, in a longer incremental learning sequence, the single training with the newly added class data can cause the severe reduction of the performance on the original class, and is a main difficulty of the multi-class incremental classification task.

The related work of the multi-class incremental classification task is relatively lacked, and the prior optimal solution is mainly to propose some variant forms of cross entropy loss functions based on knowledge distillation and replace the last commonly used full-connection layer of the classification deep neural network with the nearest class feature sampler. The above methods lack in-depth analysis and manipulation of the difficulties of this task and still perform unsatisfactorily on longer incremental sequences.

In a longer multi-class increment sequence, the performance of a single classification network on all classes is simultaneously improved;

under the limit of limited storage space, samples related to old categories are effectively reserved and utilized;

in particular, the difficulty of solving the problem of uneven numbers of samples of new and old classes in the multi-class incremental classification task

Finding an imbalance of data samples between old and new classes is a core difficulty of this task, based on the perspective of the set embedding features. The algorithm is used for training the multi-class incremental learning model. At some stage in the incremental sequence, the following steps are followed. Training is performed using a small number of samples of the original classes retained, and samples of newly added classes (the number of newly added classes is much greater than the number of originally retained classes) at the incremental step. During training, the normalized sample characteristics are used in a targeted manner, and besides the original classification cross entropy loss function, a loss function for keeping the multi-class integral characteristic structure and a loss function for strengthening the characteristic discrimination between the new and old classes are introduced to solve the problem of imbalance of the new and old samples. After obtaining a single model capable of classifying all the expanded classification at the stage, sampling and retaining the data of the newly introduced classification at the stage. The iteration using the above steps can be applied to a longer sequence of increments.

The algorithm can be applied to various depth classification models. Generally, a network for a multi-class incremental classification model is considered to be composed of two modules: a feature extraction layer and a classification layer (i.e., feature vectors for each class). At each stage of incremental learning, due to the introduction of a new class, a corresponding classification layer also introduces a new parameter (i.e., adds a class feature vector of the new class).

Finding an imbalance of data samples between old and new classes is a core difficulty of this task, based on the perspective of the set embedding features. In order to solve the unbalance problem from different angles, the algorithm mainly integrates three key technical schemes: normalizing the sample characteristics, and keeping the loss function of the multi-class integral characteristic structure and the loss function for enhancing the characteristic discrimination between the new and the old classes.

The concepts of the three technical solutions are explained first, and then a stage in the incremental learning process is specifically explained, and the process only needs to be iterated repeatedly for incremental sequences of other lengths.

First, Normalization of sample characteristics (Cosine Normalization)

Due to the severe imbalance in the number of samples in the new and old classes, if not subject to additional constraints, the modulo length of the feature vector in the new class will be significantly larger than that of the feature vector in the old class in the trained model. To solve this problem, the feature vectors of the samples are normalized. Specifically, the classification network part of the model is calculated by

Second, maintain the loss function of the overall characteristic structure of the multi-class (Less-form Constraint)

Due to the serious imbalance of the number of samples in the old and new categories, if no additional supervision is adopted, the overall spatial configuration among the class feature vectors of the old categories shifts in the increment process, so that the distinguishing precision in the old categories is reduced. To address this problem, a loss function is introduced that maintains a multi-class global feature structure. The specific implementation of the loss function is described in the fourth section below.

Thirdly, strengthening the loss function of the feature discrimination between the new and old categories (Inter-Class Separation)

Due to the serious imbalance of the number of samples of the new and old classes, if no additional supervision is adopted, in the trained model, the spatial orientation of the feature vector of a single old class is difficult to distinguish from the feature vector between the adjacent new classes, so that the new and old classes are mixed in classification. To solve this problem, a loss function is introduced that enhances the degree of feature discrimination between the old and new classes. The specific implementation of the loss function is described in the fourth section below.

Specific implementation of four, some incremental stages

Inputting: the model of the previous stage, data on a small number of previous classes retained, data on classes that have been newly introduced by this stage

And (3) outputting: single classification model capable of effectively classifying new and old classes at the same time

The method comprises the following specific steps:

feature extractor F using old models for each training batch of training image samples^*Obtaining normalized feature vectors of old class samples

Obtaining normalized feature vectors F of respectively obtained old class samples and new class samples by using a feature extractor F of a new model_oddAnd f_new。

And calculating the following three network losses by using the sample feature vectors obtained by calculation and the class feature vectors of the classified networks in the new and old models:

the final loss function for training is the sum of the three:

after the incremental training of the stage is completed, sampling and reserving the data of the newly introduced class to make the scale of each class consistent with the scale of each class reserved previously. The sampling scheme can use uniform random non-back sampling, and can be replaced by other sampling modes.

In yet another aspect of the embodiments of the present application, there is further provided a computer storage medium configured to store computer-executable instructions, which when executed perform the operations of the incremental training method and/or the classification method of the classification network provided in any one of the above embodiments.

In a further aspect of the embodiments of the present application, there is also provided a computer program product, which includes computer-executable instructions, and when the computer-executable instructions are executed on a device, a processor in the device executes instructions for implementing the incremental training method and/or the classification method of the classification network provided in any one of the above embodiments.

The embodiment of the application also provides an electronic device, which can be a mobile terminal, a Personal Computer (PC), a tablet computer, a server and the like. Referring now to fig. 11, a schematic diagram of an electronic device 1000 suitable for implementing a terminal device or a server according to an embodiment of the present application is shown: as shown in fig. 11, the electronic device 1000 includes one or more processors, communication section, and the like, for example: one or more Central Processing Units (CPUs) 1001 and/or one or more special purpose processors, which may serve as acceleration units 1013 and may include, but are not limited to, image processors (GPUs), FPGAs, DSPs, and other special purpose processors such as ASIC chips, etc., which may perform various appropriate actions and processes according to executable instructions stored in a Read Only Memory (ROM)1002 or loaded from a storage portion 1008 into a Random Access Memory (RAM) 1003. Communications portion 1012 may include, but is not limited to, a network card, which may include, but is not limited to, an IB (Infiniband) network card.

The processor may communicate with the read-only memory 1002 and/or the random access memory 1003 to execute executable instructions, connect with the communication unit 1012 through the bus 1004, and communicate with other target devices through the communication unit 1012, so as to complete operations corresponding to any method provided by the embodiments of the present application, for example, performing feature extraction on shots in a shot sequence of a video stream to be processed, and obtaining image features of each shot, each shot including at least one frame of video image; acquiring the global characteristics of the lens according to the image characteristics of all the lenses; determining the weight of the shot according to the image characteristics and the global characteristics of the shot; and obtaining a video abstract of the video stream to be processed based on the weight of the shot.

In addition, in the RAM1003, various programs and data necessary for the operation of the device can be stored. The CPU1001, ROM1002, and RAM1003 are connected to each other via a bus 1004. The ROM1002 is an optional module in the case of the RAM 1003. The RAM1003 stores or writes executable instructions into the ROM1002 at runtime, and the executable instructions cause the central processing unit 1001 to execute operations corresponding to the above-described communication method. An input/output (I/O) interface 1005 is also connected to bus 1004. The communication unit 1012 may be integrated, or may be provided with a plurality of sub-modules (e.g., a plurality of IB network cards) and connected to the bus link.

The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.

It should be noted that the architecture shown in fig. 11 is only an optional implementation manner, and in a specific practical process, the number and types of the components in fig. 11 may be selected, deleted, added, or replaced according to actual needs; in different functional component settings, separate settings or integrated settings may also be used, for example, the acceleration unit 1013 and the CPU1001 may be separately provided or the acceleration unit 1013 may be integrated on the CPU1001, the communication unit may be separately provided, or the acceleration unit 1013 or the CPU1001 may be integrated on the CPU, or the like. These alternative embodiments are all within the scope of the present disclosure.

In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method illustrated in the flowchart, the program code may include instructions corresponding to performing the method steps provided by embodiments of the present application, for example, performing feature extraction on shots in a sequence of shots of a video stream to be processed, obtaining image features of each shot, each shot including at least one frame of a video image; acquiring the global characteristics of the lens according to the image characteristics of all the lenses; determining the weight of the shot according to the image characteristics and the global characteristics of the shot; and obtaining a video abstract of the video stream to be processed based on the weight of the shot. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. The operations of the above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 1001.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purposes of some embodiment schemes.

In addition, all functional units in the embodiments of the present application may be integrated into one processing module, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Technical features disclosed in any embodiment of the present application may be combined arbitrarily to form a new method embodiment or an apparatus embodiment without conflict.

The method embodiments disclosed in any embodiment of the present application can be combined arbitrarily to form a new method embodiment without conflict.

The device embodiments disclosed in any embodiment of the present application can be combined arbitrarily to form a new device embodiment without conflict.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An incremental training method of an image classification network comprises the following steps:

performing feature extraction on a plurality of training image samples of a plurality of classes by using a first classification network to obtain first sample features of the plurality of training image samples, wherein the plurality of training image samples comprise: a first number of old class image samples and a second number of new class image samples, the second number being greater than the first number;

normalizing the first sample characteristics of the training image samples to obtain first normalized characteristics of the training image samples;

carrying out normalization processing on the classification weights of a plurality of categories to obtain first normalization weights of the plurality of categories;

selecting K new categories corresponding to the old category image samples from the multiple categories based on the first normalized features of the old category image samples; wherein K is a positive integer not less than 2;

determining a first loss term based on first normalized features of the old class image samples and first normalized weights of the K new classes;

determining a network loss based at least on the first loss term;

adjusting a network parameter of the first classification network based on the network loss.

2. The method according to claim 1, wherein the selecting K new classes corresponding to the old class image samples from the plurality of classes based on the first normalized features of the old class image samples comprises:

3. The method according to claim 1 or 2, wherein determining a first loss term based on the first normalized features of the old class of image samples and the first normalized weights of the K new classes comprises:

determining the first loss term based on a similarity between the first normalized features of the old class image samples and the first normalized weight for each of the K new classes.

4. The method of claim 3, wherein determining the network loss based at least on the first loss term comprises:

determining a first weighting factor based on the first number;

and obtaining the network loss based on the product of the first weighting coefficient and the first loss term.

5. The method according to any one of claims 1 to 4, further comprising:

obtaining a classification probability of each training image sample based on a first normalization feature of each training image sample in the plurality of training image samples and a first normalization weight of a prediction class of each training image sample obtained by the first classification network;

6. The method according to any one of claims 1 to 4, further comprising:

and obtaining a third loss term of the network loss based on the similarity between the second normalized feature of the old class image sample and the first normalized feature of the old class image sample.

7. The method of claim 6, further comprising:

determining a second weighting coefficient of the third loss term according to the number of classes contained in the plurality of classes and the number of new classes;

and obtaining the network loss based on the product of the third loss term and the second weighting coefficient.

8. An image classification method, comprising:

acquiring an image to be processed;

and classifying the images to be processed by using a target classification network to obtain a classification result of the images to be processed, wherein the target classification network is obtained by training by using the increment training method according to any one of claims 1 to 7.

9. An incremental training apparatus for an image classification network, comprising:

a first obtaining module, configured to perform feature extraction on multiple training image samples of multiple classes by using a first classification network, so as to obtain first sample features of the multiple training image samples, where the multiple training image samples include: a first number of old class image samples and a second number of new class image samples, the second number being greater than the first number;

the first normalization module is used for performing normalization processing on the first sample characteristics of the training image samples to obtain first normalization characteristics of the training image samples;

the first determining module is further used for selecting K new categories corresponding to the old category image samples from the multiple categories based on the first normalized features of the old category image samples; determining a first loss term in network loss based on the first normalized features of the old class image samples and the first normalized weights of the K new classes, wherein K is a positive integer not less than 2;

the first determining module is further configured to determine a network loss according to at least the first loss term;

an adjusting module, configured to adjust a network parameter of the first classification network based on the network loss.

10. The apparatus according to claim 9, wherein the first determining module is specifically configured to select K new categories corresponding to the old category image sample from the plurality of categories based on a similarity between the first normalized weight of each new category in the plurality of categories and the first normalized feature of the old category image sample.

11. The apparatus of claim 9 or 10,

the first determining module is specifically configured to determine the first loss term based on a similarity between a first normalized feature of the old category image sample and a first normalized weight of each of the K new categories.

12. The apparatus of claim 11, further comprising:

a second determining module for determining a first weighting factor based on the first number;

the first determining module is specifically configured to obtain the network loss based on a product of the first weighting coefficient and the first loss term.

13. The apparatus according to any one of claims 9 to 12, wherein the first determining module is specifically configured to obtain the classification probability of each of the plurality of training image samples based on the first normalized feature of each of the plurality of training image samples and the first normalized weight of the predicted class of each of the plurality of training image samples obtained by the first classification network; and obtaining a second loss item of the network loss based on the classification probability of each training image sample in the plurality of training image samples and the labeling class information of each training image sample.

14. The apparatus of any one of claims 9 to 12, further comprising:

the first determining module is specifically configured to obtain a third loss term of the network loss based on a similarity between the second normalized feature of the old-category image sample and the first normalized feature of the old-category image sample.

15. The apparatus of claim 14, further comprising:

the first determining module is specifically configured to obtain the network loss based on a product of the third loss term and the second weighting coefficient.

16. An image classification apparatus, comprising:

the acquisition module is used for acquiring an image to be processed;

a classification module, configured to perform classification processing on the image to be processed by using a target classification network to obtain a classification result of the image to be processed, where the target classification network is obtained by using the incremental training method according to any one of claims 1 to 7.

17. An electronic device, comprising:

a memory;

a processor coupled to the memory for implementing the method provided by any of claims 1 to 7 or 8 by executing computer-executable instructions stored on the memory.

18. A computer storage medium having stored thereon computer-executable instructions which, when executed, implement the method provided in any one of claims 1 to 7 or 8.