[go: up one dir, main page]

CN112101217B - Person Re-identification Method Based on Semi-supervised Learning - Google Patents

Person Re-identification Method Based on Semi-supervised Learning Download PDF

Info

Publication number
CN112101217B
CN112101217B CN202010970306.5A CN202010970306A CN112101217B CN 112101217 B CN112101217 B CN 112101217B CN 202010970306 A CN202010970306 A CN 202010970306A CN 112101217 B CN112101217 B CN 112101217B
Authority
CN
China
Prior art keywords
samples
sample
pedestrian
new
labeled
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202010970306.5A
Other languages
Chinese (zh)
Other versions
CN112101217A (en
Inventor
葛永新
高志顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhenjiang Qidi Digital World Technology Co ltd
Original Assignee
Zhenjiang Qidi Digital World Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhenjiang Qidi Digital World Technology Co ltd filed Critical Zhenjiang Qidi Digital World Technology Co ltd
Priority to CN202010970306.5A priority Critical patent/CN112101217B/en
Publication of CN112101217A publication Critical patent/CN112101217A/en
Application granted granted Critical
Publication of CN112101217B publication Critical patent/CN112101217B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • G06F18/21322Rendering the within-class scatter matrix non-singular
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • G06F18/21322Rendering the within-class scatter matrix non-singular
    • G06F18/21328Rendering the within-class scatter matrix non-singular involving subspace restrictions, e.g. nullspace techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Psychiatry (AREA)
  • Software Systems (AREA)
  • Social Psychology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pedestrian re-identification method based on semi-supervised learning, which comprises the following steps that S100 learns a projection matrix U epsilon R d×c to project an original d-dimensional feature space into a c-dimensional subspace, so that U TX∈Rc×N meets the following conditions in a new subspace: the Euclidean distance between pairs of samples from the same pedestrian is smaller, and the Euclidean distance between pairs of samples from different pedestrians is larger; samples from the same pedestrian are defined as homogeneous samples, and samples from different pedestrians are defined as heterogeneous samples; s200, projecting the new sample into a new subspace by adopting a projection matrix U epsilon R d×c to obtain a predicted sample sequence, wherein the predicted sample sequence is arranged according to the Euclidean distance between the new sample and the samples in the training sample set from small to large. The method fully utilizes the accurately marked labeled sample, and the positive sample is restrained by using the contrast loss function, and simultaneously, the negative sample pair can be fully utilized, so that the identification speed is high, and the identification accuracy is higher.

Description

基于半监督学习的行人再识别方法Person Re-identification Method Based on Semi-supervised Learning

技术领域Technical Field

本发明涉及行人再识别的技术领域,尤其涉及基于半监督学习的行人再识别方法。The present invention relates to the technical field of pedestrian re-identification, and in particular to a pedestrian re-identification method based on semi-supervised learning.

背景技术Background technique

行人重识别(Person re-identification)也称行人再识别,是利用计算机视觉技术判断图像或者视频序列中是否存在特定行人的技术。广泛被认为是一个图像检索的子问题,给定一个监控行人图像,检索跨设备下的该行人图像。旨在弥补固定的摄像头的视觉局限,并可与行人检测、行人跟踪技术相结合,可广泛应用于智能视频监控、智能安保等领域。Person re-identification, also known as pedestrian re-identification, is a technology that uses computer vision technology to determine whether a specific pedestrian exists in an image or video sequence. It is widely considered to be a sub-problem of image retrieval. Given a monitored pedestrian image, the image of the pedestrian is retrieved across devices. It aims to make up for the visual limitations of fixed cameras and can be combined with pedestrian detection and pedestrian tracking technologies. It can be widely used in intelligent video surveillance, intelligent security and other fields.

尽管近年来计算机视觉从业者针对行人再识别任务从不同角度提出了大量算法,尝试不断提升在公开数据集上的识别率,但是由于一些现实因素的影响,行人再识别仍然是一个极具挑战性的任务。Although computer vision practitioners have proposed a large number of algorithms for pedestrian re-identification tasks from different perspectives in recent years, trying to continuously improve the recognition rate on public datasets, pedestrian re-identification is still an extremely challenging task due to some practical factors.

目前,一般使用半监督学习方法来解决行人再识别任务,流程大致为:首先对无标签样本进行自动标注;其次将有标签样本和自动标注后样本进行统一训练,进而优化模型,使得模型具有更好的判别能力。而对无标签样本进行自动标注后和标注后的利用存在两个问题:At present, semi-supervised learning methods are generally used to solve the pedestrian re-identification task. The process is roughly as follows: first, the unlabeled samples are automatically labeled; second, the labeled samples and the automatically labeled samples are uniformly trained, and then the model is optimized to make the model have better discrimination ability. However, there are two problems with the automatic labeling of unlabeled samples and the use of labeled samples:

①对无标签样本进行自动标注时所使用方法的思想都是在影射到的新空间使用K最近邻(k-Nearest Neighbor,KNN)算法进行标注。这就使得若学习到的新空间判别能力不够,那么自动标注之后误差就会比较大。而使用标注误差较大的数据进行模型训练时,极有可能使模型并没有因为训练样本的增加而有更好的泛化能力,反而导致越训练,模型的判别能力越差。① The idea of the method used for automatic labeling of unlabeled samples is to use the K-Nearest Neighbor (KNN) algorithm to label in the new space. This means that if the discriminative ability of the learned new space is not enough, the error after automatic labeling will be relatively large. When using data with large labeling errors for model training, it is very likely that the model will not have better generalization ability due to the increase in training samples, but will instead lead to the worse discriminative ability of the model as the training progresses.

②在进行训练时,只对训练集中的正样本对进行了约束,并未关注负样本对,导致对训练样本的利用不够充分。② During training, only the positive sample pairs in the training set are constrained, and no attention is paid to the negative sample pairs, resulting in insufficient utilization of the training samples.

发明内容Summary of the invention

针对现有技术存在的上述问题,本发明的要解决的技术问题是:现有使用半监督学习来解决行人再识别任务的方法存在自动标注误差容易受学习得到的新空间的影响和对训练样本的利用不充分的问题。In view of the above problems existing in the prior art, the technical problem to be solved by the present invention is: the existing method of using semi-supervised learning to solve the pedestrian re-identification task has the problem that the automatic labeling error is easily affected by the new space obtained by learning and the training samples are not fully utilized.

为解决上述技术问题,本发明采用如下技术方案:基于半监督学习的行人再识别方法,包括如下步骤:In order to solve the above technical problems, the present invention adopts the following technical solution: a pedestrian re-identification method based on semi-supervised learning, comprising the following steps:

S100:学习一个投影矩阵U∈Rd×c将原始的d维特征空间投影到c维子空间,使得UTX∈Rc×N在新的子空间中满足:来自于同一行人的样本对间欧氏距离更小,来自不同行人的样本对间欧式距离更大;来自于同一行人的样本定义为同类样本,来自不同行人的样本定义为不同类样本;S100: Learn a projection matrix U∈R d×c to project the original d-dimensional feature space into a c-dimensional subspace, so that U T X∈R c×N satisfies in the new subspace: the Euclidean distance between sample pairs from the same pedestrian is smaller, and the Euclidean distance between sample pairs from different pedestrians is larger; samples from the same pedestrian are defined as samples of the same type, and samples from different pedestrians are defined as samples of different types;

S200:将新样本采用投影矩阵U∈Rd×c投影到新的子空间,得到一个预测样本序列,该预测样本序列按照新样本与训练样本集中样本间的欧氏距离,从小到大的顺序排列。S200: Project the new sample to the new subspace using the projection matrix U∈R d×c to obtain a prediction sample sequence, which is arranged in ascending order according to the Euclidean distance between the new sample and the samples in the training sample set.

作为优选,所述S100中学习投影矩阵U∈Rd×c的方法具体是:Preferably, the method for learning the projection matrix U∈R d×c in S100 is specifically:

S110:建立训练样本集,该训练样本集中包括若干样本,若干样本中包括有标签的样本和无标签的样本,有标签的样本中同一行人的样本标签相同;S110: Establish a training sample set, the training sample set includes a plurality of samples, the plurality of samples include labeled samples and unlabeled samples, and the labeled samples have the same sample label for the same pedestrian;

设X=[XL,XU]∈Rd×N表示所有的训练样本,其中N为训练集所包含的所有图片数目,d为特征向量的长度,表示NL个有标签样本,表示NU个无标签样本;Let X = [ XL , XU ] ∈ Rd × N represent all training samples, where N is the number of images in the training set, d is the length of the feature vector, represents N L labeled samples, Represents N U unlabeled samples;

S120:建立目标函数如下:S120: Establish the objective function as follows:

其中L(U)为一个回归函数,为一个加权的回归函数,Ω(U)是一个正则化的约束,α,λ>0为平衡系数;Where L(U) is a regression function, is a weighted regression function, Ω(U) is a regularized constraint, and α,λ>0 are balance coefficients;

S130:有标签样本损失函数为对比损失函数:对于采样的NP个样本对和/>若/>和/>来自同一行人的样本,那么在新的投影空间/>和/>之间的欧氏距离dn应该尽量小,接近于0;反之,则dn应该至少大于事先设定的一个阈值margin>0,如果不满足上述条件就会产生损失;S130: The loss function of the labeled sample is a contrast loss function: for the N P sample pairs and/> If/> and/> Samples from the same pedestrian, then in the new projection space/> and/> The Euclidean distance d n between them should be as small as possible, close to 0; otherwise, d n should be at least greater than a pre-set threshold margin>0. If the above conditions are not met, losses will occur.

S140:无标签样本标记标签,采用K相互最近邻的方法对无标签样本标记标签,对无标签样本损失函数为:S140: Label the unlabeled samples. Use the K nearest neighbor method to label the unlabeled samples. The loss function for the unlabeled samples is:

其中,如果UTxi和UTxj满足K相互最近邻且xi和xj来自不同相机,那么取If U T x i and U T x j satisfy K nearest neighbors and x i and x j are from different cameras, then take

否则Wij=0; (8);Otherwise, Wij = 0; (8);

对无标签样本标注标签后,利用标记后的样本对对现有子空间进行进一步约束,约束的权重为两个样本在新投影空间中的余弦距离;After labeling the unlabeled samples, the labeled sample pairs are used to further constrain the existing subspace, and the constraint weight is the cosine distance between the two samples in the new projection space;

S150:正则化项:使用L2,1范数对投影矩阵U进行约束:S150: Regularization term: Use the L2,1 norm to constrain the projection matrix U:

Ω(U)=||U||2,1 (4)。Ω(U)=||U|| 2,1 (4).

作为优选,所述S130的有标签样本损失函数为:Preferably, the labeled sample loss function of S130 is:

其中: in:

作为优选,所述S130中采样的NP个样本采样策略为最大化top-k识别率的采样策略,即对于每张图像,对其k最近邻的样本全部采样。Preferably, the sampling strategy for the NP samples sampled in S130 is a sampling strategy for maximizing the top-k recognition rate, that is, for each image, all samples of its k nearest neighbors are sampled.

作为优选,所述S140中采用K相互最近邻的方法对无标签样本标记标签的方法为:Preferably, the method of labeling the unlabeled samples using the K nearest neighbor method in S140 is:

定义样本x的K最近邻N(x,k)如下:The K nearest neighbors N(x,k) of sample x are defined as follows:

N(x,k)={x1,x2,...,xk},|N(p,k)|=k (5);N(x,k)={x 1 ,x 2 ,...,x k },|N(p,k)|=k (5);

其中|·|表示集合中样本数目,那么K相互最近邻R(x,k)定义如下:Where |·| represents the number of samples in the set, then the K nearest neighbors R(x,k) are defined as follows:

R(x,k)={xi|(xi∈N(x,k))∧(x∈N(xi,k))} (6)。R(x,k)={ xi | ( xi∈N(x,k))∧(x∈N( xi ,k))} (6).

相对于现有技术,本发明至少具有如下优点:Compared with the prior art, the present invention has at least the following advantages:

(1)本发明使用K相互最近邻,使得对无标签样本的自动标注结果更加可信。(1) The present invention uses K nearest neighbors to make the automatic labeling results of unlabeled samples more reliable.

(2)对精确标注的有标签样本进行充分利用。使用训练深度神经网络中常用的对比损失函数,在对正样本进行约束的同时,也能对负样本对进行充分利用。需要注意的是有标签样本损失函数可以使用任意一种用于识别或者分类的损失都可以作为替换。(2) Make full use of accurately annotated labeled samples. Use the contrast loss function commonly used in training deep neural networks to constrain positive samples while making full use of negative samples. It should be noted that the labeled sample loss function can be replaced by any loss used for recognition or classification.

(3)为了使得以后模型能够方便地迁移到深度模型,本文使用了端到端的训练方式,训练策略使用了随机梯度下降法。本文提出了一种最大化top-k识别率的批次生成策略,解决成对训练策略在随机批次下收敛速度慢以及防止模型过拟合等问题。(3) In order to facilitate the migration of models to deep models in the future, this paper uses an end-to-end training method, and the training strategy uses the stochastic gradient descent method. This paper proposes a batch generation strategy that maximizes the top-k recognition rate to solve the problems of slow convergence of paired training strategies under random batches and prevent model overfitting.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1行人再识别问题的K相互最近邻采样策略。第一行:一张待检索图片与它的10个最近邻图片,其中P1-P4为正样本,N1-N6为负样本。第二行:每两列为其对应第一行图像的10个最近邻图像。粗线无倒角矩形框和细线具有倒角的矩形框分别表示带检索图片与正样本图片。Figure 1 K mutual nearest neighbor sampling strategy for person re-identification problem. First row: a searched image and its 10 nearest neighbor images, where P1-P4 are positive samples and N1-N6 are negative samples. Second row: every two columns are the 10 nearest neighbor images of the corresponding first row image. The thick rectangular box without chamfer and the thin rectangular box with chamfer represent the searched image and the positive sample image respectively.

图2与待检索图像距离最近的负样本为最难负样本;刚好小于最难负样本的第一个正样本为适度正样本;方框内样本为本文采样策略。In Figure 2, the negative sample closest to the image to be retrieved is the most difficult negative sample; the first positive sample that is just smaller than the most difficult negative sample is a moderate positive sample; the samples in the box are the sampling strategy of this paper.

图3适当的正样本采样。Fig. 3 Appropriate positive sample sampling.

具体实施方式Detailed ways

下面结合附图对本发明作进一步详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings.

参见图1-3,基于半监督学习的行人再识别方法,包括如下步骤:Referring to Figure 1-3, the pedestrian re-identification method based on semi-supervised learning includes the following steps:

S100:学习一个投影矩阵U∈Rd×c将原始的d维特征空间投影到c维子空间,使得UTX∈Rc×N在新的子空间中满足:来自于同一行人的样本对间欧氏距离更小,来自不同行人的样本对间欧式距离更大;来自于同一行人的样本定义为同类样本,来自不同行人的样本定义为不同类样本。S100: Learn a projection matrix U∈Rd ×c to project the original d-dimensional feature space into the c-dimensional subspace, so that UTX∈Rc ×N satisfies in the new subspace: the Euclidean distance between sample pairs from the same pedestrian is smaller, and the Euclidean distance between sample pairs from different pedestrians is larger; samples from the same pedestrian are defined as samples of the same type, and samples from different pedestrians are defined as samples of different types.

学习投影矩阵U∈Rd×c的方法具体是:The method of learning the projection matrix U∈Rd ×c is as follows:

S110:建立训练样本集,该训练样本集中包括若干样本,若干样本中包括有标签的样本和无标签的样本,有标签的样本中同一行人的样本标签相同;S110: Establish a training sample set, the training sample set includes a plurality of samples, the plurality of samples include labeled samples and unlabeled samples, and the labeled samples have the same sample label for the same pedestrian;

设X=[XL,XU]∈Rd×N表示所有的训练样本,其中N为训练集所包含的所有图片数目,d为特征向量的长度,表示NL个有标签样本,表示NU个无标签样本;Let X = [ XL , XU ] ∈ Rd × N represent all training samples, where N is the number of images in the training set, d is the length of the feature vector, represents N L labeled samples, represents N U unlabeled samples;

S120:建立目标函数如下:S120: Establish the objective function as follows:

其中L(U)为一个回归函数,目的是使得有标签样本在映射到的新空间中满足相同标签样本对距离更近,不相关标签样本对距离更远,为一个加权的回归函数,它可以利用无标签样本提高模型的判别力,Ω(U)是一个正则化的约束,可以原始的特征空间中选择更有判别能力的特征,避免过拟合,α,λ>0为平衡系数;Where L(U) is a regression function, the purpose of which is to make the labeled samples closer to each other in the new space mapped to the same label sample pair and farther away from each other. is a weighted regression function that can use unlabeled samples to improve the discriminative power of the model. Ω(U) is a regularized constraint that can select more discriminative features in the original feature space to avoid overfitting. α,λ>0 are balance coefficients.

S130:有标签样本损失函数:此约束项的目的是为了充分利用有标签样本的标签信息。为了同时利用正负样本对约束,这里我们采用了训对比损失函数S130: Labeled sample loss function: The purpose of this constraint is to make full use of the label information of the labeled samples. In order to simultaneously utilize the positive and negative sample pair constraints, we use the training contrast loss function

其中: in:

对于采样的NP个样本对和/>若/>和/>来自同一行人的样本,那么在新的投影空间/>和/>之间的欧氏距离dn应该尽量小,接近于0;反之,则dn应该至少大于事先设定的一个阈值margin>0,如果不满足上述条件就会产生损失;For the N P sample pairs and/> If/> and/> Samples from the same pedestrian, then in the new projection space/> and/> The Euclidean distance d n between them should be as small as possible, close to 0; otherwise, d n should be at least greater than a pre-set threshold margin>0. If the above conditions are not met, losses will occur;

S140:无标签样本标记标签,采用K相互最近邻的方法对无标签样本标记标签,对无标签样本损失函数为:S140: Label the unlabeled samples. Use the K nearest neighbor method to label the unlabeled samples. The loss function for the unlabeled samples is:

为了有效利用无标签样本的判别信息,减少错误标注对模型造成的不良影响,我们采用K相互最近邻替代K最近邻对无标签样本进行标注,并且在该项中也仅仅对正样本对进行了约束。具体损失函数如下:In order to effectively utilize the discriminative information of unlabeled samples and reduce the adverse effects of incorrect labeling on the model, we use K mutual nearest neighbors instead of K nearest neighbors to label unlabeled samples, and only constrain positive sample pairs in this item. The specific loss function is as follows:

其中,如果UTxi和UTxj满足K相互最近邻且xi和xj来自不同相机,那么取If U T x i and U T x j satisfy K nearest neighbors and x i and x j are from different cameras, then take

否则Wij=0 (8);Otherwise, Wij = 0 (8);

这一项的意义在于,认定在已学习到的具有判别能力的子空间中K相互最近邻样本对极有可能来自同一行人。接着,对无标签样本标注标签后,利用标记后的样本对对现有子空间进行进一步约束,约束的权重为两个样本在新投影空间中的余弦距离。The significance of this item is that it is determined that the K nearest neighbor sample pairs in the learned discriminative subspace are very likely to come from the same pedestrian. Then, after labeling the unlabeled samples, the labeled sample pairs are used to further constrain the existing subspace, and the constraint weight is the cosine distance between the two samples in the new projection space.

S150:正则化项:添加正则化项的目的是为了避免过拟合的发生的同时,使得所学习到的投影矩阵更加稀疏。此处,我们使用L2,1范数对投影矩阵U进行约束:S150: Regularization term: The purpose of adding the regularization term is to avoid overfitting and make the learned projection matrix more sparse. Here, we use the L2,1 norm to constrain the projection matrix U:

Ω(U)=||U||2,1 (4)。Ω(U)=||U|| 2,1 (4).

作为改进,所述S130中采样的NP个样本采样策略为最大化top-k识别率的采样策略,即对于每张图像,对其k最近邻的样本全部采样。这样,避免过拟合的同时,还能最大化利用有标签样本的判别信息。As an improvement, the sampling strategy of the N P samples sampled in S130 is a sampling strategy that maximizes the top-k recognition rate, that is, for each image, all samples of its k nearest neighbors are sampled. In this way, while avoiding overfitting, the discriminant information of the labeled samples can be maximized.

在使用随机梯度下降进行优化时,需要将所有样本分批次送入模型。对所有样本进行随机采样,每次随机选择一小部分类别,每个类别两张图像。在损失计算时,每个批次中所有图像可能组成的所有样本对都参与计算。这种方式,虽然一次可以计算很多对样本对,但是由于类别采样时的随机性,这种采样的优化方向可能并不是可以使得目标最快下降的方向。每优化完一次,将会对当前模型下所有样本的距离进行计算。为了使得目标更快下降,对于每张图像只选择一对当前模型下最难的负样本,如图2。When using stochastic gradient descent for optimization, all samples need to be fed into the model in batches. All samples are randomly sampled, and a small number of categories are randomly selected each time, with two images for each category. When calculating the loss, all sample pairs that may be composed of all images in each batch are involved in the calculation. In this way, although many pairs of samples can be calculated at a time, due to the randomness of category sampling, the optimization direction of this sampling may not be the direction that can make the target drop fastest. After each optimization, the distance of all samples under the current model will be calculated. In order to make the target drop faster, only one pair of the most difficult negative samples under the current model is selected for each image, as shown in Figure 2.

值得注意的是,有些正样本对因为剧烈的变化导致类内差异太大,如果将这些样本进行训练,极有可能会使模型过拟合,如图3。为了避免这种过拟合的发生,对每张图片进行了一次适度的正样本采样(中度正样本),采样方式如图2,即采样刚好小于最难负样本的第一个正样本。为了尽可能多的利用有标签样本提供的信息,我们提出了一种最大化top-k识别率的采样策略。如图2,对于每张图像,对其k最近邻的样本全部采样,这样,避免过拟合的同时,还能最大化利用有标签样本的判别信息。It is worth noting that some positive sample pairs have too large intra-class differences due to drastic changes. If these samples are trained, the model is likely to overfit, as shown in Figure 3. In order to avoid this overfitting, a moderate positive sample (moderate positive sample) is sampled for each image. The sampling method is shown in Figure 2, that is, the first positive sample that is just smaller than the most difficult negative sample is sampled. In order to utilize as much information as possible from labeled samples, we propose a sampling strategy that maximizes the top-k recognition rate. As shown in Figure 2, for each image, all samples of its k nearest neighbors are sampled. In this way, while avoiding overfitting, the discriminant information of labeled samples can be maximized.

作为改进,所述S140中采用K相互最近邻的方法对无标签样本标记标签的方法为:As an improvement, the method of labeling the unlabeled samples using the K nearest neighbor method in S140 is:

如图1,P1-P4为待检索图片的四个正样本,但是并没有排在最近邻图片的前四位,如果直接使用K最近邻结果则会引入较大误差。然而,值得注意的是,待检索图片与四个正样本分别为彼此的K最近邻,我们将此称之为K相互最近邻。如果用该方式来标注无标签数据则会在一定程度上减少误差引入。As shown in Figure 1, P1-P4 are four positive samples of the image to be retrieved, but they are not ranked in the top four nearest neighbor images. If the K nearest neighbor result is used directly, a large error will be introduced. However, it is worth noting that the image to be retrieved and the four positive samples are each other's K nearest neighbors, which we call K mutual nearest neighbors. If this method is used to annotate unlabeled data, the error introduction will be reduced to a certain extent.

定义样本x的K最近邻N(x,k)如下:The K nearest neighbors N(x,k) of sample x are defined as follows:

N(x,k)={x1,x2,...,xk},|N(p,k)|=k (5);N(x,k)={x 1 ,x 2 ,...,x k },|N(p,k)|=k (5);

其中|·|表示集合中样本数目,那么K相互最近邻R(x,k)定义如下:Where |·| represents the number of samples in the set, then the K nearest neighbors R(x,k) are defined as follows:

R(x,k)={xi|(xi∈N(x,k))∧(x∈N(xi,k))} (6)。R(x,k)={ xi | ( xi∈N(x,k))∧(x∈N( xi ,k))} (6).

S200:将新样本采用投影矩阵U∈Rd×c投影到新的子空间,得到一个预测样本序列,该预测样本序列按照新样本与训练样本集中样本间的欧氏距离,从小到大的顺序排列。预测样本排在最前的,表示新样本与该预测样本之间是同一人的可能性最高。S200: Project the new sample to the new subspace using the projection matrix U∈R d×c to obtain a prediction sample sequence, which is arranged in ascending order according to the Euclidean distance between the new sample and the samples in the training sample set. The prediction sample that is ranked first indicates that the new sample is most likely to be the same person as the prediction sample.

实验及分析:Experiment and analysis:

特征选择:为了快速验证所提方法的有效性,本文使用了行人再识别任务中常用的LOMO特征和GOG特征。Feature selection: In order to quickly verify the effectiveness of the proposed method, this paper uses LOMO features and GOG features which are commonly used in pedestrian re-identification tasks.

参数设置:使用了theano框架对上述算法进行了实现。其中最小间隔margin取0.5,平衡系数α,λ分别取0.005和0.0001,映射到子空间维度c取512,批大小、学习率和k分别取32、1和10。Parameter setting: Theano framework was used to implement the above algorithm, where the minimum margin was set to 0.5, the balance coefficients α and λ were set to 0.005 and 0.0001 respectively, the subspace dimension c was mapped to 512, and the batch size, learning rate and k were set to 32, 1 and 10 respectively.

VIPeR数据库测试结果及分析VIPeR database test results and analysis

VIPeR数据库是行人再识别任务最受欢迎的数据库之一。它包含了由两个视角变化90°不同光照条件的摄像头采集的632个行人的1264张图像。我们自由选择了316个行人组成训练集,剩余的316个人组成测试集,并分别进行了半监督和全监督实验设置。The VIPeR database is one of the most popular databases for person re-identification tasks. It contains 1264 images of 632 pedestrians captured by two cameras with 90° viewing angles and different lighting conditions. We randomly selected 316 pedestrians as the training set and the remaining 316 pedestrians as the test set, and conducted semi-supervised and fully supervised experimental settings respectively.

半监督实验:对于半监督设置,我们随机取训练集中1/3行人的图片抹去标签作为无标签样本,剩余2/3行人的图片作为带标签样本。实验结果如表4.1。我们对本文提出的方法与SSCDL、DLLAP进行了对比,可以发现本文提出的方法对性能有很大提升,尤其是在将LOMO特征与GOG特征结合后Rank-1识别率可以达到47.5%。Semi-supervised experiment: For the semi-supervised setting, we randomly select 1/3 of the pedestrian images in the training set and erase the labels as unlabeled samples, and the remaining 2/3 of the pedestrian images as labeled samples. The experimental results are shown in Table 4.1. We compared the proposed method with SSCDL and DLLAP, and found that the proposed method has greatly improved the performance, especially after combining LOMO features with GOG features, the Rank-1 recognition rate can reach 47.5%.

表4.1 VIPeR数据库上半监督学习方法识别率比较Table 4.1 Comparison of recognition rates of semi-supervised learning methods on the VIPeR database

RankRank 11 55 1010 2020 SSCDLSSCDL 25.625.6 53.753.7 68.268.2 83.683.6 DLLAPDLLAP 32.532.5 61.861.8 74.374.3 84.184.1 LOMO+OurLOMO+Our 34.234.2 65.265.2 76.476.4 85.485.4 GOG+OurGOG+Our 42.442.4 73.473.4 83.983.9 91.091.0 LOMO+GOG+OurLOMO+GOG+Our 47.547.5 78.378.3 86.986.9 92.192.1

全监督实验:我们还对本文提出方法进行了全监督设置,即使用全部训练样本的标签。实验结果见表4.2。与DLLAP和L1Graph比较可以发现在使用GOG特征和联合使用LOMO和GOG特征时,本文提出方法均有较大提升。与半监督设置进行比较可以看出使用LOMO和GOG特征时,仅仅使用2/3训练样本的标签就可以达到47.5%的识别率,仅仅比全监督情况下相差3%,充分证明了本文提出方法的有效性。Fully supervised experiments: We also conducted a fully supervised setting for the proposed method, that is, using the labels of all training samples. The experimental results are shown in Table 4.2. Compared with DLLAP and L1Graph, it can be found that the proposed method has a significant improvement when using GOG features and the combined use of LOMO and GOG features. Compared with the semi-supervised setting, it can be seen that when using LOMO and GOG features, only using the labels of 2/3 of the training samples can achieve a recognition rate of 47.5%, which is only 3% lower than the fully supervised case, which fully proves the effectiveness of the proposed method.

表4.2 VIPeR数据库上全监督设置下识别率比较Table 4.2 Comparison of recognition rates under full supervision on the VIPeR database

RankRank 11 55 1010 2020 DLLAP[41] DLLAP [41] 38.538.5 70.870.8 78.578.5 86.186.1 L1Graph[42] L1Graph [42] 41.541.5 -- -- -- LOMO+OurLOMO+Our 36.136.1 68.268.2 79.679.6 88.588.5 GOG+OurGOG+Our 48.648.6 77.177.1 87.387.3 92.992.9 LOMO+GOG+OurLOMO+GOG+Our 50.550.5 79.679.6 88.888.8 94.394.3

本发明方法使用对比损失函数来充分利用有标签样本的标签信息,以及使用K相互最近邻方法替代K最近邻方法进行无标签样本的标注。在行人再识别公开数据集VIPeR上的实验结果证实了该方法的有效性。The method of the present invention uses a contrastive loss function to make full use of the label information of labeled samples, and uses the K mutual nearest neighbor method to replace the K nearest neighbor method to annotate unlabeled samples. The experimental results on the public dataset VIPeR for person re-identification confirm the effectiveness of the method.

最后说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的宗旨和范围,其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention rather than to limit it. Although the present invention has been described in detail with reference to the preferred embodiments, those skilled in the art should understand that the technical solution of the present invention can be modified or replaced by equivalents without departing from the purpose and scope of the technical solution of the present invention, which should be included in the scope of the claims of the present invention.

Claims (3)

1.基于半监督学习的行人再识别方法,其特征在于,包括如下步骤:1. A pedestrian re-identification method based on semi-supervised learning, characterized in that it includes the following steps: S100:学习一个投影矩阵U∈Rd×c将原始的d维特征空间投影到c维子空间,使得UTX∈Rc ×N在新的子空间中满足:来自于同一行人的样本对间欧氏距离更小,来自不同行人的样本对间欧式距离更大;来自于同一行人的样本定义为同类样本,来自不同行人的样本定义为不同类样本;S100: Learn a projection matrix U∈R d×c to project the original d-dimensional feature space into a c-dimensional subspace, so that U T X∈R c ×N satisfies in the new subspace: the Euclidean distance between sample pairs from the same pedestrian is smaller, and the Euclidean distance between sample pairs from different pedestrians is larger; samples from the same pedestrian are defined as samples of the same type, and samples from different pedestrians are defined as samples of different types; 所述S100中学习投影矩阵U∈Rd×c的方法具体是:The method for learning the projection matrix U∈R d×c in S100 is specifically: S110:建立训练样本集,该训练样本集中包括若干样本,若干样本中包括有标签的样本和无标签的样本,有标签的样本中同一行人的样本标签相同;S110: Establish a training sample set, the training sample set includes a plurality of samples, the plurality of samples include labeled samples and unlabeled samples, and the labeled samples have the same sample label for the same pedestrian; 设X=[XL,XU]∈Rd×N表示所有的训练样本,其中N为训练集所包含的所有图片数目,d为特征向量的长度,表示NL个有标签样本,/>表示NU个无标签样本;Let X = [ XL , XU ] ∈ Rd × N represent all training samples, where N is the number of images in the training set, d is the length of the feature vector, represents N L labeled samples, /> represents N U unlabeled samples; S120:建立目标函数如下:S120: Establish the objective function as follows: 其中L(U)为一个回归函数,为一个加权的回归函数,W(U)是一个正则化的约束,α,λ>0为平衡系数;Where L(U) is a regression function, is a weighted regression function, W(U) is a regularized constraint, and α,λ>0 are balance coefficients; S130:有标签样本损失函数为对比损失函数:对于采样的NP个样本对和/>若/>和/>来自同一行人的样本,那么在新的投影空间/>和/>之间的欧氏距离dn应该尽量小,接近于0;反之,则dn应该至少大于事先设定的一个阈值margin>0,如果不满足上述条件就会产生损失;S130: The loss function of the labeled sample is a contrast loss function: for the N P sample pairs and/> If/> and/> Samples from the same pedestrian, then in the new projection space/> and/> The Euclidean distance d n between them should be as small as possible, close to 0; otherwise, d n should be at least greater than a pre-set threshold margin>0. If the above conditions are not met, losses will occur. 所述S130的有标签样本损失函数为:The labeled sample loss function of S130 is: 其中: in: S140:无标签样本标记标签,采用K相互最近邻的方法对无标签样本标记标签,对无标签样本损失函数为:S140: Label the unlabeled samples. Use the K nearest neighbor method to label the unlabeled samples. The loss function for the unlabeled samples is: 其中,如果UTxi和UTxj满足K相互最近邻且xi和xj来自不同相机,那么取If U T x i and U T x j satisfy K nearest neighbors and x i and x j are from different cameras, then take 否则Wij=0; (8);Otherwise, Wij = 0; (8); 对无标签样本标注标签后,利用标记后的样本对对现有子空间进行进一步约束,约束的权重为两个样本在新投影空间中的余弦距离;After labeling the unlabeled samples, the labeled samples are used to further constrain the existing subspace. The weight of the constraint is the cosine distance between the two samples in the new projection space. S150:正则化项:使用L2,1范数对投影矩阵U进行约束:S150: Regularization term: Use the L2,1 norm to constrain the projection matrix U: W(U)=||U||2,1 (4);W(U)=||U|| 2,1 (4); S200:将新样本采用投影矩阵U∈Rd×c投影到新的子空间,得到一个预测样本序列,该预测样本序列按照新样本与训练样本集中样本间的欧氏距离,从小到大的顺序排列。S200: Project the new sample to the new subspace using the projection matrix U∈R d×c to obtain a prediction sample sequence, which is arranged in ascending order according to the Euclidean distance between the new sample and the samples in the training sample set. 2.如权利要求1所述的基于半监督学习的行人再识别方法,其特征在于,所述S130中采样的NP个样本采样策略为最大化top-k识别率的采样策略,即对于每张图像,对其k最近邻的样本全部采样。2. The pedestrian re-identification method based on semi-supervised learning as described in claim 1 is characterized in that the sampling strategy of the N P samples sampled in S130 is a sampling strategy that maximizes the top-k recognition rate, that is, for each image, all samples of its k nearest neighbors are sampled. 3.如权利要求1所述的基于半监督学习的行人再识别方法,其特征在于,所述S140中采用K相互最近邻的方法对无标签样本标记标签的方法为:3. The pedestrian re-identification method based on semi-supervised learning according to claim 1, characterized in that the method of labeling the unlabeled samples using the K-nearest neighbor method in S140 is: 定义样本x的K最近邻N(x,k)如下:The K nearest neighbors N(x,k) of sample x are defined as follows: N(x,k)={x1,x2,...,xk},|N(p,k)|=k (5);N(x,k)={x 1 ,x 2 ,...,x k },|N(p,k)|=k (5); 其中|·|表示集合中样本数目,那么K相互最近邻R(x,k)定义如下:Where |·| represents the number of samples in the set, then the K nearest neighbors R(x,k) are defined as follows: R(x,k)={xi|(xi∈N(x,k))∧(x∈N(xi,k))} (6)。R(x,k)={ xi | ( xi∈N(x,k))∧(x∈N( xi ,k))} (6).
CN202010970306.5A 2020-09-15 2020-09-15 Person Re-identification Method Based on Semi-supervised Learning Expired - Fee Related CN112101217B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010970306.5A CN112101217B (en) 2020-09-15 2020-09-15 Person Re-identification Method Based on Semi-supervised Learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010970306.5A CN112101217B (en) 2020-09-15 2020-09-15 Person Re-identification Method Based on Semi-supervised Learning

Publications (2)

Publication Number Publication Date
CN112101217A CN112101217A (en) 2020-12-18
CN112101217B true CN112101217B (en) 2024-04-26

Family

ID=73758623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010970306.5A Expired - Fee Related CN112101217B (en) 2020-09-15 2020-09-15 Person Re-identification Method Based on Semi-supervised Learning

Country Status (1)

Country Link
CN (1) CN112101217B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657176A (en) * 2021-07-22 2021-11-16 西南财经大学 Pedestrian re-identification implementation method based on active contrast learning
CN116052095B (en) * 2023-03-31 2023-06-16 松立控股集团股份有限公司 Vehicle re-identification method for smart city panoramic video monitoring

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8527432B1 (en) * 2008-08-08 2013-09-03 The Research Foundation Of State University Of New York Semi-supervised learning based on semiparametric regularization
CN107145827A (en) * 2017-04-01 2017-09-08 浙江大学 Cross-camera person re-identification method based on adaptive distance metric learning
CN109522956A (en) * 2018-11-16 2019-03-26 哈尔滨理工大学 A kind of low-rank differentiation proper subspace learning method
CN110008828A (en) * 2019-02-21 2019-07-12 上海工程技术大学 Pairs of constraint ingredient assay measures optimization method based on difference regularization
CN110008842A (en) * 2019-03-09 2019-07-12 同济大学 A kind of pedestrian's recognition methods again for more losing Fusion Model based on depth
CN110175511A (en) * 2019-04-10 2019-08-27 杭州电子科技大学 It is a kind of to be embedded in positive negative sample and adjust the distance pedestrian's recognition methods again of distribution
CN111027442A (en) * 2019-12-03 2020-04-17 腾讯科技(深圳)有限公司 Model training method, recognition method, device and medium for pedestrian re-recognition
CN111033509A (en) * 2017-07-18 2020-04-17 视语智能有限公司 target re-identification
CN111027421A (en) * 2019-11-26 2020-04-17 西安宏规电子科技有限公司 Graph-based direct-push type semi-supervised pedestrian re-identification method
CN111144451A (en) * 2019-12-10 2020-05-12 东软集团股份有限公司 Training method, device and equipment of image classification model
CN111353516A (en) * 2018-12-21 2020-06-30 华为技术有限公司 A sample classification method and model update method for online learning
CN111563424A (en) * 2020-04-20 2020-08-21 清华大学 Method and device for pedestrian re-identification based on semi-supervised learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3979007B2 (en) * 2000-12-22 2007-09-19 富士ゼロックス株式会社 Pattern identification method and apparatus
US9116894B2 (en) * 2013-03-14 2015-08-25 Xerox Corporation Method and system for tagging objects comprising tag recommendation based on query-based ranking and annotation relationships between objects and tags
US9471847B2 (en) * 2013-10-29 2016-10-18 Nec Corporation Efficient distance metric learning for fine-grained visual categorization
US11537817B2 (en) * 2018-10-18 2022-12-27 Deepnorth Inc. Semi-supervised person re-identification using multi-view clustering

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8527432B1 (en) * 2008-08-08 2013-09-03 The Research Foundation Of State University Of New York Semi-supervised learning based on semiparametric regularization
CN107145827A (en) * 2017-04-01 2017-09-08 浙江大学 Cross-camera person re-identification method based on adaptive distance metric learning
CN111033509A (en) * 2017-07-18 2020-04-17 视语智能有限公司 target re-identification
CN109522956A (en) * 2018-11-16 2019-03-26 哈尔滨理工大学 A kind of low-rank differentiation proper subspace learning method
CN111353516A (en) * 2018-12-21 2020-06-30 华为技术有限公司 A sample classification method and model update method for online learning
CN110008828A (en) * 2019-02-21 2019-07-12 上海工程技术大学 Pairs of constraint ingredient assay measures optimization method based on difference regularization
CN110008842A (en) * 2019-03-09 2019-07-12 同济大学 A kind of pedestrian's recognition methods again for more losing Fusion Model based on depth
CN110175511A (en) * 2019-04-10 2019-08-27 杭州电子科技大学 It is a kind of to be embedded in positive negative sample and adjust the distance pedestrian's recognition methods again of distribution
CN111027421A (en) * 2019-11-26 2020-04-17 西安宏规电子科技有限公司 Graph-based direct-push type semi-supervised pedestrian re-identification method
CN111027442A (en) * 2019-12-03 2020-04-17 腾讯科技(深圳)有限公司 Model training method, recognition method, device and medium for pedestrian re-recognition
CN111144451A (en) * 2019-12-10 2020-05-12 东软集团股份有限公司 Training method, device and equipment of image classification model
CN111563424A (en) * 2020-04-20 2020-08-21 清华大学 Method and device for pedestrian re-identification based on semi-supervised learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Center Based Pseudo-Labeling For Semi-Supervised Person Re-Identification;G. Ding等;2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW);20181129;第1-6页 *
行人重识别研究综述;张化祥等;山东师范大学学报(自然科学版);20181231;第33卷(第04期);第379-387页 *

Also Published As

Publication number Publication date
CN112101217A (en) 2020-12-18

Similar Documents

Publication Publication Date Title
Zhang et al. Metacleaner: Learning to hallucinate clean representations for noisy-labeled visual recognition
CN107515895B (en) A visual target retrieval method and system based on target detection
Zheng et al. Person re-identification: Past, present and future
CN105844239B (en) It is a kind of that video detecting method is feared based on CNN and LSTM cruelly
Yang et al. Group-sensitive multiple kernel learning for object categorization
CN110490236B (en) Automatic image annotation method, system, device and medium based on neural network
Wang et al. Robust high dimensional stream classification with novel class detection
US20210319215A1 (en) Method and system for person re-identification
Zheng et al. Adaptive boosting for domain adaptation: Toward robust predictions in scene segmentation
CN108846413A (en) A kind of zero sample learning method based on global semantic congruence network
CN111950372A (en) An unsupervised person re-identification method based on graph convolutional network
CN112101217B (en) Person Re-identification Method Based on Semi-supervised Learning
CN107480690A (en) A Multi-Classification Method Including Unknown Classes Based on Support Vector Machine
CN110647907A (en) Multi-label image classification algorithm using multi-layer classification and dictionary learning
Lu et al. Mask-aware pseudo label denoising for unsupervised vehicle re-identification
CN118628813A (en) Passive domain adaptive image recognition method based on transferable semantic knowledge
CN114580566A (en) A Few-Shot Image Classification Method Based on Interval Supervised Contrastive Loss
CN114882534A (en) Pedestrian re-identification method, system and medium based on counterfactual attention learning
CN103617609A (en) A k-means nonlinear manifold clustering and representative point selecting method based on a graph theory
CN117830891A (en) Method and system for detecting non-friendly event in audio and video
CN116912742A (en) A weakly supervised video anomaly detection method based on self-evolution
CN112001345A (en) Few-sample human behavior identification method and system based on feature transformation measurement network
CN114419382A (en) An unsupervised graph embedding method and system for multi-view images
CN118015507A (en) Weak supervision video violence detection method based on time domain enhancement and contrast learning
CN117876685A (en) A weakly supervised point cloud semantic segmentation method combining noise mining and correction strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20240426