Disclosure of Invention
The invention provides a pedestrian re-identification method based on double-constraint metric learning and sample reordering to solve the problems in the prior art, so that the accuracy and the stability of the existing pedestrian re-identification method based on metric learning are improved.
In order to achieve the purpose, the invention discloses a pedestrian re-identification method based on double-constraint metric learning and sample reordering, which comprises two stages of training and testing;
the training phase comprises the steps of:
step 1, establishing cross-camera association constraint: forming a cross-camera sample pair by using pedestrian pictures from different cameras in the training set, and establishing a constraint item to ensure that the characteristic distance between the cross-camera positive sample pair is smaller than the characteristic distance between the cross-camera negative sample pair;
step 2, establishing a camera association constraint: forming a same-camera sample pair by using pedestrian pictures from the same camera in the training set, and establishing a constraint item to ensure that the characteristic distance between the same-camera negative sample pair is greater than the characteristic distance between the camera-crossing positive sample pair;
step 3, solving a measurement matrix: obtaining a target function of double-constraint metric learning by combining the two constraint terms in the step 1 and the step 2, solving a semi-positive definite metric matrix M which minimizes the target function to obtain a training result of metric learning, and ending the training stage;
the testing phase comprises the following steps:
and 4, performing characteristic space projection by using the measurement matrix: according to the semi-positive nature of the measurement matrix M, the characteristics of the measurement matrix M are decomposed into M-P
TP, utilizing the matrix P to search the feature vector x of the picture in the test stage
pAnd feature vectors of candidate sets
Projecting the images to a new feature space in a unified manner, wherein N is the total number of the candidate concentrated images in the testing stage;
step 5, calculating Euclidean distances of the query picture and the candidate pictures in the feature space: respectively calculating the Euclidean distance between the query picture and each candidate picture in the new feature space:
step 6, calculating the initial sequence of the candidate pictures: sorting the candidate pictures according to the Euclidean distances calculated in the step 5, wherein the candidate pictures with smaller Euclidean distances to the query picture can obtain a more front sorting position;
step 7, selecting the first K candidate pictures in the sorting queue: selecting K candidate pictures with the top ranking from the candidate picture ranking queue obtained in the step 6;
step 8, constructing a probability hypergraph by using the relevance of the previous K candidate pictures in the feature space: taking the query picture and the K candidate pictures as vertexes of the probability hypergraph, generating hyperedges of the probability hypergraph through the relevance between the vertexes, and finally giving corresponding weight to each hyperedge;
step 9, calculating a reordering result based on the probability hypergraph: calculating a Laplace matrix of the probability hypergraph, establishing a target function by combining with the experience loss of the initial label, calculating the ranking score of the candidate pictures according to the target function, and reordering the K candidate pictures from large to small according to the ranking score;
step 10, returning the final ordering of the candidate pictures: and (4) replacing the sorting positions of the previous K pictures in the sorting queue in the step 6 with the re-sorting results of the K candidate pictures in the step 9, and returning the whole candidate set sorting queue as the final result of pedestrian re-identification.
Further: the establishment of the cross-camera association constraint in the step 1 comprises the following steps:
step 1.1, respectively defining training pictures from different cameras as a query set
And candidate set
Wherein x
iAnd y
jIs a feature vector of a pedestrian picture, and
and
the number of the pictures in the search set is n, and the number of the pictures in the candidate set is m;
step 1.2, defining sample pairs (x) composed of pedestrian pictures from different cameras
i,y
j) Is a cross-camera sample pair; when x is
iAnd y
jWhen belonging to the same pedestrian, i.e.
Scale (x)
i,y
j) For a cross-camera positive sample pair, and define
z ij1 is ═ 1; when in
When it comes to (x)
i,y
j) For pairs of negative samples across the camera, and set z
ij=-1;
Step 1.3, constraining any cross-camera positive sample pair (x) in the training seti,yj) Is less than the negative sample pair (x) across the camerai,yk) The distance between:
wherein d isM(-) is the mahalanobis distance metric function to be learned, expressed as follows:
in the above formula, M is a semi-positive measurement matrix, i.e. the target of measurement learning;
step 1.4, performing equivalent transformation on the constraint in step 1.3, wherein the distance between any cross-camera positive sample pair in the constraint training set is smaller than a threshold ξ, and the distance between any cross-camera negative sample pair in the training set is larger than a threshold ξ, so as to obtain the following loss function:
wherein
Is a logistic regression function; e
p(M) is a loss function across the camera positive sample pair, E
d(M) is a loss function of the cross-camera negative sample pairs, ξ takes on all cross-camera sample pairs (x)
i,y
j) And with camera sample pair (y)
j,y
k) The average distance of (c).
Further: the establishment of the camera association constraint in the step 2 comprises the following steps:
step 2.1, define candidate set
Picture of different pedestrians in middle
jAnd y
kPairs of constituent samples (y)
j,y
k) For the same camera negative sample pair, and set label z
jk=-1;
Step 2.2, constraining any cross-camera positive sample pair (x) in the training seti,yj) Is less than the same camera negative sample pair (y)j,yk) The distance between:
step 2.3, since step 1.4 already constrains the distances between all pairs of cross-camera positive samples to be less than the threshold ξ, the constraint in step 2.2 is equivalently converted into any pair of same-camera negative samples (y) in the constraint training setj,yk) Greater than ξ, the following loss function is obtained:
wherein Es(M) is the loss function of the same camera negative sample pair.
Further: the solving of the measurement matrix in the step 3 specifically comprises the following steps:
step 3.1, jointly considering the loss functions in step 1.4 and step 2.3, obtaining a target function of the double-constraint distance metric learning:
Φ(M)=Ep(M)+Ed(M)+Es(M)
step 3.2, give weight w to the sample pairs in the objective functionijAnd WjkAnd simplifying the objective function expression in the step 3.1 to obtain:
wherein
When z is
ijWhen 1 is equal to w
ij=1/N
posIn which N is
posThe total number of pairs of positive samples across cameras in the training set; when z is
ijWhen is-1 time w
ijIs set to be 1/N
negIn which N is
negThe total number of all cross-camera and same-camera negative sample pairs in the training set; at the same time, since there is no same-camera positive sample pair, w will be
jkAre uniformly set to 1/N
neg;
Step 3.3, defining the dual constraint metric learning as the following optimization problem:
and 3.4, solving the optimization problem in the step 3.3 to obtain a semi-positive definite metric matrix M.
Further: the constructing of the probability hypergraph by using the relevance of the previous K candidate pictures in the feature space in the step 8 specifically comprises the following steps:
step 8.1, first queryMerging the pictures and K candidate pictures to obtain a vertex set of the probability hypergraph
Step 8.2, in
Each vertex v in
iAs central node, by connection v
iGenerating three super edges by 5, 15 and 25 vertexes which are closest to each other in the projection feature space, and adding the three super edges into a super edge set epsilon of the probability hypergraph, so that the set epsilon contains 3 x (K +1) super edges in total;
step 8.3, for each super edge e in the super edge set epsilon
iAssigning a non-negative weight value w
h(e
i) When the super edge takes the query picture as the central node, a weighted value is distributed to the super edge
When the super edge takes the candidate picture as the central node, the weight value is distributed to the super edge
Step 8.4, according to
The subordination relation between the middle vertex and the epsilon middle transfinite has a structure size of
The element of the incidence matrix H is defined as:
wherein A (v)i,ej) Representing a vertex viBelonging to the super edge ejIs calculated by the following formula:
wherein v is
jIs a super edge e
jσ is the average distance between all vertices in the projection feature space; final completion probability hypergraph
And obtaining a correlation matrix H.
Further: in step 9, a reordering result is calculated based on the probabilistic hypergraph, which specifically includes the following substeps:
step 9.1, based on the correlation matrix H, calculate the degree d (v) of each vertex and the degree δ (e) of each superedge in the probability hypergraph, where d (v) Σ
e∈εw
h(e) H (v, e), and
defining a diagonal matrix D
vMaking the elements on the diagonal line correspond to the degree of each vertex in the probability hypergraph; defining a diagonal matrix D
eMaking the elements on the diagonal line correspond to the degree of each hyper-edge in the probability hyper-graph; defining diagonal matrix W to make its diagonal elements correspond to weight W of each superedge
h(e);
Step 9.2, utilizing incidence matrix H and vertex degree matrix DvOvercritical matrix DeAnd the Laplace matrix L of the probability hypergraph is calculated together with the hyperedge weight matrix W:
wherein I is a size of
The identity matrix of (1);
step 9.3, simultaneously considering Laplacian constraint and initial label experience loss of the probability hypergraph by utilizing a normalization framework, and defining an objective function of sample reordering as follows:
wherein f represents a reordering score vector needing to be learned, r represents an initial label vector, the label of a query picture in r is set to be 1, the labels of all candidate pictures are set to be 0, and μ >0 is a normalization parameter used for weighing the importance between a first item and a second item in an objective function; the first item in the target function restrains the peaks sharing more hyper-edges in the probability hypergraph to obtain similar reordering scores, and the second item in the target function restrains the reordering scores to be close to the initial label information;
step 9.4, by making the first derivative of the objective function in step 9.3 with respect to f zero, an optimal solution to the reordering problem can be obtained quickly:
and 9.5, reordering the K candidate pictures from large to small according to the reordering scores of the candidate pictures in the vector f.
Compared with the prior art, the invention adopting the technical scheme has the following beneficial effects:
1) compared with the existing pedestrian re-identification method based on metric learning, which only considers the cross-camera association constraint of the training samples, the method provided by the invention simultaneously considers the same-camera and cross-camera association information among the training samples in the process of metric learning, so that the learned metric matrix has stronger discriminability;
2) according to the method, the probability hypergraph is constructed by using the associated information among different candidate pictures, the similarity sorting result in the test stage is reordered, the influence of an over-fitting phenomenon in metric learning is effectively relieved, and a more stable and accurate candidate picture sorting result is obtained;
3) according to the invention, only K candidate pictures with the front initial ordering positions are considered during reordering, and compared with the reordering of the whole candidate set, the calculation complexity of probability hypergraph construction is reduced on the basis of ensuring the ordering accuracy, so that the reordering speed is accelerated.
Examples
In the embodiment of the invention, pedestrian pictures shot by different cameras are processed, a metric matrix is learned through a training set, a query picture of a certain pedestrian target is used in a test stage, and correct matching of the pedestrian target is found in candidate sets shot by different cameras, referring to fig. 1, in the embodiment of the invention, the method comprises two stages of training and testing;
the training phase comprises the steps of:
step 1, establishing cross-camera association constraint: the method comprises the following steps of forming a cross-camera sample pair by using pedestrian pictures from different cameras in a training set, establishing a constraint term to enable the characteristic distance between the cross-camera positive sample pair to be smaller than the characteristic distance between the cross-camera negative sample pair, and specifically comprising the following sub-steps:
step 1.1, respectively defining training pictures from different cameras as a query set
And candidate set
Wherein x
iAnd y
jIs a feature vector of a pedestrian picture, and
and
the number of the pictures in the search set is n, and the number of the pictures in the candidate set is m;
step 1.2, defining sample pairs (x) composed of pedestrian pictures from different cameras
i,y
j) Is a cross-camera sample pair; when x is
iAnd y
jWhen belonging to the same pedestrian, i.e.
Scale (x)
i,y
j) For a cross-camera positive sample pair, and define
z ij1 is ═ 1; when in
When it comes to (x)
i,y
j) For pairs of negative samples across the camera, and set z
ij=-1;
Step 1.3, constraining any cross-camera positive sample pair (x) in the training seti,yj) Is less than the negative sample pair (x) across the camerai,yk) The distance between:
wherein d isM(-) is the mahalanobis distance metric function to be learned, expressed as follows:
in the above formula, M is a semi-positive measurement matrix, i.e. the target of measurement learning;
step 1.4, performing equivalent transformation on the constraint in the step 1.3, wherein the distance between any cross-camera positive sample pair in the constraint training set is smaller than a threshold value ξ, and any cross-camera negative sample pair in the training set
The distance between is greater than the threshold ξ, the following loss function is obtained:
wherein
Is a logistic regression function; e
p(M) is a loss function across the camera positive sample pair, E
d(M) is a loss function of the cross-camera negative sample pairs, ξ takes on all cross-camera sample pairs (x)
i,y
j) And with camera sample pair (y)
j,y
k) The average distance of (c).
Step 2, establishing a camera association constraint: the method comprises the following steps of forming a same-camera sample pair by using pedestrian pictures from the same camera in a training set, establishing a constraint term to enable the characteristic distance between the same-camera negative sample pair to be larger than the characteristic distance between a cross-camera positive sample pair, and specifically comprising the following substeps:
step 2.1, define candidate set
Picture of different pedestrians in middle
jAnd y
kPairs of constituent samples (y)
j,y
k) For the same camera negative sample pair, and set label z
jk=-1;
Step 2.2, constraining any cross-camera positive sample pair (x) in the training seti,yj) Is less than the same camera negative sample pair (y)j,yk) The distance between:
step 2.3, since step 1.4 already constrains the distances between all pairs of cross-camera positive samples to be less than the threshold ξ, the constraint in step 2.2 is equivalently converted into any pair of same-camera negative samples (y) in the constraint training setj,yk) Greater than ξ, the following loss function is obtained:
step 3, solving a measurement matrix: obtaining a target function of double-constraint metric learning by combining the two constraint terms in the step 1 and the step 2, and solving a semi-positive definite metric matrix M which minimizes the target function to obtain a training result of metric learning, wherein the training result specifically comprises the following substeps:
step 3.1, jointly considering the loss functions in step 1.4 and step 2.3, obtaining a target function of the double-constraint distance metric learning:
Φ(M)=Ep(M)+Ed(M)+Es(M)
step 3.2, giving weight to the sample pairs in the objective function, and simplifying the objective function expression in step 3.1 to obtain:
wherein
When z is
ijWhen 1 is equal to w
ij=1/N
posIn which N is
posThe total number of pairs of positive samples across cameras in the training set; when z is
ijWhen is-1 time w
ijIs set to be 1/N
negIn which N is
negThe total number of all cross-camera and same-camera negative sample pairs in the training set; at the same time, since there is no same-camera positive sample pair, w will be
jkAre uniformly set to 1/N
neg;
Step 3.3, defining the dual constraint metric learning as the following optimization problem:
step 3.4, solving the optimization problem in the step 3.3 to obtain a semi-positive definite metric matrix M; in this embodiment, first, a matrix X and a matrix Y are defined, where the matrix X and the matrix Y respectively store a query set
Middle n pictures and candidate set
Feature vectors of the middle m pictures; then, X and Y are combined into a matrix C ═ X, Y]And use in combination of c
iRepresents the ith column of matrix C; by assuming when y
jAnd y
kZ is the same candidate picture
jk0 and w
jkThe objective function in step 3.2 can be changed to 0:
the gradient of the objective function with respect to the matrix M is found as follows:
finally, iteratively solving a measurement matrix M which minimizes the objective function by using a gradient descent method;
the testing phase comprises the following steps:
and 4, performing characteristic space projection by using the measurement matrix: according to the semi-positive nature of the measurement matrix M, the characteristics of the measurement matrix M are decomposed into M-P
TP, utilizing the matrix P to search the feature vector x of the picture in the test stage
pAnd feature vectors of candidate sets
Projecting the images to a new feature space in a unified manner, wherein N is the total number of the candidate concentrated images in the testing stage;
step 5, calculating Euclidean distances of the query picture and the candidate pictures in the feature space: respectively calculating the Euclidean distance between the query picture and each candidate picture in the new feature space:
step 6, calculating the initial sequence of the candidate pictures: sorting the candidate pictures according to the Euclidean distances calculated in the step 5, wherein the candidate pictures with smaller Euclidean distances to the query picture can obtain a more front sorting position;
step 7, selecting the first K candidate pictures in the sorting queue: selecting K candidate pictures with the top ranking from the candidate picture ranking queue obtained in the step 6, wherein K is 100 in the embodiment;
step 8, constructing a probability hypergraph by using the relevance of the previous K candidate pictures in the feature space: taking the query picture and the K candidate pictures as vertexes of the probability hypergraph, generating hyperedges of the probability hypergraph through the relevance between the vertexes, and finally giving corresponding weight to each hyperedge; the method specifically comprises the following substeps:
step 8.1, firstly, merging the query picture and K candidate pictures to obtain a vertex set of the probability hypergraph
Step 8.2, in
Each vertex v in
iAs central node, by connection v
iGenerating three super edges by 5, 15 and 25 vertexes which are closest to each other in the projection feature space, and adding the three super edges into a super edge set epsilon of the probability hypergraph, so that the set epsilon contains 3 x (K +1) super edges in total;
step 8.3, for each super edge e in the super edge set epsilon
iAssigning a non-negative weight value w
h(e
i) When the super edge takes the query picture as the central node, a larger weight value is distributed to the super edge
Emphasizing the role of the query picture in reordering; when the super edge takes the candidate picture as the central node, a smaller weight value is distributed to the super edge
In this example take
Step 8.4, according to
The subordination relation between the middle vertex and the epsilon middle transfinite has a structure size of
The element of the incidence matrix H is defined as:
wherein A (v)i,ej) Representing a vertex viBelonging to the super edge ejIs calculated by the following formula:
wherein v is
jIs a super edge e
jσ is the average distance between all vertices in the projection feature space; final completion probability hypergraph
And obtaining a correlation matrix H;
step 9, calculating a reordering result based on the probability hypergraph: calculating a Laplace matrix of the probability hypergraph, establishing a target function by combining with the experience loss of the initial label, calculating the ranking score of the candidate pictures according to the target function, and reordering the K candidate pictures from large to small according to the ranking score; the method specifically comprises the following substeps:
step 9.1, based on the correlation matrix H, calculate the degree d (v) of each vertex and the degree δ (e) of each superedge in the probability hypergraph, where d (v) Σ
e∈εw
h(e) H (v, e), and
defining a diagonal matrix D
vMaking the elements on the diagonal line correspond to the degree of each vertex in the probability hypergraph; defining a diagonal matrix D
eMaking the elements on the diagonal line correspond to the degree of each hyper-edge in the probability hyper-graph; defining diagonal matrix W to make its diagonal elements correspond to weight W of each superedge
h(e);
Step 9.2, utilizing incidence matrix H and vertex degree matrix DvOvercritical matrix DeAnd the Laplace matrix L of the probability hypergraph is calculated together with the hyperedge weight matrix W:
wherein I is a size of
The identity matrix of (1);
step 9.3, simultaneously considering Laplacian constraint and initial label experience loss of the probability hypergraph by utilizing a normalization framework, and defining an objective function of sample reordering as follows:
wherein f represents a reordering score vector needing to be learned, r represents an initial label vector, the label of a query picture in r is set to be 1, the labels of all candidate pictures are set to be 0, and μ >0 is a normalization parameter used for weighing the importance between a first item and a second item in an objective function; the first item in the target function restrains the peaks sharing more super edges in the hypergraph to obtain similar reordering scores, and the second item in the target function restrains the reordering scores to be close to the initial label information; μ in this example is 0.01;
step 9.4, by making the first derivative of the objective function in step 9.3 with respect to f zero, an optimal solution to the reordering problem can be obtained quickly:
step 9.5, reordering the K candidate pictures from large to small according to the reordering scores of the candidate pictures in the vector f;
step 10, returning the final ordering of the candidate pictures: and (4) replacing the sorting positions of the previous K pictures in the sorting queue in the step 6 with the re-sorting results of the K candidate pictures in the step 9, and returning the whole candidate set sorting queue as the final result of pedestrian re-identification.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.