Pedestrian tracking method based on federal learning and edge calculation
Technical Field
The invention belongs to the technical field of computer vision and federal learning, and particularly relates to a pedestrian tracking method based on federal learning and edge calculation.
Background
The existing pedestrian tracking method needs to train a model by using a large amount of pedestrian data, but a single data set usually only contains data of a specific scene, and the identification accuracy is often greatly reduced when the model trained based on the data set is applied to an actual scene. If the data collected by the video acquisition equipment in a plurality of different scenes are used for model training, data leakage can be caused, and the privacy and the data safety of a user can not be ensured. In addition, the existing pedestrian tracking technology can only be applied to RGB video images photographed in daytime, but pedestrian tracking is required to be applicable to night scenes as well as daytime use. In summary, the model obtained by training the single marked data set by the existing pedestrian tracking technology has the defects of poor generalization capability, incapability of adapting to different use scenes and incapability of processing cross-modal data. Therefore, a pedestrian tracking method capable of ensuring data security, adapting to different application scenes and realizing cross-modal identification is a key for solving the problem.
The federal learning is used as a distributed machine learning technology, so that a plurality of participants can perform machine learning under the condition of protecting data privacy and meeting legal compliance requirements, and the problem of data island is solved. The edge computing technology is mainly applied to the field of video processing, and has the advantages of low transmission delay, large bandwidth and high computing performance. The federal learning and edge computing are applied to the pedestrian tracking field, and the edge server and the cloud server are used for jointly performing model training based on federal learning, so that data privacy can be effectively ensured, and the problem of data island is solved.
The Chinese patent No. CN20191019437. X discloses a pedestrian tracking method, device and equipment, wherein the method comprises the steps of obtaining a video frame to be detected, detecting candidate pedestrians in the video frame to be detected, extracting candidate pedestrian characteristics of the candidate pedestrians, determining differences between the candidate pedestrian characteristics and stored characteristics in a characteristic queue, and determining the candidate pedestrians as target pedestrians when the differences meet preset conditions, wherein the stored characteristics in the characteristic queue are characteristics matched with the target pedestrians. The efficiency of the pedestrian tracking process can be improved. But this method can only be used in the daytime and cannot be applied to night scenes.
The Chinese patent CN201510548633.0 discloses a video pedestrian detection and tracking method based on motion information and track association. Detecting pedestrians, namely detecting motion by using a frame difference method, combining a morphological method in digital image processing, firstly detecting a motion region in a video, then extracting features in the motion region in a sliding window searching mode, and finally classifying by using a pre-trained pedestrian detection classifier to obtain a classification result. The tracking method is characterized in that the pedestrian detection result obtained in the last step is used as input of the step, a tracker is initialized for each detected pedestrian at the beginning, and each tracker contains historical motion information and appearance information of a target. When the current frame is processed, extracting position information and appearance information from each input detection result, and establishing an association matrix based on the position information and the appearance information to associate tracking targets of previous frames, so as to finally obtain the tracking track of pedestrians. The method has good real-time performance and good robustness in more complex scenes. But this method has a data security problem.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a pedestrian tracking method based on federal learning and edge calculation, which uses an edge server and a cloud server to jointly perform model training based on federal learning to effectively ensure data privacy and solve the problem of data island, uses a knowledge distillation and weight distribution method to solve the problems of precision reduction and model convergence caused by data isomerism of multiple scenes when federal learning is applied to pedestrian tracking, and uses a cross-modal processing technology to process images of different modes so as to meet the requirements of cross-modal pedestrian tracking.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a pedestrian tracking method based on federal learning and edge calculation comprises the following steps:
Step one, detecting and acquiring a pedestrian image;
step two, performing cross-modal processing on the pedestrian image;
Step three, extracting pedestrian characteristics;
And step four, matching the characteristics and determining the target pedestrians.
In a further aspect of the present invention, the detecting and acquiring a pedestrian image in the first step specifically includes:
11 Processing the video to be detected to obtain a plurality of video frame images;
12 Inputting each video frame image of the video file into a pedestrian tracking network, determining whether a pedestrian appears in the current frame image through the pedestrian tracking network, and if the pedestrian is detected, acquiring and outputting the pedestrian image;
The detection is carried out to obtain a pedestrian image based on a detection frame of PP-YOLOv.
The invention further provides a specific flow of the cross-mode processing in the second step, which is as follows:
21 Judging the image data type, if the image is RGB image, executing step 22), if the image is IR image, executing step 23);
22 Converting the RGB image into a gray scale image, wherein the image is a single channel, creating a second channel and filling the second channel with a value of 0;
23 Taking the IR image itself as a second channel, filling the first channel with a value of 0;
24 The image data processed in the step 22) or the step 23) are unified into a double-channel structure and output to the step three for pedestrian feature extraction.
In the process of model training by using the training set, different nodes can selectively inactivate different modes, so that the effect of cross-mode re-identification is achieved.
The specific flow of extracting the pedestrian features in the third step is as follows:
31 Respectively inputting the pedestrian image to be queried and the pedestrian image detected from the video and subjected to cross-modal processing into a trained feature extraction model based on federal learning and edge calculation;
32 The feature extraction model extracts feature vectors of the pedestrian image to be queried and the detected pedestrian images respectively.
The model training method of federal learning and edge computing comprises the steps that firstly, a cloud server selects K participating models from all N edge servers to train to obtain an initial global model, and the cloud server transmits the global model to each edge server. And after receiving the global model, the edge server connects the global model with the local classifier obtained by the previous round of training to form a new local model, and uses the local data set to perform local updating training on the model. The edge server saves the local classifier and uploads the parameters of the backbone network to the cloud server. Finally, the cloud server aggregates the received model parameters uploaded by all edge servers to obtain a new global model, and the global model is continuously issued and the steps are repeated until the model converges to meet the accuracy requirement.
The invention further provides a specific process for determining the target pedestrian by the matching feature in the fourth step, which comprises the following steps:
41 Respectively comparing the feature vectors of the image to be queried with the feature vectors of the pedestrian image detected in the video, and counting
Calculating feature similarity, wherein the feature similarity calculation formula is as follows:
Wherein x is the characteristic of the pedestrian to be inquired, y is the characteristic of the pedestrian image detected in the video image, n is the characteristic dimension, and T x,y is the cosine value between the two characteristics;
42 A pedestrian in the detected image that satisfies the similarity threshold is taken as a target pedestrian in the video.
According to the method, the cloud server, the edge server and the data acquisition equipment are needed to be used in the training process of the feature extraction model, and the data are encrypted by using a homomorphic encryption algorithm and then transmitted when the information is transmitted between the edge server and the cloud server.
According to the method, the cloud server, the edge server and the data acquisition equipment are needed to be used in the training process of the feature extraction model, and the data are encrypted by using a homomorphic encryption algorithm and then transmitted when the information is transmitted between the edge server and the cloud server.
Further describing the invention, the training method of the feature extraction model is optimized through knowledge distillation and weight distribution.
The knowledge distillation optimization method includes the steps of firstly taking a data set as a shared data set, distributing the shared data set and initialized model parameters to edge servers by a cloud server, predicting the shared data set by each edge server through a local model obtained through local data training to obtain a prediction label, uploading updated weights of the local model and the obtained prediction label, carrying out weighted average on the received prediction label by the cloud server, and finally training a federal model by the cloud server through the shared data set and the prediction label, so that model instability is reduced, and better convergence of the model is achieved.
The invention further describes the optimization method of weight distribution, which uses cosine distance as a measurement standard, distributes larger weight to the local model with larger parameter change in each training, and leads more learned knowledge to be reflected in the global model, and comprises the following specific procedures:
51 The edge server randomly selects a group of training data D batch from the local training data;
52 In the next round of training, the edge server receives the model parameters issued by the cloud server, connects the global model with the local classifier obtained in the previous round of training to form a new local model, and sends the local model parameters Save with D batch as log
53 Edge server continues training to obtain new local modelNew model parametersSave with D batch as log
54 Edge server calculates weights by averaging cosine distances for each data point in D batch The calculation formula is as follows:
55 Edge server will weight) Uploading to a cloud server, wherein the cloud server uses the weight value as a weighted average of each model.
The invention has the following beneficial effects:
1. the invention processes images of different modes by using a cross-mode processing technology so as to meet the requirements of cross-mode pedestrian tracking.
2. The present invention is directed to a method for weight redistribution by using knowledge distillation. The method solves the problems of precision reduction and model convergence caused by data heterogeneity of multiple scenes when federal learning is applied to pedestrian tracking.
3. The invention completes all processes under the condition that the data does not go out of the local place by using the federal learning and the edge server to carry out the distributed training of the pedestrian tracking model, thereby protecting the safety of the monitoring data and the privacy of the user.
Drawings
Fig. 1 is a flowchart of pedestrian tracking.
FIG. 2 is a cross-modality processing module workflow diagram.
Fig. 3 is a schematic diagram of a training process of the pedestrian feature extraction module.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Example 1:
a pedestrian tracking method based on federal learning and edge calculation comprises the following steps:
Step one, detecting and acquiring a pedestrian image;
step two, performing cross-modal processing on the pedestrian image;
Step three, extracting pedestrian characteristics;
And step four, matching the characteristics and determining the target pedestrians.
Example 2:
a pedestrian tracking method based on federal learning and edge calculation comprises the following steps:
step one, detecting and acquiring a pedestrian image, which specifically comprises the following steps:
11 Processing the video to be detected to obtain a plurality of video frame images;
12 Inputting each video frame image of the video file into a pedestrian tracking network, determining whether a pedestrian appears in the current frame image through the pedestrian tracking network, and if the pedestrian is detected, acquiring and outputting the pedestrian image;
step two, performing cross-modal processing on the pedestrian image;
the specific flow of the cross-mode processing is as follows:
21 Judging the image data type, if the image is RGB image, executing step 22), if the image is IR image, executing step 23);
22 Converting the RGB image into a gray scale image, wherein the image is a single channel, creating a second channel and filling the second channel with a value of 0;
23 Taking the IR image itself as a second channel, filling the first channel with a value of 0;
24 The image data processed in the step 22) or the step 23) are unified into a double-channel structure and output to the step three for pedestrian feature extraction.
Step three, extracting pedestrian characteristics, wherein the specific flow is as follows:
31 Respectively inputting the pedestrian image to be queried and the pedestrian image detected from the video and subjected to cross-modal processing into a trained feature extraction model based on federal learning and edge calculation;
32 The feature extraction model extracts feature vectors of the pedestrian image to be queried and the detected pedestrian images respectively.
Step four, matching the characteristics and determining a target pedestrian, wherein the specific flow is as follows:
41 Respectively comparing the feature vectors of the image to be queried with the feature vectors of the pedestrian image detected in the video, and counting
Calculating feature similarity, wherein the feature similarity calculation formula is as follows:
Wherein x is the characteristic of the pedestrian to be inquired, y is the characteristic of the pedestrian image detected in the video image, n is the characteristic dimension, and T x,y is the cosine value between the two characteristics
42 A pedestrian in the detected image that satisfies the similarity threshold is taken as a target pedestrian in the video.
According to the method, the cloud server, the edge server and the data acquisition equipment are needed to be used in the training process of the feature extraction model, and the data are encrypted by using a homomorphic encryption algorithm and then transmitted when the information is transmitted between the edge server and the cloud server.
Example 3:
a pedestrian tracking method based on federal learning and edge calculation comprises the following steps:
Step one, detecting and acquiring a pedestrian image, which specifically comprises the following steps:
11 Processing the video to be detected to obtain a plurality of video frame images;
12 Inputting each video frame image of the video file into a pedestrian tracking network, determining whether a pedestrian appears in the current frame image through the pedestrian tracking network, and if the pedestrian is detected, acquiring and outputting the pedestrian image;
step two, performing cross-modal processing on the pedestrian image;
the specific flow of the cross-mode processing is as follows:
21 Judging the image data type, if the image is RGB image, executing step 22), if the image is IR image, executing step 23);
22 Converting the RGB image into a gray scale image, wherein the image is a single channel, creating a second channel and filling the second channel with a value of 0;
23 Taking the IR image itself as a second channel, filling the first channel with a value of 0;
24 The image data processed in the step 22) or the step 23) are unified into a double-channel structure and output to the step three for pedestrian feature extraction.
Step three, extracting pedestrian characteristics, wherein the specific flow is as follows:
31 Respectively inputting the pedestrian image to be queried and the pedestrian image detected from the video and subjected to cross-modal processing into a trained feature extraction model based on federal learning and edge calculation;
32 The feature extraction model extracts feature vectors of the pedestrian image to be queried and the detected pedestrian images respectively.
The method comprises the steps that a cloud server, an edge server and data acquisition equipment are needed in the training process of the feature extraction model, and when information is transmitted between the edge server and the cloud server, the data are encrypted by using a homomorphic encryption algorithm and then transmitted.
The training method of the feature extraction model is optimized through knowledge distillation and weight distribution;
The knowledge distillation optimization method comprises the steps of firstly taking a data set as a shared data set, distributing the shared data set and initialized model parameters to edge servers by a cloud server, predicting the shared data set by each edge server by using a local model obtained by local data training to obtain a prediction label, uploading updated weight of the local model and the obtained prediction label, and carrying out weighted average on the received prediction label by the cloud server;
the optimization method of the weight distribution is that cosine distance is used as a measurement standard, and the local model with larger parameter change in each training is distributed with larger weight, so that more learned knowledge is reflected in the global model, and the specific flow is as follows:
51 The edge server randomly selects a group of training data D batch from the local training data;
52 In the next round of training, the edge server receives the model parameters issued by the cloud server, connects the global model with the local classifier obtained in the previous round of training to form a new local model, and sends the local model parameters Save with D batch as log
53 Edge server continues training to obtain new local modelNew model parametersSave with D batch as log
54 Edge server calculates weights by averaging cosine distances for each data point in D batch The calculation formula is as follows:
55 Edge server will weight) Uploading to a cloud server, wherein the cloud server uses the weight value as a weighted average of each model.
Step four, matching the characteristics and determining a target pedestrian, wherein the specific flow is as follows:
41 Respectively comparing the feature vectors of the image to be queried with the feature vectors of the pedestrian image detected in the video, and counting
Calculating feature similarity, wherein the feature similarity calculation formula is as follows:
wherein x is the characteristic of the pedestrian to be inquired, y is the characteristic of the pedestrian image detected in the video image, n is the characteristic dimension, and T x,y is the cosine value between the two characteristics;
42 A pedestrian in the detected image that satisfies the similarity threshold is taken as a target pedestrian in the video.
The above embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, the scope of which is defined by the claims. Various modifications and equivalent arrangements of parts may be made to the present invention within the spirit and scope of the invention, and such modifications and equivalents should be considered to fall within the scope of the invention.