CN114582011B

CN114582011B - Pedestrian tracking method based on federal learning and edge calculation

Info

Publication number: CN114582011B
Application number: CN202111612745.XA
Authority: CN
Inventors: 曾锦超; 黄裕; 黄家铭; 刘洋; 覃雄宁
Original assignee: Guangxi Zhuang Autonomous Region Public Information Industry Co ltd
Current assignee: Guangxi Zhuang Autonomous Region Public Information Industry Co ltd
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2025-07-18
Anticipated expiration: 2041-12-27
Also published as: CN114582011A

Abstract

The present invention discloses a pedestrian tracking method based on federated learning and edge computing, comprising the following steps: step one, detecting and acquiring pedestrian images; step two, performing cross-modal processing on pedestrian images; step three, extracting pedestrian features; step four, matching features and determining target pedestrians. The method can effectively ensure data privacy and solve the problem of data islands based on model training based on federated learning; use knowledge distillation and weight distribution methods to solve the problems of accuracy reduction and model convergence caused by data heterogeneity in multiple scenes; use cross-modal processing technology to process images of different modalities to meet the needs of cross-modal pedestrian tracking.

Description

Pedestrian tracking method based on federal learning and edge calculation

Technical Field

The invention belongs to the technical field of computer vision and federal learning, and particularly relates to a pedestrian tracking method based on federal learning and edge calculation.

Background

The existing pedestrian tracking method needs to train a model by using a large amount of pedestrian data, but a single data set usually only contains data of a specific scene, and the identification accuracy is often greatly reduced when the model trained based on the data set is applied to an actual scene. If the data collected by the video acquisition equipment in a plurality of different scenes are used for model training, data leakage can be caused, and the privacy and the data safety of a user can not be ensured. In addition, the existing pedestrian tracking technology can only be applied to RGB video images photographed in daytime, but pedestrian tracking is required to be applicable to night scenes as well as daytime use. In summary, the model obtained by training the single marked data set by the existing pedestrian tracking technology has the defects of poor generalization capability, incapability of adapting to different use scenes and incapability of processing cross-modal data. Therefore, a pedestrian tracking method capable of ensuring data security, adapting to different application scenes and realizing cross-modal identification is a key for solving the problem.

The federal learning is used as a distributed machine learning technology, so that a plurality of participants can perform machine learning under the condition of protecting data privacy and meeting legal compliance requirements, and the problem of data island is solved. The edge computing technology is mainly applied to the field of video processing, and has the advantages of low transmission delay, large bandwidth and high computing performance. The federal learning and edge computing are applied to the pedestrian tracking field, and the edge server and the cloud server are used for jointly performing model training based on federal learning, so that data privacy can be effectively ensured, and the problem of data island is solved.

The Chinese patent No. CN20191019437. X discloses a pedestrian tracking method, device and equipment, wherein the method comprises the steps of obtaining a video frame to be detected, detecting candidate pedestrians in the video frame to be detected, extracting candidate pedestrian characteristics of the candidate pedestrians, determining differences between the candidate pedestrian characteristics and stored characteristics in a characteristic queue, and determining the candidate pedestrians as target pedestrians when the differences meet preset conditions, wherein the stored characteristics in the characteristic queue are characteristics matched with the target pedestrians. The efficiency of the pedestrian tracking process can be improved. But this method can only be used in the daytime and cannot be applied to night scenes.

The Chinese patent CN201510548633.0 discloses a video pedestrian detection and tracking method based on motion information and track association. Detecting pedestrians, namely detecting motion by using a frame difference method, combining a morphological method in digital image processing, firstly detecting a motion region in a video, then extracting features in the motion region in a sliding window searching mode, and finally classifying by using a pre-trained pedestrian detection classifier to obtain a classification result. The tracking method is characterized in that the pedestrian detection result obtained in the last step is used as input of the step, a tracker is initialized for each detected pedestrian at the beginning, and each tracker contains historical motion information and appearance information of a target. When the current frame is processed, extracting position information and appearance information from each input detection result, and establishing an association matrix based on the position information and the appearance information to associate tracking targets of previous frames, so as to finally obtain the tracking track of pedestrians. The method has good real-time performance and good robustness in more complex scenes. But this method has a data security problem.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a pedestrian tracking method based on federal learning and edge calculation, which uses an edge server and a cloud server to jointly perform model training based on federal learning to effectively ensure data privacy and solve the problem of data island, uses a knowledge distillation and weight distribution method to solve the problems of precision reduction and model convergence caused by data isomerism of multiple scenes when federal learning is applied to pedestrian tracking, and uses a cross-modal processing technology to process images of different modes so as to meet the requirements of cross-modal pedestrian tracking.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a pedestrian tracking method based on federal learning and edge calculation comprises the following steps:

Step one, detecting and acquiring a pedestrian image;

step two, performing cross-modal processing on the pedestrian image;

Step three, extracting pedestrian characteristics;

And step four, matching the characteristics and determining the target pedestrians.

In a further aspect of the present invention, the detecting and acquiring a pedestrian image in the first step specifically includes:

11 Processing the video to be detected to obtain a plurality of video frame images;

12 Inputting each video frame image of the video file into a pedestrian tracking network, determining whether a pedestrian appears in the current frame image through the pedestrian tracking network, and if the pedestrian is detected, acquiring and outputting the pedestrian image;

The detection is carried out to obtain a pedestrian image based on a detection frame of PP-YOLOv.

The invention further provides a specific flow of the cross-mode processing in the second step, which is as follows:

21 Judging the image data type, if the image is RGB image, executing step 22), if the image is IR image, executing step 23);

22 Converting the RGB image into a gray scale image, wherein the image is a single channel, creating a second channel and filling the second channel with a value of 0;

23 Taking the IR image itself as a second channel, filling the first channel with a value of 0;

24 The image data processed in the step 22) or the step 23) are unified into a double-channel structure and output to the step three for pedestrian feature extraction.

In the process of model training by using the training set, different nodes can selectively inactivate different modes, so that the effect of cross-mode re-identification is achieved.

The specific flow of extracting the pedestrian features in the third step is as follows:

31 Respectively inputting the pedestrian image to be queried and the pedestrian image detected from the video and subjected to cross-modal processing into a trained feature extraction model based on federal learning and edge calculation;

32 The feature extraction model extracts feature vectors of the pedestrian image to be queried and the detected pedestrian images respectively.

The model training method of federal learning and edge computing comprises the steps that firstly, a cloud server selects K participating models from all N edge servers to train to obtain an initial global model, and the cloud server transmits the global model to each edge server. And after receiving the global model, the edge server connects the global model with the local classifier obtained by the previous round of training to form a new local model, and uses the local data set to perform local updating training on the model. The edge server saves the local classifier and uploads the parameters of the backbone network to the cloud server. Finally, the cloud server aggregates the received model parameters uploaded by all edge servers to obtain a new global model, and the global model is continuously issued and the steps are repeated until the model converges to meet the accuracy requirement.

The invention further provides a specific process for determining the target pedestrian by the matching feature in the fourth step, which comprises the following steps:

41 Respectively comparing the feature vectors of the image to be queried with the feature vectors of the pedestrian image detected in the video, and counting

Calculating feature similarity, wherein the feature similarity calculation formula is as follows:

Wherein x is the characteristic of the pedestrian to be inquired, y is the characteristic of the pedestrian image detected in the video image, n is the characteristic dimension, and T _x,y is the cosine value between the two characteristics;

42 A pedestrian in the detected image that satisfies the similarity threshold is taken as a target pedestrian in the video.

According to the method, the cloud server, the edge server and the data acquisition equipment are needed to be used in the training process of the feature extraction model, and the data are encrypted by using a homomorphic encryption algorithm and then transmitted when the information is transmitted between the edge server and the cloud server.

Further describing the invention, the training method of the feature extraction model is optimized through knowledge distillation and weight distribution.

The knowledge distillation optimization method includes the steps of firstly taking a data set as a shared data set, distributing the shared data set and initialized model parameters to edge servers by a cloud server, predicting the shared data set by each edge server through a local model obtained through local data training to obtain a prediction label, uploading updated weights of the local model and the obtained prediction label, carrying out weighted average on the received prediction label by the cloud server, and finally training a federal model by the cloud server through the shared data set and the prediction label, so that model instability is reduced, and better convergence of the model is achieved.

The invention further describes the optimization method of weight distribution, which uses cosine distance as a measurement standard, distributes larger weight to the local model with larger parameter change in each training, and leads more learned knowledge to be reflected in the global model, and comprises the following specific procedures:

51 The edge server randomly selects a group of training data D _batch from the local training data;

52 In the next round of training, the edge server receives the model parameters issued by the cloud server, connects the global model with the local classifier obtained in the previous round of training to form a new local model, and sends the local model parameters Save with D _batch as log

53 Edge server continues training to obtain new local modelNew model parametersSave with D _batch as log

54 Edge server calculates weights by averaging cosine distances for each data point in D _batch The calculation formula is as follows:

55 Edge server will weight) Uploading to a cloud server, wherein the cloud server uses the weight value as a weighted average of each model.

The invention has the following beneficial effects:

1. the invention processes images of different modes by using a cross-mode processing technology so as to meet the requirements of cross-mode pedestrian tracking.

2. The present invention is directed to a method for weight redistribution by using knowledge distillation. The method solves the problems of precision reduction and model convergence caused by data heterogeneity of multiple scenes when federal learning is applied to pedestrian tracking.

3. The invention completes all processes under the condition that the data does not go out of the local place by using the federal learning and the edge server to carry out the distributed training of the pedestrian tracking model, thereby protecting the safety of the monitoring data and the privacy of the user.

Drawings

Fig. 1 is a flowchart of pedestrian tracking.

FIG. 2 is a cross-modality processing module workflow diagram.

Fig. 3 is a schematic diagram of a training process of the pedestrian feature extraction module.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Example 1:

Step one, detecting and acquiring a pedestrian image;

step two, performing cross-modal processing on the pedestrian image;

Step three, extracting pedestrian characteristics;

Example 2:

step one, detecting and acquiring a pedestrian image, which specifically comprises the following steps:

step two, performing cross-modal processing on the pedestrian image;

the specific flow of the cross-mode processing is as follows:

Step three, extracting pedestrian characteristics, wherein the specific flow is as follows:

Step four, matching the characteristics and determining a target pedestrian, wherein the specific flow is as follows:

Wherein x is the characteristic of the pedestrian to be inquired, y is the characteristic of the pedestrian image detected in the video image, n is the characteristic dimension, and T _x,y is the cosine value between the two characteristics

Example 3:

step two, performing cross-modal processing on the pedestrian image;

the specific flow of the cross-mode processing is as follows:

The method comprises the steps that a cloud server, an edge server and data acquisition equipment are needed in the training process of the feature extraction model, and when information is transmitted between the edge server and the cloud server, the data are encrypted by using a homomorphic encryption algorithm and then transmitted.

The training method of the feature extraction model is optimized through knowledge distillation and weight distribution;

The knowledge distillation optimization method comprises the steps of firstly taking a data set as a shared data set, distributing the shared data set and initialized model parameters to edge servers by a cloud server, predicting the shared data set by each edge server by using a local model obtained by local data training to obtain a prediction label, uploading updated weight of the local model and the obtained prediction label, and carrying out weighted average on the received prediction label by the cloud server;

the optimization method of the weight distribution is that cosine distance is used as a measurement standard, and the local model with larger parameter change in each training is distributed with larger weight, so that more learned knowledge is reflected in the global model, and the specific flow is as follows:

The above embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, the scope of which is defined by the claims. Various modifications and equivalent arrangements of parts may be made to the present invention within the spirit and scope of the invention, and such modifications and equivalents should be considered to fall within the scope of the invention.

Claims

1. A pedestrian tracking method based on federated learning and edge computing, characterized by comprising the following steps:

Step 1: Detect and obtain pedestrian images;

Step 2: Perform cross-modal processing on pedestrian images;

The specific process of the cross-modal processing is as follows:

21) Determine the image data type: if it is an RGB image, execute step 22); if it is an IR image, execute step 23);

22) Convert the RGB image to a grayscale image. At this time, the image is a single channel. Create a second channel and fill it with 0 values.

23) Use the IR image itself as the second channel and fill the first channel with zero values;

24) The image data processed in step 22) or step 23) is unified into a dual-channel structure and output to step 3 for pedestrian feature extraction;

Step 3: Extract pedestrian features. The specific process is as follows:

31) Input the pedestrian image to be queried and the pedestrian image detected from the video and processed cross-modally into the trained feature extraction model based on federated learning and edge computing respectively;

32) The feature extraction model extracts the feature vectors of the pedestrian image to be queried and each detected pedestrian image respectively;

Step 4: Match features and determine the target pedestrian. The specific process is as follows:

41) The feature vector of the image to be queried is compared with the feature vector of the pedestrian image detected in the video, and the feature similarity is calculated; the feature similarity calculation formula is:

Where x is the feature of the pedestrian to be queried, y is the feature of the pedestrian image detected in the video image, n is the feature dimension, and T _{x, y} is the cosine value between the two features;

42) The pedestrians in the detected image that meet the similarity threshold are taken as the target pedestrians in the video.

2. The pedestrian tracking method based on federated learning and edge computing according to claim 1 is characterized in that: the step of detecting and acquiring pedestrian images in step 1 specifically includes:

11) Processing the video to be detected to obtain several video frame images;

12) Each video frame image of the video file is input into the pedestrian tracking network, and the pedestrian tracking network is used to determine whether a pedestrian appears in the current frame image. If a pedestrian is detected, the pedestrian image is obtained and output.

3. According to claim 2, the pedestrian tracking method based on federated learning and edge computing is characterized in that: cloud servers, edge servers and data acquisition equipment are required in the feature extraction model training process; when the edge server and the cloud server transmit information, the data is encrypted using a homomorphic encryption algorithm before transmission.

4. According to claim 3, the pedestrian tracking method based on federated learning and edge computing is characterized in that the training method of the feature extraction model is optimized through knowledge distillation and weight distribution.

5. According to the pedestrian tracking method based on federated learning and edge computing described in claim 4, it is characterized in that: the optimization method of the knowledge distillation is to first use a data set as a shared data set, and the cloud server distributes the shared data set together with the initialized model parameters to the edge server, and each edge server uses the local model trained with local data to predict the shared data set to obtain a predicted label; then upload the updated weights of the local model and the obtained predicted labels, and the cloud server performs weighted average on the received predicted labels; finally, the cloud server uses the shared data set and the predicted labels to train the federated model to reduce the instability of the model and make the model converge better.

6. The pedestrian tracking method based on federated learning and edge computing according to claim 5 is characterized in that: the optimization method of weight allocation is to use cosine distance as a metric, and to assign a larger weight to the local model with a larger parameter change in each training, so that more newly learned knowledge is reflected in the global model; the specific process is:

51) The edge server randomly selects a set of training data D _batch from the local training data;

52) In the next round of training, the edge server receives the model parameters sent by the cloud server, connects the global model with the local classifier obtained from the previous round of training to form a new local model, and sets the local model parameters Save as log with D _batch

53) The edge server continues to train to obtain a new local model The new model parameters Save as log with D _batch

54) The edge server calculates the weight by averaging the cosine distance of each data point in the D _batch The calculation formula is as follows:

55) The edge server will weight Uploaded to the cloud server, the cloud server uses it as the weight value of the weighted average of each model.