CN119027860B

CN119027860B - Dynamic target tracking method and system for intrinsic safety type video monitoring of vehicle-mounted mine

Info

Publication number: CN119027860B
Application number: CN202411504952.7A
Authority: CN
Inventors: 焦敏; 彭斌; 罗茂耘; 赵光绪; 武玉梁
Original assignee: Chengdu Keruite Electric Automation Co ltd
Current assignee: Chengdu Keruite Electric Automation Co ltd
Priority date: 2024-10-28
Filing date: 2024-10-28
Publication date: 2025-01-24
Anticipated expiration: 2044-10-28
Also published as: CN119027860A

Abstract

The invention provides a dynamic target tracking method and a system for vehicle-mounted mining intrinsic safety type video monitoring, which relate to the technical field of video monitoring target tracking and comprise the steps of preprocessing a real-time video stream acquired through a vehicle-mounted camera, extracting fine-granularity tracking characteristics of a detection target result, sequencing captured events according to a time stamp to obtain a real-time position and a motion trend of a moving target, carrying out short-term prediction by using a Kalman filter in a TLD target moving algorithm to obtain an updated target model, processing by using a motion detection algorithm to obtain the motion characteristic of the moving target, processing by using a video tracking algorithm based on background segmentation to obtain a specific event prediction result, and finally obtaining the dynamic target tracking result by combining the motion characteristic of the moving target and the specific event prediction result. The invention improves the performance of the vehicle-mounted mining intrinsic safety type video monitoring system in the aspect of dynamic target tracking, and realizes the accurate tracking of the dynamic target in the mine.

Description

Dynamic target tracking method and system for intrinsic safety type video monitoring of vehicle-mounted mine

Technical Field

The invention relates to the technical field of video monitoring target tracking, in particular to a dynamic target tracking method and system for intrinsic safety type video monitoring of a vehicle-mounted mine.

Background

With the continuous deep industrialization and automation, the safety and efficiency of mine operation are increasingly important. In mine monitoring systems, a dynamic target tracking technology is one of key technologies for guaranteeing the safety of miners and optimizing the operation flow, and in recent years, along with the development of computer vision and pattern recognition technologies, the application of the dynamic target tracking technology in the field of mine monitoring is remarkably promoted.

The traditional monitoring system usually adopts wired connection, which easily leads to line damage in the complex environment of a mine, the internal environment of the mine is complex, light is dim, dust is more, temperature change is large, the traditional video monitoring technology is difficult to stably work in the environment, monitoring quality can be affected sometimes, such as the environment in the mine is bad, such as high dust and strong vibration, the requirements on stability and reliability of equipment are extremely high, and many existing equipment cannot be adapted. The existing video monitoring system is often focused on video recording rather than real-time analysis, lacks quick response capability to sudden events, is often required to be uploaded to a central server for analysis when video data are processed, so that the processing speed is low, the requirement of real-time monitoring cannot be met, and a target tracking algorithm is easy to lose targets under a complex background, such as rocks and vehicles with similar colors in a mine, so that the monitoring effect is affected. Also, existing systems often lack an effective early warning mechanism and cannot accurately predict and respond in time before a hazard occurs. Thus, the prior art is challenged in the special environment of mines, and these factors seriously affect the stability and reliability of the monitoring system.

Disclosure of Invention

The invention aims to provide a dynamic target tracking method and a system for vehicle-mounted mining intrinsic safety type video monitoring so as to solve the problems. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

in a first aspect, the application provides a dynamic target tracking method for intrinsic safety type video monitoring for a vehicle-mounted mine, which comprises the following steps:

The method comprises the steps that a real-time video stream is obtained through a vehicle-mounted camera, wherein the real-time video stream comprises a visible light video and an event stream captured by an event camera, the visible light video is a visible light activity scene in a mine, the visible light activity scene comprises states of miner activities, vehicle operation and other visible objects, and the event stream captured by the event camera comprises brightness change position and time information recorded when pixel brightness changes;

Preprocessing a real-time video stream, wherein the preprocessing comprises noise reduction and illumination enhancement, an optimized video frame is obtained, a lightweight deep learning model is utilized to analyze the optimized video frame, a detection target result is obtained, fine granularity tracking feature extraction is carried out on the detection target result according to the event stream with high time resolution and a visible light video, and a first feature of a surface interest point of the target result is obtained, wherein the first feature comprises local texture, shape information and a motion mode, and the detection target result comprises preliminary positioning results of miners, vehicle targets and other visible objects;

according to the first characteristics of the surface interest points, processing the moving targets in the event stream through a rapid tracking algorithm to obtain real-time positions and movement trends of the moving targets, performing short-term prediction by using a Kalman filter in a TLD target movement algorithm to obtain short-term movement tracks of the moving targets, and obtaining updated target models through an online learning mechanism;

Based on the updated target model and the real-time video frame, a motion detection algorithm is used for processing to obtain the motion state of the moving target, wherein the processing process comprises the steps of identifying the moving target and extracting the characteristics of the moving target by analyzing the change of pixel points in an image sequence;

Aiming at the color distribution characteristics of a specific event, a video tracking algorithm based on background segmentation is applied to process to obtain a specific event prediction result, wherein the processing process comprises the steps of calculating a color histogram of a dynamic target, searching a region which is most matched with the histogram in a next frame, dynamically adjusting the size and shape of a target frame, combining the motion characteristics of a moving target and the specific event prediction result, and finally obtaining the dynamic target tracking result through multi-scale fusion processing and updating correction.

Preferably, the method comprises the steps of analyzing an optimized video frame by using a lightweight deep learning model to obtain a detection target result, extracting fine granularity tracking characteristics of the detection target result to obtain first characteristics of surface interest points of the target result, wherein the first characteristics comprise:

Processing the optimized video frame through YOLOv-Lite model, wherein the model obtains preliminary positioning results of miners, vehicle targets and other visible objects by multiplying the input frame with pre-training weights and applying an activation function;

Obtaining an optimization measure of the target candidate region through distance intersection ratio calculation based on the preliminary positioning result, and obtaining a boundary frame of the target candidate region through non-maximum suppression processing, wherein the processing comprises the steps of calculating the intersection ratio of a prediction frame and removing an overlapped frame based on a preset threshold value;

and extracting the characteristics which are still identified and matched under the illumination change and shielding rotation conditions by calculating the gradient direction histogram of the local image in the boundary box by using a scale-invariant characteristic transformation algorithm, and marking the characteristics as first characteristics of interest points of the target surface.

Preferably, the extracting, by calculating a gradient direction histogram of a local image in a bounding box, a feature still having identification and matching under illumination change and occlusion rotation conditions by using a scale invariant feature transformation algorithm, and recording the feature as a first feature of a target surface interest point includes:

Constructing a differential Gaussian pyramid by utilizing a Gaussian pyramid technology based on a bounding box to obtain a first key point position, wherein the first key point position comprises points representing angular points, edges or other remarkable characteristics of targets in a mine in a scale space, and the differential Gaussian pyramid is obtained by calculating the difference value of two adjacent layers of the Gaussian pyramid;

According to the differential Gaussian pyramid, extremum detection is carried out on the scale space of each pixel point on the first key point position, so that a second key point position is obtained, wherein each pixel point comprises extremum points which are compared with 26 adjacent pixels of surrounding adjacent scale layers;

obtaining a unique direction of each second key point position by calculating a gradient direction histogram based on gradient directions around the second key point position, wherein a main direction is allocated to the second key point position by taking a neighborhood of 16x16 pixels around the second key point position, calculating gradient directions and sizes of each pixel, generating a histogram of 8 directions in each sub-region of 4x 4;

Based on the dominant direction of the second keypoint location, feature descriptors are generated in the neighborhood of 16x16 pixels around it, resulting in first features that still have identified and matched features under illumination variation and occlusion rotation conditions, and are noted as target surface points of interest, where the descriptors are constructed by computing histograms of gradient directions in each 4x4 sub-region, resulting in a 128-dimensional feature vector.

Preferably, the processing the moving object in the event stream through the fast tracking algorithm according to the first feature of the surface interest point to obtain the real-time position and the motion trend of the moving object includes:

Calculating a color histogram of the moving target region based on the first feature of the surface interest point;

calculating a region similar to the color histogram of the moving target in the search window to determine the current position of the moving target, obtaining a matching result, and updating the position of the moving target according to the matching result;

Based on the updated position of the moving object, a movement trend of the moving object is analyzed, wherein the movement trend comprises a movement direction and a movement speed.

Preferably, the performing short-term prediction by using a kalman filter in a TLD target movement algorithm to obtain a short-term movement track of a moving target, and obtaining an updated target model through an online learning mechanism, where the method includes:

Based on the real-time position and the motion trend of the moving target, performing optimal estimation calculation on the target state through the treatment of a Kalman filter to obtain a short-term motion track of the moving target, wherein the calculation process comprises a prediction step and an updating step, the prediction step is used for estimating the position and the speed of the moving target in the next frame, and the updating step is used for adjusting and predicting according to new observation data;

Collecting positive and negative samples according to the short-term motion trail of the moving target to form a training data set for model updating, wherein the positive samples are from detection results of the tracking target, and the negative samples are from areas identified as non-targets;

And processing the training data set by using an online random gradient descent algorithm, updating model parameters online, and adapting to the change of the appearance of the target, thereby obtaining an updated target model reflecting the current state of the moving target.

Preferably, the color distribution characteristic of the specific event is processed by applying a video tracking algorithm based on background segmentation to obtain a prediction result of the specific event, wherein the processing includes calculating a color histogram of a dynamic target, searching a region which is most matched with the histogram in a next frame, and dynamically adjusting the size and shape of a target frame, which includes:

Calculating a color histogram of a specific event in the mine to obtain a color feature description of the target, wherein the color distribution characteristics comprise the concentration, the distribution range and the uniformity of the color;

Searching a region which is most matched with the calculated color histogram in the next frame based on the color feature description of the target to obtain the potential position of the target, wherein the process of matching the histogram comprises the step of comparing the color histogram of each pixel point with the target histogram to find the best matching region;

And dynamically adjusting the target frame according to the color distribution of the target by using a CamShift algorithm according to the potential position of the target, wherein the method comprises initializing the target frame, tracking the target in a subsequent frame by using a color histogram, and iteratively updating the position and the size of the target frame to match the color distribution of the target until a pixel color angle which is most matched with the color histogram of the target is obtained, thereby obtaining a specific event prediction result.

In a second aspect, the application also provides a dynamic target tracking method for vehicle-mounted mining intrinsic safety video monitoring, which comprises the following steps:

The acquisition module is used for acquiring real-time video streams through the vehicle-mounted camera, wherein the real-time video streams comprise visible light videos and event streams captured by the event camera, the visible light videos are visible light activity scenes in a mine, the visible light activity scenes comprise states of miner activities, vehicle operation and other visible objects, and the event streams captured by the event camera comprise brightness change position and time information recorded when pixel brightness changes;

The detection extraction module is used for generating an event stream containing a time stamp based on the brightness change of pixels in the event stream captured by the event camera, and sequencing the captured events according to the time stamp to generate a high-time-resolution event stream; preprocessing a real-time video stream, wherein preprocessing comprises noise reduction and illumination enhancement to obtain an optimized video frame, analyzing the optimized video frame by utilizing a lightweight deep learning model to obtain a detection target result, and extracting fine granularity tracking features of the detection target result according to a high-time-resolution event stream and a visible light video to obtain first features of surface interest points of the target result, wherein the first features comprise local textures, shape information and a movement mode, and the detection target result comprises preliminary positioning results of miners, vehicle targets and other visible objects;

The prediction module is used for processing the moving target in the event stream through a rapid tracking algorithm according to the first characteristics of the surface interest points to obtain the real-time position and the movement trend of the moving target, performing short-term prediction by using a Kalman filter in a TLD target movement algorithm to obtain the short-term movement track of the moving target, and obtaining an updated target model through an online learning mechanism;

The recognition module is used for obtaining the motion state of the moving target based on the updated target model and the real-time video frame by utilizing a motion detection algorithm, wherein the motion detection algorithm processing process comprises the steps of recognizing the moving target and extracting the characteristics of the moving target by analyzing the change of pixel points in an image sequence;

The tracking module is used for aiming at the color distribution characteristics of a specific event, applying a video tracking algorithm based on background segmentation to process to obtain a specific event prediction result, wherein the processing process comprises the steps of calculating a color histogram of a dynamic target, searching a region which is most matched with the histogram in a next frame, dynamically adjusting the size and shape of a target frame, combining the motion characteristics of the moving target and the specific event prediction result, and finally obtaining the dynamic target tracking result through multi-scale fusion processing and updating correction.

In a third aspect, the present application also provides a dynamic target tracking device for vehicle-mounted mining intrinsic safety video monitoring, including:

a memory for storing a computer program;

and the processor is used for realizing the step of the dynamic target tracking method of the intrinsic safety type video monitoring for the vehicle-mounted mine when executing the computer program.

In a fourth aspect, the present application further provides a readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements the steps of the dynamic target tracking method based on intrinsic safety video monitoring for vehicle-mounted mining.

The beneficial effects of the invention are as follows:

According to the invention, the front end processing of the video is performed by utilizing the vehicle-mounted computing platform, the data transmission delay is reduced, the real time of the system is improved, the interest points on the surface of the target are focused by a fine granularity tracking technology, the accuracy of target detection is improved, the dynamic change in a scene is captured by utilizing the event camera, a high-time-resolution event stream is generated, the high-efficiency tracking of a fast moving target is realized, the accuracy of target detection and tracking is improved by adopting a target detection and tracking algorithm based on deep learning, the size and shape of a target frame are dynamically adjusted according to the color distribution of the target by utilizing a video tracking algorithm based on background segmentation, the tracking capability of events such as ore slip and gas leakage is improved, the dynamic monitoring and intelligent analysis of coal mine disaster events are realized, and the accuracy and response speed of early warning are improved.

The invention can accurately detect the targets such as miners, vehicles and the like by analyzing the optimized video frames by using a lightweight deep learning model, can keep high accuracy even in a complex environment such as a mine, can obtain first characteristics including local texture, shape information and a motion mode by extracting fine-granularity tracking characteristics of the detected targets, has stronger robustness on illumination change and shielding, is favorable for keeping a stable tracking effect in a severe environment, can track the rapidly moving targets in real time by utilizing a high-time-resolution event stream captured by an event camera and combining a rapid tracking algorithm, is particularly suitable for dynamic monitoring of equipment or personnel in the mine, can adapt to dynamic change of target motion by carrying out short-term prediction by a Kalman filter in a TLD algorithm and combining an online learning mechanism to update the target model, and can improve the tracking accuracy and adaptability.

The invention dynamically adjusts the size and shape of the target frame by applying the CamShift algorithm aiming at the color distribution characteristics of specific events such as ore slip and gas leakage, can accurately track the events, timely send out early warning, improve the safety of a mine, reduce the dependence on a central server by performing front-end processing and feature extraction on a vehicle-mounted computing platform, reduce data transmission delay, improve the instantaneity and reliability of a system, optimize the use of computing resources, comprehensively utilize the motion characteristics of a dynamic target and the prediction result of the specific event by multi-scale fusion processing and updating correction, improve the intelligent decision level of the system and realize more accurate target tracking.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a dynamic target tracking method for intrinsic safety type video monitoring for vehicle-mounted mining according to an embodiment of the invention;

Fig. 2 is a schematic structural diagram of a dynamic target tracking system for intrinsic safety type video monitoring for vehicle-mounted mining according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a dynamic target tracking device for vehicle-mounted mining intrinsic safety video monitoring according to an embodiment of the present invention.

In the figure, 701, an acquisition module, 702, a detection and extraction module, 703, a prediction module, 704, an identification module, 705, a tracking module, 800, a dynamic target tracking device for intrinsic safety type video monitoring of a vehicle-mounted mine, 801, a processor, 802, a memory, 803, a multimedia component, 804, an I/O interface, 805 and a communication component.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.

Example 1:

the embodiment provides a dynamic target tracking method for vehicle-mounted mining intrinsic safety type video monitoring.

Referring to fig. 1, the method is shown to include steps S100, S200, S300, S400, and S500.

S100, acquiring a real-time video stream through a vehicle-mounted camera, wherein the real-time video stream comprises a visible light video and an event stream captured by an event camera, the visible light video is a visible light activity scene in a mine, the visible light activity scene comprises states of miner activities, vehicle operation and other visible objects, and the event stream captured by the event camera comprises brightness change position and time information recorded when pixel brightness changes;

It will be appreciated that in this step, the event stream captured by the event camera, unlike a conventional camera, records events only when pixel brightness changes, generating a time-stamp based event stream containing brightness change location and time information, where the difference between the two is that the visible light video provides continuous video frames suitable for capturing details and color information in the scene, but the amount of data is large, the processing requirements are high, the event stream provides asynchronous, event-driven data suitable for capturing rapid changes, the computational resource requirements are low, but the detail information of conventional video is lacking.

S200, generating an event stream containing a time stamp based on pixel brightness change in the event stream captured by an event camera, sorting the captured event according to the time stamp, generating a high-time-resolution event stream, preprocessing a real-time video stream, wherein the preprocessing comprises noise reduction and illumination enhancement, obtaining an optimized video frame, analyzing the optimized video frame by using a lightweight deep learning model, obtaining a detection target result, carrying out fine-grained tracking feature extraction on the detection target result according to the high-time-resolution event stream and a visible light video, and obtaining a first feature of a surface interest point of the target result, wherein the first feature comprises local texture, shape information and a motion mode, and the detection target result comprises preliminary positioning results of miners, vehicle targets and other visible objects.

The target surface interest points refer to points with unique and obvious characteristics on the surface of a target object in image processing and computer vision. These points are generally easily identified and tracked in the image because they have a significant texture change or color change in the surrounding area so that they can stand out from other image areas. In object tracking and recognition, points of interest are key features because they facilitate feature matching, i.e., matching the same object between different images even if the object is rotated, scaled, or illuminated, object positioning, i.e., accurately positioning and tracking the object in the images, and motion estimation, i.e., analyzing the motion pattern and direction of the object. Common points of interest include corner points, edge points, or other significant texture change points. In practical applications, various algorithms may be used to detect these points, such as Harris corner point detector, FAST or SIFT, etc.

In this step, based on the brightness change of the pixels in the event stream captured by the event camera, generating an event stream containing a timestamp, and sorting the captured events according to the timestamp, generating an event stream with high time resolution, which includes:

By utilizing the characteristics of the event camera, the position and time information of brightness change when the brightness of the pixel is changed are recorded to form an event stream with high time resolution, and the calculation formula of the sequencing process is as follows:

Where Esorted denotes the ordered event stream, ecaptured denotes the original event stream captured from the event camera, sort is a function, λ is the timestamp of each event, x denotes the single event in the event stream, x.timestamp denotes the timestamp of event x, and key is a parameter used to specify a function.

It will be appreciated that S201, S202, and S203 are included in this step S200, in which:

s201, processing the optimized video frame through a YOLOv-Lite model, wherein the model obtains preliminary positioning results of miners, vehicle targets and other visible objects by multiplying the input frame with a pre-training weight and applying an activation function, and a calculation formula for performing target detection by the YOLOv-Lite model is as follows:

wherein bi represents a predicted value of a boundary box, wi and bi' are weight and bias parameters of a model, sigma is a sigmoid activation function, and x represents an input feature diagram;

S202, based on a preliminary positioning result, obtaining an optimization measure of a target candidate region through distance intersection ratio calculation, and obtaining a boundary frame of the target candidate region through non-maximum suppression processing, wherein the processing comprises the steps of calculating the intersection ratio of a prediction frame and removing an overlapped frame based on a preset threshold value, and the calculation formula of the non-maximum suppression processing is as follows:

wherein B1 and B2 are two bounding boxes, ioU is the distance intersection ratio, and area is the area;

S203, extracting features which are still identified and matched under the illumination change and shielding rotation conditions by calculating a gradient direction histogram of a local image in the boundary box by using a scale invariant feature transformation algorithm, and marking the features as first features of interest points on the target surface.

It should be noted that the features with strong robustness refer to features that can still be accurately identified and matched under various environmental changes, such as illumination condition changes, image blurring, occlusion, rotation, scale changes, and the like. These features are important for the identification and tracking of objects because they provide a capability to operate stably under different conditions.

In the present embodiment, S2031, S2032, S2033, and S2034 are further included in step S203, wherein:

S2031, constructing a differential Gaussian pyramid by utilizing a Gaussian pyramid technology based on a bounding box to obtain a first key point position, wherein the first key point position comprises points representing angular points, edges or other remarkable characteristics of a target in a mine in a scale space, the differential Gaussian pyramid is obtained by calculating the difference value of two adjacent layers of the Gaussian pyramid, and the calculation formula is as follows:

DoG(x,y,σ)=G(x,y,kσ)−G(x,y,σ)

Wherein, doG is a Gaussian difference function, G (x, y, sigma) is a Gaussian function, sigma is a standard deviation of the Gaussian function, k is a constant for controlling the change of the scale, x and y are coordinates of pixel points in an image, and k sigma is a scale larger than the current scale sigma and is used for constructing a fuzzy image of an adjacent scale in a Gaussian pyramid;

Thus, the gaussian difference function is actually calculated as the difference between two gaussian blur images of different scales, which difference image can highlight local features in the image, such as edges and corner points, which have invariance to scale transformation, and in SIFT algorithm, the gaussian difference function is used to identify potential key point positions.

S2032, carrying out extremum detection on the scale space of each pixel point on the first key point position according to the differential Gaussian pyramid to obtain a second key point position, wherein each pixel point comprises extremum points compared with 26 adjacent pixels of surrounding adjacent scale layers;

The differential gaussian pyramid is a pyramid formed by a series of images obtained by performing gaussian filtering and scale transformation on an original image. Each layer image represents a different scale (degree of blurring) and each layer image is more blurred than the next layer image. Wherein the first key point position is that in the differential gaussian pyramid, each pixel point is compared with 8 pixel points around the same scale layer and 18 pixel points (26 pixel points in total) of the upper and lower adjacent scale layers, so as to determine whether the pixel point is a local extremum point (namely, a local maximum value or minimum value of brightness). The purpose of extremum detection is to find those points that are significantly prominent in brightness, which may be potential key points. If a pixel has higher or lower luminance than the surrounding 26 pixels on the current scale of the differential gaussian pyramid, it is an extreme point. The second key point position is the pixel position identified as the extreme point in the first round of detection, and becomes the final key point position after further screening and accurate positioning. These positions represent points in the image that have significant features, such as corner points, edges, etc. Scale space refers to the representation of an image at different scales. In the SIFT algorithm, the scale space is implemented by a gaussian pyramid, where each scale layer represents a blurred version of the image at that scale.

In summary, this sentence describes how potential keypoint locations are identified in the SIFT algorithm by comparing the luminance values of each pixel point in the DoG pyramid with its surrounding neighboring pixel points, which locations are further processed in subsequent steps to determine the final keypoint.

S2033, obtaining a unique direction of each second key point position by calculating a gradient direction histogram based on gradient directions around the second key point position, wherein a main direction is allocated to the second key point position by taking a neighborhood of 16x16 pixels around the second key point position, calculating the gradient direction and the gradient of each pixel, generating a histogram of 8 directions in each sub-region of 4x 4;

In the SIFT (scale invariant feature transform) algorithm, the 8-direction histogram refers to a gradient direction histogram calculated for each sub-region within a 4×4 pixel neighborhood around a key point, and this histogram is used to describe the feature of an image part. The method comprises the steps of 8-direction histogram meaning that in image processing, in order to enable feature descriptors to have invariance to rotation, a main direction needs to be allocated to each key point, gradient calculation, wherein gradient vectors (comprising the size and the direction of gradients) of each pixel around the key point are calculated, the gradient directions point to the direction with the fastest change of image brightness, histogram construction, namely, 4x4 neighborhood is divided into 8 quadrants, each quadrant covers an angle range of 45 degrees, a histogram is constructed for each quadrant, the number of gradient directions in the quadrant is counted, and 8 directions are respectively 0 degree, 45 degree, 90 degree, 135 degree, 180 degree, 225 degree, 270 degree and 315 degree, wherein each quadrant corresponds to a direction interval and is used for counting gradient information in the direction.

It will be appreciated that in the calculation, for each pixel around the keypoint, its gradient direction is calculated, then based on the gradient direction it is determined which of the 8 quadrants the pixel belongs to, and finally the count is incremented in the histogram of the corresponding quadrant (direction). In this way, even if the image is rotated, the 8-direction histograms can still describe the same features, since they are based on local gradient directions, rather than on absolute coordinate directions. The generated 8-direction histogram is used to construct a feature descriptor of the keypoint that can uniquely identify the keypoint in the image and remain stable as the image scales and rotates.

S2034, generating a feature descriptor in a neighborhood of 16x16 pixels around the second keypoint location according to the main direction of the second keypoint location, thereby obtaining a first feature which still has features of identification and matching under illumination change and occlusion rotation conditions and is recorded as a target surface interest point, wherein the descriptor is constructed by calculating a histogram of gradient directions in each 4x4 sub-region, and finally forming a 128-dimensional feature vector.

Wherein, the "128-dimensional feature vector" is a set of values used for describing the features of the key points in the image in the SIFT algorithm, and includes the image information around the point for the subsequent image matching and recognition process, and the 128 dimensions describe gradient direction histogram information around the key points, and each dimension corresponds to gradient statistical information in one direction in a 4x4 sub-area in a 16x16 pixel area around the key points. Wherein the association with "and noting the feature as the first feature of the target surface interest point" is that this 128-dimensional feature vector is a detailed description of the keypoints in the image (i.e. the interest points of the target surface), which captures local image features around the keypoints, including information of gradient direction and magnitude, etc., while these feature descriptors enable the algorithm to identify and match the same target surface interest point under different scales, directions and lighting conditions. In the SIFT algorithm, each keypoint is assigned a principal direction, and its surrounding 16x16 pixel neighborhood is divided into 16 sub-regions, each of which computes an 8-direction gradient histogram that combines to form a 128-dimensional feature vector. It will be appreciated that this vector provides a unique "fingerprint" for each keypoint in the image so that it can be accurately identified and matched during image matching. The feature vectors contained in the 128-dimensional feature vector have good adaptability to the common changes of dust, water vapor and dim light in the mine, the 128-dimensional feature vectors can be normalized so as to reduce the influence of illumination change, improve the stability and accuracy of feature matching, and also contribute to improving the reliability of a mine monitoring system.

S300, according to the first characteristics of the surface interest points, processing the moving targets in the event stream through a quick tracking algorithm to obtain real-time positions and movement trends of the moving targets, performing short-term prediction by using a Kalman filter in a TLD target movement algorithm to obtain short-term movement tracks of the moving targets, and obtaining updated target models through an online learning mechanism;

Fast moving objects are tracked using event stream based tracking algorithms, such as optical flow methods or kalman filters. The method can quickly respond to the motion change of the target, realize real-time tracking, and obtain the real-time position and motion trend of the quick-moving target through algorithm processing. This information is critical to predicting potential safety hazards.

It will be appreciated that S301, S302 and S303 are included in this step S300, wherein:

s301, calculating a color histogram of a moving target area based on a first feature of the surface interest point, wherein the calculation formula is as follows:

Wherein H is the color histogram of the target area, N is the total number of pixels in the area, θ is the color angle, δ is the dirac delta function for constructing the color histogram, i is the current term number in the summation process, and θi is the color angle of the ith pixel point;

s302, calculating a region similar to the color histogram of the moving target in the search window to determine the current position of the moving target, obtaining a matching result, and updating the position of the moving target according to the matching result, wherein the calculation formula is as follows:

Wherein S is the similarity in the search window, hw is the color histogram in the search window, hi is the color histogram of the target model, x, y are the pixel coordinates in the image or the search window, wherein x and y traverse each pixel point in the search window;

In this step, it can be appreciated that the real-time position of the target is determined by finding the window position where the similarity is the greatest.

S303, analyzing the movement trend of the moving object based on the updated position of the moving object, wherein the movement trend comprises a movement direction and a movement speed.

In this step, the movement trend includes a movement direction and a speed, wherein the speed calculation formula is as follows:

Where v is the speed of the target and d is the distance the target moves in time t.

In mine monitoring, the calculation of velocity can help to understand how fast the target is moving.

The direction calculation formula is as follows:

where θ represents the direction of movement of the object, Δy and Δx are the displacements of the object on the y-axis and x-axis, respectively, and the calculation of the direction is important for tracking the movement trajectory of the object and predicting its possible travel route. Through the two steps, the mine monitoring system can track the positions of targets such as miners, vehicles and the like in real time and analyze the movement trend of the targets, so that the safety and the efficiency of mine operation are improved.

It will be appreciated that this step S300 further includes S304, S305, and S306, where:

S304, performing optimal estimation calculation on the state of the target through Kalman filter processing based on the real-time position and the motion trend of the moving target to obtain a short-term motion track of the moving target, wherein the calculation process comprises a prediction step and an updating step, the prediction step is to estimate the position and the speed of the moving target in the next frame, and the updating step is to adjust and predict according to new observation data;

the prediction step is to estimate the position and speed of the target in the next frame, and the following formula is used:

Where x k-1 is an a priori state estimate at time step k, In the form of a state transition matrix,For posterior state estimation at time step k-1,In order to control the matrix,Is a control input.

And updating, namely adjusting prediction according to the new observation data, wherein the following formula is used:

In the formula, For posterior state estimation at time step k,In order for the kalman gain to be achieved,For observations, x k-1 is a priori state estimate at time step k,Is an observation matrix.

S305, collecting positive and negative samples according to a short-term motion track of a moving target to form a training data set for model updating, wherein the positive samples are from detection results of a tracking target, and the negative samples are from a region identified as a non-target;

S306, processing the training data set by applying an online random gradient descent algorithm, updating model parameters online, and adapting to the change of the appearance of the target, thereby obtaining an updated target model reflecting the current state of the moving target.

It will be appreciated that in the mine monitoring scenario of this step, this means that the model can be updated immediately each time a new video frame or sensor data arrives to accommodate changes in the dynamic targets within the mine. In a specific application, the updating process of the SGD may be described as the following steps:

Firstly, initializing a model parameter theta according to the state of a previous model, calculating a gradient ∇ θL (theta, xi, yi) of a loss function L (theta, xi, yi) on the parameter theta for each newly arrived sample (xi, yi) through an online learning mechanism, and updating the model parameter by using an updating rule of random gradient descent:

where η is the learning rate, an superparameter, controlling the step size of each update, θnew represents the model parameters after update, θold represents the model parameters before update, In order to achieve the effect that the loss function L is subjected to gradient of a sample (xi, yi) under the current parameter thetaold, then on-line learning possibly needs to dynamically adjust the learning rate eta to adapt to the change of a target, the step can be realized through a learning rate attenuation strategy or an adaptive learning rate algorithm, finally after model updating, the performance of the model on new data is evaluated, and if the model is not good, the learning rate or the model structure possibly needs to be further adjusted.

In summary, in mine monitoring scenarios, the SGD may be used to update the target detection and tracking model in real-time to accommodate dynamic changes in targets such as miners and vehicles. For example, if the model performs poorly on new samples, the model parameters can be quickly adjusted by the SGD to improve tracking accuracy and robustness. This method is particularly useful in environments where the data stream is changing in mines because it can quickly respond to changes in the appearance of the target, such as changes due to dust, changes in illumination, or occlusion of the target.

S400, based on the updated target model and the real-time video frame, a motion detection algorithm is used for processing to obtain the motion state of the moving target, wherein the motion detection algorithm processing process comprises the steps of identifying the moving target and extracting the characteristics of the moving target by analyzing the change of pixel points in an image sequence, and the motion characteristics of the moving target are obtained through self-adaptive tracking algorithm processing according to the motion state and the updated target model.

It is understood that S401, S402, S403, and S404 are included in the present step S400, in which:

S401, according to the updated target model and the real-time video frame, preliminary motion information of the moving target is obtained through inter-frame difference algorithm processing, and the inter-frame difference algorithm detects the moving target by calculating pixel level differences between continuous video frames:

where It is the current frame, For the previous frame, motion is Motion information, and the step provides a basis for subsequent feature extraction by utilizing pixel change caused by target Motion in a mine environment;

s402, according to the motion information obtained in the step S401, the motion vector and the speed of the moving object are obtained through processing by an optical flow based algorithm, and the optical flow algorithm estimates the motion characteristics of the moving object by analyzing the displacement of the object in continuous frames:

In the formula, Is the position of the target in the current frame,Is the position of the previous frame, Δt is the time interval, flow represents the motion vector of the target, and the motion vector extracted in this step reflects the motion trend of the target in the mine environment.

S403, according to the motion vector extracted in the step S402, obtaining an updated target model through Kalman filter processing, wherein the Kalman filter is a recursive algorithm, and estimates the target state through the steps of prediction and updating;

s404, according to the updated target model in the step S403, the accurate motion characteristics of the moving target are obtained through processing of an adaptive tracking algorithm, wherein the adaptive tracking algorithm can dynamically adjust tracking parameters according to the motion characteristics of the target so as to adapt to complex and changeable environments in a mine, and the calculation formula is as follows:

Wherein Track represents the tracking result, f is an adaptive tracking function, updated Model is an Updated target Model, motion Features are Motion Features, and the step ensures stable and accurate tracking of moving targets in a mine environment.

S500, aiming at the color distribution characteristics of a specific event, a video tracking algorithm based on background segmentation is applied to process to obtain a specific event prediction result, wherein the processing process comprises the steps of calculating a color histogram of a dynamic target, searching a region which is most matched with the histogram in a next frame, dynamically adjusting the size and shape of a target frame, combining the motion characteristics of the moving target and the specific event prediction result, and finally obtaining the dynamic target tracking result through multi-scale fusion processing and updating correction.

It will be appreciated that in this step, the color distribution characteristics of a particular event generally refer to the occurrence of a particular event, such as ore slip or gas leakage, which would exhibit a unique pattern of color change in the video image. For example, ore slip may generate a lot of dust, which may appear as abrupt changes in color or a significant increase in gray scale value for a specific area in a video, and gas leakage may appear as abnormal shadows or color changes in a video. These changes in color and brightness can be used as important cues for event detection.

In handling such specific events, these color distribution characteristics are typically analyzed and utilized for event identification and tracking. This includes color histogram analysis, which is to generate a color histogram by calculating the number of pixels of each color in an image to learn the overall color distribution of the image, color moment calculation, which is a statistic describing the color distribution of the image, including the mean, variance, skewness, etc., color space conversion, which is to convert the image from RGB color space to HSV or other color space to better emphasize specific color features, color clustering, which is to quantize the colors of the image using an algorithm such as K-Means, which clusters the colors in the image into several main color categories, and color correlation diagram, which is to analyze the correlation of colors between pixels, describing the spatial distribution and relationship of colors in the image.

By the method, the characteristics of the specific events can be captured and described from the perspective of color, and further the identification and tracking of the events are realized.

It will be appreciated that in this step S500, S501, S502 and S503 are included, which include:

S501, calculating a color histogram of a specific event in a mine to obtain a color feature description of a target, wherein the color distribution characteristics comprise the concentration, the distribution range and the uniformity of the color;

It should be noted that the "specific event" refers to various abnormal conditions that may occur in the mine, such as ore slip, gas leakage, and the like. These events are often accompanied by color changes such as ore slip, which may produce large amounts of soot that may appear to be of a particular color, such as black or gray. Gas leakage, gas leakage may be accompanied by a special color change, such as a yellow or orange tone, because gas leakage may react with other substances in the air to produce a color change.

The color distribution characteristics refer to distribution modes of colors in the specific events, such as concentration of the colors, distribution range, uniformity of the colors, and the like. The color characterization of the object is the quantification of these color distribution characteristics, which includes the frequency distribution of colors, i.e., the frequency of occurrence of different colors in the image, the spatial distribution of colors, i.e., the distribution of colors in the image space, such as whether concentrated or dispersed, and the brightness and saturation of colors, i.e., the darkness and vividness of colors.

S502, searching a region which is most matched with the calculated color histogram in the next frame based on the color feature description of the target to obtain the potential position of the target, wherein the process of matching the histogram comprises the step of comparing the color histogram of each pixel point with the target histogram to find the best matching region;

It should be noted that the matching process of the best matching region generally involves the steps of calculating a target histogram, first calculating a color histogram of a target region (e.g., a gas leakage region), searching for a next frame, searching for a region that best matches the target histogram in the next frame of the video, comparing the color histogram of each candidate region to the target histogram, generally using a histogram comparison algorithm such as correlation matching, chi-square testing, bayesian classification, etc., and determining the best matching region, namely selecting the region that has the highest degree of matching with the target histogram as the best matching region. The best matching region refers to a region in the current frame, the color histogram of which is most similar to the target histogram, and the region is most likely to be the place where the target event occurs.

S503, dynamically adjusting a target frame according to the potential position of the target by using a CamShift algorithm according to the color distribution of the target, wherein the method comprises initializing the target frame, tracking the target in a subsequent frame by using a color histogram, iteratively updating the position and the size of the target frame to match the color distribution of the target, and iteratively updating the new target frame position until the pixel color angle which is most matched with the color histogram of the target is obtained, thereby obtaining a specific event prediction result, wherein the calculation formula is as follows:

In the formula, In order to update the target position after the update,To find the coordinate value that maximizes the following expression among all possible (x, y) coordinates, θi is the pixel color angle in the current frame that matches the target color histogram, θ is one specific color angle in the target color histogram, N is the total number of pixels in the search window,To accumulate all values of i from 1 to N, δ is a dirac function for counting the number of pixels per color angle.

In this step, the process of dynamically adjusting the target box according to the color distribution of the target generally involves initializing the target box, determining the approximate position and size of the target in the initial frame, color histogram tracking, tracking the target using the color histogram in the subsequent frame, and CamShift algorithm, adjusting the target box using the CamShift algorithm. The CamShift algorithm iteratively updates the position and size of the target frame to match the color distribution of the target, updates the target frame such that in each iteration the algorithm calculates a new target frame position based on the pixel color angle in the current frame that best matches the target color histogram, and adapts to target changes such that the target frame dynamically adjusts to keep track of the target as the target moves and changes. In summary, in this way, the algorithm can keep track of the object and maintain the accuracy of the object box even if the object moves in the video or its color distribution changes.

In this step, in combination with specific events (such as ore slip, gas leakage, etc.) in the mine, prediction models (possibly based on machine learning or statistical analysis) are used to predict the occurrence of these events and analyze their impact on the target motion, and then various features are usually fused in the target tracking algorithm to improve the accuracy and robustness of tracking. For example, the features may be combined by linear weighting or other fusion strategies in combination with color features, fast direction gradient histograms, local binary pattern features, etc., and in terms of scale fusion, it is necessary to track the object on different scales because it may appear as a scale change (i.e., a change in size) in the video, by constructing a scale filter that is able to respond to the features of the object on different scales and fuse the responses to estimate the scale and location of the object.

It should be noted that, during the target tracking process, tracking drift may be caused by occlusion, illumination change, or other interference factors. To cope with this, a self-correction mechanism such as peak-to-side lobe ratio check may be introduced, and when an abnormality in the tracking output is detected, the self-correction mechanism is started to correct the tracking output so as to accurately re-track the target. Then, as the characteristics of the target change, the tracking model also needs to be updated accordingly, which is achieved by online learning or incremental learning, that is, the model parameters are updated at each frame or every few frames to adapt to the dynamic change of the target.

It will be appreciated that through the above steps, the resulting dynamic object tracking results are a continuous time series of the position, scale and motion state of the object in the video sequence, which information can be used for real-time monitoring, early warning systems or further analysis processes. In an actual mining vehicle-mounted video monitoring system, the tracking results can be used for improving the safety of mine operation, such as preventing accidents such as collision or landslide by monitoring the movement state of a mine car, or timely taking measures by detecting early signs of specific events.

Example 2:

As shown in fig. 2, this embodiment provides a dynamic target tracking system for on-vehicle mining intrinsic safety video monitoring, where the system referring to fig. 2 includes:

The acquisition module 701 is configured to acquire a real-time video stream through a vehicle-mounted camera, where the real-time video stream includes a visible light video and an event stream captured by an event camera, where the visible light video is a visible light activity scene in a mine, including a state of a miner activity, a vehicle running, and other visible objects, and the event stream captured by the event camera includes position and time information of a brightness change recorded when a pixel brightness change;

The detection extraction module 702 is used for generating an event stream containing a timestamp based on pixel brightness change in the event stream captured by the event camera, sequencing the captured event according to the timestamp, and generating a high-time-resolution event stream, preprocessing the real-time video stream, wherein the preprocessing comprises noise reduction and illumination enhancement, obtaining an optimized video frame, analyzing the optimized video frame by using a lightweight deep learning model, obtaining a detection target result, carrying out fine-granularity tracking feature extraction on the detection target result according to the high-time-resolution event stream and the visible light video, and obtaining a first feature of a surface interest point of the target result, wherein the first feature comprises local texture, shape information and a motion mode, and the detection target result comprises preliminary positioning results of miners, vehicle targets and other visible objects;

The prediction module 703 is configured to process, according to a first feature of the surface interest point, a moving target in the event stream through a fast tracking algorithm to obtain a real-time position and a motion trend of the moving target, and perform short-term prediction by using a kalman filter in a TLD target moving algorithm to obtain a short-term motion track of the moving target, and obtain an updated target model through an online learning mechanism;

The recognition module 704 is used for obtaining the motion state of the moving target based on the updated target model and the real-time video frame by using a motion detection algorithm, wherein the motion detection algorithm processing process comprises the steps of recognizing the moving target and extracting the characteristics thereof by analyzing the change of pixel points in an image sequence;

The tracking module 705 is configured to apply a video tracking algorithm based on background segmentation to obtain a prediction result of a specific event according to the color distribution characteristics of the specific event, wherein the processing includes calculating a color histogram of a dynamic target, searching a region most matched with the histogram in a next frame, dynamically adjusting the size and shape of a target frame, combining the motion characteristics of the moving target and the prediction result of the specific event, and finally obtaining the tracking result of the dynamic target through multi-scale fusion processing and updating correction.

Specifically, the detection extraction module 702 includes:

the first processing unit is used for processing the optimized video frame through a YOLOv-Lite model, the model obtains preliminary positioning results of miners, vehicle targets and other visible objects by multiplying the input frame with a pre-training weight and applying an activation function, and a calculation formula for detecting the targets by the YOLOv-Lite model is as follows:

The second processing unit is used for obtaining an optimization measure of the target candidate region through distance intersection ratio calculation based on the preliminary positioning result, and obtaining a boundary frame of the target candidate region through non-maximum value inhibition processing, wherein the processing process comprises the steps of calculating the intersection ratio of a prediction frame and removing an overlapped frame based on a preset threshold value, and the calculation formula of the non-maximum value inhibition processing is as follows:

The extraction unit is used for extracting the characteristics which are still identified and matched under the illumination change and shielding rotation conditions by calculating the gradient direction histogram of the partial image in the boundary box by utilizing a scale-invariant characteristic transformation algorithm, and recording the characteristics as first characteristics of the interest points of the target surface.

Specifically, the extraction unit includes:

the construction unit is used for constructing a differential Gaussian pyramid by utilizing a Gaussian pyramid technology based on a bounding box to obtain a first key point position, wherein the first key point position comprises points representing angular points, edges or other remarkable characteristics of a target in a mine in a scale space, the differential Gaussian pyramid is obtained by calculating the difference value of two adjacent layers of the Gaussian pyramid, and the calculation formula is as follows:

DoG(x,y,σ)=G(x,y,kσ)−G(x,y,σ)

The detection unit is used for carrying out extremum detection on the scale space of each pixel point on the first key point position according to the differential Gaussian pyramid to obtain a second key point position, wherein each pixel point comprises extremum points which are compared with 26 adjacent pixels of surrounding adjacent scale layers;

The first calculation unit is used for obtaining a unique direction of each second key point position by calculating a gradient direction histogram based on gradient directions around the second key point position, wherein a neighborhood of 16x16 pixels is taken around the second key point position, the gradient direction and the gradient of each pixel are calculated, an 8-direction histogram is generated in each 4x4 sub-region, and a main direction is allocated to the second key point position;

And the generating unit is used for generating a characteristic descriptor in a neighborhood of 16x16 pixels around the second key point according to the main direction of the second key point position, so as to obtain a first characteristic which still has the characteristics of identification and matching under the conditions of illumination change and shielding rotation and is recorded as a target surface interest point, wherein the descriptor is constructed by calculating a histogram of gradient directions in each 4x4 subarea, and finally a 128-dimensional characteristic vector is formed.

Specifically, the detection extraction module 702 includes:

the recording unit is used for recording the brightness change position and time information when the brightness of the pixel changes by utilizing the characteristics of the event camera to form an event stream with high time resolution, and the calculation formula of the sequencing process is as follows:

Specifically, the prediction module 703 includes:

a second calculation unit for calculating a color histogram of the moving target region based on the first feature of the surface interest point, wherein the calculation formula is as follows:

the matching updating unit is used for calculating the region similar to the color histogram of the moving target in the search window to determine the current position of the moving target, obtaining a matching result, and updating the position of the moving target according to the matching result, wherein the calculation formula is as follows:

and the analysis unit is used for analyzing the movement trend of the moving object based on the updated position of the moving object, wherein the movement trend comprises a movement direction and a movement speed.

Specifically, the prediction module 703 includes:

the estimation unit is used for carrying out optimal estimation calculation on the state of the target through the treatment of a Kalman filter based on the real-time position and the motion trend of the moving target to obtain a short-term motion track of the moving target, wherein the calculation process comprises a prediction step and an updating step, the prediction step is used for estimating the position and the speed of the moving target in the next frame, and the updating step is used for carrying out adjustment prediction according to new observation data;

The training unit is used for collecting positive and negative samples according to the short-term motion trail of the moving target to form a training data set for model updating, wherein the positive samples are from detection results of the tracking target, and the negative samples are from areas which are identified as non-targets;

And the updating adaptation unit is used for processing the training data set by applying an online random gradient descent algorithm, updating model parameters online and adapting to the change of the appearance of the target so as to obtain an updated target model reflecting the current state of the moving target.

Specifically, the tracking module 705 includes:

The description unit is used for calculating a color histogram of a specific event in the mine to obtain a color characteristic description of the target, wherein the color distribution characteristics comprise the concentration, the distribution range and the uniformity of the color;

The searching and matching unit is used for searching the area which is most matched with the calculated color histogram in the next frame based on the color feature description of the target to obtain the potential position of the target, wherein the process of matching the histogram comprises the step of comparing the color histogram of each pixel point with the target histogram to find the best matching area;

The iteration unit is used for dynamically adjusting the target frame according to the potential position of the target by using a CamShift algorithm according to the color distribution of the target, wherein the method comprises the steps of initializing the target frame, tracking the target in a subsequent frame by using a color histogram, iteratively updating the position and the size of the target frame to match the color distribution of the target, and iteratively updating the position of the target frame until the pixel color angle which is most matched with the color histogram of the target is obtained, wherein the calculated formula is as follows:

It should be noted that, regarding the system in the above embodiment, the specific manner in which the respective modules perform the operations has been described in detail in the embodiment regarding the method, and will not be described in detail herein.

Example 3:

corresponding to the above method embodiment, in this embodiment, a dynamic target tracking device for intrinsic safety type video monitoring of a vehicle-mounted mine is further provided, and a dynamic target tracking device for intrinsic safety type video monitoring of a vehicle-mounted mine described below and a dynamic target tracking method for intrinsic safety type video monitoring of a vehicle-mounted mine described above may be referred to correspondingly with each other.

Fig. 3 is a block diagram illustrating a dynamic target tracking device 800 for in-vehicle mining intrinsic safety video monitoring, according to an exemplary embodiment. As shown in fig. 3, the dynamic target tracking device 800 for intrinsic safety type video monitoring of the vehicle-mounted mine comprises a processor 801 and a memory 802. The dynamic target tracking device 800 for in-vehicle mining intrinsic safety video monitoring also includes one or more of a multimedia component 803, an i/O interface 804, and a communication component 805.

The processor 801 is configured to control overall operation of the dynamic target tracking device 800 for intrinsic safety type video monitoring of a vehicle-mounted mine, so as to complete all or part of the steps in the dynamic target tracking method for intrinsic safety type video monitoring of a vehicle-mounted mine. The memory 802 is used to store various types of data to support the operation of the dynamic object tracking device 800 for intrinsic safety video monitoring of the on-board mining, such data may include, for example, instructions for any application or method operating on the dynamic object tracking device 800 for intrinsic safety video monitoring of the on-board mining, as well as application-related data such as contact data, messages, pictures, audio, video, and the like. The Memory 802 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia component 803 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in the memory 802 or transmitted through the communication component 805. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 804 provides an interface between the processor 801 and other interface modules, which may be a keyboard, mouse, or buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 805 is configured to perform wired or wireless communication between the dynamic target tracking device 800 and other devices for intrinsic safety type video monitoring of the vehicle-mounted mine. Wireless communication, such as Wi-Fi, bluetooth, near field communication (Near FieldCommunication, NFC for short), 2G, 3G, or 4G, or a combination of one or more thereof, and thus the corresponding communication component 805 may include a Wi-Fi module, a bluetooth module, or an NFC module.

In an exemplary embodiment, the dynamic target tracking device 800 for on-board mining intrinsic safety type video monitoring may be implemented by one or more Application Specific Integrated Circuits (ASIC), digital signal processor (DIGITALSIGNAL PROCESSOR DSP), digital signal processing device (DIGITAL SIGNAL Processing Device DSPD), programmable logic device (Programmable Logic Device PLD), field programmable gate array (Field Programmable GATE ARRAY FPGA), controller, microcontroller, microprocessor or other electronic components for performing the above-described dynamic target tracking method for on-board mining intrinsic safety type video monitoring.

In another exemplary embodiment, a computer readable storage medium is also provided that includes program instructions that, when executed by a processor, implement the steps of the dynamic target tracking method for on-board mining intrinsic safety video monitoring described above. For example, the computer readable storage medium may be the memory 802 including program instructions described above that are executable by the processor 801 of the on-board mining intrinsically-safe video-monitoring dynamic object tracking device 800 to perform the on-board mining intrinsically-safe video-monitoring dynamic object tracking method described above.

Example 4:

Corresponding to the above method embodiment, a readable storage medium is further provided in this embodiment, and a readable storage medium described below and a dynamic target tracking method for on-vehicle mining intrinsic safety video monitoring described above may be referred to correspondingly.

The readable storage medium stores a computer program which is executed by a processor to realize the steps of the dynamic target tracking method for the vehicle-mounted mining intrinsic safety type video monitoring of the method embodiment.

The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, which may store various program codes.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A dynamic target tracking method for vehicle-mounted intrinsically safe video monitoring for mining, characterized by comprising:

S100, a real-time video stream obtained by a vehicle-mounted camera, wherein the real-time video stream includes a visible light video and an event stream captured by an event camera, wherein the visible light video is a visible light activity scene in a mine, including miner activities, vehicle operation, and the status of other visible objects, and the event stream captured by the event camera includes location and time information of brightness changes recorded when pixel brightness changes;

S200, based on the pixel brightness changes in the event stream captured by the event camera, generate an event stream containing a timestamp, and sort the captured events according to the timestamps to generate an event stream with high temporal resolution; preprocess the real-time video stream, wherein the preprocessing includes noise reduction and illumination enhancement, obtain optimized video frames, use a lightweight deep learning model to analyze the optimized video frames, obtain detection target results, perform fine-grained tracking feature extraction on the detection target results according to the high temporal resolution event stream and visible light video, and obtain the first feature of the surface interest point of the target result, wherein the first feature includes local texture, shape information and motion mode, wherein the detection target result includes preliminary positioning results of miners, vehicle targets and other visible objects;

S300, according to the first feature of the surface interest point, the moving target in the event stream is processed by a fast tracking algorithm to obtain the real-time position and movement trend of the moving target, and the Kalman filter in the TLD target movement algorithm is used for short-term prediction to obtain the short-term movement trajectory of the moving target, and an updated target model is obtained through an online learning mechanism;

S400, based on the updated target model and the real-time video frame, a motion detection algorithm is used to obtain the motion state of the moving target, wherein the motion detection algorithm processing process includes analyzing the changes of pixels in the image sequence to identify the moving target and extract its features; according to the motion state and the updated target model, the motion features of the moving target are obtained through the adaptive tracking algorithm processing;

Step S400 further includes the following steps S401 to S404:

S401. According to the updated target model and the real-time video frame, preliminary motion information of the moving target is obtained through processing by the inter-frame difference algorithm. The inter-frame difference algorithm detects the moving target by calculating the pixel-level difference between consecutive video frames:

Where It is the current frame, is the previous frame, and Motion is the motion information. This step uses the pixel changes caused by the target motion in the mine environment to provide a basis for subsequent feature extraction;

S402: According to the motion information obtained in step S401, the motion vector and speed of the moving target are obtained through processing based on the optical flow algorithm. The optical flow algorithm estimates the motion characteristics of the target by analyzing its displacement in consecutive frames:

In the formula, is the position of the target in the current frame, is the position of the previous frame, Δt is the time interval, Flow represents the motion vector of the target. The motion vector extracted in this step reflects the motion trend of the target in the mine environment;

S403, according to the motion vector extracted in step S402, the updated target model is obtained by processing through a Kalman filter, wherein the Kalman filter is a recursive algorithm, which estimates the target state through prediction and update steps;

S404: According to the target model updated in step S403, the accurate motion characteristics of the moving target are obtained through processing by the adaptive tracking algorithm, wherein the adaptive tracking algorithm can dynamically adjust the tracking parameters according to the motion characteristics of the target to adapt to the complex and changeable environment in the mine. The calculation formula is as follows:

Where Track represents the tracking result, f is the adaptive tracking function, Updated Model is the updated target model, and Motion Features is the motion features. This step ensures stable and accurate tracking of moving targets in a mine environment.

S500. Based on the color distribution characteristics of the set event, a video tracking algorithm based on background segmentation is applied to obtain the set event prediction result, wherein the processing process includes calculating the color histogram of the dynamic target, searching for the area that best matches the histogram in the next frame, and dynamically adjusting the size and shape of the target frame; combining the motion characteristics of the moving target and the set event prediction result, and finally obtaining the dynamic target tracking result through multi-scale fusion processing and update correction.

2. The dynamic target tracking method for vehicle-mounted intrinsically safe video monitoring for mining according to claim 1 is characterized in that the event stream containing timestamps is generated based on the pixel brightness changes in the event stream captured by the event camera, and the captured events are sorted according to the timestamps to generate an event stream with high time resolution, which includes:

By utilizing the characteristics of the event camera, the position and time information of the brightness change when the pixel brightness changes are recorded to form an event stream with high time resolution. The calculation formula of the sorting process is as follows:

Where Esorted represents the sorted event stream, Ecaptured represents the original event stream captured from the event camera, sort is a sorting function, λ is the timestamp of each event, x represents a single event in the event stream, x.timestamp represents the timestamp of event x , and key is a parameter used to specify a function.

3. The dynamic target tracking method for vehicle-mounted mining intrinsically safe video monitoring according to claim 1 is characterized in that the optimized video frames are analyzed using a lightweight deep learning model to obtain a detection target result, and the detection target result is subjected to fine-grained tracking feature extraction based on a high-temporal-resolution event stream and visible light video to obtain a first feature of a surface interest point of the target result, which includes:

The optimized video frames are processed by the YOLOv5-Lite model, which obtains preliminary positioning results of miners, vehicle targets, and other visible objects by multiplying the input frame with the pre-trained weights and applying an activation function. The calculation formula for target detection by the YOLOv5-Lite model is as follows:

Where bi represents the predicted value of the bounding box, Wi and bi ′ are the weight and bias parameters of the model, σ is the sigmoid activation function, and x represents the input feature map;

Based on the preliminary positioning results, the optimized metric of the target candidate area is obtained through distance intersection and union calculation, and the bounding box of the target candidate area is obtained through non-maximum suppression processing. The processing process includes calculating the intersection and union of the predicted box and removing the overlapping boxes based on the preset threshold. The calculation formula of non-maximum suppression processing is as follows:

Where B1 and B2 are two bounding boxes, IoU is the intersection over union ratio, and area is the area;

By using the scale-invariant feature transformation algorithm and calculating the gradient direction histogram of the local image in the bounding box, the features that can still be recognized and matched under the conditions of illumination change and occlusion rotation are extracted, and the features are recorded as the first features of the interest points on the target surface.

4. The dynamic target tracking method for vehicle-mounted intrinsically safe video monitoring for mining according to claim 3 is characterized in that the scale-invariant feature transformation algorithm is used to calculate the gradient direction histogram of the local image in the bounding box, extract the features that can still be recognized and matched under the conditions of illumination change and occlusion rotation, and record the features as the first features of the target surface interest points, which include:

Based on the bounding box, a differential Gaussian pyramid is constructed using the Gaussian pyramid technology to obtain the first key point position, where the first key point position includes the corner points, edges or other significant features of the target in the mine in the scale space. The differential Gaussian pyramid is obtained by calculating the difference between two adjacent layers of the Gaussian pyramid. The calculation formula is as follows:

Where DoG is the Gaussian difference function, G ( x , y , σ ) is the Gaussian function, σ is the standard deviation of the Gaussian function, k is a constant that controls the scale change, x and y are the coordinates of the pixel points in the image, kσ is a scale larger than the current scale σ , and is used to construct the blurred image of adjacent scales in the Gaussian pyramid;

According to the differential Gaussian pyramid, the scale space of each pixel at the first key point position is detected to obtain the second key point position, wherein each pixel includes an extreme point compared with 26 adjacent pixels in the surrounding adjacent scale layers;

Based on the gradient direction around the second key point position, a unique direction of each second key point position is obtained by calculating the gradient direction histogram, wherein a 16x16 pixel neighborhood is taken around the second key point position, and the gradient direction and magnitude of each pixel are calculated, and an 8-direction histogram is generated in each 4x4 sub-region to assign a main direction to the second key point position;

According to the main direction of the second key point position, a feature descriptor is generated in the 16x16 pixel neighborhood around it, so as to obtain features that can still be recognized and matched under conditions of illumination changes and occlusion rotation, and recorded as the first feature of the interest point on the target surface. The descriptor is constructed by calculating the histogram of the gradient direction in each 4x4 sub-region, and finally forming a 128-dimensional feature vector.

5. The dynamic target tracking method for vehicle-mounted mining intrinsically safe video monitoring according to claim 1 is characterized in that the moving target in the event stream is processed by a fast tracking algorithm according to the first feature of the surface interest point to obtain the real-time position and movement trend of the moving target, which includes:

Based on the first feature of the surface interest point, the color histogram of the moving target area is calculated, where the calculation formula is as follows:

Where H is the color histogram of the target area, N is the total number of pixels in the area, θ is the color angle, δ is the Dirac delta function used to construct the color histogram, i is the current number of terms in the summation process, and θi is the color angle of the i - th pixel;

Calculate the area similar to the color histogram of the moving target in the search window to determine the current position of the moving target, obtain the matching result, and update the position of the moving target based on the matching result. The calculation formula is as follows:

Where S is the similarity within the search window, Hw is the color histogram within the search window, Hi is the color histogram of the target model, x , y are the pixel coordinates in the image or search window, where x and y traverse every pixel in the search window;

Based on the updated position of the moving target, the movement trend of the moving target is analyzed, wherein the movement trend includes movement direction and speed.

6. The dynamic target tracking method for vehicle-mounted intrinsically safe video monitoring for mining according to claim 1 is characterized in that the Kalman filter in the TLD target movement algorithm is used for short-term prediction to obtain the short-term motion trajectory of the moving target, and an updated target model is obtained through an online learning mechanism, which includes:

Based on the real-time position and motion trend of the moving target, the target state is optimally estimated and calculated after being processed by the Kalman filter to obtain the short-term motion trajectory of the moving target. The calculation process includes a prediction step and an update step. The prediction step is to estimate the position and speed of the moving target in the next frame, and the update step is to adjust the prediction according to the new observation data.

According to the short-term motion trajectory of the moving target, positive and negative samples are collected to form a training data set for model updating, where positive samples come from the detection results of the tracked target, and negative samples come from the area identified as non-target;

The online stochastic gradient descent algorithm is applied to process the training data set, and the model parameters are updated online to adapt to the changes in the target appearance, thereby obtaining an updated target model that reflects the current state of the moving target.

7. The dynamic target tracking method for vehicle-mounted mining intrinsically safe video monitoring according to claim 1 is characterized in that the color distribution characteristics of the set event are processed by a video tracking algorithm based on background segmentation to obtain the set event prediction result, wherein the processing process includes calculating the color histogram of the dynamic target and searching for the area that best matches the histogram in the next frame to dynamically adjust the size and shape of the target frame, including:

Calculate the color histogram of the set event in the mine to obtain the color feature description of the target, where the color distribution characteristics include color concentration, distribution range and color uniformity;

Based on the color feature description of the target, search for the area in the next frame that best matches the calculated color histogram to obtain the potential position of the target. The histogram matching process includes comparing the color histogram of each pixel with the target histogram to find the best matching area.

According to the potential position of the target, the CamShift algorithm is used to dynamically adjust the target frame according to the color distribution of the target, which includes initializing the target frame. In subsequent frames, the color histogram is used to track the target. The position and size of the target frame are iteratively updated to match the color distribution of the target. The target frame position is iteratively updated until the pixel color angle that best matches the target color histogram is obtained. The set event prediction result is obtained. The calculation formula is as follows:

In the formula, is the updated target position, To find the coordinate value that maximizes the following expression among all possible ( x , y ) coordinates, θi is the color angle of the pixel that matches the target color histogram in the current frame, θ is a set color angle in the target color histogram, N is the total number of pixels in the search window, To accumulate all values of i from 1 to N , δ is the Dirac function used to count the number of pixels at each color angle.

8. A dynamic target tracking system for vehicle-mounted intrinsically safe video monitoring for mining, based on the dynamic target tracking method for vehicle-mounted intrinsically safe video monitoring for mining according to claim 1, characterized in that it comprises:

Acquisition module: used to acquire real-time video streams through vehicle-mounted cameras, where the real-time video streams include visible light video and event streams captured by event cameras, where the visible light video is a visible light activity scene in the mine, including miner activities, vehicle operation, and the status of other visible objects, and the event stream captured by the event camera includes the location and time information of brightness changes recorded when pixel brightness changes;

Detection and extraction module: used to generate an event stream containing timestamps based on the pixel brightness changes in the event stream captured by the event camera, and sort the captured events according to the timestamps to generate an event stream with high temporal resolution; preprocess the real-time video stream, where the preprocessing includes noise reduction and illumination enhancement, to obtain optimized video frames, use a lightweight deep learning model to analyze the optimized video frames to obtain detection target results, and perform fine-grained tracking feature extraction on the detection target results based on the high temporal resolution event stream and visible light video to obtain the first feature of the surface interest point of the target result, where the first feature includes local texture, shape information and motion mode, and the detection target result includes preliminary positioning results of miners, vehicle targets and other visible objects;

Prediction module: It is used to process the moving targets in the event stream according to the first feature of the surface interest point through the fast tracking algorithm to obtain the real-time position and movement trend of the moving target, and use the Kalman filter in the TLD target movement algorithm to make short-term predictions to obtain the short-term movement trajectory of the moving target. After the online learning mechanism, the updated target model is obtained;

Identification module: used to obtain the motion state of the moving target by using the motion detection algorithm based on the updated target model and real-time video frames. The motion detection algorithm processing process includes analyzing the changes in pixels in the image sequence to identify the moving target and extract its features; according to the motion state and the updated target model, the motion features of the moving target are obtained by processing with the adaptive tracking algorithm;

Tracking module: It is used to apply the video tracking algorithm based on background segmentation to the color distribution characteristics of the set event to obtain the set event prediction result. The processing process includes calculating the color histogram of the dynamic target, searching for the area that best matches this histogram in the next frame, and dynamically adjusting the size and shape of the target box; combining the motion characteristics of the moving target and the set event prediction result, through multi-scale fusion processing and update correction, finally obtaining the dynamic target tracking result.

9. The dynamic target tracking system for vehicle-mounted intrinsically safe video monitoring for mining according to claim 8, characterized in that the detection and extraction module includes:

The first processing unit is used to process the optimized video frames through the YOLOv5-Lite model. The model obtains the preliminary positioning results of miners, vehicle targets and other visible objects by multiplying the input frame with the pre-trained weights and applying the activation function. The calculation formula for target detection by the YOLOv5-Lite model is as follows:

The second processing unit is used to obtain the optimization metric of the target candidate area based on the preliminary positioning result through distance intersection and union calculation, and obtain the bounding box of the target candidate area through non-maximum suppression processing, wherein the processing process includes calculating the intersection and union ratio of the predicted box and removing the overlapping box based on a preset threshold, wherein the calculation formula of the non-maximum suppression processing is as follows:

Extraction unit: It is used to use the scale-invariant feature transformation algorithm to calculate the gradient direction histogram of the local image in the bounding box, extract the features that can still be recognized and matched under the conditions of illumination change and occlusion rotation, and record the features as the first features of the interest points on the target surface.

10. The dynamic target tracking system for vehicle-mounted intrinsically safe video monitoring for mining according to claim 9, characterized in that the extraction unit comprises:

Construction unit: used to construct a differential Gaussian pyramid based on the bounding box using the Gaussian pyramid technology to obtain the first key point position, wherein the first key point position includes the corner points, edges or other significant features of the target in the mine in the scale space, wherein the differential Gaussian pyramid is obtained by calculating the difference between two adjacent layers of the Gaussian pyramid, and the calculation formula is as follows:

Detection unit: used to perform extreme value detection on the scale space of each pixel point at the first key point position according to the differential Gaussian pyramid to obtain the second key point position, wherein each pixel point includes an extreme value point compared with 26 adjacent pixels of the surrounding adjacent scale layers;

A first calculation unit is used to obtain a unique direction of each second key point position based on the gradient direction around the second key point position by calculating the gradient direction histogram, wherein a 16x16 pixel neighborhood is taken around the second key point position, and the gradient direction and magnitude of each pixel are calculated, and an 8-direction histogram is generated in each 4x4 sub-region to assign a main direction to the second key point position;

Generation unit: used to generate feature descriptors in a 16x16 pixel neighborhood around the second key point according to the main direction of the position, so as to obtain features that can still be recognized and matched under conditions of illumination changes and occlusion rotation, and recorded as the first feature of the interest point on the target surface. The descriptor is constructed by calculating the histogram of the gradient direction in each 4x4 sub-region, and finally forming a 128-dimensional feature vector.