CN119027860B - Dynamic target tracking method and system for intrinsic safety type video monitoring of vehicle-mounted mine - Google Patents
Dynamic target tracking method and system for intrinsic safety type video monitoring of vehicle-mounted mine Download PDFInfo
- Publication number
- CN119027860B CN119027860B CN202411504952.7A CN202411504952A CN119027860B CN 119027860 B CN119027860 B CN 119027860B CN 202411504952 A CN202411504952 A CN 202411504952A CN 119027860 B CN119027860 B CN 119027860B
- Authority
- CN
- China
- Prior art keywords
- target
- tracking
- motion
- event
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a dynamic target tracking method and a system for vehicle-mounted mining intrinsic safety type video monitoring, which relate to the technical field of video monitoring target tracking and comprise the steps of preprocessing a real-time video stream acquired through a vehicle-mounted camera, extracting fine-granularity tracking characteristics of a detection target result, sequencing captured events according to a time stamp to obtain a real-time position and a motion trend of a moving target, carrying out short-term prediction by using a Kalman filter in a TLD target moving algorithm to obtain an updated target model, processing by using a motion detection algorithm to obtain the motion characteristic of the moving target, processing by using a video tracking algorithm based on background segmentation to obtain a specific event prediction result, and finally obtaining the dynamic target tracking result by combining the motion characteristic of the moving target and the specific event prediction result. The invention improves the performance of the vehicle-mounted mining intrinsic safety type video monitoring system in the aspect of dynamic target tracking, and realizes the accurate tracking of the dynamic target in the mine.
Description
Technical Field
The invention relates to the technical field of video monitoring target tracking, in particular to a dynamic target tracking method and system for intrinsic safety type video monitoring of a vehicle-mounted mine.
Background
With the continuous deep industrialization and automation, the safety and efficiency of mine operation are increasingly important. In mine monitoring systems, a dynamic target tracking technology is one of key technologies for guaranteeing the safety of miners and optimizing the operation flow, and in recent years, along with the development of computer vision and pattern recognition technologies, the application of the dynamic target tracking technology in the field of mine monitoring is remarkably promoted.
The traditional monitoring system usually adopts wired connection, which easily leads to line damage in the complex environment of a mine, the internal environment of the mine is complex, light is dim, dust is more, temperature change is large, the traditional video monitoring technology is difficult to stably work in the environment, monitoring quality can be affected sometimes, such as the environment in the mine is bad, such as high dust and strong vibration, the requirements on stability and reliability of equipment are extremely high, and many existing equipment cannot be adapted. The existing video monitoring system is often focused on video recording rather than real-time analysis, lacks quick response capability to sudden events, is often required to be uploaded to a central server for analysis when video data are processed, so that the processing speed is low, the requirement of real-time monitoring cannot be met, and a target tracking algorithm is easy to lose targets under a complex background, such as rocks and vehicles with similar colors in a mine, so that the monitoring effect is affected. Also, existing systems often lack an effective early warning mechanism and cannot accurately predict and respond in time before a hazard occurs. Thus, the prior art is challenged in the special environment of mines, and these factors seriously affect the stability and reliability of the monitoring system.
Disclosure of Invention
The invention aims to provide a dynamic target tracking method and a system for vehicle-mounted mining intrinsic safety type video monitoring so as to solve the problems. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
in a first aspect, the application provides a dynamic target tracking method for intrinsic safety type video monitoring for a vehicle-mounted mine, which comprises the following steps:
The method comprises the steps that a real-time video stream is obtained through a vehicle-mounted camera, wherein the real-time video stream comprises a visible light video and an event stream captured by an event camera, the visible light video is a visible light activity scene in a mine, the visible light activity scene comprises states of miner activities, vehicle operation and other visible objects, and the event stream captured by the event camera comprises brightness change position and time information recorded when pixel brightness changes;
Preprocessing a real-time video stream, wherein the preprocessing comprises noise reduction and illumination enhancement, an optimized video frame is obtained, a lightweight deep learning model is utilized to analyze the optimized video frame, a detection target result is obtained, fine granularity tracking feature extraction is carried out on the detection target result according to the event stream with high time resolution and a visible light video, and a first feature of a surface interest point of the target result is obtained, wherein the first feature comprises local texture, shape information and a motion mode, and the detection target result comprises preliminary positioning results of miners, vehicle targets and other visible objects;
according to the first characteristics of the surface interest points, processing the moving targets in the event stream through a rapid tracking algorithm to obtain real-time positions and movement trends of the moving targets, performing short-term prediction by using a Kalman filter in a TLD target movement algorithm to obtain short-term movement tracks of the moving targets, and obtaining updated target models through an online learning mechanism;
Based on the updated target model and the real-time video frame, a motion detection algorithm is used for processing to obtain the motion state of the moving target, wherein the processing process comprises the steps of identifying the moving target and extracting the characteristics of the moving target by analyzing the change of pixel points in an image sequence;
Aiming at the color distribution characteristics of a specific event, a video tracking algorithm based on background segmentation is applied to process to obtain a specific event prediction result, wherein the processing process comprises the steps of calculating a color histogram of a dynamic target, searching a region which is most matched with the histogram in a next frame, dynamically adjusting the size and shape of a target frame, combining the motion characteristics of a moving target and the specific event prediction result, and finally obtaining the dynamic target tracking result through multi-scale fusion processing and updating correction.
Preferably, the method comprises the steps of analyzing an optimized video frame by using a lightweight deep learning model to obtain a detection target result, extracting fine granularity tracking characteristics of the detection target result to obtain first characteristics of surface interest points of the target result, wherein the first characteristics comprise:
Processing the optimized video frame through YOLOv-Lite model, wherein the model obtains preliminary positioning results of miners, vehicle targets and other visible objects by multiplying the input frame with pre-training weights and applying an activation function;
Obtaining an optimization measure of the target candidate region through distance intersection ratio calculation based on the preliminary positioning result, and obtaining a boundary frame of the target candidate region through non-maximum suppression processing, wherein the processing comprises the steps of calculating the intersection ratio of a prediction frame and removing an overlapped frame based on a preset threshold value;
and extracting the characteristics which are still identified and matched under the illumination change and shielding rotation conditions by calculating the gradient direction histogram of the local image in the boundary box by using a scale-invariant characteristic transformation algorithm, and marking the characteristics as first characteristics of interest points of the target surface.
Preferably, the extracting, by calculating a gradient direction histogram of a local image in a bounding box, a feature still having identification and matching under illumination change and occlusion rotation conditions by using a scale invariant feature transformation algorithm, and recording the feature as a first feature of a target surface interest point includes:
Constructing a differential Gaussian pyramid by utilizing a Gaussian pyramid technology based on a bounding box to obtain a first key point position, wherein the first key point position comprises points representing angular points, edges or other remarkable characteristics of targets in a mine in a scale space, and the differential Gaussian pyramid is obtained by calculating the difference value of two adjacent layers of the Gaussian pyramid;
According to the differential Gaussian pyramid, extremum detection is carried out on the scale space of each pixel point on the first key point position, so that a second key point position is obtained, wherein each pixel point comprises extremum points which are compared with 26 adjacent pixels of surrounding adjacent scale layers;
obtaining a unique direction of each second key point position by calculating a gradient direction histogram based on gradient directions around the second key point position, wherein a main direction is allocated to the second key point position by taking a neighborhood of 16x16 pixels around the second key point position, calculating gradient directions and sizes of each pixel, generating a histogram of 8 directions in each sub-region of 4x 4;
Based on the dominant direction of the second keypoint location, feature descriptors are generated in the neighborhood of 16x16 pixels around it, resulting in first features that still have identified and matched features under illumination variation and occlusion rotation conditions, and are noted as target surface points of interest, where the descriptors are constructed by computing histograms of gradient directions in each 4x4 sub-region, resulting in a 128-dimensional feature vector.
Preferably, the processing the moving object in the event stream through the fast tracking algorithm according to the first feature of the surface interest point to obtain the real-time position and the motion trend of the moving object includes:
Calculating a color histogram of the moving target region based on the first feature of the surface interest point;
calculating a region similar to the color histogram of the moving target in the search window to determine the current position of the moving target, obtaining a matching result, and updating the position of the moving target according to the matching result;
Based on the updated position of the moving object, a movement trend of the moving object is analyzed, wherein the movement trend comprises a movement direction and a movement speed.
Preferably, the performing short-term prediction by using a kalman filter in a TLD target movement algorithm to obtain a short-term movement track of a moving target, and obtaining an updated target model through an online learning mechanism, where the method includes:
Based on the real-time position and the motion trend of the moving target, performing optimal estimation calculation on the target state through the treatment of a Kalman filter to obtain a short-term motion track of the moving target, wherein the calculation process comprises a prediction step and an updating step, the prediction step is used for estimating the position and the speed of the moving target in the next frame, and the updating step is used for adjusting and predicting according to new observation data;
Collecting positive and negative samples according to the short-term motion trail of the moving target to form a training data set for model updating, wherein the positive samples are from detection results of the tracking target, and the negative samples are from areas identified as non-targets;
And processing the training data set by using an online random gradient descent algorithm, updating model parameters online, and adapting to the change of the appearance of the target, thereby obtaining an updated target model reflecting the current state of the moving target.
Preferably, the color distribution characteristic of the specific event is processed by applying a video tracking algorithm based on background segmentation to obtain a prediction result of the specific event, wherein the processing includes calculating a color histogram of a dynamic target, searching a region which is most matched with the histogram in a next frame, and dynamically adjusting the size and shape of a target frame, which includes:
Calculating a color histogram of a specific event in the mine to obtain a color feature description of the target, wherein the color distribution characteristics comprise the concentration, the distribution range and the uniformity of the color;
Searching a region which is most matched with the calculated color histogram in the next frame based on the color feature description of the target to obtain the potential position of the target, wherein the process of matching the histogram comprises the step of comparing the color histogram of each pixel point with the target histogram to find the best matching region;
And dynamically adjusting the target frame according to the color distribution of the target by using a CamShift algorithm according to the potential position of the target, wherein the method comprises initializing the target frame, tracking the target in a subsequent frame by using a color histogram, and iteratively updating the position and the size of the target frame to match the color distribution of the target until a pixel color angle which is most matched with the color histogram of the target is obtained, thereby obtaining a specific event prediction result.
In a second aspect, the application also provides a dynamic target tracking method for vehicle-mounted mining intrinsic safety video monitoring, which comprises the following steps:
The acquisition module is used for acquiring real-time video streams through the vehicle-mounted camera, wherein the real-time video streams comprise visible light videos and event streams captured by the event camera, the visible light videos are visible light activity scenes in a mine, the visible light activity scenes comprise states of miner activities, vehicle operation and other visible objects, and the event streams captured by the event camera comprise brightness change position and time information recorded when pixel brightness changes;
The detection extraction module is used for generating an event stream containing a time stamp based on the brightness change of pixels in the event stream captured by the event camera, and sequencing the captured events according to the time stamp to generate a high-time-resolution event stream; preprocessing a real-time video stream, wherein preprocessing comprises noise reduction and illumination enhancement to obtain an optimized video frame, analyzing the optimized video frame by utilizing a lightweight deep learning model to obtain a detection target result, and extracting fine granularity tracking features of the detection target result according to a high-time-resolution event stream and a visible light video to obtain first features of surface interest points of the target result, wherein the first features comprise local textures, shape information and a movement mode, and the detection target result comprises preliminary positioning results of miners, vehicle targets and other visible objects;
The prediction module is used for processing the moving target in the event stream through a rapid tracking algorithm according to the first characteristics of the surface interest points to obtain the real-time position and the movement trend of the moving target, performing short-term prediction by using a Kalman filter in a TLD target movement algorithm to obtain the short-term movement track of the moving target, and obtaining an updated target model through an online learning mechanism;
The recognition module is used for obtaining the motion state of the moving target based on the updated target model and the real-time video frame by utilizing a motion detection algorithm, wherein the motion detection algorithm processing process comprises the steps of recognizing the moving target and extracting the characteristics of the moving target by analyzing the change of pixel points in an image sequence;
The tracking module is used for aiming at the color distribution characteristics of a specific event, applying a video tracking algorithm based on background segmentation to process to obtain a specific event prediction result, wherein the processing process comprises the steps of calculating a color histogram of a dynamic target, searching a region which is most matched with the histogram in a next frame, dynamically adjusting the size and shape of a target frame, combining the motion characteristics of the moving target and the specific event prediction result, and finally obtaining the dynamic target tracking result through multi-scale fusion processing and updating correction.
In a third aspect, the present application also provides a dynamic target tracking device for vehicle-mounted mining intrinsic safety video monitoring, including:
a memory for storing a computer program;
and the processor is used for realizing the step of the dynamic target tracking method of the intrinsic safety type video monitoring for the vehicle-mounted mine when executing the computer program.
In a fourth aspect, the present application further provides a readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements the steps of the dynamic target tracking method based on intrinsic safety video monitoring for vehicle-mounted mining.
The beneficial effects of the invention are as follows:
According to the invention, the front end processing of the video is performed by utilizing the vehicle-mounted computing platform, the data transmission delay is reduced, the real time of the system is improved, the interest points on the surface of the target are focused by a fine granularity tracking technology, the accuracy of target detection is improved, the dynamic change in a scene is captured by utilizing the event camera, a high-time-resolution event stream is generated, the high-efficiency tracking of a fast moving target is realized, the accuracy of target detection and tracking is improved by adopting a target detection and tracking algorithm based on deep learning, the size and shape of a target frame are dynamically adjusted according to the color distribution of the target by utilizing a video tracking algorithm based on background segmentation, the tracking capability of events such as ore slip and gas leakage is improved, the dynamic monitoring and intelligent analysis of coal mine disaster events are realized, and the accuracy and response speed of early warning are improved.
The invention can accurately detect the targets such as miners, vehicles and the like by analyzing the optimized video frames by using a lightweight deep learning model, can keep high accuracy even in a complex environment such as a mine, can obtain first characteristics including local texture, shape information and a motion mode by extracting fine-granularity tracking characteristics of the detected targets, has stronger robustness on illumination change and shielding, is favorable for keeping a stable tracking effect in a severe environment, can track the rapidly moving targets in real time by utilizing a high-time-resolution event stream captured by an event camera and combining a rapid tracking algorithm, is particularly suitable for dynamic monitoring of equipment or personnel in the mine, can adapt to dynamic change of target motion by carrying out short-term prediction by a Kalman filter in a TLD algorithm and combining an online learning mechanism to update the target model, and can improve the tracking accuracy and adaptability.
The invention dynamically adjusts the size and shape of the target frame by applying the CamShift algorithm aiming at the color distribution characteristics of specific events such as ore slip and gas leakage, can accurately track the events, timely send out early warning, improve the safety of a mine, reduce the dependence on a central server by performing front-end processing and feature extraction on a vehicle-mounted computing platform, reduce data transmission delay, improve the instantaneity and reliability of a system, optimize the use of computing resources, comprehensively utilize the motion characteristics of a dynamic target and the prediction result of the specific event by multi-scale fusion processing and updating correction, improve the intelligent decision level of the system and realize more accurate target tracking.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a dynamic target tracking method for intrinsic safety type video monitoring for vehicle-mounted mining according to an embodiment of the invention;
Fig. 2 is a schematic structural diagram of a dynamic target tracking system for intrinsic safety type video monitoring for vehicle-mounted mining according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a dynamic target tracking device for vehicle-mounted mining intrinsic safety video monitoring according to an embodiment of the present invention.
In the figure, 701, an acquisition module, 702, a detection and extraction module, 703, a prediction module, 704, an identification module, 705, a tracking module, 800, a dynamic target tracking device for intrinsic safety type video monitoring of a vehicle-mounted mine, 801, a processor, 802, a memory, 803, a multimedia component, 804, an I/O interface, 805 and a communication component.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
Example 1:
the embodiment provides a dynamic target tracking method for vehicle-mounted mining intrinsic safety type video monitoring.
Referring to fig. 1, the method is shown to include steps S100, S200, S300, S400, and S500.
S100, acquiring a real-time video stream through a vehicle-mounted camera, wherein the real-time video stream comprises a visible light video and an event stream captured by an event camera, the visible light video is a visible light activity scene in a mine, the visible light activity scene comprises states of miner activities, vehicle operation and other visible objects, and the event stream captured by the event camera comprises brightness change position and time information recorded when pixel brightness changes;
It will be appreciated that in this step, the event stream captured by the event camera, unlike a conventional camera, records events only when pixel brightness changes, generating a time-stamp based event stream containing brightness change location and time information, where the difference between the two is that the visible light video provides continuous video frames suitable for capturing details and color information in the scene, but the amount of data is large, the processing requirements are high, the event stream provides asynchronous, event-driven data suitable for capturing rapid changes, the computational resource requirements are low, but the detail information of conventional video is lacking.
S200, generating an event stream containing a time stamp based on pixel brightness change in the event stream captured by an event camera, sorting the captured event according to the time stamp, generating a high-time-resolution event stream, preprocessing a real-time video stream, wherein the preprocessing comprises noise reduction and illumination enhancement, obtaining an optimized video frame, analyzing the optimized video frame by using a lightweight deep learning model, obtaining a detection target result, carrying out fine-grained tracking feature extraction on the detection target result according to the high-time-resolution event stream and a visible light video, and obtaining a first feature of a surface interest point of the target result, wherein the first feature comprises local texture, shape information and a motion mode, and the detection target result comprises preliminary positioning results of miners, vehicle targets and other visible objects.
The target surface interest points refer to points with unique and obvious characteristics on the surface of a target object in image processing and computer vision. These points are generally easily identified and tracked in the image because they have a significant texture change or color change in the surrounding area so that they can stand out from other image areas. In object tracking and recognition, points of interest are key features because they facilitate feature matching, i.e., matching the same object between different images even if the object is rotated, scaled, or illuminated, object positioning, i.e., accurately positioning and tracking the object in the images, and motion estimation, i.e., analyzing the motion pattern and direction of the object. Common points of interest include corner points, edge points, or other significant texture change points. In practical applications, various algorithms may be used to detect these points, such as Harris corner point detector, FAST or SIFT, etc.
In this step, based on the brightness change of the pixels in the event stream captured by the event camera, generating an event stream containing a timestamp, and sorting the captured events according to the timestamp, generating an event stream with high time resolution, which includes:
By utilizing the characteristics of the event camera, the position and time information of brightness change when the brightness of the pixel is changed are recorded to form an event stream with high time resolution, and the calculation formula of the sequencing process is as follows:
Where Esorted denotes the ordered event stream, ecaptured denotes the original event stream captured from the event camera, sort is a function, λ is the timestamp of each event, x denotes the single event in the event stream, x.timestamp denotes the timestamp of event x, and key is a parameter used to specify a function.
It will be appreciated that S201, S202, and S203 are included in this step S200, in which:
s201, processing the optimized video frame through a YOLOv-Lite model, wherein the model obtains preliminary positioning results of miners, vehicle targets and other visible objects by multiplying the input frame with a pre-training weight and applying an activation function, and a calculation formula for performing target detection by the YOLOv-Lite model is as follows:
wherein bi represents a predicted value of a boundary box, wi and bi' are weight and bias parameters of a model, sigma is a sigmoid activation function, and x represents an input feature diagram;
S202, based on a preliminary positioning result, obtaining an optimization measure of a target candidate region through distance intersection ratio calculation, and obtaining a boundary frame of the target candidate region through non-maximum suppression processing, wherein the processing comprises the steps of calculating the intersection ratio of a prediction frame and removing an overlapped frame based on a preset threshold value, and the calculation formula of the non-maximum suppression processing is as follows:
wherein B1 and B2 are two bounding boxes, ioU is the distance intersection ratio, and area is the area;
S203, extracting features which are still identified and matched under the illumination change and shielding rotation conditions by calculating a gradient direction histogram of a local image in the boundary box by using a scale invariant feature transformation algorithm, and marking the features as first features of interest points on the target surface.
It should be noted that the features with strong robustness refer to features that can still be accurately identified and matched under various environmental changes, such as illumination condition changes, image blurring, occlusion, rotation, scale changes, and the like. These features are important for the identification and tracking of objects because they provide a capability to operate stably under different conditions.
In the present embodiment, S2031, S2032, S2033, and S2034 are further included in step S203, wherein:
S2031, constructing a differential Gaussian pyramid by utilizing a Gaussian pyramid technology based on a bounding box to obtain a first key point position, wherein the first key point position comprises points representing angular points, edges or other remarkable characteristics of a target in a mine in a scale space, the differential Gaussian pyramid is obtained by calculating the difference value of two adjacent layers of the Gaussian pyramid, and the calculation formula is as follows:
DoG(x,y,σ)=G(x,y,kσ)−G(x,y,σ)
Wherein, doG is a Gaussian difference function, G (x, y, sigma) is a Gaussian function, sigma is a standard deviation of the Gaussian function, k is a constant for controlling the change of the scale, x and y are coordinates of pixel points in an image, and k sigma is a scale larger than the current scale sigma and is used for constructing a fuzzy image of an adjacent scale in a Gaussian pyramid;
Thus, the gaussian difference function is actually calculated as the difference between two gaussian blur images of different scales, which difference image can highlight local features in the image, such as edges and corner points, which have invariance to scale transformation, and in SIFT algorithm, the gaussian difference function is used to identify potential key point positions.
S2032, carrying out extremum detection on the scale space of each pixel point on the first key point position according to the differential Gaussian pyramid to obtain a second key point position, wherein each pixel point comprises extremum points compared with 26 adjacent pixels of surrounding adjacent scale layers;
The differential gaussian pyramid is a pyramid formed by a series of images obtained by performing gaussian filtering and scale transformation on an original image. Each layer image represents a different scale (degree of blurring) and each layer image is more blurred than the next layer image. Wherein the first key point position is that in the differential gaussian pyramid, each pixel point is compared with 8 pixel points around the same scale layer and 18 pixel points (26 pixel points in total) of the upper and lower adjacent scale layers, so as to determine whether the pixel point is a local extremum point (namely, a local maximum value or minimum value of brightness). The purpose of extremum detection is to find those points that are significantly prominent in brightness, which may be potential key points. If a pixel has higher or lower luminance than the surrounding 26 pixels on the current scale of the differential gaussian pyramid, it is an extreme point. The second key point position is the pixel position identified as the extreme point in the first round of detection, and becomes the final key point position after further screening and accurate positioning. These positions represent points in the image that have significant features, such as corner points, edges, etc. Scale space refers to the representation of an image at different scales. In the SIFT algorithm, the scale space is implemented by a gaussian pyramid, where each scale layer represents a blurred version of the image at that scale.
In summary, this sentence describes how potential keypoint locations are identified in the SIFT algorithm by comparing the luminance values of each pixel point in the DoG pyramid with its surrounding neighboring pixel points, which locations are further processed in subsequent steps to determine the final keypoint.
S2033, obtaining a unique direction of each second key point position by calculating a gradient direction histogram based on gradient directions around the second key point position, wherein a main direction is allocated to the second key point position by taking a neighborhood of 16x16 pixels around the second key point position, calculating the gradient direction and the gradient of each pixel, generating a histogram of 8 directions in each sub-region of 4x 4;
In the SIFT (scale invariant feature transform) algorithm, the 8-direction histogram refers to a gradient direction histogram calculated for each sub-region within a 4×4 pixel neighborhood around a key point, and this histogram is used to describe the feature of an image part. The method comprises the steps of 8-direction histogram meaning that in image processing, in order to enable feature descriptors to have invariance to rotation, a main direction needs to be allocated to each key point, gradient calculation, wherein gradient vectors (comprising the size and the direction of gradients) of each pixel around the key point are calculated, the gradient directions point to the direction with the fastest change of image brightness, histogram construction, namely, 4x4 neighborhood is divided into 8 quadrants, each quadrant covers an angle range of 45 degrees, a histogram is constructed for each quadrant, the number of gradient directions in the quadrant is counted, and 8 directions are respectively 0 degree, 45 degree, 90 degree, 135 degree, 180 degree, 225 degree, 270 degree and 315 degree, wherein each quadrant corresponds to a direction interval and is used for counting gradient information in the direction.
It will be appreciated that in the calculation, for each pixel around the keypoint, its gradient direction is calculated, then based on the gradient direction it is determined which of the 8 quadrants the pixel belongs to, and finally the count is incremented in the histogram of the corresponding quadrant (direction). In this way, even if the image is rotated, the 8-direction histograms can still describe the same features, since they are based on local gradient directions, rather than on absolute coordinate directions. The generated 8-direction histogram is used to construct a feature descriptor of the keypoint that can uniquely identify the keypoint in the image and remain stable as the image scales and rotates.
S2034, generating a feature descriptor in a neighborhood of 16x16 pixels around the second keypoint location according to the main direction of the second keypoint location, thereby obtaining a first feature which still has features of identification and matching under illumination change and occlusion rotation conditions and is recorded as a target surface interest point, wherein the descriptor is constructed by calculating a histogram of gradient directions in each 4x4 sub-region, and finally forming a 128-dimensional feature vector.
Wherein, the "128-dimensional feature vector" is a set of values used for describing the features of the key points in the image in the SIFT algorithm, and includes the image information around the point for the subsequent image matching and recognition process, and the 128 dimensions describe gradient direction histogram information around the key points, and each dimension corresponds to gradient statistical information in one direction in a 4x4 sub-area in a 16x16 pixel area around the key points. Wherein the association with "and noting the feature as the first feature of the target surface interest point" is that this 128-dimensional feature vector is a detailed description of the keypoints in the image (i.e. the interest points of the target surface), which captures local image features around the keypoints, including information of gradient direction and magnitude, etc., while these feature descriptors enable the algorithm to identify and match the same target surface interest point under different scales, directions and lighting conditions. In the SIFT algorithm, each keypoint is assigned a principal direction, and its surrounding 16x16 pixel neighborhood is divided into 16 sub-regions, each of which computes an 8-direction gradient histogram that combines to form a 128-dimensional feature vector. It will be appreciated that this vector provides a unique "fingerprint" for each keypoint in the image so that it can be accurately identified and matched during image matching. The feature vectors contained in the 128-dimensional feature vector have good adaptability to the common changes of dust, water vapor and dim light in the mine, the 128-dimensional feature vectors can be normalized so as to reduce the influence of illumination change, improve the stability and accuracy of feature matching, and also contribute to improving the reliability of a mine monitoring system.
S300, according to the first characteristics of the surface interest points, processing the moving targets in the event stream through a quick tracking algorithm to obtain real-time positions and movement trends of the moving targets, performing short-term prediction by using a Kalman filter in a TLD target movement algorithm to obtain short-term movement tracks of the moving targets, and obtaining updated target models through an online learning mechanism;
Fast moving objects are tracked using event stream based tracking algorithms, such as optical flow methods or kalman filters. The method can quickly respond to the motion change of the target, realize real-time tracking, and obtain the real-time position and motion trend of the quick-moving target through algorithm processing. This information is critical to predicting potential safety hazards.
It will be appreciated that S301, S302 and S303 are included in this step S300, wherein:
s301, calculating a color histogram of a moving target area based on a first feature of the surface interest point, wherein the calculation formula is as follows:
Wherein H is the color histogram of the target area, N is the total number of pixels in the area, θ is the color angle, δ is the dirac delta function for constructing the color histogram, i is the current term number in the summation process, and θi is the color angle of the ith pixel point;
s302, calculating a region similar to the color histogram of the moving target in the search window to determine the current position of the moving target, obtaining a matching result, and updating the position of the moving target according to the matching result, wherein the calculation formula is as follows:
Wherein S is the similarity in the search window, hw is the color histogram in the search window, hi is the color histogram of the target model, x, y are the pixel coordinates in the image or the search window, wherein x and y traverse each pixel point in the search window;
In this step, it can be appreciated that the real-time position of the target is determined by finding the window position where the similarity is the greatest.
S303, analyzing the movement trend of the moving object based on the updated position of the moving object, wherein the movement trend comprises a movement direction and a movement speed.
In this step, the movement trend includes a movement direction and a speed, wherein the speed calculation formula is as follows:
Where v is the speed of the target and d is the distance the target moves in time t.
In mine monitoring, the calculation of velocity can help to understand how fast the target is moving.
The direction calculation formula is as follows:
where θ represents the direction of movement of the object, Δy and Δx are the displacements of the object on the y-axis and x-axis, respectively, and the calculation of the direction is important for tracking the movement trajectory of the object and predicting its possible travel route. Through the two steps, the mine monitoring system can track the positions of targets such as miners, vehicles and the like in real time and analyze the movement trend of the targets, so that the safety and the efficiency of mine operation are improved.
It will be appreciated that this step S300 further includes S304, S305, and S306, where:
S304, performing optimal estimation calculation on the state of the target through Kalman filter processing based on the real-time position and the motion trend of the moving target to obtain a short-term motion track of the moving target, wherein the calculation process comprises a prediction step and an updating step, the prediction step is to estimate the position and the speed of the moving target in the next frame, and the updating step is to adjust and predict according to new observation data;
the prediction step is to estimate the position and speed of the target in the next frame, and the following formula is used:
Where x k-1 is an a priori state estimate at time step k, In the form of a state transition matrix,For posterior state estimation at time step k-1,In order to control the matrix,Is a control input.
And updating, namely adjusting prediction according to the new observation data, wherein the following formula is used:
In the formula, For posterior state estimation at time step k,In order for the kalman gain to be achieved,For observations, x k-1 is a priori state estimate at time step k,Is an observation matrix.
S305, collecting positive and negative samples according to a short-term motion track of a moving target to form a training data set for model updating, wherein the positive samples are from detection results of a tracking target, and the negative samples are from a region identified as a non-target;
S306, processing the training data set by applying an online random gradient descent algorithm, updating model parameters online, and adapting to the change of the appearance of the target, thereby obtaining an updated target model reflecting the current state of the moving target.
It will be appreciated that in the mine monitoring scenario of this step, this means that the model can be updated immediately each time a new video frame or sensor data arrives to accommodate changes in the dynamic targets within the mine. In a specific application, the updating process of the SGD may be described as the following steps:
Firstly, initializing a model parameter theta according to the state of a previous model, calculating a gradient ∇ θL (theta, xi, yi) of a loss function L (theta, xi, yi) on the parameter theta for each newly arrived sample (xi, yi) through an online learning mechanism, and updating the model parameter by using an updating rule of random gradient descent:
where η is the learning rate, an superparameter, controlling the step size of each update, θnew represents the model parameters after update, θold represents the model parameters before update, In order to achieve the effect that the loss function L is subjected to gradient of a sample (xi, yi) under the current parameter thetaold, then on-line learning possibly needs to dynamically adjust the learning rate eta to adapt to the change of a target, the step can be realized through a learning rate attenuation strategy or an adaptive learning rate algorithm, finally after model updating, the performance of the model on new data is evaluated, and if the model is not good, the learning rate or the model structure possibly needs to be further adjusted.
In summary, in mine monitoring scenarios, the SGD may be used to update the target detection and tracking model in real-time to accommodate dynamic changes in targets such as miners and vehicles. For example, if the model performs poorly on new samples, the model parameters can be quickly adjusted by the SGD to improve tracking accuracy and robustness. This method is particularly useful in environments where the data stream is changing in mines because it can quickly respond to changes in the appearance of the target, such as changes due to dust, changes in illumination, or occlusion of the target.
S400, based on the updated target model and the real-time video frame, a motion detection algorithm is used for processing to obtain the motion state of the moving target, wherein the motion detection algorithm processing process comprises the steps of identifying the moving target and extracting the characteristics of the moving target by analyzing the change of pixel points in an image sequence, and the motion characteristics of the moving target are obtained through self-adaptive tracking algorithm processing according to the motion state and the updated target model.
It is understood that S401, S402, S403, and S404 are included in the present step S400, in which:
S401, according to the updated target model and the real-time video frame, preliminary motion information of the moving target is obtained through inter-frame difference algorithm processing, and the inter-frame difference algorithm detects the moving target by calculating pixel level differences between continuous video frames:
where It is the current frame, For the previous frame, motion is Motion information, and the step provides a basis for subsequent feature extraction by utilizing pixel change caused by target Motion in a mine environment;
s402, according to the motion information obtained in the step S401, the motion vector and the speed of the moving object are obtained through processing by an optical flow based algorithm, and the optical flow algorithm estimates the motion characteristics of the moving object by analyzing the displacement of the object in continuous frames:
In the formula, Is the position of the target in the current frame,Is the position of the previous frame, Δt is the time interval, flow represents the motion vector of the target, and the motion vector extracted in this step reflects the motion trend of the target in the mine environment.
S403, according to the motion vector extracted in the step S402, obtaining an updated target model through Kalman filter processing, wherein the Kalman filter is a recursive algorithm, and estimates the target state through the steps of prediction and updating;
s404, according to the updated target model in the step S403, the accurate motion characteristics of the moving target are obtained through processing of an adaptive tracking algorithm, wherein the adaptive tracking algorithm can dynamically adjust tracking parameters according to the motion characteristics of the target so as to adapt to complex and changeable environments in a mine, and the calculation formula is as follows:
Wherein Track represents the tracking result, f is an adaptive tracking function, updated Model is an Updated target Model, motion Features are Motion Features, and the step ensures stable and accurate tracking of moving targets in a mine environment.
S500, aiming at the color distribution characteristics of a specific event, a video tracking algorithm based on background segmentation is applied to process to obtain a specific event prediction result, wherein the processing process comprises the steps of calculating a color histogram of a dynamic target, searching a region which is most matched with the histogram in a next frame, dynamically adjusting the size and shape of a target frame, combining the motion characteristics of the moving target and the specific event prediction result, and finally obtaining the dynamic target tracking result through multi-scale fusion processing and updating correction.
It will be appreciated that in this step, the color distribution characteristics of a particular event generally refer to the occurrence of a particular event, such as ore slip or gas leakage, which would exhibit a unique pattern of color change in the video image. For example, ore slip may generate a lot of dust, which may appear as abrupt changes in color or a significant increase in gray scale value for a specific area in a video, and gas leakage may appear as abnormal shadows or color changes in a video. These changes in color and brightness can be used as important cues for event detection.
In handling such specific events, these color distribution characteristics are typically analyzed and utilized for event identification and tracking. This includes color histogram analysis, which is to generate a color histogram by calculating the number of pixels of each color in an image to learn the overall color distribution of the image, color moment calculation, which is a statistic describing the color distribution of the image, including the mean, variance, skewness, etc., color space conversion, which is to convert the image from RGB color space to HSV or other color space to better emphasize specific color features, color clustering, which is to quantize the colors of the image using an algorithm such as K-Means, which clusters the colors in the image into several main color categories, and color correlation diagram, which is to analyze the correlation of colors between pixels, describing the spatial distribution and relationship of colors in the image.
By the method, the characteristics of the specific events can be captured and described from the perspective of color, and further the identification and tracking of the events are realized.
It will be appreciated that in this step S500, S501, S502 and S503 are included, which include:
S501, calculating a color histogram of a specific event in a mine to obtain a color feature description of a target, wherein the color distribution characteristics comprise the concentration, the distribution range and the uniformity of the color;
It should be noted that the "specific event" refers to various abnormal conditions that may occur in the mine, such as ore slip, gas leakage, and the like. These events are often accompanied by color changes such as ore slip, which may produce large amounts of soot that may appear to be of a particular color, such as black or gray. Gas leakage, gas leakage may be accompanied by a special color change, such as a yellow or orange tone, because gas leakage may react with other substances in the air to produce a color change.
The color distribution characteristics refer to distribution modes of colors in the specific events, such as concentration of the colors, distribution range, uniformity of the colors, and the like. The color characterization of the object is the quantification of these color distribution characteristics, which includes the frequency distribution of colors, i.e., the frequency of occurrence of different colors in the image, the spatial distribution of colors, i.e., the distribution of colors in the image space, such as whether concentrated or dispersed, and the brightness and saturation of colors, i.e., the darkness and vividness of colors.
S502, searching a region which is most matched with the calculated color histogram in the next frame based on the color feature description of the target to obtain the potential position of the target, wherein the process of matching the histogram comprises the step of comparing the color histogram of each pixel point with the target histogram to find the best matching region;
It should be noted that the matching process of the best matching region generally involves the steps of calculating a target histogram, first calculating a color histogram of a target region (e.g., a gas leakage region), searching for a next frame, searching for a region that best matches the target histogram in the next frame of the video, comparing the color histogram of each candidate region to the target histogram, generally using a histogram comparison algorithm such as correlation matching, chi-square testing, bayesian classification, etc., and determining the best matching region, namely selecting the region that has the highest degree of matching with the target histogram as the best matching region. The best matching region refers to a region in the current frame, the color histogram of which is most similar to the target histogram, and the region is most likely to be the place where the target event occurs.
S503, dynamically adjusting a target frame according to the potential position of the target by using a CamShift algorithm according to the color distribution of the target, wherein the method comprises initializing the target frame, tracking the target in a subsequent frame by using a color histogram, iteratively updating the position and the size of the target frame to match the color distribution of the target, and iteratively updating the new target frame position until the pixel color angle which is most matched with the color histogram of the target is obtained, thereby obtaining a specific event prediction result, wherein the calculation formula is as follows:
In the formula, In order to update the target position after the update,To find the coordinate value that maximizes the following expression among all possible (x, y) coordinates, θi is the pixel color angle in the current frame that matches the target color histogram, θ is one specific color angle in the target color histogram, N is the total number of pixels in the search window,To accumulate all values of i from 1 to N, δ is a dirac function for counting the number of pixels per color angle.
In this step, the process of dynamically adjusting the target box according to the color distribution of the target generally involves initializing the target box, determining the approximate position and size of the target in the initial frame, color histogram tracking, tracking the target using the color histogram in the subsequent frame, and CamShift algorithm, adjusting the target box using the CamShift algorithm. The CamShift algorithm iteratively updates the position and size of the target frame to match the color distribution of the target, updates the target frame such that in each iteration the algorithm calculates a new target frame position based on the pixel color angle in the current frame that best matches the target color histogram, and adapts to target changes such that the target frame dynamically adjusts to keep track of the target as the target moves and changes. In summary, in this way, the algorithm can keep track of the object and maintain the accuracy of the object box even if the object moves in the video or its color distribution changes.
In this step, in combination with specific events (such as ore slip, gas leakage, etc.) in the mine, prediction models (possibly based on machine learning or statistical analysis) are used to predict the occurrence of these events and analyze their impact on the target motion, and then various features are usually fused in the target tracking algorithm to improve the accuracy and robustness of tracking. For example, the features may be combined by linear weighting or other fusion strategies in combination with color features, fast direction gradient histograms, local binary pattern features, etc., and in terms of scale fusion, it is necessary to track the object on different scales because it may appear as a scale change (i.e., a change in size) in the video, by constructing a scale filter that is able to respond to the features of the object on different scales and fuse the responses to estimate the scale and location of the object.
It should be noted that, during the target tracking process, tracking drift may be caused by occlusion, illumination change, or other interference factors. To cope with this, a self-correction mechanism such as peak-to-side lobe ratio check may be introduced, and when an abnormality in the tracking output is detected, the self-correction mechanism is started to correct the tracking output so as to accurately re-track the target. Then, as the characteristics of the target change, the tracking model also needs to be updated accordingly, which is achieved by online learning or incremental learning, that is, the model parameters are updated at each frame or every few frames to adapt to the dynamic change of the target.
It will be appreciated that through the above steps, the resulting dynamic object tracking results are a continuous time series of the position, scale and motion state of the object in the video sequence, which information can be used for real-time monitoring, early warning systems or further analysis processes. In an actual mining vehicle-mounted video monitoring system, the tracking results can be used for improving the safety of mine operation, such as preventing accidents such as collision or landslide by monitoring the movement state of a mine car, or timely taking measures by detecting early signs of specific events.
Example 2:
As shown in fig. 2, this embodiment provides a dynamic target tracking system for on-vehicle mining intrinsic safety video monitoring, where the system referring to fig. 2 includes:
The acquisition module 701 is configured to acquire a real-time video stream through a vehicle-mounted camera, where the real-time video stream includes a visible light video and an event stream captured by an event camera, where the visible light video is a visible light activity scene in a mine, including a state of a miner activity, a vehicle running, and other visible objects, and the event stream captured by the event camera includes position and time information of a brightness change recorded when a pixel brightness change;
The detection extraction module 702 is used for generating an event stream containing a timestamp based on pixel brightness change in the event stream captured by the event camera, sequencing the captured event according to the timestamp, and generating a high-time-resolution event stream, preprocessing the real-time video stream, wherein the preprocessing comprises noise reduction and illumination enhancement, obtaining an optimized video frame, analyzing the optimized video frame by using a lightweight deep learning model, obtaining a detection target result, carrying out fine-granularity tracking feature extraction on the detection target result according to the high-time-resolution event stream and the visible light video, and obtaining a first feature of a surface interest point of the target result, wherein the first feature comprises local texture, shape information and a motion mode, and the detection target result comprises preliminary positioning results of miners, vehicle targets and other visible objects;
The prediction module 703 is configured to process, according to a first feature of the surface interest point, a moving target in the event stream through a fast tracking algorithm to obtain a real-time position and a motion trend of the moving target, and perform short-term prediction by using a kalman filter in a TLD target moving algorithm to obtain a short-term motion track of the moving target, and obtain an updated target model through an online learning mechanism;
The recognition module 704 is used for obtaining the motion state of the moving target based on the updated target model and the real-time video frame by using a motion detection algorithm, wherein the motion detection algorithm processing process comprises the steps of recognizing the moving target and extracting the characteristics thereof by analyzing the change of pixel points in an image sequence;
The tracking module 705 is configured to apply a video tracking algorithm based on background segmentation to obtain a prediction result of a specific event according to the color distribution characteristics of the specific event, wherein the processing includes calculating a color histogram of a dynamic target, searching a region most matched with the histogram in a next frame, dynamically adjusting the size and shape of a target frame, combining the motion characteristics of the moving target and the prediction result of the specific event, and finally obtaining the tracking result of the dynamic target through multi-scale fusion processing and updating correction.
Specifically, the detection extraction module 702 includes:
the first processing unit is used for processing the optimized video frame through a YOLOv-Lite model, the model obtains preliminary positioning results of miners, vehicle targets and other visible objects by multiplying the input frame with a pre-training weight and applying an activation function, and a calculation formula for detecting the targets by the YOLOv-Lite model is as follows:
wherein bi represents a predicted value of a boundary box, wi and bi' are weight and bias parameters of a model, sigma is a sigmoid activation function, and x represents an input feature diagram;
The second processing unit is used for obtaining an optimization measure of the target candidate region through distance intersection ratio calculation based on the preliminary positioning result, and obtaining a boundary frame of the target candidate region through non-maximum value inhibition processing, wherein the processing process comprises the steps of calculating the intersection ratio of a prediction frame and removing an overlapped frame based on a preset threshold value, and the calculation formula of the non-maximum value inhibition processing is as follows:
wherein B1 and B2 are two bounding boxes, ioU is the distance intersection ratio, and area is the area;
The extraction unit is used for extracting the characteristics which are still identified and matched under the illumination change and shielding rotation conditions by calculating the gradient direction histogram of the partial image in the boundary box by utilizing a scale-invariant characteristic transformation algorithm, and recording the characteristics as first characteristics of the interest points of the target surface.
Specifically, the extraction unit includes:
the construction unit is used for constructing a differential Gaussian pyramid by utilizing a Gaussian pyramid technology based on a bounding box to obtain a first key point position, wherein the first key point position comprises points representing angular points, edges or other remarkable characteristics of a target in a mine in a scale space, the differential Gaussian pyramid is obtained by calculating the difference value of two adjacent layers of the Gaussian pyramid, and the calculation formula is as follows:
DoG(x,y,σ)=G(x,y,kσ)−G(x,y,σ)
Wherein, doG is a Gaussian difference function, G (x, y, sigma) is a Gaussian function, sigma is a standard deviation of the Gaussian function, k is a constant for controlling the change of the scale, x and y are coordinates of pixel points in an image, and k sigma is a scale larger than the current scale sigma and is used for constructing a fuzzy image of an adjacent scale in a Gaussian pyramid;
The detection unit is used for carrying out extremum detection on the scale space of each pixel point on the first key point position according to the differential Gaussian pyramid to obtain a second key point position, wherein each pixel point comprises extremum points which are compared with 26 adjacent pixels of surrounding adjacent scale layers;
The first calculation unit is used for obtaining a unique direction of each second key point position by calculating a gradient direction histogram based on gradient directions around the second key point position, wherein a neighborhood of 16x16 pixels is taken around the second key point position, the gradient direction and the gradient of each pixel are calculated, an 8-direction histogram is generated in each 4x4 sub-region, and a main direction is allocated to the second key point position;
And the generating unit is used for generating a characteristic descriptor in a neighborhood of 16x16 pixels around the second key point according to the main direction of the second key point position, so as to obtain a first characteristic which still has the characteristics of identification and matching under the conditions of illumination change and shielding rotation and is recorded as a target surface interest point, wherein the descriptor is constructed by calculating a histogram of gradient directions in each 4x4 subarea, and finally a 128-dimensional characteristic vector is formed.
Specifically, the detection extraction module 702 includes:
the recording unit is used for recording the brightness change position and time information when the brightness of the pixel changes by utilizing the characteristics of the event camera to form an event stream with high time resolution, and the calculation formula of the sequencing process is as follows:
Where Esorted denotes the ordered event stream, ecaptured denotes the original event stream captured from the event camera, sort is a function, λ is the timestamp of each event, x denotes the single event in the event stream, x.timestamp denotes the timestamp of event x, and key is a parameter used to specify a function.
Specifically, the prediction module 703 includes:
a second calculation unit for calculating a color histogram of the moving target region based on the first feature of the surface interest point, wherein the calculation formula is as follows:
Wherein H is the color histogram of the target area, N is the total number of pixels in the area, θ is the color angle, δ is the dirac delta function for constructing the color histogram, i is the current term number in the summation process, and θi is the color angle of the ith pixel point;
the matching updating unit is used for calculating the region similar to the color histogram of the moving target in the search window to determine the current position of the moving target, obtaining a matching result, and updating the position of the moving target according to the matching result, wherein the calculation formula is as follows:
Wherein S is the similarity in the search window, hw is the color histogram in the search window, hi is the color histogram of the target model, x, y are the pixel coordinates in the image or the search window, wherein x and y traverse each pixel point in the search window;
and the analysis unit is used for analyzing the movement trend of the moving object based on the updated position of the moving object, wherein the movement trend comprises a movement direction and a movement speed.
Specifically, the prediction module 703 includes:
the estimation unit is used for carrying out optimal estimation calculation on the state of the target through the treatment of a Kalman filter based on the real-time position and the motion trend of the moving target to obtain a short-term motion track of the moving target, wherein the calculation process comprises a prediction step and an updating step, the prediction step is used for estimating the position and the speed of the moving target in the next frame, and the updating step is used for carrying out adjustment prediction according to new observation data;
The training unit is used for collecting positive and negative samples according to the short-term motion trail of the moving target to form a training data set for model updating, wherein the positive samples are from detection results of the tracking target, and the negative samples are from areas which are identified as non-targets;
And the updating adaptation unit is used for processing the training data set by applying an online random gradient descent algorithm, updating model parameters online and adapting to the change of the appearance of the target so as to obtain an updated target model reflecting the current state of the moving target.
Specifically, the tracking module 705 includes:
The description unit is used for calculating a color histogram of a specific event in the mine to obtain a color characteristic description of the target, wherein the color distribution characteristics comprise the concentration, the distribution range and the uniformity of the color;
The searching and matching unit is used for searching the area which is most matched with the calculated color histogram in the next frame based on the color feature description of the target to obtain the potential position of the target, wherein the process of matching the histogram comprises the step of comparing the color histogram of each pixel point with the target histogram to find the best matching area;
The iteration unit is used for dynamically adjusting the target frame according to the potential position of the target by using a CamShift algorithm according to the color distribution of the target, wherein the method comprises the steps of initializing the target frame, tracking the target in a subsequent frame by using a color histogram, iteratively updating the position and the size of the target frame to match the color distribution of the target, and iteratively updating the position of the target frame until the pixel color angle which is most matched with the color histogram of the target is obtained, wherein the calculated formula is as follows:
In the formula, In order to update the target position after the update,To find the coordinate value that maximizes the following expression among all possible (x, y) coordinates, θi is the pixel color angle in the current frame that matches the target color histogram, θ is one specific color angle in the target color histogram, N is the total number of pixels in the search window,To accumulate all values of i from 1 to N, δ is a dirac function for counting the number of pixels per color angle.
It should be noted that, regarding the system in the above embodiment, the specific manner in which the respective modules perform the operations has been described in detail in the embodiment regarding the method, and will not be described in detail herein.
Example 3:
corresponding to the above method embodiment, in this embodiment, a dynamic target tracking device for intrinsic safety type video monitoring of a vehicle-mounted mine is further provided, and a dynamic target tracking device for intrinsic safety type video monitoring of a vehicle-mounted mine described below and a dynamic target tracking method for intrinsic safety type video monitoring of a vehicle-mounted mine described above may be referred to correspondingly with each other.
Fig. 3 is a block diagram illustrating a dynamic target tracking device 800 for in-vehicle mining intrinsic safety video monitoring, according to an exemplary embodiment. As shown in fig. 3, the dynamic target tracking device 800 for intrinsic safety type video monitoring of the vehicle-mounted mine comprises a processor 801 and a memory 802. The dynamic target tracking device 800 for in-vehicle mining intrinsic safety video monitoring also includes one or more of a multimedia component 803, an i/O interface 804, and a communication component 805.
The processor 801 is configured to control overall operation of the dynamic target tracking device 800 for intrinsic safety type video monitoring of a vehicle-mounted mine, so as to complete all or part of the steps in the dynamic target tracking method for intrinsic safety type video monitoring of a vehicle-mounted mine. The memory 802 is used to store various types of data to support the operation of the dynamic object tracking device 800 for intrinsic safety video monitoring of the on-board mining, such data may include, for example, instructions for any application or method operating on the dynamic object tracking device 800 for intrinsic safety video monitoring of the on-board mining, as well as application-related data such as contact data, messages, pictures, audio, video, and the like. The Memory 802 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia component 803 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in the memory 802 or transmitted through the communication component 805. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 804 provides an interface between the processor 801 and other interface modules, which may be a keyboard, mouse, or buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 805 is configured to perform wired or wireless communication between the dynamic target tracking device 800 and other devices for intrinsic safety type video monitoring of the vehicle-mounted mine. Wireless communication, such as Wi-Fi, bluetooth, near field communication (Near FieldCommunication, NFC for short), 2G, 3G, or 4G, or a combination of one or more thereof, and thus the corresponding communication component 805 may include a Wi-Fi module, a bluetooth module, or an NFC module.
In an exemplary embodiment, the dynamic target tracking device 800 for on-board mining intrinsic safety type video monitoring may be implemented by one or more Application Specific Integrated Circuits (ASIC), digital signal processor (DIGITALSIGNAL PROCESSOR DSP), digital signal processing device (DIGITAL SIGNAL Processing Device DSPD), programmable logic device (Programmable Logic Device PLD), field programmable gate array (Field Programmable GATE ARRAY FPGA), controller, microcontroller, microprocessor or other electronic components for performing the above-described dynamic target tracking method for on-board mining intrinsic safety type video monitoring.
In another exemplary embodiment, a computer readable storage medium is also provided that includes program instructions that, when executed by a processor, implement the steps of the dynamic target tracking method for on-board mining intrinsic safety video monitoring described above. For example, the computer readable storage medium may be the memory 802 including program instructions described above that are executable by the processor 801 of the on-board mining intrinsically-safe video-monitoring dynamic object tracking device 800 to perform the on-board mining intrinsically-safe video-monitoring dynamic object tracking method described above.
Example 4:
Corresponding to the above method embodiment, a readable storage medium is further provided in this embodiment, and a readable storage medium described below and a dynamic target tracking method for on-vehicle mining intrinsic safety video monitoring described above may be referred to correspondingly.
The readable storage medium stores a computer program which is executed by a processor to realize the steps of the dynamic target tracking method for the vehicle-mounted mining intrinsic safety type video monitoring of the method embodiment.
The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, which may store various program codes.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411504952.7A CN119027860B (en) | 2024-10-28 | 2024-10-28 | Dynamic target tracking method and system for intrinsic safety type video monitoring of vehicle-mounted mine |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411504952.7A CN119027860B (en) | 2024-10-28 | 2024-10-28 | Dynamic target tracking method and system for intrinsic safety type video monitoring of vehicle-mounted mine |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN119027860A CN119027860A (en) | 2024-11-26 |
| CN119027860B true CN119027860B (en) | 2025-01-24 |
Family
ID=93529416
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411504952.7A Active CN119027860B (en) | 2024-10-28 | 2024-10-28 | Dynamic target tracking method and system for intrinsic safety type video monitoring of vehicle-mounted mine |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119027860B (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120071300A (en) * | 2025-02-13 | 2025-05-30 | 深圳云亿通数字技术有限公司 | Multi-target-area dynamic vehicle pedestrian recognition method and system |
| CN119649414B (en) * | 2025-02-17 | 2025-05-30 | 南京混沌信息科技有限公司 | Multi-target fish tracking method for land-based industrial circulating water culture scene |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102881022A (en) * | 2012-07-20 | 2013-01-16 | 西安电子科技大学 | Concealed-target tracking method based on on-line learning |
| CN112700475A (en) * | 2020-12-31 | 2021-04-23 | 荆门汇易佳信息科技有限公司 | Self-adaptive multi-target video tracking system under different scenes |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106683121A (en) * | 2016-11-29 | 2017-05-17 | 广东工业大学 | Robust object tracking method in fusion detection process |
| CN109785365B (en) * | 2019-01-17 | 2021-05-04 | 西安电子科技大学 | A real-time object tracking method for addressing event-driven unstructured signals |
| CN116402852B (en) * | 2023-03-24 | 2025-07-29 | 西安电子科技大学广州研究院 | Dynamic high-speed target tracking method and device based on event camera |
| CN117689686A (en) * | 2023-11-16 | 2024-03-12 | 西安科技大学 | An event camera-based detection method for large coal retention |
| CN118129692A (en) * | 2024-01-25 | 2024-06-04 | 深圳云程科技有限公司 | A method and system for tracking the position of a target moving object |
-
2024
- 2024-10-28 CN CN202411504952.7A patent/CN119027860B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102881022A (en) * | 2012-07-20 | 2013-01-16 | 西安电子科技大学 | Concealed-target tracking method based on on-line learning |
| CN112700475A (en) * | 2020-12-31 | 2021-04-23 | 荆门汇易佳信息科技有限公司 | Self-adaptive multi-target video tracking system under different scenes |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119027860A (en) | 2024-11-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN119027860B (en) | Dynamic target tracking method and system for intrinsic safety type video monitoring of vehicle-mounted mine | |
| US9213901B2 (en) | Robust and computationally efficient video-based object tracking in regularized motion environments | |
| Bertini et al. | Multi-scale and real-time non-parametric approach for anomaly detection and localization | |
| US9323991B2 (en) | Method and system for video-based vehicle tracking adaptable to traffic conditions | |
| Gupte et al. | Detection and classification of vehicles | |
| CN108171196B (en) | Face detection method and device | |
| US7916944B2 (en) | System and method for feature level foreground segmentation | |
| CN115995063A (en) | Work vehicle detection and tracking method and system | |
| JP4479478B2 (en) | Pattern recognition method and apparatus | |
| Devasena et al. | Video surveillance systems-a survey | |
| Abdulghafoor et al. | A novel real-time multiple objects detection and tracking framework for different challenges | |
| Bedruz et al. | Real-time vehicle detection and tracking using a mean-shift based blob analysis and tracking approach | |
| JP3970877B2 (en) | Tracking device and tracking method | |
| CN108416780B (en) | An Object Detection and Matching Method Based on Siamese-Region of Interest Pooling Model | |
| Vignesh et al. | Abnormal event detection on BMTT-PETS 2017 surveillance challenge | |
| WO2013054130A1 (en) | Aerial survey video processing | |
| Ehsan et al. | Violence detection in indoor surveillance cameras using motion trajectory and differential histogram of optical flow | |
| CN112733770A (en) | Regional intrusion monitoring method and device | |
| Denman et al. | Multi-spectral fusion for surveillance systems | |
| Zakaria et al. | Particle swarm optimization and support vector machine for vehicle type classification in video stream | |
| Gautam et al. | Computer vision based asset surveillance for smart buildings | |
| Algethami et al. | Combining Accumulated Frame Differencing and Corner Detection for Motion Detection. | |
| Kodwani et al. | Automatic license plate recognition in real time videos using visual surveillance techniques | |
| Thotapalli et al. | Feature extraction of moving objects using background subtraction technique for robotic applications | |
| Kavitha et al. | Vision-based vehicle detection and tracking system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |