[go: up one dir, main page]

CN113256731A - Target detection method and device based on monocular vision - Google Patents

Target detection method and device based on monocular vision Download PDF

Info

Publication number
CN113256731A
CN113256731A CN202110356609.2A CN202110356609A CN113256731A CN 113256731 A CN113256731 A CN 113256731A CN 202110356609 A CN202110356609 A CN 202110356609A CN 113256731 A CN113256731 A CN 113256731A
Authority
CN
China
Prior art keywords
target
camera
coordinate system
slam
security camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110356609.2A
Other languages
Chinese (zh)
Other versions
CN113256731B (en
Inventor
梁贵钘
黄银君
汪明明
冀怀远
荆伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ningchuangyun Software Technology Co ltd
Original Assignee
Shenzhen Ningchuangyun Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Ningchuangyun Software Technology Co ltd filed Critical Shenzhen Ningchuangyun Software Technology Co ltd
Priority to CN202110356609.2A priority Critical patent/CN113256731B/en
Publication of CN113256731A publication Critical patent/CN113256731A/en
Application granted granted Critical
Publication of CN113256731B publication Critical patent/CN113256731B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C11/00Photogrammetry or videogrammetry, e.g. stereogrammetry; Photographic surveying
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method and device based on monocular vision, relates to the technical field of image recognition, and can accurately position the 3D space coordinates of a target in a scene in real time and realize real-time positioning and tracking of the target in a 3D space. The method comprises the following steps: modeling a scene by adopting an SLAM algorithm to obtain a global map model, and acquiring external parameters of an SLAM camera; acquiring an image set acquired by a security camera, and calibrating the security camera based on external parameters of the SLAM camera to obtain the external parameters of the security camera in a global map model; identifying a target in an image acquired by a security camera through a target detection technology, and performing monocular distance measurement on the target; and (4) coordinates of the target in the camera coordinate system are converted into a global coordinate system determined by the SLAM camera, and the coordinates are displayed in real time in the global map model. The device is applied with the method provided by the scheme.

Description

Target detection method and device based on monocular vision
Technical Field
The invention relates to the technical field of image recognition, in particular to a target detection method and device based on monocular vision.
Background
In the times of vigorous development of artificial intelligence, various applications of artificial intelligence are developed like bamboo shoots in spring after rain. With the advent of the intelligent age, the retail model of offline versus online integration or offline replication became a booming line of research. The method has the advantages that the target in the scene is detected, positioned and tracked in real time, so that the motion track and the activity area of the target are obtained, and the method has important significance for digitalization and intellectualization of offline operation.
At present, a target positioning method includes a GPS method based on equipment carried by the target, a visual odometer SLAM method and a method of detecting and tracking by means of an externally-mounted camera. The method depending on the GPS is often out of order for indoor scenes, the method for tracking and positioning based on the equipment carried by the target is not suitable for retail scenes, the method for tracking and positioning by means of the externally-mounted cameras can be divided into a method depending on the overlapping area of the cameras and a method not depending on the overlapping area of the cameras, and the method depending on the overlapping area of the cameras, namely the method based on the binocular mode, requires a large visual field overlapping area between the associated cameras when the cameras are mounted, and then the 3D position of the target is recovered by performing image matching between the two cameras. However, the large overlapping area of the fields of view means that the effective field of view of a single camera is greatly reduced, the number of cameras required per unit area is correspondingly increased, and the hardware cost of the solution is also increased. For a method independent of a camera overlapping area, the current moving track of a target breakpoint is obtained by combining technologies such as ReID and the like with position information of a camera, a real-time continuous moving track cannot be obtained, target tracking is only limited in a camera image space, and the physical meaning of a tracking result is not closely related to the real world, so that the use of the tracking result is limited. And the cross-camera tracking method relies on the high-performance ReID technology, so that high requirements are placed on the accuracy and robustness of related algorithms and the computing power of computing equipment.
Disclosure of Invention
The invention aims to provide a target detection method and device based on monocular vision, which can accurately position the 3D space coordinates of a target in a scene in real time and realize real-time positioning and tracking of the target in a 3D space.
In order to achieve the above object, a first aspect of the present invention provides a target detection method based on monocular vision, including:
modeling a scene by adopting an SLAM algorithm to obtain a global map model, and acquiring external parameters of an SLAM camera;
acquiring an image set acquired by a security camera, and calibrating the security camera based on external parameters of the SLAM camera to obtain the external parameters of the security camera in a global map model;
identifying a target in an image acquired by a security camera through a target detection technology, and performing monocular distance measurement on the target;
and (4) enabling the coordinates of the target in the camera coordinate system to be in a global coordinate system determined by the SLAM camera, and displaying the coordinates in real time in a global map model.
Preferably, the method for modeling the scene by using the SLAM algorithm to obtain the global map model and simultaneously acquiring the external parameters of the SLAM camera includes:
acquiring a point cloud reconstruction result, an SLAM camera pose set and a reconstruction image frame set of a scene by adopting an SLAM algorithm;
constructing a global map model according to the point cloud reconstruction result, and scanning and shooting a scene by adopting an SLAM camera to obtain a reconstructed image frame set consisting of a plurality of scene images;
and sequentially acquiring the feature points in the two adjacent scene images by using a FAST algorithm, solving feature descriptors of the feature points, sequentially matching similar feature points in the two adjacent scene images based on the feature descriptors of the feature points, constructing a unified camera coordinate system with the camera coordinate system of the first scene image as a reference, and converting to obtain a global coordinate system.
Preferably, the method for obtaining the external parameters of the security camera in the global map model by acquiring the image set acquired by the security camera and calibrating the security camera based on the external parameters of the SLAM camera comprises the following steps:
matching the image set acquired by the security camera with the reconstructed image frame set at the same name point;
solving the 3D space coordinate of the same-name point in the global coordinate system according to the SLAM camera pose set;
and calculating the pose parameter of the security camera relative to the SLAM camera based on the 3D space coordinate to obtain the external parameters of the security camera in a global map model.
Preferably, the method for identifying the target in the image collected by the security camera through the target detection technology and performing monocular distance measurement on the target comprises the following steps:
positioning a target in a security camera picture through a target detection technology, and identifying an external rectangular frame of the target;
calculating the average physical size of the target by adopting a clustering algorithm based on a plurality of depth image sets which are shot by a depth camera and contain the target;
and measuring the distance of the target relative to the corresponding security camera by utilizing a monocular distance measuring principle according to the width and height parameters of the external rectangular frame, the focal length of the security camera and the average physical size.
Preferably, after the step of measuring the distance of the target relative to the corresponding security camera by using the principle of monocular distance measurement, the method further comprises the following steps:
and converting the target into a coordinate relative to a camera coordinate system based on the internal reference of the security camera and the coordinate of the central point of the external rectangular frame in an image coordinate system.
Preferably, the method for displaying the coordinates of the target in the camera coordinate system to the global coordinate system determined by the SLAM camera in real time in the global map model comprises the following steps:
based on the external reference of the security camera in the global map model, converting the target coordinates relative to the camera coordinate system into 3D space coordinates relative to the target in the global coordinate system through a space coordinate system conversion matrix;
and displaying the 3D space coordinates of the target in the global coordinate system in real time in the global map model.
Preferably, the method further comprises the following steps:
and tracking the target in real time in a global map model by adopting a target tracking technology, and drawing a real-time position track and an active area thermodynamic diagram of the target.
Compared with the prior art, the target detection method based on monocular vision provided by the invention has the following beneficial effects:
the monocular vision-based target detection method provided by the invention comprises the steps of firstly modeling a monitoring scene by adopting an SLAM algorithm to obtain a global map model, obtaining external parameters of an SLAM camera, exemplarily, the external parameters are pose parameters of each SLAM camera, then utilizing a security camera installed in the monitoring scene to shoot images in real time to construct an image set, calibrating the corresponding security camera by combining the external parameters of the SLAM camera at the corresponding position to respectively obtain the external parameters of each security camera in the global map model, then identifying a target in an acquired image of the security camera by using a target detection technology, carrying out monocular distance measurement on the target to obtain the position distance of the target relative to the corresponding security camera, finally converting the coordinate of the target in a camera coordinate system into a global coordinate system, and displaying the coordinate in the global map model in real time.
In conclusion, the invention adopts the monocular distance measurement method to perform monocular positioning on the target, and compared with the GPS-based positioning method adopted in the prior art, the problem of inaccurate indoor positioning can be solved. Meanwhile, a global map model is constructed by adopting an SLAM algorithm, and the targets in the monitoring picture of the security camera are converted into the global map model to be displayed in real time, so that the targets can be continuously tracked in a 3D space.
A second aspect of the present invention provides a target detection apparatus based on monocular vision, which is applied to the target detection method based on monocular vision described in the above technical solution, and the apparatus includes:
the global modeling unit is used for modeling a scene by adopting an SLAM algorithm to obtain a global map model and acquiring external parameters of an SLAM camera;
the calibration unit is used for acquiring an image set acquired by a security camera, calibrating the security camera based on external parameters of the SLAM camera, and obtaining the external parameters of the security camera in a global map model;
the distance measurement unit is used for identifying a target in an image acquired by the security camera through a target detection technology and performing monocular distance measurement on the target;
and the coordinate conversion unit is used for converting the coordinates of the target in the camera coordinate system into a global coordinate system determined by the SLAM camera and displaying the coordinates in real time in the global map model.
Preferably, the method further comprises the following steps:
and the target tracking unit tracks the target in real time in a global map model by adopting a target tracking technology and draws a real-time position track and an activity area thermodynamic diagram of the target.
Compared with the prior art, the beneficial effects of the monocular vision based target detection device provided by the invention are the same as those of the monocular vision based target detection method provided by the technical scheme, and are not repeated herein.
A third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described monocular vision-based object detecting method.
Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the invention are the same as those of the monocular vision-based target detection method provided by the technical scheme, and are not repeated herein.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic flow chart illustrating a monocular vision based object detection method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of acquiring a SLAM camera pose set and a reconstructed image frame set by using a SLAM algorithm according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of calibrating a security camera by using a SLAM camera pose set and a reconstructed image frame set to obtain external parameters of the security camera in a global map model according to the embodiment of the present invention;
FIG. 4 is a schematic flow chart of acquiring a target 3D space coordinate by using a security camera pose set and a monocular distance measuring method in the embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, the present embodiment provides a target detection method based on monocular vision, including:
modeling a scene by adopting an SLAM algorithm to obtain a global map model, and acquiring external parameters of an SLAM camera; acquiring an image set acquired by a security camera, and calibrating the security camera based on external parameters of the SLAM camera to obtain the external parameters of the security camera in a global map model; identifying a target in an image acquired by a security camera through a target detection technology, and performing monocular distance measurement on the target; and converting the coordinates of the target in the camera coordinate system into a global coordinate system, and displaying the coordinates in the global map model in real time.
In the target detection method based on monocular vision provided by this embodiment, firstly, a SLAM algorithm is used to model a monitoring scene to obtain a global map model, and obtain external parameters of SLAM cameras, illustratively, the external parameters are pose parameters of each SLAM camera, then, images are taken in real time by security cameras installed in the monitoring scene to construct an image set, the corresponding security cameras are calibrated by combining the external parameters of the SLAM cameras at corresponding positions, the external parameters of each security camera in the global map model are respectively obtained, then, targets in images collected by the security cameras are identified by a target detection technology, and monocular distance measurement is performed on the targets to obtain the position distances of the targets relative to the corresponding security cameras, and finally, coordinates of the targets in a camera coordinate system are converted into a global coordinate system and displayed in the global map model in real time.
In summary, the present embodiment employs a monocular distance measurement method to perform monocular positioning on a target, and compared with the positioning method based on GPS, which is employed in the prior art, the present embodiment can solve the problem of inaccurate indoor positioning. Meanwhile, a global map model is constructed by adopting an SLAM algorithm, and the targets in the monitoring picture of the security camera are converted into the global map model to be displayed in real time, so that the targets can be continuously tracked in a 3D space.
Referring to fig. 2, in the foregoing embodiment, the method for obtaining the global map model by modeling the scene with the SLAM algorithm and obtaining the external parameters of the SLAM camera includes:
acquiring a point cloud reconstruction result, an SLAM camera pose set and a reconstruction image frame set of a scene by adopting an SLAM algorithm; constructing a global map model according to a point cloud reconstruction result, and scanning and shooting a scene by adopting an SLAM camera to obtain a reconstruction image frame set consisting of a plurality of scene images; and sequentially acquiring the feature points in the two adjacent scene images by using a FAST algorithm, solving feature descriptors of the feature points, sequentially matching similar feature points in the two adjacent scene images based on the feature descriptors of the feature points, constructing a unified camera coordinate system with the camera coordinate system of the first scene image as a reference, and converting to obtain a global coordinate system.
In specific implementation, a point cloud reconstruction result, a SLAM camera pose set and a reconstructed image frame set of a monitoring scene are obtained through an SLAM technology, the SLAM camera pose set comprises pose parameters of each SLAM camera and is used for representing the installation position of each SLAM camera in a global map model, and exemplarily, the SLAM camera pose set is { (R)0,T0),(R1,T1),(R2,T2),…,(Rn,Tn) Reconstructed image frame set { I }0,I1,I2,…,In}, reconstructing the image frame setImages of a scene taken continuously for a SLAM camera, wherein I0Representing the scene image taken by the SLAM camera at time 1, I1Representing the scene image taken by the SLAM camera at time 2, and so on, InRepresenting a scene image shot by an SLAM camera at the n +1 th moment, in the embodiment, an SLAM algorithm is adopted to model a monitoring scene and construct a global map model by combining a point cloud reconstruction result, the SLAM camera is used to continuously scan and shoot the monitoring scene, the scene images of the t-1 th frame and the t-th frame are taken as an example for explanation, the scene images of the t-1 th frame and the t-th frame are subjected to feature point extraction by using a FAST algorithm, in the process of extracting feature points, for each pixel point of the scene image, the pixel point is taken as a center, 16 pixel points in total are calculated for a circle with the radius of 3 (P1, P2, … and P16), for simple calculation, pixel differences of P1, P9, P5, P13 and a central point P are calculated by setting threshold values, and if the absolute value of the pixel difference of at least 3 of the pixel points exceeds the set threshold value, the central point P is selected as a candidate, and then, carrying out next investigation, if not, considering the central point P not to be a candidate angular point, if P is the candidate angular point, calculating pixel differences between the 16 pixel points P1-P16 and the central point P, if the pixel differences of at least 9 continuous pixel points and the central point P exceed a threshold value, finally considering the central point P to be a characteristic point, otherwise, considering the central point P not to be the characteristic point.
The method is adopted to sequentially traverse the adjacent frame scene images to obtain the feature points in each frame scene image, and carry out non-maximum suppression screening on the feature points in each frame scene image, and the specific mode is as follows: calculating the FAST score value (i.e. score value, i.e. s value) of the feature point in each frame of scene image, which is used for representing the absolute value sum of the difference value between the pixel point in the neighborhood and the pixel point at the central point, judging that the s value of each feature point is in a neighborhood (e.g. 3x3 or 5x5) with the feature point p as the central point, if the central point p is a plurality of feature points in the neighborhood, if the score value of the central point p is the maximum of all the feature points in the neighborhood, retaining the pixel point, otherwise, inhibiting the pixel point, and if only one feature point in the neighborhood is retained. The score s is calculated as follows, t represents the threshold:
Figure BDA0003003515840000081
after the feature points are obtained, respectively calculating feature descriptors for each feature point in the adjacent scene images, and then matching the feature points of all the feature points in the two adjacent scene images through the feature descriptors. Then, the depth image of the t-1 frame image is utilized to calculate the 3D space coordinate of the matched feature point relative to the t-1 moment camera coordinate system, and finally, the pose transformation relation (R) of the t-moment camera relative to the t-1 moment camera is obtained through pnp algorithm solvingt,Tt) By analogy, a unified camera coordinate system taking the camera coordinate system (marked as o) at the 1 st moment of scene shooting as a reference can be finally obtained, and the conversion relation of the pose conversion at the k moment relative to the pose o at the 1 st moment is shown as a formula:
Figure BDA0003003515840000082
referring to fig. 3, in the above embodiment, the method for obtaining the image set acquired by the security camera and calibrating the security camera based on the external parameters of the SLAM camera to obtain the external parameters of the security camera in the global map model includes:
carrying out similarity matching on an image set acquired by a security camera and a reconstructed image frame set, obtaining similar reconstructed image frames corresponding to each security camera image according to the similarity to form a security camera image-reconstructed image pair, and then carrying out same-name point matching on each pair; solving the 3D space coordinate of the same-name point in the global coordinate system according to the SLAM camera pose set; and calculating the pose parameters of the security camera relative to the SLAM camera based on the 3D space coordinates to obtain external parameters of the security camera in the global map model.
In specific implementation, after the security camera is installed in a monitored scene, the security camera is used for acquiring monitoring images in the scene in real time to construct an image set, then the pose of the security camera is calibrated by taking the pose parameter of the SlAM camera as a reference, and the k-th security camera is taken as an example for explanation:
the bag-of-words model visual vocabulary is obtained through the images collected by the SLAM reconstruction image frame set and the k-th camera in real time, and the visual vocabulary corresponding to the security camera to be calibrated is matched with the visual vocabulary of each image in the reconstruction image frame set to obtain a similarity set { l }0,l1,l2,…,lnAnd screening m SLAM camera pose candidates with the highest matching degree based on a threshold value. Then calibrating the security camera by taking each candidate pose as a reference so as to obtain the pose (R) in the m SLAM camera posesk,Tk) Taking the calibration reference as an example, the corresponding reconstructed image frame is IkRespectively treating the image and I collected by the camera to be calibratedkAnd solving the 3D coordinates of the homonymous points under the coordinate system of the SLAM camera through the corresponding external parameters of the SLAM camera and the depth images of the homonymous points. Obtaining the pose parameter of the security camera k relative to the SLAM camera after solving and optimizing the reprojection error through pnp
Figure BDA0003003515840000091
Finally, the poses of m cameras to be calibrated can be obtained based on the poses of the m cameras
Figure BDA0003003515840000092
Calculating the re-projection error { e) of the m poses1,e2,e3,…,emComprehensively scoring the calibrated poses according to the reprojection error and the image matching similarity, and scoring each pose as s, wherein the s calculation mode (3) is as follows:
Figure BDA0003003515840000093
sorting according to the score of each pose, selecting the pose parameter with the highest score as a calibration result, and recording as (r)k,tk). By taking the pose set of the security camera as an example { (r)0,t0),(r1,t1),(r2,t2),…,(rn,tn) N is the installation number of the security cameras, and k is less than or equal to n. The extrinsic parameters of the security camera k relative to the global coordinate system
Figure BDA0003003515840000094
The conversion relation formula (4) is satisfied as follows:
Figure BDA0003003515840000095
in the above embodiment, the method for identifying the target in the image collected by the security camera through the target detection technology and performing monocular distance measurement on the target includes:
positioning a target in a security camera picture through a target detection technology, and identifying an external rectangular frame of the target; based on a plurality of depth image sets which are shot by a depth camera and contain a target, measuring the size of the target by using the depth camera, and calculating the average physical size of the target by adopting a clustering algorithm according to the obtained plurality of measured sizes; and measuring the distance between the target and the corresponding security camera by utilizing a monocular distance measuring principle according to the width and height parameters of the external rectangular frame, the focal length of the security camera and the average physical size.
In specific implementation, the target marker in the security camera picture is positioned through a target detection technology. Taking the k number security camera as an example, the position of an external rectangular frame of the target marker is detected in the k number camera picture, and the width or height w' of the target marker image is calculated according to the width and height of the external rectangular frame. The method comprises the steps of installing a depth camera at a scene entrance, collecting images when a target enters a monitored scene, measuring by the depth camera to obtain a size set of target markers, solving the average physical size w of the target markers by a clustering algorithm, and combining the focal length f of a k-number security camerakThe distance d of the target marker relative to the k-th security camera can be measured by utilizing the monocular distance measuring principlekThe measurement solving equation (5) is as follows:
Figure BDA0003003515840000101
the coordinates of the target marker relative to the coordinate system of the corresponding camera can be obtained by solving the coordinates (u, v) of the internal reference of the security camera and the center point of the circumscribed rectangular frame of the target marker
Figure BDA0003003515840000102
The calculation formula (6) is as follows:
Figure BDA0003003515840000103
Figure BDA0003003515840000104
Figure BDA0003003515840000105
wherein c isu、cvIs the optical center coordinate f of the security camera in the x and y directionsu、fvThe focal length of the security camera is shown. In actual scene application, objects with standard sizes or small size changes, such as license plates, lane lines, heads of people and the like, can be selected as reference markers for ranging.
In the above embodiment, after the step of measuring the distance between the target and the corresponding security camera by using the monocular distance measuring principle, the method further includes:
based on the internal reference of the security camera and the coordinates of the central point of the external rectangular frame in the image coordinate system, the coordinates of the target central point are used as a target position representation and converted into coordinates relative to the camera coordinate system.
Referring to fig. 4, in the above embodiment, the method for converting the coordinates of the target in the camera coordinate system into the global coordinate system and displaying the target in the global map model in real time includes:
based on the external reference of the security camera in the global map model, converting the target coordinates relative to the camera coordinate system into 3D space coordinates relative to the target in the global coordinate system through a space coordinate system conversion matrix; and displaying the 3D space coordinates of the target in the global coordinate system in real time in the global map model.
In specific implementation, after obtaining the 3D space coordinates of the target relative to the camera coordinate system, the 3D space coordinates of the target relative to the camera coordinate system are converted into the unified camera coordinate system determined by the SLAM camera through the space coordinate system conversion matrix by using the calibrated security camera pose information, that is, the coordinates are converted into an o coordinate system. Similarly, taking the target position in the k-th security camera as an example, the reference pose calibrated by the k-th security camera is (R)k,Tk) The position coordinates of the target in the global coordinate system o are expressed as
Figure BDA0003003515840000111
And finally, displaying the 3D space coordinates of the target in the global coordinate system in the global map model in real time. The conversion formula (7) is as follows:
Figure BDA0003003515840000112
the above embodiment further includes: and tracking the target in real time in a global map model by adopting a target tracking technology, and drawing a real-time position track and an active area thermodynamic diagram of the target.
During specific implementation, a target tracking technology is utilized to directly predict, track and connect a target position in a 3D space in a global map model in real time, the real-time position of the target is obtained, and a real-time position track and an activity area thermodynamic diagram of the target in the whole scene are obtained and are used for accurately analyzing target behaviors.
In summary, the present embodiment has the following innovative points:
1. the target is monocular-positioned based on the monocular distance measurement principle, so that the problem of inaccurate indoor positioning based on a GPS positioning method is solved;
2. the target is positioned by using the security camera, continuous track information of the target can be obtained in real time, and the cost of the security camera is relatively low;
3. the method comprises the steps that a target is subjected to monocular positioning based on a monocular distance measurement principle, 3D space coordinates of the target are obtained by combining calibration information, large overlapping areas are not required among security cameras, the effective utilization area of the security cameras is increased, and the hardware cost under the unit area is reduced;
4. and carrying out one-time combined calibration on the SLAM camera in the whole scene by utilizing the SLAM camera pose obtained by combining with the SLAM scene reconstruction to form a unified coordinate system.
5. The target is tracked and positioned in the global map model in real time, the physical significance is clear, the precision is higher compared with a pure image space positioning scheme, the ReID technology is not relied on, the computing resource is saved, and the hardware cost is reduced.
Therefore, the embodiment utilizes the reusable camera calibration and monocular distance measurement technology to perform real-time full-scene 3D position locating and tracking on the target, and realizes digital modeling on the target activity track in the monitored scene under the line. The target is positioned in real time through monocular distance measurement and SLAM camera calibration technology, and the positioning error is within 10%. The applied scene of this embodiment is various, for example, utilize this embodiment can carry out location tracking to the vehicle in the parking area, acquire the real-time orbit information after the vehicle warehouse entry, extract the parking position that the vehicle obtained the vehicle at the final orbit stop point, carry out real-time intelligent maintenance to the parking stall in the garage, learn in real time which parking stall is used, which parking stall is idle. The parking guidance can be performed on the newly-parked vehicle by combining the modes of the indication board and the like, so that the vehicle owner is guided to go to an idle parking space to park, and the parking efficiency is improved; the parking space control system is combined to realize the advance reservation of partial parking spaces, so that the problem of difficult parking is solved, and the parking experience is improved; the large parking lot is matched with the car searching system to help car owners to search cars, and the problem that the car owners forget parking positions and need help is solved.
For another example, the method is used for detecting and tracking pedestrians in scenes such as shopping malls and the like, so that continuous real-time activity tracks and regional thermodynamic diagrams of the pedestrians are obtained, and the method plays an important role in statistics of passenger flow and drainage of merchants; when the method is applied to stores, the interested areas of customers are obtained, the preference of the customers is known, the areas where the commodities are located are popular, or the customers in the areas are easy to go to, and the like, so that the stores are guided to carry out goods placement and commodity selection.
For ease of understanding, the following terms are now explained:
1. point cloud: and converting pixel points on the image into a set of points in a three-dimensional space.
2. Global coordinate system (world coordinate system): a certain point in the real world is used as a coordinate point formed by coordinate setting.
3. Camera coordinate system: parallel to the x-axis and y-axis of the imaging plane coordinate system, the axes being the optical axis of the camera, and a coordinate system perpendicular to the image plane.
4. Image coordinate system: a rectangular coordinate system u-v defined on the image in pixels.
5. Internal reference of the camera: parameters describing the camera optical center, focal length, distortion, etc.
6. External reference of the camera: parameters describing the rotation and translation of the camera with respect to the reference coordinate system.
Example two
The embodiment provides a target detection device based on monocular vision, including:
the global modeling unit is used for modeling a scene by adopting an SLAM algorithm to obtain a global map model and acquiring external parameters of an SLAM camera;
the calibration unit is used for acquiring an image set acquired by a security camera, calibrating the security camera based on external parameters of the SLAM camera, and obtaining the external parameters of the security camera in a global map model;
the distance measurement unit is used for identifying a target in an image acquired by the security camera through a target detection technology and performing monocular distance measurement on the target;
and the coordinate conversion unit is used for converting the coordinates of the target in the camera coordinate system into a global coordinate system and displaying the coordinates in the global map model in real time.
Compared with the prior art, the beneficial effects of the target detection device based on monocular vision provided by the embodiment of the present invention are the same as those of the target detection method based on monocular vision provided by the first embodiment, and are not repeated herein.
EXAMPLE III
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to perform the steps of the above-mentioned monocular vision-based object detecting method.
Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the embodiment are the same as those of the monocular vision-based target detection method provided by the above technical scheme, and are not repeated herein.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the invention may be implemented by hardware instructions related to a program, the program may be stored in a computer-readable storage medium, and when executed, the program includes the steps of the method of the embodiment, and the storage medium may be: ROM/RAM, magnetic disks, optical disks, memory cards, and the like.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A target detection method based on monocular vision is characterized by comprising the following steps:
modeling a scene by adopting an SLAM algorithm to obtain a global map model, and acquiring external parameters of an SLAM camera;
acquiring an image set acquired by a security camera, and calibrating the security camera based on external parameters of the SLAM camera to obtain the external parameters of the security camera in a global map model;
identifying a target in an image acquired by a security camera through a target detection technology, and performing monocular distance measurement on the target;
and converting the coordinates of the target in the camera coordinate system into a global coordinate system, and displaying the coordinates in the global map model in real time.
2. The method of claim 1, wherein modeling the scene using a SLAM algorithm to obtain a global map model, and the method of obtaining external parameters of the SLAM camera comprises:
acquiring a point cloud reconstruction result, an SLAM camera pose set and a reconstruction image frame set of a scene by adopting an SLAM algorithm;
constructing a global map model according to the point cloud reconstruction result, and scanning and shooting a scene by adopting an SLAM camera to obtain a reconstructed image frame set consisting of a plurality of scene images;
and sequentially acquiring the feature points in the two adjacent scene images by using a FAST algorithm, solving feature descriptors of the feature points, sequentially matching similar feature points in the two adjacent scene images based on the feature descriptors of the feature points, constructing a unified camera coordinate system with the camera coordinate system of the first scene image as a reference, and converting to obtain a global coordinate system.
3. The method of claim 2, wherein the step of obtaining an image set acquired by a security camera, calibrating the security camera based on external parameters of the SLAM camera, and obtaining the external parameters of the security camera in a global map model comprises the steps of:
matching the image set acquired by the security camera with the reconstructed image frame set at the same name point;
solving the 3D space coordinate of the same-name point in the global coordinate system according to the SLAM camera pose set;
and calculating the pose parameter of the security camera relative to the SLAM camera based on the 3D space coordinate to obtain the external parameters of the security camera in a global map model.
4. The method of claim 3, wherein identifying the target in the image captured by the security camera through a target detection technique and performing monocular distance measurement on the target comprises:
positioning a target in a security camera picture through a target detection technology, and identifying an external rectangular frame of the target;
calculating the average physical size of the target by adopting a clustering algorithm based on a plurality of depth image sets which are shot by a depth camera and contain the target;
and measuring the distance of the target relative to the corresponding security camera by utilizing a monocular distance measuring principle according to the width and height parameters of the external rectangular frame, the focal length of the security camera and the average physical size.
5. The method of claim 4, further comprising, after the step of measuring the distance of the target relative to the corresponding security camera by monocular distance measuring principles:
and converting the target into a coordinate relative to a camera coordinate system based on the internal reference of the security camera and the coordinate of the central point of the external rectangular frame in an image coordinate system.
6. The method of claim 5, wherein the method for converting the coordinates of the target in the camera coordinate system into a global coordinate system and displaying the target in the global map model in real time comprises:
based on the external reference of the security camera in the global map model, converting the target coordinates relative to the camera coordinate system into 3D space coordinates relative to the target in the global coordinate system through a space coordinate system conversion matrix;
and displaying the 3D space coordinates of the target in the global coordinate system in real time in the global map model.
7. The method of any one of claims 1-6, further comprising:
and tracking the target in real time in a global map model by adopting a target tracking technology, and drawing a real-time position track and an active area thermodynamic diagram of the target.
8. An object detection device based on monocular vision, comprising:
the global modeling unit is used for modeling a scene by adopting an SLAM algorithm to obtain a global map model and acquiring external parameters of an SLAM camera;
the calibration unit is used for acquiring an image set acquired by a security camera, calibrating the security camera based on external parameters of the SLAM camera, and obtaining the external parameters of the security camera in a global map model;
the distance measurement unit is used for identifying a target in an image acquired by the security camera through a target detection technology and performing monocular distance measurement on the target;
and the coordinate conversion unit is used for converting the coordinates of the target in the camera coordinate system into a global coordinate system and displaying the coordinates in the global map model in real time.
9. The method of claim 8, further comprising:
and the target tracking unit tracks the target in real time in a global map model by adopting a target tracking technology and draws a real-time position track and an activity area thermodynamic diagram of the target.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of the claims 1 to 7.
CN202110356609.2A 2021-04-01 2021-04-01 Target detection method and device based on monocular vision Active CN113256731B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110356609.2A CN113256731B (en) 2021-04-01 2021-04-01 Target detection method and device based on monocular vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110356609.2A CN113256731B (en) 2021-04-01 2021-04-01 Target detection method and device based on monocular vision

Publications (2)

Publication Number Publication Date
CN113256731A true CN113256731A (en) 2021-08-13
CN113256731B CN113256731B (en) 2025-09-02

Family

ID=77181275

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110356609.2A Active CN113256731B (en) 2021-04-01 2021-04-01 Target detection method and device based on monocular vision

Country Status (1)

Country Link
CN (1) CN113256731B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113744560A (en) * 2021-09-15 2021-12-03 厦门科拓通讯技术股份有限公司 Automatic parking method and device for parking lot, server and machine-readable storage medium
CN114419167A (en) * 2022-01-20 2022-04-29 浙江吉利控股集团有限公司 Method, device, device and storage medium for determining external parameters of monocular camera
CN114608521A (en) * 2022-03-17 2022-06-10 北京市商汤科技开发有限公司 Monocular distance measuring method and device, electronic equipment and storage medium
CN117197245A (en) * 2023-09-26 2023-12-08 杭州萤石软件有限公司 Pose restoration method and device
EP4312189A1 (en) * 2022-07-29 2024-01-31 Sick Ag System for determining the distance of an object
CN118537935A (en) * 2024-05-30 2024-08-23 西藏大洋泊车科技有限公司 Automatic control system for three-dimensional parking lot
CN120233371A (en) * 2025-05-29 2025-07-01 江苏濠汉信息技术有限公司 A method for measuring the spatial distance between a hazard source and a target using monocular vision

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803271A (en) * 2016-12-23 2017-06-06 成都通甲优博科技有限责任公司 A kind of camera marking method and device of vision guided navigation unmanned plane
CN108447097A (en) * 2018-03-05 2018-08-24 清华-伯克利深圳学院筹备办公室 Depth camera scaling method, device, electronic equipment and storage medium
CN108648240A (en) * 2018-05-11 2018-10-12 东南大学 Based on a non-overlapping visual field camera posture scaling method for cloud characteristics map registration
CN110458897A (en) * 2019-08-13 2019-11-15 北京积加科技有限公司 Multi-camera automatic calibration method and system, monitoring method and system
CN110619662A (en) * 2019-05-23 2019-12-27 深圳大学 Monocular vision-based multi-pedestrian target space continuous positioning method and system
WO2020168667A1 (en) * 2019-02-18 2020-08-27 广州小鹏汽车科技有限公司 High-precision localization method and system based on shared slam map
CN112284390A (en) * 2020-10-14 2021-01-29 南京工程学院 VSLAM-based indoor high-precision positioning and navigation method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803271A (en) * 2016-12-23 2017-06-06 成都通甲优博科技有限责任公司 A kind of camera marking method and device of vision guided navigation unmanned plane
CN108447097A (en) * 2018-03-05 2018-08-24 清华-伯克利深圳学院筹备办公室 Depth camera scaling method, device, electronic equipment and storage medium
CN108648240A (en) * 2018-05-11 2018-10-12 东南大学 Based on a non-overlapping visual field camera posture scaling method for cloud characteristics map registration
WO2020168667A1 (en) * 2019-02-18 2020-08-27 广州小鹏汽车科技有限公司 High-precision localization method and system based on shared slam map
CN110619662A (en) * 2019-05-23 2019-12-27 深圳大学 Monocular vision-based multi-pedestrian target space continuous positioning method and system
CN110458897A (en) * 2019-08-13 2019-11-15 北京积加科技有限公司 Multi-camera automatic calibration method and system, monitoring method and system
CN112284390A (en) * 2020-10-14 2021-01-29 南京工程学院 VSLAM-based indoor high-precision positioning and navigation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭会文;吴新宇;苏士娟;傅睿卿;: "移动相机下基于三维背景估计的运动目标检测", 仪器仪表学报, no. 10, 15 October 2017 (2017-10-15) *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113744560A (en) * 2021-09-15 2021-12-03 厦门科拓通讯技术股份有限公司 Automatic parking method and device for parking lot, server and machine-readable storage medium
CN114419167A (en) * 2022-01-20 2022-04-29 浙江吉利控股集团有限公司 Method, device, device and storage medium for determining external parameters of monocular camera
CN114608521A (en) * 2022-03-17 2022-06-10 北京市商汤科技开发有限公司 Monocular distance measuring method and device, electronic equipment and storage medium
CN114608521B (en) * 2022-03-17 2024-06-07 北京市商汤科技开发有限公司 Monocular ranging method and device, electronic equipment and storage medium
EP4312189A1 (en) * 2022-07-29 2024-01-31 Sick Ag System for determining the distance of an object
CN117197245A (en) * 2023-09-26 2023-12-08 杭州萤石软件有限公司 Pose restoration method and device
CN118537935A (en) * 2024-05-30 2024-08-23 西藏大洋泊车科技有限公司 Automatic control system for three-dimensional parking lot
CN120233371A (en) * 2025-05-29 2025-07-01 江苏濠汉信息技术有限公司 A method for measuring the spatial distance between a hazard source and a target using monocular vision
CN120233371B (en) * 2025-05-29 2025-09-02 江苏濠汉信息技术有限公司 Method for measuring spatial distance between hazard source and target using monocular vision

Also Published As

Publication number Publication date
CN113256731B (en) 2025-09-02

Similar Documents

Publication Publication Date Title
Toft et al. Long-term visual localization revisited
CN113256731A (en) Target detection method and device based on monocular vision
US11205276B2 (en) Object tracking method, object tracking device, electronic device and storage medium
Toulminet et al. Vehicle detection by means of stereo vision-based obstacles features extraction and monocular pattern analysis
EP3633615A1 (en) Deep learning network and average drift-based automatic vessel tracking method and system
US20040125207A1 (en) Robust stereo-driven video-based surveillance
CN109099929B (en) Intelligent vehicle positioning device and method based on scene fingerprint
JP2014504410A (en) Detection and tracking of moving objects
CN112598743B (en) Pose estimation method and related device for monocular vision image
CN109871739B (en) Automatic target detection and space positioning method for mobile station based on YOLO-SIOCTL
KR101645959B1 (en) The Apparatus and Method for Tracking Objects Based on Multiple Overhead Cameras and a Site Map
CN112541403B (en) Indoor personnel falling detection method by utilizing infrared camera
CN111062971B (en) Deep learning multi-mode-based mud head vehicle tracking method crossing cameras
CN107506753B (en) Multi-vehicle tracking method for dynamic video monitoring
US20220164595A1 (en) Method, electronic device and storage medium for vehicle localization
WO2020155075A1 (en) Navigation apparatus and method, and related device
CN110992424A (en) Positioning method and system based on binocular vision
Revaud et al. Robust automatic monocular vehicle speed estimation for traffic surveillance
Lashkov et al. Edge-computing-empowered vehicle tracking and speed estimation against strong image vibrations using surveillance monocular camera
CN114782500B (en) Karting racing behavior analysis method based on multi-target tracking
Vuong et al. Toward planet-wide traffic camera calibration
CN113870351B (en) Indoor large scene pedestrian fingerprint positioning method based on monocular vision
KR102249380B1 (en) System for generating spatial information of CCTV device using reference image information
CN118485898A (en) A method and device for detecting deformation of traffic tracks based on multimodal three-dimensional point cloud fusion
Le et al. TQU-SLAM Benchmark Feature-based Dataset for Building Monocular VO

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant