[go: up one dir, main page]

CN114185073A - Pose display method, device and system - Google Patents

Pose display method, device and system Download PDF

Info

Publication number
CN114185073A
CN114185073A CN202111350621.9A CN202111350621A CN114185073A CN 114185073 A CN114185073 A CN 114185073A CN 202111350621 A CN202111350621 A CN 202111350621A CN 114185073 A CN114185073 A CN 114185073A
Authority
CN
China
Prior art keywords
positioning
target
image
map
pose
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111350621.9A
Other languages
Chinese (zh)
Inventor
李佳宁
李�杰
毛慧
浦世亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202111350621.9A priority Critical patent/CN114185073A/en
Publication of CN114185073A publication Critical patent/CN114185073A/en
Priority to PCT/CN2022/131134 priority patent/WO2023083256A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S19/00Satellite radio beacon positioning systems; Determining position, velocity or attitude using signals transmitted by such systems
    • G01S19/38Determining a navigation solution using signals transmitted by a satellite radio beacon positioning system
    • G01S19/39Determining a navigation solution using signals transmitted by a satellite radio beacon positioning system the satellite radio beacon positioning system transmitting time-stamped messages, e.g. GPS [Global Positioning System], GLONASS [Global Orbiting Navigation Satellite System] or GALILEO
    • G01S19/42Determining position
    • G01S19/45Determining position by combining measurements of signals from the satellite radio beacon positioning system with a supplementary measurement
    • G01S19/47Determining position by combining measurements of signals from the satellite radio beacon positioning system with a supplementary measurement the supplementary measurement being an inertial measurement, e.g. tightly coupled inertial
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C3/00Measuring distances in line of sight; Optical rangefinders

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application provides a pose display method, a pose display device and a pose display system, wherein the method comprises the following steps: the method comprises the steps that terminal equipment obtains a target image of a target scene and motion data of the terminal equipment, and a self-positioning track is determined based on the target image and the motion data; selecting partial images from the multi-frame images as images to be detected, and sending the images to be detected and the self-positioning track to a server; the server generates a fusion positioning track based on the image to be detected and the self-positioning track, wherein the fusion positioning track comprises a plurality of fusion positioning poses; and aiming at each fusion positioning pose, the server determines a target positioning pose corresponding to the fusion positioning pose and displays the target positioning pose. According to the technical scheme, the positioning function with high frame rate and high precision is achieved, the terminal equipment only sends the self-positioning track and the image to be detected, the data volume of network transmission is reduced, and the computing resource consumption and the storage resource consumption of the terminal equipment are reduced.

Description

Pose display method, device and system
Technical Field
The application relates to the field of computer vision, in particular to a pose display method, a pose display device and a pose display system.
Background
The GPS (Global Positioning System) is a high-precision radio navigation Positioning System based on artificial earth satellites, and can provide accurate geographic position, vehicle speed and time information anywhere in the world and in the near-earth space. The Beidou satellite navigation system consists of a space section, a ground section and a user section, can provide high-precision, high-reliability positioning, navigation and time service for users all day long in the global range, and has regional navigation, positioning and time service capabilities.
Because the terminal equipment is provided with the GPS or the Beidou satellite navigation system, the GPS or the Beidou satellite navigation system can be adopted to position the terminal equipment when the terminal equipment needs to be positioned. Under the outdoor environment, because GPS signal or big dipper signal are better, can adopt GPS or big dipper satellite navigation system to carry out accurate positioning to terminal equipment. However, in an indoor environment, the GPS or beidou satellite navigation system cannot accurately position the terminal device because the GPS signal or beidou signal is poor. For example, in energy industries such as coal, electric power, petrochemical industry, and the like, the positioning needs are more and more, and these positioning needs are generally in indoor environments, and due to the problems such as signal shielding, accurate positioning of terminal equipment cannot be performed.
Disclosure of Invention
The application provides a pose display method, which is applied to a cloud side management system, wherein the cloud side management system comprises terminal equipment and a server, the server comprises a three-dimensional visual map of a target scene, and the method comprises the following steps:
the method comprises the steps that in the moving process of a target scene, the terminal equipment obtains a target image of the target scene and motion data of the terminal equipment, and determines a self-positioning track of the terminal equipment based on the target image and the motion data; if the target image comprises a plurality of frames of images, selecting a partial image from the plurality of frames of images as an image to be detected, and sending the image to be detected and the self-positioning track to a server;
the server generates a fused positioning track of the terminal equipment in the three-dimensional visual map based on the image to be detected and the self-positioning track, wherein the fused positioning track comprises a plurality of fused positioning poses;
and aiming at each fusion positioning pose in the fusion positioning track, the server determines a target positioning pose corresponding to the fusion positioning pose and displays the target positioning pose.
The application provides a cloud management system, cloud management system includes terminal equipment and server, the server includes the three-dimensional visual map of target scene, wherein:
the terminal device is used for acquiring a target image of a target scene and motion data of the terminal device in the moving process of the target scene, and determining a self-positioning track of the terminal device based on the target image and the motion data; if the target image comprises a plurality of frames of images, selecting a partial image from the plurality of frames of images as an image to be detected, and sending the image to be detected and the self-positioning track to a server;
the server is used for generating a fused positioning track of the terminal equipment in the three-dimensional visual map based on the image to be detected and the self-positioning track, and the fused positioning track comprises a plurality of fused positioning poses; and aiming at each fusion positioning pose in the fusion positioning track, determining a target positioning pose corresponding to the fusion positioning pose, and displaying the target positioning pose.
The application provides a position appearance display device, is applied to the server among the cloud limit management system, the server includes the three-dimensional visual map of target scene, the device includes:
the acquisition module is used for acquiring an image to be detected and a self-positioning track; the self-positioning track is determined by terminal equipment based on a target image of the target scene and motion data of the terminal equipment, and the image to be detected is a partial image in a multi-frame image included in the target image;
a generating module, configured to generate a fused positioning track of the terminal device in the three-dimensional visual map based on the image to be detected and the self-positioning track, where the fused positioning track includes multiple fused positioning poses;
and the display module is used for determining a target positioning pose corresponding to each fusion positioning pose in the fusion positioning track and displaying the target positioning pose.
According to the technical scheme, the cloud-edge combined positioning and displaying method is provided, terminal equipment at the edge end collects target images and motion data, high-frame-rate self-positioning is carried out according to the target images and the motion data, and a high-frame-rate self-positioning track is obtained. The cloud server receives an image to be detected and a self-positioning track sent by the terminal equipment, obtains a high-frame-rate fusion positioning track according to the image to be detected and the self-positioning track, namely the high-frame-rate fusion positioning track in the three-dimensional visual map, realizes a high-frame-rate and high-precision positioning function, realizes a high-precision, low-cost and easily-deployed indoor positioning function, is an indoor positioning mode based on vision, and can display the fusion positioning track. In the above mode, the terminal device calculates the self-positioning track with a high frame rate, and only sends the self-positioning track and a small number of images to be detected, so as to reduce the data volume transmitted by the network. And global positioning is carried out on the server, so that the consumption of computing resources and the consumption of storage resources of the terminal equipment are reduced. The device can be applied to energy industries such as coal, electric power and petrochemical industry, indoor positioning of personnel (such as workers and inspection personnel) is achieved, position information of the personnel is rapidly obtained, and personnel safety is guaranteed.
Drawings
Fig. 1 is a schematic flowchart of a pose display method according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a cloud edge management system according to an embodiment of the present application;
FIG. 3 is a schematic flow chart for determining a self-positioning trajectory according to an embodiment of the present application;
FIG. 4 is a schematic flow chart illustrating a process for determining a global localization track according to an embodiment of the present application;
FIG. 5 is a schematic illustration of a self-localizing track, a global localizing track, and a fused localizing track;
FIG. 6 is a schematic flow chart illustrating a process for determining a fused localization track according to an embodiment of the present application;
fig. 7 is a schematic structural view of a pose display apparatus in an embodiment of the present application.
Detailed Description
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is meant to encompass any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "when … …" or "in response to a determination".
The embodiment of the application provides a pose display method, which can be applied to a cloud management system, where the cloud management system may include a terminal device (i.e., a terminal device at an edge end) and a server (i.e., a server at a cloud end), and the server may include a three-dimensional visual map of a target scene (e.g., an indoor environment, an outdoor environment, etc.), as shown in fig. 1, a flowchart of the pose display method is shown, and the method may include:
step 101, in the moving process of a target scene, the terminal device obtains a target image of the target scene and motion data of the terminal device, and determines a self-positioning track of the terminal device based on the target image and the motion data.
Illustratively, if the target image comprises a multi-frame image, the terminal device traverses the current frame image from the multi-frame image; determining a self-positioning pose corresponding to the current frame image based on a self-positioning pose corresponding to a K frame image in front of the current frame image, a map position of the terminal equipment in a self-positioning coordinate system and the motion data; and generating a self-positioning track of the terminal equipment in a self-positioning coordinate system based on the self-positioning poses corresponding to the multi-frame images.
For example, if the current frame image is a key image, the map position in the self-localization coordinate system may be generated based on the current position of the terminal device (i.e., the position corresponding to the current frame image). If the current frame image is a non-key image, the map position in the self-positioning coordinate system does not need to be generated based on the current position of the terminal equipment.
And if the number of the matched feature points between the current frame image and the previous frame image of the current frame image does not reach a preset threshold value, determining that the current frame image is a key image. And if the number of the matched feature points between the current frame image and the previous frame image of the current frame image reaches a preset threshold value, determining that the current frame image is a non-key image.
And 102, if the target image comprises a plurality of frames of images, the terminal equipment selects a part of the images from the plurality of frames of images as an image to be detected, and sends the image to be detected and the self-positioning track to a server.
For example, the terminal device may select M frames of images from the multiple frames of images as the image to be measured, where M may be a positive integer, such as 1, 2, 3, and the like. Obviously, the terminal device sends the server a part of the images to be detected in the multi-frame images, so that the data volume of network transmission can be reduced, and the network bandwidth resources are saved.
103, the server generates a fusion positioning track of the terminal device in the three-dimensional visual map based on the image to be detected and the self-positioning track, wherein the fusion positioning track can comprise a plurality of fusion positioning poses.
For example, the server may determine a target map point corresponding to the image to be measured from a three-dimensional visual map of the target scene, and determine a global positioning track of the terminal device in the three-dimensional visual map based on the target map point. Then, the server generates a fused positioning track of the terminal device in the three-dimensional visual map based on the self-positioning track and the global positioning track. For example, the frame rate of the fusion localization poses included in the fusion localization track may be greater than the frame rate of the global localization poses included in the global localization track, that is, the frame rate of the fusion localization track is higher than the frame rate of the global localization track, the fusion localization track may be a high frame rate pose in a three-dimensional visual map, the global localization track may be a low frame rate pose in the three-dimensional visual map, and the frame rate of the fusion localization track is higher than the frame rate of the global localization track, which indicates that the number of the fusion localization poses is greater than the number of the global localization poses. Further, the frame rate of the fused localization pose included by the fused localization track may be equal to the frame rate of the self-localization pose included by the self-localization track, i.e., the frame rate of the fused localization track is equal to the frame rate of the self-localization track, i.e., the self-localization track may be a high frame rate pose. And the frame rate of the fusion positioning tracks is equal to the frame rate of the self-positioning tracks, and the number of the fusion positioning poses is equal to the number of the self-positioning poses.
In one possible embodiment, the three-dimensional visual map may include, but is not limited to, at least one of: the pose matrix corresponding to the sample image, the sample global descriptor corresponding to the sample image, the sample local descriptor corresponding to the characteristic point in the sample image and the map point information. The server determines a target map point corresponding to the image to be detected from a three-dimensional visual map of a target scene, and determines a global positioning track of the terminal device in the three-dimensional visual map based on the target map point, which may include but is not limited to: and aiming at each frame of image to be detected, selecting candidate sample images from the multi-frame sample images based on the similarity between the image to be detected and the multi-frame sample images corresponding to the three-dimensional visual map. Acquiring a plurality of characteristic points from the image to be detected; and for each feature point, determining a target map point corresponding to the feature point from a plurality of map points corresponding to the candidate sample image. And determining the global positioning pose in the three-dimensional visual map corresponding to the image to be detected based on the plurality of feature points and the target map points corresponding to the plurality of feature points. And generating a global positioning track of the terminal equipment in the three-dimensional visual map based on the global positioning poses corresponding to all the images to be detected.
The server selects a candidate sample image from the multiple frame sample images based on the similarity between the image to be detected and the multiple frame sample images corresponding to the three-dimensional visual map, and the selecting may include: determining a global descriptor to be detected corresponding to the image to be detected, and determining the distance between the global descriptor to be detected and a sample global descriptor corresponding to each frame of sample image corresponding to the three-dimensional visual map; the three-dimensional visual map at least comprises a sample global descriptor corresponding to each frame of sample image. Selecting candidate sample images from the multi-frame sample images based on the distance between the global descriptor to be detected and each sample global descriptor; the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image is the minimum distance; or, the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image is smaller than the distance threshold.
The server determines a global descriptor to be tested corresponding to the image to be tested, which may include but is not limited to: determining a bag-of-words vector corresponding to the image to be detected based on the trained dictionary model, and determining the bag-of-words vector as a global descriptor to be detected corresponding to the image to be detected; or inputting the image to be detected to the trained deep learning model to obtain a target vector corresponding to the image to be detected, and determining the target vector as a global descriptor to be detected corresponding to the image to be detected. Of course, the above is only an example of determining the global descriptor to be tested, and the method is not limited thereto.
The server determines a target map point corresponding to the feature point from a plurality of map points corresponding to the candidate sample image, which may include but is not limited to: and determining a local descriptor to be detected corresponding to the feature point, wherein the local descriptor to be detected is used for representing the feature vector of the image block where the feature point is located, and the image block can be located in the image to be detected. Determining the distance between the local descriptor to be tested and the sample local descriptor corresponding to each map point corresponding to the candidate sample image; wherein, the three-dimensional visual map at least comprises a sample local descriptor corresponding to each map point corresponding to the candidate sample image. Then, a target map point can be selected from a plurality of map points corresponding to the candidate sample image based on the distance between the local descriptor to be detected and each sample local descriptor; the distance between the local descriptor to be detected and the sample local descriptor corresponding to the target map point may be a minimum distance, and the minimum distance is smaller than a distance threshold.
The server generates a fused positioning track of the terminal device in the three-dimensional visual map based on the self-positioning track and the global positioning track, which may include but is not limited to: the server can select N self-positioning poses corresponding to the target time period from all the self-positioning poses included in the self-positioning track, and select P global positioning poses corresponding to the target time period from all the global positioning poses included in the global positioning track; wherein N is greater than P. And determining N fusion positioning poses corresponding to the N self-positioning poses based on the N self-positioning poses and the P global positioning poses, wherein the N self-positioning poses correspond to the N fusion positioning poses one by one. And generating a fusion positioning track of the terminal equipment in the three-dimensional visual map based on the N fusion positioning poses.
After generating the fused positioning track of the terminal device in the three-dimensional visual map based on the self-positioning track and the global positioning track, the server may further select an initial fused positioning pose from the fused positioning track, and select an initial self-positioning pose corresponding to the initial fused positioning pose from the self-positioning track. And selecting a target self-positioning pose from the self-positioning track, and determining a target fusion positioning pose based on the initial fusion positioning pose, the initial self-positioning pose and the target self-positioning pose. And then, generating a new fusion positioning track based on the target fusion positioning pose and the fusion positioning track to replace the original fusion positioning track.
And step 104, aiming at each fusion positioning pose in the fusion positioning track, the server determines a target positioning pose corresponding to the fusion positioning pose and displays the target positioning pose.
For example, the server may determine the fused positioning pose as a target positioning pose and display the target positioning pose in a three-dimensional visual map. Or the server converts the fusion positioning pose into a target positioning pose in the three-dimensional visual map based on a target transformation matrix between the three-dimensional visual map and the three-dimensional visual map, and displays the target positioning pose through the three-dimensional visual map.
For example, the determination manner of the target transformation matrix between the three-dimensional visual map and the three-dimensional visual map may include, but is not limited to: for each of a plurality of calibration points, a coordinate pair corresponding to the calibration point may be determined, where the coordinate pair may include a position coordinate of the calibration point in a three-dimensional visual map and a position coordinate of the calibration point in the three-dimensional visual map; and determining the target transformation matrix based on the coordinate pairs corresponding to the plurality of calibration points. Or acquiring an initial transformation matrix, mapping the position coordinates in the three-dimensional visual map into mapping coordinates in the three-dimensional visual map based on the initial transformation matrix, and determining whether the initial transformation matrix is converged based on the relation between the mapping coordinates and actual coordinates in the three-dimensional visual map; if yes, determining the initial transformation matrix as a target transformation matrix; if not, the initial transformation matrix is adjusted, the adjusted transformation matrix is used as the initial transformation matrix, the operation of mapping the position coordinates in the three-dimensional visual map into the mapping coordinates in the three-dimensional visual map based on the initial transformation matrix is returned, and the like is performed until the target transformation matrix is obtained. Or sampling the three-dimensional visual map to obtain a first point cloud corresponding to the three-dimensional visual map; sampling the three-dimensional visual map to obtain a second point cloud corresponding to the three-dimensional visual map; and registering the first point cloud and the second point cloud by adopting an ICP (inductively coupled plasma) algorithm to obtain a target transformation matrix between the three-dimensional visual map and the three-dimensional visual map.
According to the technical scheme, the cloud-edge combined positioning and displaying method is provided, terminal equipment at the edge end collects target images and motion data, high-frame-rate self-positioning is carried out according to the target images and the motion data, and a high-frame-rate self-positioning track is obtained. The cloud server receives an image to be detected and a self-positioning track sent by the terminal equipment, obtains a high-frame-rate fusion positioning track according to the image to be detected and the self-positioning track, namely the high-frame-rate fusion positioning track in the three-dimensional visual map, realizes a high-frame-rate and high-precision positioning function, realizes a high-precision, low-cost and easily-deployed indoor positioning function, is a vision-based indoor positioning mode, and can display the fusion positioning track in the three-dimensional visual map. In the above mode, the terminal device calculates the self-positioning track with a high frame rate, and only sends the self-positioning track and a small number of images to be detected, so as to reduce the data volume transmitted by the network. And global positioning is carried out on the server, so that the consumption of computing resources and the consumption of storage resources of the terminal equipment are reduced. The device can be applied to energy industries such as coal, electric power and petrochemical industry, indoor positioning of personnel (such as workers and inspection personnel) is achieved, position information of the personnel is rapidly obtained, and personnel safety is guaranteed.
The following describes a pose display method according to an embodiment of the present application with reference to specific embodiments.
The embodiment of the application provides a cloud-edge combined visual positioning and displaying method. The target scene may be an indoor environment, that is, when the terminal device moves in the indoor environment, the server determines the fusion positioning track of the terminal device in the three-dimensional visual map, that is, an indoor positioning mode based on the vision is proposed, and of course, the target scene may also be an outdoor environment, which is not limited to this.
Referring to fig. 2, a schematic structural diagram of the cloud-edge management system is shown, where the cloud-edge management system may include a terminal device (i.e., a terminal device at an edge end) and a server (i.e., a server at a cloud end), and of course, the cloud-edge management system may further include other devices, such as a wireless base station and a router, which is not limited thereto. The server can include a three-dimensional visual map of a target scene and a three-dimensional visual map corresponding to the three-dimensional visual map, and can generate a fusion positioning track of the terminal device in the three-dimensional visual map and display the fusion positioning track in the three-dimensional visual map (the fusion positioning track needs to be converted into a track capable of being displayed in the three-dimensional visual map), so that a manager can view the fusion positioning track in the three-dimensional visual map through a web end.
The terminal device may include a visual sensor, a motion sensor, and the like, where the visual sensor may be a camera, and the visual sensor is configured to acquire an image of a target scene during movement of the terminal device, and for convenience of distinguishing, the image is recorded as a target image, and the target image includes multiple frames of images (i.e., multiple frames of real-time images during movement of the terminal device). The motion sensor may be, for example, an IMU (Inertial Measurement Unit), which is a Measurement device including a gyroscope and an accelerometer, and is used to acquire motion data of the terminal device, such as acceleration and angular velocity, during movement of the terminal device.
For example, the terminal device may be a wearable device (e.g., a video helmet, a smart watch, smart glasses, etc.), and the visual sensor and the motion sensor are disposed on the wearable device; or the terminal equipment is a recorder (for example, the terminal equipment is carried by a worker during work and has the functions of collecting video and audio in real time, taking pictures, recording, talkbacking, positioning and the like), and the visual sensor and the motion sensor are arranged on the recorder; alternatively, the terminal device is a camera (such as a split camera), and the vision sensor and the motion sensor are disposed on the camera. Of course, the above is only an example, and the type of the terminal device is not limited, for example, the terminal device may also be a smartphone, and the like, as long as a vision sensor and a motion sensor are deployed.
For example, the terminal device may acquire the target image and the motion data, perform high-frame-rate self-positioning according to the target image and the motion data, and obtain a high-frame-rate self-positioning trajectory (e.g., a 6DOF (six degrees of freedom) self-positioning trajectory), where the self-positioning trajectory may include multiple self-positioning poses, and since the self-positioning trajectory is a high-frame-rate self-positioning trajectory, the number of self-positioning poses in the self-positioning trajectory is large.
The terminal device can select a part of images from multi-frame images of the target image as images to be detected, and sends the self-positioning track with the high frame rate and the images to be detected to the server. The server can obtain a self-positioning track and an image to be detected, the server can perform global positioning at a low frame rate according to the image to be detected and a three-dimensional visual map of a target scene, and obtain a global positioning track (namely the global positioning track of the image to be detected in the three-dimensional visual map) at the low frame rate, the global positioning track can comprise a plurality of global positioning poses, and the global positioning track is the global positioning track at the low frame rate, so that the number of the global positioning poses in the global positioning track is small.
Based on the high-frame-rate self-positioning track and the low-frame-rate global positioning track, the server can fuse the high-frame-rate self-positioning track and the low-frame-rate global positioning track to obtain a high-frame-rate fusion positioning track, namely a high-frame-rate fusion positioning track in the three-dimensional visual map, so that a high-frame-rate global positioning result is obtained. The fusion positioning track can comprise a plurality of fusion positioning poses, and the fusion positioning track is a high-frame-rate fusion positioning track, so that the number of the fusion positioning poses in the fusion positioning track is large.
In the above embodiments, the pose (e.g., self-positioning pose, global positioning pose, fusion positioning pose, etc.) may be a position and a pose, and is generally represented by a rotation matrix and a translation vector, which is not limited to this.
In summary, in this embodiment, based on the target image and the motion data, a globally unified high frame rate visual positioning function can be implemented, a fused positioning track (e.g., 6DOF pose) of a high frame rate in the three-dimensional visual map is obtained, and the method is a high frame rate globally consistent positioning method, and is an indoor positioning function that is high in frame rate, high in precision, low in cost, and easy to deploy of the terminal device, and is implemented.
The above process of the embodiment of the present application is described in detail below with reference to specific application scenarios.
Firstly, self-positioning of the terminal equipment. The terminal device is an electronic device with a vision sensor and a motion sensor, and can acquire a target image (such as a continuous video image) of a target scene and motion data (such as IMU data) of the terminal device and determine a self-positioning track of the terminal device based on the target image and the motion data.
The target image may include multiple frames of images, and for each frame of image, the terminal device determines a self-positioning pose corresponding to the image, that is, the multiple frames of image correspond to multiple self-positioning poses, and the self-positioning trajectory of the terminal device may include multiple self-positioning poses, which may be understood as a set of multiple self-positioning poses.
The method comprises the steps that for a first frame image in a multi-frame image, the terminal equipment determines a self-positioning pose corresponding to the first frame image, for a second frame image in the multi-frame image, the terminal equipment determines a self-positioning pose corresponding to the second frame image, and the like. The self-positioning pose corresponding to the first frame image can be a coordinate origin of a reference coordinate system (namely, a self-positioning coordinate system), the self-positioning pose corresponding to the second frame image is a pose point in the reference coordinate system, namely, a pose point relative to the coordinate origin (namely, the self-positioning pose corresponding to the first frame image), the self-positioning pose corresponding to the third frame image is a pose point in the reference coordinate system, namely, a pose point relative to the coordinate origin, and so on, and the self-positioning poses corresponding to the frames of images are pose points in the reference coordinate system.
In summary, after obtaining the self-positioning poses corresponding to each frame of image, the self-positioning poses can be combined into a self-positioning track in the reference coordinate system, and the self-positioning track comprises the self-positioning poses.
In one possible embodiment, as shown in fig. 3, the self-localization trajectory is determined by the following steps:
step 301, acquiring a target image of a target scene and motion data of the terminal device.
Step 302, traversing the current frame image from the multiple frame images if the target image comprises the multiple frame images.
When the first frame image is traversed from the multiple frame images as the current frame image, the self-positioning pose corresponding to the first frame image may be a coordinate origin of a reference coordinate system (i.e., a self-positioning coordinate system), that is, the self-positioning pose coincides with the coordinate origin. When the second frame image is traversed from the multi-frame image as the current frame image, the self-positioning pose corresponding to the second frame image can be determined by adopting the subsequent steps. When a third frame image is traversed from the multi-frame image to serve as a current frame image, the self-positioning pose corresponding to the third frame image can be determined by adopting the subsequent steps, and by analogy, each frame image can be traversed to serve as the current frame image.
Step 303, calculating a feature point association relationship between the current frame image and the previous frame image of the current frame image by using an optical flow algorithm. The optical flow algorithm is a method for finding out the corresponding relation between a current frame image and a previous frame image by using the change of pixels in the current frame image in a time domain and the correlation between the previous frame images, so as to calculate the motion information of an object between the current frame image and the previous frame image.
And step 304, determining whether the current frame image is a key image or not based on the number of the matched feature points between the current frame image and the previous frame image. For example, if the number of matching feature points between the current frame image and the previous frame image does not reach the preset threshold, the method is used to indicate that the change between the current frame image and the previous frame image is large, and the number of matching feature points between the two frame images is small, and then it is determined that the current frame image is a key image, and step 305 is executed. If the number of the matching feature points between the current frame image and the previous frame image reaches the preset threshold, the method is used for indicating that the change of the current frame image and the previous frame image is small, so that the number of the matching feature points between the two frame images is large, determining that the current frame image is a non-key image, and executing step 306.
For example, the matching ratio between the current frame image and the previous frame image, for example, the ratio of the number of matching feature points to the total number of feature points, may also be calculated based on the number of matching feature points between the current frame image and the previous frame image. And if the matching proportion does not reach the preset proportion, determining that the current frame image is the key image, and if the matching proportion reaches the preset proportion, determining that the current frame image is the non-key image.
Step 305, if the current frame image is a key image, generating a map position in a self-positioning coordinate system (i.e. a reference coordinate system) based on the current position of the terminal device (i.e. the position where the current frame image is acquired by the terminal device), i.e. generating a new 3D map position. If the current frame image is a non-key image, the map position in the self-positioning coordinate system does not need to be generated based on the current position of the terminal equipment.
Step 306, determining a self-positioning pose corresponding to the current frame image based on a self-positioning pose corresponding to a K frame image in front of the current frame image, a map position of the terminal device in a self-positioning coordinate system and motion data of the terminal device, wherein K may be a positive integer, may be a value configured according to experience, and is not limited.
For example, all motion data between a previous frame image and a current frame image of the current frame image may be pre-integrated to obtain an inertial measurement constraint between the two frame images. Based on the self-positioning pose and motion data (such as speed, acceleration, angular velocity and the like) corresponding to the K frame images (such as a sliding window) in front of the current frame image, the map position in a self-positioning coordinate system and inertial measurement constraints (such as speed, acceleration, angular velocity and the like between the previous frame image and the current frame image), the self-positioning pose corresponding to the current frame image can be obtained by adopting bundling optimization and updating in a combined optimization mode, and the bundling optimization process is not limited.
For example, in order to maintain the scale of the variable to be optimized, a certain frame and a part of the map position in the sliding window may be marginalized, and the constraint information is retained in a priori form.
For example, the terminal device may determine the self-positioning pose by using a VIO (Visual Inertial odometer) algorithm, that is, input data of the VIO algorithm is target image and motion data, output data of the VIO algorithm is the self-positioning pose, for example, the VIO algorithm may obtain the self-positioning pose based on the target image and the motion data, for example, the VIO algorithm performs steps 301 to 306 to obtain the self-positioning pose. The VIO algorithm may include, but is not limited to, VINS (Visual Inertial Navigation Systems), SVO (Semi-direct Visual odometer), MSCKF (Multi State constrained Kalman Filter), and the like, and is not limited herein as long as a self-positioning pose can be obtained.
And 307, generating a self-positioning track of the terminal device in a self-positioning coordinate system based on self-positioning poses corresponding to the multi-frame images, wherein the self-positioning track comprises a plurality of self-positioning poses in the self-positioning coordinate system.
Therefore, the terminal device can obtain the self-positioning track in the self-positioning coordinate system, the self-positioning track can comprise self-positioning poses corresponding to multiple frames of images, obviously, the vision sensor can collect a large number of images, so that the terminal device can obtain the self-positioning poses corresponding to the images, namely, the self-positioning track can comprise a large number of self-positioning poses, namely, the terminal device can obtain the self-positioning track with a high frame rate.
And secondly, data transmission. If the target image comprises a plurality of frames of images, the terminal device can select a part of the images from the plurality of frames of images as an image to be detected and send the image to be detected and the self-positioning track to the server. For example, the terminal device sends the self-positioning track and the image to be measured to the server through a wireless network (e.g. 4G, 5G, Wifi, etc.), and the frame rate of the image to be measured is low, so that the occupied network bandwidth is small.
And thirdly, three-dimensional visual map of the target scene. The three-dimensional visual map of the target scene needs to be constructed in advance, and the three-dimensional visual map is stored in the server, so that the server can perform global positioning based on the three-dimensional visual map. The three-dimensional visual map is a storage method for image information of a target scene, And may be configured to collect multiple frame sample images of the target scene, And construct the three-dimensional visual map based on the sample images, for example, based on the multiple frame sample images of the target scene, the three-dimensional visual map of the target scene may be constructed by using visual Mapping algorithms such as SFM (Structure From Motion) or SLAM (Simultaneous Localization And Mapping), And the like, And the construction method is not limited.
After obtaining the three-dimensional visual map of the target scene, the three-dimensional visual map may include the following information:
pose of sample image: the sample image is a representative image when the three-dimensional visual map is constructed, that is, the three-dimensional visual map can be constructed based on the sample image, the pose matrix of the sample image (which may be referred to as sample image pose for short) can be stored in the three-dimensional visual map, and the three-dimensional visual map can include the pose of the sample image.
Sample global descriptor: for each frame of sample image, the sample image may correspond to an image global descriptor, and the image global descriptor is denoted as a sample global descriptor, where the sample global descriptor represents the sample image by using a high-dimensional vector, and the sample global descriptor is used to distinguish image features of different sample images.
For each frame of sample image, determining a bag-of-words vector corresponding to the sample image based on the trained dictionary model, and determining the bag-of-words vector as a sample global descriptor corresponding to the sample image. For example, a Bag of Words (Bag of Words) method is a way for determining a global descriptor, and in the Bag of Words method, a Bag of Words vector can be constructed, which is a vector representation method for image similarity detection, and the Bag of Words vector can be used as a sample global descriptor corresponding to a sample image.
In the visual bag-of-words method, a "dictionary", also called a dictionary model, needs to be trained in advance, and generally, a classification tree is obtained by clustering feature point descriptors in a large number of images and training, each classification tree can represent a visual "word", and the visual "words" form the dictionary model.
For a sample image, all feature point descriptors in the sample image may be classified as words, and the occurrence frequency of all words is counted, so that the frequency of each word in a dictionary may form a vector, the vector is a bag-of-word vector corresponding to the sample image, the bag-of-word vector may be used to measure the similarity of two images, and the bag-of-word vector is used as a sample global descriptor corresponding to the sample image.
For each frame of sample image, the sample image may be input to a trained deep learning model to obtain a target vector corresponding to the sample image, and the target vector is determined as a sample global descriptor corresponding to the sample image. For example, a deep learning method is a method for determining a global descriptor, in the deep learning method, a sample image may be subjected to multilayer convolution through a deep learning model, and a high-dimensional target vector is finally obtained, and the target vector is used as the sample global descriptor corresponding to the sample image.
In the deep learning method, a deep learning model, such as a CNN (Convolutional Neural Networks) model, needs to be trained in advance, and the deep learning model is generally obtained by training a large number of images, and the training mode of the deep learning model is not limited. For a sample image, the sample image may be input to a deep learning model, the deep learning model processes the sample image to obtain a high-dimensional target vector, and the target vector is used as a sample global descriptor corresponding to the sample image.
Sample local descriptors corresponding to feature points of the sample image: for each frame of sample image, the sample image may include a plurality of feature points, where a feature point may be a specific pixel position in the sample image, the feature point may correspond to an image local descriptor, and the image local descriptor is recorded as a sample local descriptor, where the sample local descriptor describes features of image blocks in a range near the feature point (i.e., the pixel position) with a vector, and the vector may also be referred to as a descriptor of the feature point. In summary, the sample local descriptor is a feature vector for representing an image block where the feature point is located, and the image block may be located in the sample image. It should be noted that, for a feature point (i.e., a two-dimensional feature point) in a sample image, the feature point may correspond to a map point (i.e., a three-dimensional map point) in a three-dimensional visual map, and therefore, the sample local descriptor corresponding to the feature point may also be a sample local descriptor corresponding to the map point corresponding to the feature point.
Wherein, algorithms such as ORB (Oriented FAST and Rotated FAST Transform), SIFT (Scale-Invariant Feature Transform), SURF (Speeded Up Robust Features), and the like can be adopted to extract Feature points from the sample image and determine the sample local descriptors corresponding to the Feature points. A deep learning algorithm (such as SuperPoint, DELF, D2-Net, etc.) may also be used to extract feature points from the sample image and determine a sample local descriptor corresponding to the feature points, which is not limited to this, as long as the feature points can be obtained and the sample local descriptor can be determined.
Map point information: map point information may include, but is not limited to: the 3D spatial location of the map point, all observed sample images, and the corresponding 2D feature point (i.e., the feature point corresponding to the map point) number.
And fourthly, global positioning of the server. Based on the acquired three-dimensional visual map of the target scene, after the server obtains the image to be detected, the server determines a target map point corresponding to the image to be detected from the three-dimensional visual map of the target scene, and determines a global positioning track of the terminal equipment in the three-dimensional visual map based on the target map point.
For each frame of image to be detected, the server may determine a global positioning pose corresponding to the image to be detected, assuming that M frames of images to be detected exist, the M frames of images to be detected correspond to M global positioning poses, and a global positioning track of the terminal device in the three-dimensional visual map may include M global positioning poses, which may be understood as a set of M global positioning poses. And determining the global positioning pose corresponding to the first frame of image to be detected in the M frames of images to be detected, determining the global positioning pose corresponding to the second frame of image to be detected in the second frame of image to be detected, and so on. For each global positioning pose, the global positioning pose is a pose point in the three-dimensional visual map, i.e. a pose point in the coordinate system of the three-dimensional visual map. In summary, after obtaining the global positioning poses corresponding to the M frames of images to be detected, the global positioning poses are combined into a global positioning track in the three-dimensional visual map, and the global positioning track includes the global positioning poses.
Based on the three-dimensional visual map of the target scene, in one possible implementation, referring to fig. 4, the server may determine the global positioning track of the terminal device in the three-dimensional visual map by using the following steps:
step 401, the server obtains an image to be detected of a target scene from the terminal device.
For example, the terminal device may acquire a target image, where the target image includes a multi-frame image, and the terminal device may select M frames of images from the multi-frame image as an image to be detected and send the M frames of images to be detected to the server. For example, the multi-frame image includes a key image and a non-key image, and on this basis, the terminal device may use the key image in the multi-frame image as the image to be measured, and the non-key image is not used as the image to be measured. For another example, the terminal device may select an image to be measured from the multiple frames of images at fixed intervals, and assuming that the fixed interval is 5 (of course, the fixed interval may be arbitrarily configured according to experience, and is not limited thereto), may select the 1 st frame of image as the image to be measured, the 6(1+5) th frame of image as the image to be measured, the 11(6+5) th frame of image as the image to be measured, and so on, and select one frame of image to be measured every 5 frames of images.
Step 402, determining a global descriptor to be detected corresponding to each frame of image to be detected.
For each frame of image to be detected, the image to be detected may correspond to an image global descriptor, and the image global descriptor may be recorded as an image to be detected, where the image to be detected is represented by a high-dimensional vector, and the image to be detected is used to distinguish image features of different images to be detected.
And determining a bag-of-words vector corresponding to each frame of image to be detected based on the trained dictionary model, and determining the bag-of-words vector as a global descriptor to be detected corresponding to the image to be detected. Or, for each frame of image to be detected, inputting the image to be detected to the trained deep learning model to obtain a target vector corresponding to the image to be detected, and determining the target vector as a global descriptor to be detected corresponding to the image to be detected.
In summary, the global descriptor to be detected corresponding to the image to be detected may be determined based on a visual bag-of-words method or a deep learning method, and the determination manner refers to the determination manner of the sample global descriptor, which is not described herein again.
Step 403, determining, for each frame of image to be detected, a similarity between the global descriptor to be detected corresponding to the image to be detected and the sample global descriptor corresponding to each frame of sample image corresponding to the three-dimensional visual map.
Referring to the above embodiment, the three-dimensional visual map may include a sample global descriptor corresponding to each frame of sample image, and therefore, a similarity between the global descriptor to be measured and each sample global descriptor may be determined, and taking the similarity as "distance similarity" as an example, a distance between the global descriptor to be measured and each sample global descriptor may be determined, such as a euclidean distance, that is, a euclidean distance between two feature vectors is calculated.
Step 404, selecting candidate sample images from the multi-frame sample images corresponding to the three-dimensional visual map based on the distance between the global descriptor to be detected and each sample global descriptor; the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image is the minimum distance; or, the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image is smaller than the distance threshold.
For example, assuming that the three-dimensional visual map corresponds to the sample image 1, the sample image 2, and the sample image 3, the distance 1 between the global descriptor to be measured and the sample global descriptor corresponding to the sample image 1 may be calculated, the distance 2 between the global descriptor to be measured and the sample global descriptor corresponding to the sample image 2 may be calculated, and the distance 3 between the global descriptor to be measured and the sample global descriptor corresponding to the sample image 3 may be calculated.
In one possible embodiment, if the distance 1 is the minimum distance, the sample image 1 is selected as the candidate sample image. Alternatively, if the distance 1 is smaller than the distance threshold (which may be configured empirically), and the distance 2 is smaller than the distance threshold, but the distance 3 is not smaller than the distance threshold, then both the sample image 1 and the sample image 2 are selected as candidate sample images. Or, if the distance 1 is the minimum distance and the distance 1 is smaller than the distance threshold, the sample image 1 is selected as the candidate sample image, but if the distance 1 is the minimum distance and the distance 1 is not smaller than the distance threshold, the candidate sample image cannot be selected, that is, the relocation fails.
In summary, for each frame of the image to be measured, the candidate sample image corresponding to the image to be measured may be selected from the multiple frame sample images corresponding to the three-dimensional visual map, where the number of the candidate sample images is at least one.
Step 405, for each frame of image to be detected, obtaining a plurality of feature points from the image to be detected, and for each feature point, determining a local descriptor to be detected corresponding to the feature point, where the local descriptor to be detected is used to represent a feature vector of an image block where the feature point is located, and the image block may be located in the image to be detected.
For example, the image to be measured may include a plurality of feature points, the feature points may be pixel positions having specificity in the image to be measured, the feature points may correspond to an image local descriptor, the image local descriptor is recorded as the local descriptor to be measured, the local descriptor to be measured describes the features of the image blocks in a range near the feature points (i.e., pixel positions) with a vector, and the vector may also be referred to as a descriptor of the feature points. To sum up, the local descriptor to be measured is a feature vector for representing an image block where the feature point is located.
The characteristic points can be extracted from the image to be detected by using algorithms such as ORB, SIFT, SURF and the like, and the local descriptors to be detected corresponding to the characteristic points are determined. A deep learning algorithm (such as SuperPoint, DELF, D2-Net, etc.) may also be used to extract feature points from the image to be detected and determine the local descriptor to be detected corresponding to the feature points, which is not limited to this, as long as the feature points can be obtained and the local descriptor to be detected can be determined.
Step 406, determining, for each feature point corresponding to the image to be measured, a distance, such as a euclidean distance, between the local descriptor to be measured corresponding to the feature point and the sample local descriptor corresponding to each map point corresponding to the candidate sample image corresponding to the image to be measured (i.e., the sample local descriptor corresponding to the map point corresponding to each feature point in the candidate sample image), that is, calculating the euclidean distance between the two feature vectors.
Referring to the above embodiment, for each frame of sample image, the three-dimensional visual map includes the sample local descriptor corresponding to each map point corresponding to the sample image, and therefore, after the candidate sample image corresponding to the image to be tested is obtained, the sample local descriptor corresponding to each map point corresponding to the candidate sample image is obtained from the three-dimensional visual map. After each feature point corresponding to the image to be detected is obtained, the distance between the local descriptor to be detected corresponding to the feature point and the sample local descriptor corresponding to each map point corresponding to the candidate sample image is determined.
Step 407, for each feature point, selecting a target map point from a plurality of map points corresponding to the candidate sample image based on the distance between the local descriptor to be detected corresponding to the feature point and the sample local descriptor corresponding to each map point corresponding to the candidate sample image; and the distance between the local descriptor to be detected and the sample local descriptor corresponding to the target map point is the minimum distance, and the minimum distance is smaller than the distance threshold.
For example, assuming that the candidate sample image corresponds to a map point 1, a map point 2, and a map point 3, a distance 1 between the local descriptor to be measured corresponding to the feature point and the sample local descriptor corresponding to the map point 1 may be calculated, a distance 2 between the local descriptor to be measured and the sample local descriptor corresponding to the map point 2 may be calculated, and a distance 3 between the local descriptor to be measured and the sample local descriptor corresponding to the map point 3 may be calculated.
In one possible implementation, if the distance 1 is the minimum distance, the map point 1 may be selected as the target map point. Alternatively, if the distance 1 is less than the distance threshold (which may be configured empirically), and the distance 2 is less than the distance threshold, but the distance 3 is not less than the distance threshold, then both the map point 1 and the map point 2 may be selected as the target map point. Or, if the distance 1 is the minimum distance and the distance 1 is smaller than the distance threshold, the map point 1 may be selected as the target map point, but if the distance 1 is the minimum distance and the distance 1 is not smaller than the distance threshold, the target map point cannot be selected, that is, the relocation fails.
In summary, for each feature point of the image to be detected, a target map point corresponding to the feature point is selected from the candidate sample image corresponding to the image to be detected, so as to obtain a matching relationship between the feature point and the target map point.
And 408, determining a global positioning pose in the three-dimensional visual map corresponding to the image to be detected based on the plurality of feature points corresponding to the image to be detected and the target map points corresponding to the plurality of feature points.
For a frame of image to be detected, the image to be detected may correspond to a plurality of feature points, and each feature point corresponds to a target map point, for example, the target map point corresponding to the feature point 1 is a map point 1, the target map point corresponding to the feature point 2 is a map point 2, and so on, so as to obtain a plurality of matching relationship pairs, each matching relationship pair includes a feature point (i.e., a two-dimensional feature point) and a map point (i.e., a three-dimensional map point in a three-dimensional visual map), the feature point represents a two-dimensional position in the image to be detected, and the map point represents a three-dimensional position in the three-dimensional visual map, that is, the matching relationship pair includes a mapping relationship from a two-dimensional position to a three-dimensional position, that is, a mapping relationship from a two-dimensional position in the image to be detected to a three-dimensional position in the three-dimensional visual map.
And if the total number of the plurality of matching relationship pairs does not meet the number requirement, the fact that the global positioning pose in the three-dimensional visual map corresponding to the image to be detected cannot be determined based on the plurality of matching relationship pairs is shown. If the total number of the plurality of matching relationship pairs reaches the number requirement (that is, the total number reaches a preset number value), it means that the global positioning pose in the three-dimensional visual map corresponding to the image to be measured can be determined based on the plurality of matching relationship pairs, and the global positioning pose in the three-dimensional visual map corresponding to the image to be measured can be determined based on the plurality of matching relationship pairs.
For example, a PnP (global NPoint, n-point Perspective) algorithm may be used to calculate the global positioning pose of the image to be measured in the three-dimensional visual map, and the calculation method is not limited. For example, the input data of the PnP algorithm is a plurality of matching relationship pairs, and for each matching relationship pair, the matching relationship pair includes a two-dimensional position in the image to be measured and a three-dimensional position in the three-dimensional visual map, and the pose, that is, the global positioning pose, of the image to be measured in the three-dimensional visual map can be calculated by using the PnP algorithm based on the plurality of matching relationship pairs.
In summary, for each frame of image to be detected, the global positioning pose in the three-dimensional visual map corresponding to the image to be detected is obtained, that is, the global positioning pose of the image to be detected in the three-dimensional visual map coordinate system is obtained.
In a possible implementation manner, after obtaining the plurality of matching relationship pairs, an effective matching relationship pair may be found from the plurality of matching relationship pairs. Based on the effective matching relation pairs, the global positioning pose of the image to be detected in the three-dimensional visual map can be calculated by adopting a PnP algorithm. For example, a RANdom SAmple Consensus (RANdom SAmple Consensus) detection algorithm may be adopted to find a valid pair of matching relationships from all pairs of matching relationships, which is not limited in this process.
And 409, generating a global positioning track of the terminal equipment in the three-dimensional visual map based on the global positioning poses corresponding to the M frames of images to be detected, wherein the global positioning track comprises a plurality of global positioning poses in the three-dimensional visual map. The server may obtain a global positioning track in the three-dimensional visual map, that is, a global positioning track in a coordinate system of the three-dimensional visual map, where the global positioning track may include global positioning poses corresponding to M frames of images to be detected, that is, the global positioning track may include M global positioning poses. Because the M frames of images to be detected are partial images selected from all the images, the global positioning track can include global positioning poses corresponding to a small number of images to be detected, that is, the server can obtain the global positioning track with a low frame rate.
Fifthly, fusion positioning of the server. After obtaining the self-positioning track with the high frame rate and the global positioning track with the low frame rate, the server fuses the self-positioning track with the high frame rate and the global positioning track with the low frame rate to obtain a fused positioning track with the high frame rate in a three-dimensional visual map coordinate system, namely a fused positioning track of the terminal equipment in the three-dimensional visual map. The fusion positioning track is a high frame rate pose in the three-dimensional visual map, the global positioning track is a low frame rate pose in the three-dimensional visual map, namely the frame rate of the fusion positioning track is higher than that of the global positioning track, and the number of the fusion positioning poses is larger than that of the global positioning poses.
Referring to fig. 5, a white solid line circle represents a self-positioning pose, and a trajectory formed by a plurality of self-positioning poses is referred to as a self-positioning trajectory, that is, the self-positioning trajectory includes a plurality of self-positioning poses. The self-positioning pose corresponding to the first frame image can be a reference coordinate system SLThe coordinate origin of the (self-positioning coordinate system) records the self-positioning pose corresponding to the first frame image as
Figure BDA0003355705880000191
Self-positioning pose
Figure BDA0003355705880000192
And a reference coordinate system SLCoincide. For each self-positioning pose in the self-positioning trajectory, is in a reference coordinate system SLAnd (5) self-positioning pose.
The gray solid line circle represents a global positioning pose, and a track formed by a plurality of global positioning poses is called a global positioning track, namely the global positioning track comprises a plurality of global positioning poses which can be a three-dimensional visual map coordinate system SGPose at, i.e. each global localization pose in the global localization track is a three-dimensional visual map coordinate system SGAnd the global positioning pose is also the global positioning pose under the three-dimensional visual map.
The white dotted circle represents a fusion positioning pose, and a track formed by a plurality of fusion positioning poses is called a fusion positioning track, namely the fusion positioning track comprises a plurality of fusion positioning poses which can be a three-dimensional visual map coordinate system SGPose at, i.e. each fused localization pose in the fused localization track is a three-dimensional visual map coordinate system SGAnd the fusion positioning pose is also the fusion positioning pose under the three-dimensional visual map.
Referring to fig. 5, the target image includes multiple frames of images, each frame of image corresponds to a self-positioning pose, and a partial image is selected from the multiple frames of images as an image to be detected, and each frame of image to be detected corresponds to a global positioning pose, so that the number of self-positioning poses is greater than the number of global positioning poses. When the fusion positioning tracks are obtained based on the self-positioning tracks and the global positioning tracks, each self-positioning pose corresponds to one fusion positioning pose (namely, the self-positioning poses correspond to the fusion positioning poses one by one), namely, the number of the self-positioning poses is the same as that of the fusion positioning poses, and therefore, the number of the fusion positioning poses is larger than that of the global positioning poses.
In a possible implementation manner, the server may implement a track fusion function and a pose transformation function, as shown in fig. 6, the server may implement the track fusion function and the pose transformation function by the following steps to obtain a fusion positioning track of the terminal device in the three-dimensional visual map:
step 601, selecting N self-positioning poses corresponding to the target time period from all the self-positioning poses included in the self-positioning track, and selecting P global positioning poses corresponding to the target time period from all the global positioning poses included in the global positioning track, wherein N may be greater than P for example.
For example, when fusing the self-localization trajectory and the global localization trajectory of the target time period, N self-localization poses corresponding to the target time period (i.e., self-localization poses determined based on the images acquired by the target time period) may be determined, and P global localization poses corresponding to the target time period (i.e., global localization poses determined based on the images acquired by the target time period) may be determined, as shown in fig. 5, and the target time period may be merged with the global localization trajectory of the target time period
Figure BDA0003355705880000201
And
Figure BDA0003355705880000202
the self-positioning poses in the space are taken as N self-positioning poses corresponding to the target time period, and the self-positioning poses can be used for positioning the target time period
Figure BDA0003355705880000203
And
Figure BDA0003355705880000204
the global positioning poses in between are taken as P global positioning poses corresponding to the target time period.
Step 602, determining N fusion positioning poses corresponding to the N self-positioning poses based on the N self-positioning poses and the P global positioning poses, wherein the N self-positioning poses correspond to the N fusion positioning poses one by one.
For example, referring to FIG. 5, a self-localization pose may be determined based on N self-localization poses and P global localization poses
Figure BDA0003355705880000205
Corresponding fusion positioning pose
Figure BDA0003355705880000206
Determining self-location pose
Figure BDA0003355705880000207
Corresponding fusion positioning pose
Figure BDA0003355705880000208
Determining self-location pose
Figure BDA0003355705880000209
Corresponding fusion positioning pose
Figure BDA00033557058800002010
And so on.
In a possible implementation manner, it is assumed that N self-positioning poses, P global positioning poses, and N fusion positioning poses exist, where the N self-positioning poses are known values, the P global positioning poses are known values, and the N fusion positioning poses are unknown values and are pose values to be solved. Self-positioning pose as shown in FIG. 5
Figure BDA00033557058800002011
And fusion positioning pose
Figure BDA00033557058800002012
Corresponding, self-positioning pose
Figure BDA00033557058800002013
And fusion positioning pose
Figure BDA00033557058800002014
Corresponding, self-positioning pose
Figure BDA0003355705880000211
And fusion positioning pose
Figure BDA0003355705880000212
Correspondingly, and so on. Global positioning pose
Figure BDA0003355705880000213
And fusion positioning pose
Figure BDA0003355705880000214
Corresponding, global positioning pose
Figure BDA0003355705880000215
And fusion positioning pose
Figure BDA0003355705880000216
Correspondingly, and so on.
First constraint values may be determined based on the N self-localization poses and the N fused localization poses, the first constraint values being used to represent residual values between the fused localization poses and the self-localization poses, e.g., may be based on
Figure BDA0003355705880000217
And
Figure BDA0003355705880000218
the difference of,
Figure BDA0003355705880000219
And
Figure BDA00033557058800002110
…, the difference between the values of,
Figure BDA00033557058800002111
And
Figure BDA00033557058800002112
calculating a first constraint value. The calculation formula of the first constraint value is not limited in this embodiment, and may be related to the above difference values.
A second constraint value may be determined based on the P global positioning poses and the P fused positioning poses (i.e., P fused positioning poses corresponding to the P global positioning poses selected from the N fused positioning poses), the second constraint value being used to represent a residual value (i.e., an absolute difference value) between the fused positioning pose and the global positioning pose, e.g., may be based on
Figure BDA00033557058800002113
And
Figure BDA00033557058800002114
…, the difference between the values of,
Figure BDA00033557058800002115
And
Figure BDA00033557058800002116
and calculating a second constraint value. The formula for calculating the second constraint value is not limited in this embodiment, and may be related to the above difference values.
The target constraint value may be calculated based on the first constraint value and the second constraint value, e.g., the target constraint value may be the sum of the first constraint value and the second constraint value. Because the N self-positioning poses and the P global positioning poses are known values and the N fusion positioning poses are unknown values, the target constraint value is minimum by adjusting the values of the N fusion positioning poses. And when the target constraint value is minimum, the values of the N fusion positioning poses are the pose values finally solved, so that the values of the N fusion positioning poses are obtained.
In one possible implementation, the target constraint value may be calculated using equation (1):
Figure BDA00033557058800002117
in formula (1), f (t) represents the target constraint value, the part before the plus sign (hereinafter referred to as a first part) is the first constraint value, and the part after the plus sign (hereinafter referred to as a second part) is the second constraint value.
Ωi,i+1The residual error information matrix aiming at the self-positioning pose can be configured according to experience without limitation, and omegakThe residual error information matrix aiming at the global positioning pose can be configured according to experience, and the configuration is not limited.
The first part represents relative transformation constraint of the self-positioning pose and the fusion positioning pose and can reflect through a first constraint value, and N is all self-positioning poses in the self-positioning track, namely N self-positioning poses. The second part represents global positioning constraints of the global positioning pose and the fusion positioning pose, and can be reflected by a second constraint value, wherein P is all global positioning poses in the global positioning track, namely P global positioning poses.
For the first part and the second part, it can also be expressed by formula (2) and formula (3):
Figure BDA0003355705880000221
Figure BDA0003355705880000222
in the formula (2) and the formula (3),
Figure BDA0003355705880000223
and
Figure BDA0003355705880000224
to fuse localization poses (without a corresponding global localization pose),
Figure BDA0003355705880000225
and
Figure BDA0003355705880000226
in order to realize the self-positioning pose,
Figure BDA0003355705880000227
for relative pose change constraints between two self-positioning poses, ei,i+1Is composed of
Figure BDA0003355705880000228
And
Figure BDA0003355705880000229
relative pose change and
Figure BDA00033557058800002210
the residual of the constraint.
Figure BDA00033557058800002211
For fusing position poses (with corresponding global position poses)
Figure BDA00033557058800002212
),
Figure BDA00033557058800002213
Is composed of
Figure BDA00033557058800002214
Corresponding global positioning pose, ekRepresenting fusion positioning poses
Figure BDA00033557058800002215
Positioning poses with respect to global
Figure BDA00033557058800002216
The residual error of (a).
Because the self-positioning pose and the global positioning pose are known, the fusion positioning pose is unknown, and the optimization goal can be to minimize the value of F (T), so that the fusion positioning pose can be obtained, namely the fusion positioning track under the three-dimensional visual map coordinate system can be shown in a formula (4): and (5) argmin F (T), wherein the fusion positioning track can be obtained by minimizing the value of F (T), and the fusion positioning track can comprise a plurality of fusion positioning poses.
Exemplarily, in order to minimize the value of f (t), algorithms such as gauss newton, gradient descent, LM (Levenberg-Marquardt) and the like may be used to solve to obtain the fused positioning pose, which is not described herein again.
And 603, generating a fusion positioning track of the terminal device in the three-dimensional visual map based on the N fusion positioning poses, wherein the fusion positioning track comprises the N fusion positioning poses in the three-dimensional visual map.
The server obtains a fusion positioning track in the three-dimensional visual map, namely the fusion positioning track in the coordinate system of the three-dimensional visual map, wherein the number of fusion positioning poses in the fusion positioning track is greater than that in the global positioning track, namely the fusion positioning track with a high frame rate can be obtained.
And 604, selecting an initial fusion positioning pose from the fusion positioning track, and selecting an initial self-positioning pose corresponding to the initial fusion positioning pose from the self-positioning track.
And 605, selecting a target self-positioning pose from the self-positioning track, and determining a target fusion positioning pose based on the initial fusion positioning pose, the initial self-positioning pose and the target self-positioning pose.
For example, after the fused positioning track is generated, the fused positioning track may be updated, and in the track updating process, an initial fused positioning pose may be selected from the fused positioning track, an initial self-positioning pose may be selected from the self-positioning track, and a target self-positioning pose may be selected from the self-positioning track. On this basis, a target fusion localization pose may be determined based on the initial fusion localization pose, the initial self-localization pose, and the target self-localization pose. A new fused localization track may then be generated based on the target fused localization pose and the fused localization track to replace the original fused localization track.
For example, in steps 601-603, referring to FIG. 5, the self-localizing track comprises
Figure BDA0003355705880000231
And
Figure BDA0003355705880000232
a global localization track comprises
Figure BDA0003355705880000233
And
Figure BDA0003355705880000234
the fused localization track comprises
Figure BDA0003355705880000235
And
Figure BDA0003355705880000236
after that, if a new self-positioning pose is obtained
Figure BDA0003355705880000237
However, since there is no corresponding global positioning pose, it cannot be based on the global positioning pose and the self-positioning pose
Figure BDA0003355705880000238
Determining self-positioning pose
Figure BDA0003355705880000239
Corresponding fusion positioning pose
Figure BDA00033557058800002310
On the basis, in the embodiment, the method can also be adoptedDetermining a fusion positioning pose according to the following formula (4)
Figure BDA00033557058800002311
Figure BDA00033557058800002312
In the formula (4), the first and second groups,
Figure BDA00033557058800002313
representing self-positioning poses
Figure BDA00033557058800002314
The corresponding fusion positioning pose, namely the target fusion positioning pose,
Figure BDA00033557058800002315
represents a fusion positioning pose, namely an initial fusion positioning pose selected from the fusion positioning track,
Figure BDA00033557058800002316
representing self-positioning poses, i.e. AND extracted from self-positioning trajectories
Figure BDA00033557058800002317
A corresponding initial self-positioning pose is obtained,
Figure BDA00033557058800002318
the representation represents a self-positioning pose, namely a self-positioning pose of a target selected from self-positioning tracks. In conclusion, the pose can be positioned based on the initial fusion
Figure BDA00033557058800002319
The initial self-positioning pose
Figure BDA00033557058800002320
And self-positioning pose of the target
Figure BDA00033557058800002321
Determining a target fusion location pose
Figure BDA00033557058800002322
Obtaining a target fusion positioning pose
Figure BDA00033557058800002323
Thereafter, a new fused localization track may be generated, i.e., the new fused localization track may include the target fused localization pose
Figure BDA00033557058800002324
Thereby updating the fused localization track.
In the above process, step 601-step 603 are track fusion processes, step 604-step 605 are pose transformation processes, and track fusion is a process of registering and fusing a self-positioning track and a global positioning track, so as to realize the conversion of the self-positioning track from a self-positioning coordinate system to a three-dimensional visual map coordinate system, modify the track by using a global positioning result, and perform track fusion once when a new frame can obtain the global positioning track. Because not all frames can successfully obtain the global positioning track, the poses of the frames are fused positioning poses of a three-dimensional visual map coordinate system output in a pose transformation mode, namely a pose transformation process.
And sixthly, three-dimensional visual maps of the target scenes. The three-dimensional visual map of the target scene needs to be constructed in advance, the three-dimensional visual map is stored in the server, and the server can display the track based on the three-dimensional visual map. The three-dimensional visual map is a 3D three-dimensional visual map of a target scene, is mainly used for track display, can be obtained through laser scanning and artificial modeling, is a visual map which can be checked, is not limited in the construction mode of the three-dimensional visual map, and can be obtained by adopting a composition algorithm.
The three-dimensional visual map and the three-dimensional visual map of the target scene need to be registered based on the three-dimensional visual map of the target scene and the three-dimensional visual map of the target scene, so that the three-dimensional visual map and the three-dimensional visual map are ensured to be aligned in space. For example, sampling a three-dimensional visual map, changing the three-dimensional visual map from a triangular patch form to a dense Point cloud form, and registering the Point cloud and a 3D Point cloud of the three-dimensional visual map through an ICP (Iterative Closest Point) algorithm to obtain a transformation matrix T from the three-dimensional visual map to the three-dimensional visual map; and finally, transforming the three-dimensional visual map to a three-dimensional visual map coordinate system by using the transformation matrix T to obtain the three-dimensional visual map aligned with the three-dimensional visual map.
For example, the transformation matrix T (denoted as target transformation matrix) may be determined as follows:
mode 1, when a three-dimensional visual map and a three-dimensional visual map are constructed, a plurality of calibration points (different calibration points can be distinguished through different shapes, so that the calibration points can be recognized from an image) can be deployed in a target scene, the three-dimensional visual map can comprise a plurality of calibration points, and the three-dimensional visual map can also comprise a plurality of calibration points. For each of a plurality of calibration points, a coordinate pair corresponding to the calibration point may be determined, the coordinate pair including a position coordinate of the calibration point in the three-dimensional visual map and a position coordinate of the calibration point in the three-dimensional visual map. The target transformation matrix can be determined based on the coordinate pairs corresponding to the plurality of calibration points. For example, the target transformation matrix T may be an m × n-dimensional transformation matrix, and the transformation relationship between the three-dimensional visual map and the three-dimensional visual map may be: w is Q × T, W represents the position coordinate in the three-dimensional visual map, and Q represents the position coordinate in the three-dimensional visual map, and then, a plurality of coordinate pairs corresponding to a plurality of calibration points are substituted into the above formula (i.e., the position coordinate of the calibration point in the three-dimensional visual map is taken as Q, and the position coordinate of the calibration point in the three-dimensional visual map is taken as W), so that a target transformation matrix T can be obtained, which is not described again.
Mode 2, acquiring an initial transformation matrix, mapping position coordinates in the three-dimensional visual map into mapping coordinates in the three-dimensional visual map based on the initial transformation matrix, and determining whether the initial transformation matrix is converged based on the relation between the mapping coordinates and actual coordinates in the three-dimensional visual map; if so, determining the initial transformation matrix as a target transformation matrix to obtain a target transformation matrix; if not, the initial transformation matrix can be adjusted, the adjusted transformation matrix is used as the initial transformation matrix, then, the operation of mapping the position coordinates in the three-dimensional visual map into the mapping coordinates in the three-dimensional visual map based on the initial transformation matrix is returned to be executed, and the like until the target transformation matrix is obtained.
For example, an initial transformation matrix may be obtained first, the obtaining method of the initial transformation matrix is not limited, and the initial transformation matrix may be an initial transformation matrix set randomly or an initial transformation matrix obtained by using a certain algorithm, where the initial transformation matrix is a matrix that needs iterative optimization, that is, the initial transformation matrix is continuously iteratively optimized, and the initial transformation matrix after iterative optimization is used as a target transformation matrix.
After the initial transformation matrix is obtained, the position coordinates in the three-dimensional visual map can be mapped to the mapping coordinates in the three-dimensional visual map based on the initial transformation matrix, for example, the transformation relationship between the three-dimensional visual map and the three-dimensional visual map may be: that is, the position coordinates in the three-dimensional visual map can be obtained by using the position coordinates in the three-dimensional visual map as Q and the initial transformation matrix as T (for the sake of convenience of distinction, these are referred to as mapping coordinates). Then, it is determined whether the initial transformation matrix has converged based on a relationship of the mapped coordinates in the three-dimensional visualization map and the actual coordinates in the three-dimensional visualization map. For example, the mapping coordinates in the three-dimensional visualization map are coordinates converted based on the initial transformation matrix, the actual coordinates in the three-dimensional visualization map are real coordinates in the three-dimensional visualization map, the smaller the difference between the mapping coordinates and the actual coordinates, the higher the accuracy of the initial transformation matrix is, and the larger the difference between the mapping coordinates and the actual coordinates, the lower the accuracy of the initial transformation matrix is. Based on the above principle, it can be determined whether the initial transformation matrix has converged based on the difference between the mapped coordinates and the actual coordinates.
For example, if the difference between the mapped coordinates and the actual coordinates (which may be the sum of multiple sets of differences, each set of differences corresponds to the difference between one mapped coordinate and the actual coordinates) is smaller than a threshold, it is determined that the initial transformation matrix has converged, and if the difference between the mapped coordinates and the actual coordinates is not smaller than the threshold, it is determined that the initial transformation matrix has not converged.
If the initial transformation matrix is not converged, the initial transformation matrix can be adjusted, the adjustment process is not limited, for example, the initial transformation matrix is adjusted by adopting an ICP (inductively coupled plasma) algorithm, the adjusted transformation matrix is used as the initial transformation matrix, the operation of mapping the position coordinates in the three-dimensional visual map into the mapping coordinates in the three-dimensional visual map based on the initial transformation matrix is returned, and the like until the target transformation matrix is obtained. If the initial transformation matrix has converged, the initial transformation matrix is determined as a target transformation matrix.
Mode 3, sampling the three-dimensional visual map to obtain a first point cloud corresponding to the three-dimensional visual map; and sampling the three-dimensional visual map to obtain a second point cloud corresponding to the three-dimensional visual map. And registering the first point cloud and the second point cloud by adopting an ICP (inductively coupled plasma) algorithm to obtain a target transformation matrix between the three-dimensional visual map and the three-dimensional visual map. Obviously, the first point cloud and the second point cloud can be obtained, the first point cloud comprises a large number of 3D points, the second point cloud comprises a large number of 3D points, and based on the large number of 3D points of the first point cloud and the large number of 3D points of the second point cloud, the registration can be performed by using an ICP algorithm, and the registration process is not limited.
And seventhly, displaying the track. After the server obtains the fusion positioning tracks, the server can convert the fusion positioning poses into target positioning poses in the three-dimensional visual map based on a target transformation matrix between the three-dimensional visual map and the three-dimensional visual map, and display the target positioning poses through the three-dimensional visual map. On the basis, a manager can open a Web browser and access a server through a network so as to check the target positioning poses displayed in the three-dimensional visual map, and the target positioning poses form a track. The server can display the target positioning pose of the terminal equipment to the three-dimensional visual map by reading and rendering the three-dimensional visual map, so that a manager can check the target positioning pose displayed in the three-dimensional visual map. The manager can change the viewing angle through mouse dragging, and 3D viewing of the track is achieved. For example, the server comprises client software, and the client software reads and renders the three-dimensional visual map and displays the target positioning pose to the three-dimensional visual map. On the basis, a user (such as an administrator) can access the client software through the Web browser to view the target positioning pose displayed in the three-dimensional visual map through the client software. Illustratively, when the target positioning pose displayed in the three-dimensional visual map is viewed through the client software, the viewing angle of the three-dimensional visual map can be changed through mouse dragging.
According to the technical scheme, the cloud-edge combined positioning and displaying method is provided, the terminal device calculates the self-positioning track with the high frame rate, only the self-positioning track and a small number of images to be detected are sent, and the data volume of network transmission is reduced. And global positioning is carried out on the server, so that the consumption of computing resources and the consumption of storage resources of the terminal equipment are reduced. By adopting a cloud-edge integrated system architecture, the computing pressure can be shared, the hardware cost of the terminal equipment is reduced, and the network transmission data volume is reduced. The final positioning result can be displayed in a three-dimensional visual map, and managers can access the server through a Web end to carry out interactive display.
Based on the same application concept as the method, the embodiment of the application provides a cloud side management system, the cloud side management system comprises a terminal device and a server, the server comprises a three-dimensional visual map of a target scene, wherein: the terminal device is used for acquiring a target image of a target scene and motion data of the terminal device in the moving process of the target scene, and determining a self-positioning track of the terminal device based on the target image and the motion data; if the target image comprises a plurality of frames of images, selecting a partial image from the plurality of frames of images as an image to be detected, and sending the image to be detected and the self-positioning track to a server; the server is used for generating a fused positioning track of the terminal equipment in the three-dimensional visual map based on the image to be detected and the self-positioning track, and the fused positioning track comprises a plurality of fused positioning poses; and aiming at each fusion positioning pose in the fusion positioning track, determining a target positioning pose corresponding to the fusion positioning pose, and displaying the target positioning pose.
Illustratively, the terminal device includes a vision sensor and a motion sensor; the vision sensor is used for acquiring a target image of the target scene, and the motion sensor is used for acquiring motion data of the terminal equipment; wherein the terminal device is a wearable device and the visual sensor and the motion sensor are deployed on the wearable device; or the terminal equipment is a recorder, and the vision sensor and the motion sensor are arranged on the recorder; or, the terminal device is a camera, and the vision sensor and the motion sensor are disposed on the camera.
For example, when the server generates the fused positioning track of the terminal device in the three-dimensional visual map based on the image to be detected and the self-positioning track, the server is specifically configured to:
determining a target map point corresponding to the image to be detected from the three-dimensional visual map, and determining a global positioning track of the terminal equipment in the three-dimensional visual map based on the target map point;
generating a fused positioning track of the terminal equipment in the three-dimensional visual map based on the self-positioning track and the global positioning track; the frame rate of the fusion positioning poses included by the fusion positioning track is greater than the frame rate of the global positioning poses included by the global positioning track; the frame rate of the fused localization poses included in the fused localization tracks is equal to the frame rate of the self-localization poses included in the self-localization tracks.
Illustratively, the server determines a target positioning pose corresponding to the fusion positioning pose, and displays the target positioning pose specifically for: converting the fusion positioning pose into a target positioning pose in a three-dimensional visual map based on a target transformation matrix between the three-dimensional visual map and the three-dimensional visual map, and displaying the target positioning pose through the three-dimensional visual map;
the server comprises client software, and the client software reads and renders the three-dimensional visual map and displays the target positioning pose to the three-dimensional visual map;
accessing the client software through a Web browser by a user so as to check the target positioning pose displayed in the three-dimensional visual map through the client software;
when the target positioning pose displayed in the three-dimensional visual map is checked through the client software, the checking visual angle of the three-dimensional visual map is changed through mouse dragging.
Based on the same application concept as the method, the embodiment of the present application provides a pose display apparatus, which is applied to a server in a cloud edge management system, where the server includes a three-dimensional visual map of a target scene, as shown in fig. 7, and is a structure diagram of the pose display apparatus, and the pose display apparatus includes:
an obtaining module 71, configured to obtain an image to be detected and a self-positioning track; the self-positioning track is determined by terminal equipment based on a target image of the target scene and motion data of the terminal equipment, and the image to be detected is a partial image in a multi-frame image included in the target image; a generating module 72, configured to generate a fused positioning track of the terminal device in the three-dimensional visual map based on the image to be detected and the self-positioning track, where the fused positioning track includes multiple fused positioning poses; and the display module 73 is configured to determine, for each fusion positioning pose in the fusion positioning trajectory, a target positioning pose corresponding to the fusion positioning pose, and display the target positioning pose.
For example, the generating module 72 is specifically configured to, when generating the fused positioning track of the terminal device in the three-dimensional visual map based on the image to be detected and the self-positioning track: determining a target map point corresponding to the image to be detected from a three-dimensional visual map, and determining a global positioning track of the terminal equipment in the three-dimensional visual map based on the target map point; generating a fused positioning track of the terminal equipment in the three-dimensional visual map based on the self-positioning track and the global positioning track; the frame rate of the fusion positioning poses included in the fusion positioning track is greater than the frame rate of the global positioning poses included in the global positioning track; the frame rate of the fused localization poses included in the fused localization tracks is equal to the frame rate of the self-localization poses included in the self-localization tracks.
Illustratively, the three-dimensional visual map includes at least one of: a pose matrix corresponding to the sample image, a sample global descriptor corresponding to the sample image, a sample local descriptor corresponding to the characteristic point in the sample image and map point information; the generating module 72 determines a target map point corresponding to the image to be measured from the three-dimensional visual map, and when determining the global positioning track of the terminal device in the three-dimensional visual map based on the target map point, is specifically configured to: selecting candidate sample images from the multi-frame sample images according to the similarity between each frame of image to be detected and the multi-frame sample images corresponding to the three-dimensional visual map; acquiring a plurality of feature points from an image to be detected; for each feature point, determining a target map point corresponding to the feature point from a plurality of map points corresponding to the candidate sample image; determining a global positioning pose in a three-dimensional visual map corresponding to the image to be detected based on the plurality of feature points and target map points corresponding to the plurality of feature points; and generating a global positioning track of the terminal equipment in the three-dimensional visual map based on the global positioning poses corresponding to all the images to be measured.
For example, the generating module 72 is specifically configured to, when generating the fused localization track of the terminal device in the three-dimensional visual map based on the self-localization track and the global localization track: selecting N self-positioning poses corresponding to a target time period from all self-positioning poses included in a self-positioning track, and selecting P global positioning poses corresponding to the target time period from all global positioning poses included in the global positioning track; n is greater than P; determining N fusion positioning poses corresponding to the N self-positioning poses based on the N self-positioning poses and the P global positioning poses, wherein the N self-positioning poses correspond to the N fusion positioning poses one by one; and generating a fusion positioning track of the terminal equipment in the three-dimensional visual map based on the N fusion positioning poses.
For example, the display module 73 determines an object locating pose corresponding to the fusion locating pose, and displays the object locating pose specifically for: converting the fusion positioning pose into a target positioning pose in a three-dimensional visual map based on a target transformation matrix between the three-dimensional visual map and the three-dimensional visual map, and displaying the target positioning pose through the three-dimensional visual map; the display module, the display module 73, is further configured to determine a target transformation matrix between the three-dimensional visual map and the three-dimensional visual map by: for each of a plurality of calibration points, determining a coordinate pair corresponding to the calibration point, wherein the coordinate pair comprises a position coordinate of the calibration point in a three-dimensional visual map and a position coordinate of the calibration point in the three-dimensional visual map; determining the target transformation matrix based on the coordinate pairs corresponding to the plurality of calibration points; or acquiring an initial transformation matrix, mapping the position coordinates in the three-dimensional visual map into mapping coordinates in the three-dimensional visual map based on the initial transformation matrix, and determining whether the initial transformation matrix is converged based on the relation between the mapping coordinates and actual coordinates in the three-dimensional visual map; if so, determining the initial transformation matrix as the target transformation matrix; if not, adjusting the initial transformation matrix, taking the adjusted transformation matrix as the initial transformation matrix, and returning to execute the operation of mapping the position coordinates in the three-dimensional visual map into the mapping coordinates in the three-dimensional visual map based on the initial transformation matrix; or sampling the three-dimensional visual map to obtain a first point cloud corresponding to the three-dimensional visual map; sampling the three-dimensional visual map to obtain a second point cloud corresponding to the three-dimensional visual map; and registering the first point cloud and the second point cloud by adopting an ICP (inductively coupled plasma) algorithm to obtain a target transformation matrix between the three-dimensional visual map and the three-dimensional visual map.
Based on the same application concept as the method, the embodiment of the present application provides a server, where the server may include: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; the processor is used for executing machine executable instructions to realize the pose display method disclosed by the above example of the application.
Based on the same application concept as the method, embodiments of the present application further provide a machine-readable storage medium, where a plurality of computer instructions are stored, and when the computer instructions are executed by a processor, the pose display method disclosed in the above example of the present application can be implemented.
The machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (12)

1.一种位姿显示方法,其特征在于,应用于云边管理系统,所述云边管理系统包括终端设备和服务器,服务器包括目标场景的三维视觉地图,包括:1. a pose display method, is characterized in that, is applied to cloud edge management system, described cloud edge management system comprises terminal equipment and server, server comprises the three-dimensional visual map of target scene, comprises: 所述终端设备在目标场景的移动过程中,获取所述目标场景的目标图像和所述终端设备的运动数据,基于所述目标图像和所述运动数据确定所述终端设备的自定位轨迹;若所述目标图像包括多帧图像,则从所述多帧图像中选取部分图像作为待测图像,将所述待测图像和所述自定位轨迹发送给服务器;During the movement of the target scene, the terminal device acquires the target image of the target scene and the motion data of the terminal device, and determines the self-positioning trajectory of the terminal device based on the target image and the motion data; if If the target image includes multiple frames of images, select a part of the images from the multiple frames of images as the image to be tested, and send the image to be tested and the self-positioning trajectory to the server; 所述服务器基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹,所述融合定位轨迹包括多个融合定位位姿;generating, by the server, a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the to-be-measured image and the self-positioning trajectory, where the fusion positioning trajectory includes a plurality of fusion positioning poses; 针对所述融合定位轨迹中的每个融合定位位姿,所述服务器确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿。For each fusion positioning pose in the fusion positioning trajectory, the server determines a target positioning pose corresponding to the fusion positioning pose, and displays the target positioning pose. 2.根据权利要求1所述的方法,其特征在于,所述终端设备基于所述目标图像和所述运动数据确定所述终端设备的自定位轨迹,包括:2. The method according to claim 1, wherein the terminal device determines a self-positioning trajectory of the terminal device based on the target image and the motion data, comprising: 所述终端设备从所述多帧图像中遍历出当前帧图像;基于当前帧图像前面的K帧图像对应的自定位位姿、所述终端设备在自定位坐标系中的地图位置和所述运动数据确定当前帧图像对应的自定位位姿;基于所述多帧图像对应的自定位位姿生成所述终端设备在自定位坐标系中的自定位轨迹;The terminal device traverses the current frame image from the multi-frame images; based on the self-positioning pose corresponding to the K frame images in front of the current frame image, the map position of the terminal device in the self-positioning coordinate system, and the motion The data determines the self-positioning pose corresponding to the current frame image; the self-positioning trajectory of the terminal device in the self-positioning coordinate system is generated based on the self-positioning pose corresponding to the multi-frame images; 其中,若当前帧图像是关键图像,则基于所述终端设备的当前位置生成自定位坐标系中的地图位置;其中,若当前帧图像与当前帧图像的前一帧图像之间的匹配特征点数量未达到预设阈值,则确定当前帧图像是关键图像。Wherein, if the current frame image is a key image, the map position in the self-positioning coordinate system is generated based on the current position of the terminal device; wherein, if the matching feature point between the current frame image and the previous frame image of the current frame image If the number does not reach the preset threshold, it is determined that the current frame image is the key image. 3.根据权利要求1所述的方法,其特征在于,3. The method according to claim 1, wherein 所述服务器基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹,包括:The server generates a fusion positioning track of the terminal device in the three-dimensional visual map based on the image to be tested and the self-positioning track, including: 从所述三维视觉地图中确定出与所述待测图像对应的目标地图点,基于所述目标地图点确定所述终端设备在所述三维视觉地图中的全局定位轨迹;Determine the target map point corresponding to the to-be-measured image from the three-dimensional visual map, and determine the global positioning track of the terminal device in the three-dimensional visual map based on the target map point; 基于所述自定位轨迹和所述全局定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹;其中,所述融合定位轨迹包括的融合定位位姿的帧率大于所述全局定位轨迹包括的全局定位位姿的帧率;所述融合定位轨迹包括的融合定位位姿的帧率等于所述自定位轨迹包括的自定位位姿的帧率。Based on the self-positioning track and the global positioning track, a fusion positioning track of the terminal device in the 3D visual map is generated; wherein the frame rate of the fusion positioning pose included in the fusion positioning track is greater than that of the global positioning The frame rate of the global positioning pose included in the track; the frame rate of the fused positioning pose included in the fusion positioning track is equal to the frame rate of the self-positioning pose included in the self-positioning track. 4.根据权利要求3所述的方法,其特征在于,所述三维视觉地图包括以下至少一种:样本图像对应的位姿矩阵、样本图像对应的样本全局描述子、样本图像中的特征点对应的样本局部描述子、地图点信息;所述服务器从所述三维视觉地图中确定出与所述待测图像对应的目标地图点,基于所述目标地图点确定所述终端设备在所述三维视觉地图中的全局定位轨迹,包括:4. The method according to claim 3, wherein the three-dimensional visual map comprises at least one of the following: a pose matrix corresponding to the sample image, a sample global descriptor corresponding to the sample image, and a feature point corresponding to the sample image sample local descriptor and map point information; the server determines the target map point corresponding to the image to be measured from the three-dimensional visual map, and determines, based on the target map point, that the terminal device is in the three-dimensional visual map Global positioning tracks in the map, including: 针对每帧待测图像,所述服务器基于所述待测图像与所述三维视觉地图对应的多帧样本图像之间的相似度,从所述多帧样本图像中选取出候选样本图像;For each frame of the image to be tested, the server selects candidate sample images from the multiple frames of sample images based on the similarity between the image to be tested and the multiple frames of sample images corresponding to the three-dimensional visual map; 所述服务器从所述待测图像中获取多个特征点;针对每个特征点,从所述候选样本图像对应的多个地图点中确定出与该特征点对应的目标地图点;The server obtains a plurality of feature points from the image to be tested; for each feature point, determines a target map point corresponding to the feature point from the plurality of map points corresponding to the candidate sample image; 基于所述多个特征点和所述多个特征点对应的目标地图点确定所述待测图像对应的所述三维视觉地图中的全局定位位姿;基于所有待测图像对应的全局定位位姿生成所述终端设备在所述三维视觉地图中的全局定位轨迹。Determine the global positioning pose in the three-dimensional visual map corresponding to the image to be tested based on the multiple feature points and the target map points corresponding to the multiple feature points; based on the global positioning pose corresponding to all the images to be tested A global positioning track of the terminal device in the three-dimensional visual map is generated. 5.根据权利要求3所述的方法,其特征在于,5. The method of claim 3, wherein 所述服务器基于所述自定位轨迹和所述全局定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹,包括:The server generates a fusion positioning track of the terminal device in the three-dimensional visual map based on the self-positioning track and the global positioning track, including: 所述服务器从所述自定位轨迹包括的所有自定位位姿中选取出与目标时间段对应的N个自定位位姿,并从所述全局定位轨迹包括的所有全局定位位姿中选取出与所述目标时间段对应的P个全局定位位姿;其中,所述N大于所述P;The server selects N self-positioning poses corresponding to the target time period from all the self-positioning poses included in the self-positioning track, and selects N self-positioning poses corresponding to the target time period from all the global positioning poses included in the global positioning track. P global positioning poses corresponding to the target time period; wherein, the N is greater than the P; 基于所述N个自定位位姿和所述P个全局定位位姿确定所述N个自定位位姿对应的N个融合定位位姿,N个自定位位姿与N个融合定位位姿一一对应;Based on the N self-positioning poses and the P global positioning poses, N fused positioning poses corresponding to the N self-positioning poses are determined, and the N self-positioning poses and the N fused positioning poses are one one correspondence; 基于N个融合定位位姿生成终端设备在三维视觉地图中的融合定位轨迹。Based on the N fusion positioning poses, the fusion positioning trajectory of the terminal device in the 3D visual map is generated. 6.根据权利要求1所述的方法,其特征在于,6. The method of claim 1, wherein 所述服务器确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿,包括:基于所述三维视觉地图与三维可视化地图之间的目标变换矩阵,将所述融合定位位姿转换为所述三维可视化地图中的目标定位位姿,并通过所述三维可视化地图显示所述目标定位位姿;其中,所述三维视觉地图与三维可视化地图之间的目标变换矩阵的确定方式,包括:The server determines the target positioning pose corresponding to the fusion positioning pose, and displays the target positioning pose, including: based on the target transformation matrix between the three-dimensional visual map and the three-dimensional visualization map, the fusion positioning The pose is converted into the target positioning pose in the three-dimensional visual map, and the target positioning pose is displayed through the three-dimensional visual map; wherein, the determination of the target transformation matrix between the three-dimensional visual map and the three-dimensional visual map ways, including: 针对多个标定点中的每个标定点,确定该标定点对应的坐标对,所述坐标对包括该标定点在所述三维视觉地图中的位置坐标和该标定点在所述三维可视化地图中的位置坐标;基于所述多个标定点对应的坐标对确定目标变换矩阵;For each calibration point in the plurality of calibration points, a coordinate pair corresponding to the calibration point is determined, and the coordinate pair includes the position coordinates of the calibration point in the 3D visual map and the calibration point in the 3D visual map. Determine the target transformation matrix based on the coordinate pairs corresponding to the multiple calibration points; 或者,获取初始变换矩阵,基于初始变换矩阵将所述三维视觉地图中的位置坐标映射为所述三维可视化地图中的映射坐标,基于该映射坐标与所述三维可视化地图中的实际坐标的关系确定所述初始变换矩阵是否已收敛;若是,则将所述初始变换矩阵确定为目标变换矩阵;若否,则对所述初始变换矩阵进行调整,将调整后变换矩阵作为初始变换矩阵,返回执行基于初始变换矩阵将所述三维视觉地图中的位置坐标映射为所述三维可视化地图中的映射坐标的操作;Or, obtain an initial transformation matrix, map the position coordinates in the three-dimensional visual map to the mapped coordinates in the three-dimensional visual map based on the initial transformation matrix, and determine based on the relationship between the mapped coordinates and the actual coordinates in the three-dimensional visual map Whether the initial transformation matrix has converged; if so, determine the initial transformation matrix as the target transformation matrix; if not, adjust the initial transformation matrix, take the adjusted transformation matrix as the initial transformation matrix, and return to execute the The initial transformation matrix maps the position coordinates in the three-dimensional visual map to the mapping coordinates in the three-dimensional visual map; 或者,对所述三维可视化地图进行采样,得到与所述三维可视化地图对应的第一点云;以及,对所述三维视觉地图进行采样,得到与所述三维视觉地图对应的第二点云;采用ICP算法对所述第一点云和所述第二点云进行配准,得到所述三维视觉地图与三维可视化地图之间的目标变换矩阵。Or, sampling the three-dimensional visual map to obtain a first point cloud corresponding to the three-dimensional visual map; and sampling the three-dimensional visual map to obtain a second point cloud corresponding to the three-dimensional visual map; The ICP algorithm is used to register the first point cloud and the second point cloud to obtain a target transformation matrix between the three-dimensional visual map and the three-dimensional visual map. 7.一种云边管理系统,其特征在于,所述云边管理系统包括终端设备和服务器,所述服务器包括目标场景的三维视觉地图,其中:7. A cloud-side management system, characterized in that the cloud-side management system includes a terminal device and a server, and the server includes a three-dimensional visual map of a target scene, wherein: 所述终端设备,用于在目标场景的移动过程中,获取所述目标场景的目标图像和所述终端设备的运动数据,基于所述目标图像和所述运动数据确定所述终端设备的自定位轨迹;若所述目标图像包括多帧图像,则从所述多帧图像中选取部分图像作为待测图像,将所述待测图像和所述自定位轨迹发送给服务器;The terminal device is configured to acquire a target image of the target scene and motion data of the terminal device during the movement of the target scene, and determine the self-positioning of the terminal device based on the target image and the motion data track; if the target image includes multiple frames of images, select a part of the images from the multiple frames of images as the image to be tested, and send the image to be tested and the self-positioning track to the server; 所述服务器,用于基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹,所述融合定位轨迹包括多个融合定位位姿;针对所述融合定位轨迹中的每个融合定位位姿,确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿。The server is configured to generate a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the image to be measured and the self-positioning trajectory, where the fusion positioning trajectory includes a plurality of fusion positioning poses; Each fusion positioning pose in the fusion positioning trajectory is determined, the target positioning pose corresponding to the fusion positioning pose is determined, and the target positioning pose is displayed. 8.根据权利要求7所述的系统,其特征在于,所述终端设备包括视觉传感器和运动传感器;其中,所述视觉传感器,用于获取所述目标场景的目标图像,所述运动传感器,用于获取所述终端设备的运动数据;8 . The system according to claim 7 , wherein the terminal device comprises a visual sensor and a motion sensor; wherein, the visual sensor is used to acquire the target image of the target scene, and the motion sensor is used to obtain the target image of the target scene. 9 . for obtaining the motion data of the terminal device; 其中,所述终端设备为可穿戴设备,且所述视觉传感器和所述运动传感器部署在所述可穿戴设备上;或者,所述终端设备为记录仪,且所述视觉传感器和所述运动传感器部署在所述记录仪上;或者,所述终端设备为摄像机,且所述视觉传感器和所述运动传感器部署在所述摄像机上。Wherein, the terminal device is a wearable device, and the visual sensor and the motion sensor are deployed on the wearable device; or, the terminal device is a recorder, and the visual sensor and the motion sensor deployed on the recorder; or, the terminal device is a camera, and the vision sensor and the motion sensor are deployed on the camera. 9.根据权利要求7所述的系统,其特征在于,9. The system of claim 7, wherein: 所述服务器基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹时具体用于:When the server generates the fusion positioning track of the terminal device in the three-dimensional visual map based on the image to be tested and the self-positioning track, it is specifically used for: 从所述三维视觉地图中确定出与所述待测图像对应的目标地图点,基于所述目标地图点确定所述终端设备在所述三维视觉地图中的全局定位轨迹;Determine the target map point corresponding to the to-be-measured image from the three-dimensional visual map, and determine the global positioning track of the terminal device in the three-dimensional visual map based on the target map point; 基于所述自定位轨迹和所述全局定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹;其中,所述融合定位轨迹包括的融合定位位姿的帧率大于所述全局定位轨迹包括的全局定位位姿的帧率;所述融合定位轨迹包括的融合定位位姿的帧率等于所述自定位轨迹包括的自定位位姿的帧率。Based on the self-positioning track and the global positioning track, a fusion positioning track of the terminal device in the 3D visual map is generated; wherein the frame rate of the fusion positioning pose included in the fusion positioning track is greater than that of the global positioning The frame rate of the global positioning pose included in the track; the frame rate of the fused positioning pose included in the fusion positioning track is equal to the frame rate of the self-positioning pose included in the self-positioning track. 10.根据权利要求7所述的系统,其特征在于,10. The system of claim 7, wherein: 所述服务器确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿时具体用于:基于所述三维视觉地图与三维可视化地图之间的目标变换矩阵,将所述融合定位位姿转换为所述三维可视化地图中的目标定位位姿,并通过所述三维可视化地图显示所述目标定位位姿;When the server determines the target positioning pose corresponding to the fusion positioning pose, and displays the target positioning pose, it is specifically used for: based on the target transformation matrix between the three-dimensional visual map and the three-dimensional visual map, convert the The fusion positioning pose is converted into the target positioning pose in the three-dimensional visualization map, and the target positioning pose is displayed through the three-dimensional visualization map; 其中,所述服务器包括客户端软件,所述客户端软件读取所述三维可视化地图并进行渲染,并将所述目标定位位姿显示到所述三维可视化地图;Wherein, the server includes client software, the client software reads and renders the three-dimensional visual map, and displays the target positioning pose on the three-dimensional visual map; 其中,用户通过Web浏览器访问所述客户端软件,以通过所述客户端软件查看所述三维可视化地图中显示的所述目标定位位姿;Wherein, the user accesses the client software through a web browser, so as to view the target positioning pose displayed in the three-dimensional visualization map through the client software; 其中,在通过所述客户端软件查看所述三维可视化地图中显示的所述目标定位位姿时,通过鼠标拖动改变所述三维可视化地图的查看视角。Wherein, when viewing the target positioning pose displayed in the three-dimensional visual map through the client software, the viewing angle of the three-dimensional visual map is changed by dragging the mouse. 11.一种位姿显示装置,其特征在于,应用于云边管理系统中的服务器,所述服务器包括目标场景的三维视觉地图,所述装置包括:11. A pose display device, characterized in that it is applied to a server in a cloud edge management system, the server comprising a three-dimensional visual map of a target scene, the device comprising: 获取模块,用于获取待测图像和自定位轨迹;其中,所述自定位轨迹是终端设备基于所述目标场景的目标图像和所述终端设备的运动数据确定,所述待测图像是所述目标图像包括的多帧图像中的部分图像;The acquisition module is used to acquire the image to be tested and the self-positioning track; wherein, the self-positioning track is determined by the terminal device based on the target image of the target scene and the motion data of the terminal device, and the image to be tested is the Part of the image in the multi-frame images included in the target image; 生成模块,用于基于所述待测图像和所述自定位轨迹生成终端设备在所述三维视觉地图中的融合定位轨迹,所述融合定位轨迹包括多个融合定位位姿;a generating module, configured to generate a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the image to be measured and the self-positioning trajectory, where the fusion positioning trajectory includes a plurality of fusion positioning poses; 显示模块,用于针对所述融合定位轨迹中的每个融合定位位姿,确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿。The display module is configured to, for each fusion positioning pose in the fusion positioning track, determine a target positioning pose corresponding to the fusion positioning pose, and display the target positioning pose. 12.根据权利要求11所述的装置,其特征在于,其中,所述生成模块基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹时具体用于:从所述三维视觉地图中确定出与所述待测图像对应的目标地图点,基于所述目标地图点确定所述终端设备在所述三维视觉地图中的全局定位轨迹;基于所述自定位轨迹和所述全局定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹;所述融合定位轨迹包括的融合定位位姿的帧率大于所述全局定位轨迹包括的全局定位位姿的帧率;所述融合定位轨迹包括的融合定位位姿的帧率等于所述自定位轨迹包括的自定位位姿的帧率;12. The apparatus according to claim 11, wherein, when the generating module generates the fusion positioning track of the terminal device in the three-dimensional visual map based on the image to be measured and the self-positioning track It is specifically used for: determining the target map point corresponding to the image to be measured from the three-dimensional visual map, and determining the global positioning track of the terminal device in the three-dimensional visual map based on the target map point; The self-positioning track and the global positioning track generate a fusion positioning track of the terminal device in the three-dimensional visual map; the frame rate of the fusion positioning pose included in the fusion positioning track is greater than the global positioning track included in the global positioning track. The frame rate of the positioning pose; the frame rate of the fused positioning pose included in the fusion positioning track is equal to the frame rate of the self-positioning pose included in the self-positioning track; 其中,所述三维视觉地图包括以下至少一种:样本图像对应的位姿矩阵、样本图像对应的样本全局描述子、样本图像中的特征点对应的样本局部描述子、地图点信息;所述生成模块从三维视觉地图中确定出与所述待测图像对应的目标地图点,基于所述目标地图点确定终端设备在三维视觉地图中的全局定位轨迹时具体用于:基于待测图像与三维视觉地图对应的多帧样本图像之间的相似度,从多帧样本图像中选取出候选样本图像;从待测图像中获取多个特征点;针对每个特征点,从所述候选样本图像对应的多个地图点中确定出与该特征点对应的目标地图点;基于所述多个特征点和所述多个特征点对应的目标地图点确定所述待测图像对应的三维视觉地图中的全局定位位姿;基于所有待测图像对应的全局定位位姿生成所述终端设备在所述三维视觉地图中的全局定位轨迹;The three-dimensional visual map includes at least one of the following: a pose matrix corresponding to the sample image, a sample global descriptor corresponding to the sample image, a sample local descriptor corresponding to a feature point in the sample image, and map point information; the generating The module determines the target map point corresponding to the image to be measured from the three-dimensional visual map, and is specifically used for determining the global positioning track of the terminal device in the three-dimensional visual map based on the target map point: based on the image to be measured and the three-dimensional visual map The similarity between the multi-frame sample images corresponding to the map, select candidate sample images from the multi-frame sample images; obtain a plurality of feature points from the image to be tested; for each feature point, from the candidate sample images corresponding to Determine the target map point corresponding to the feature point from the multiple map points; determine the global map in the three-dimensional visual map corresponding to the image to be tested based on the multiple feature points and the target map point corresponding to the multiple feature points Positioning poses; generating a global positioning trajectory of the terminal device in the three-dimensional visual map based on the global positioning poses corresponding to all the images to be tested; 其中,所述生成模块基于自定位轨迹和全局定位轨迹生成终端设备在所述三维视觉地图中的融合定位轨迹时具体用于:从所述自定位轨迹包括的所有自定位位姿中选取出与目标时间段对应的N个自定位位姿,并从所述全局定位轨迹包括的所有全局定位位姿中选取出与所述目标时间段对应的P个全局定位位姿;N大于P;基于所述N个自定位位姿和所述P个全局定位位姿确定N个自定位位姿对应的N个融合定位位姿,N个自定位位姿与N个融合定位位姿一一对应;基于N个融合定位位姿生成终端设备在三维视觉地图中的融合定位轨迹;Wherein, when the generating module generates the fusion positioning track of the terminal device in the three-dimensional visual map based on the self-positioning track and the global positioning track, it is specifically used for: selecting from all the self-positioning poses included in the self-positioning track N self-positioning poses corresponding to the target time period, and P global positioning poses corresponding to the target time period are selected from all the global positioning poses included in the global positioning track; N is greater than P; The N self-positioning poses and the P global positioning poses determine N fused positioning poses corresponding to the N self-positioning poses, and the N self-positioning poses are in one-to-one correspondence with the N fused positioning poses; The N fusion positioning poses generate the fusion positioning trajectory of the terminal device in the 3D visual map; 其中,所述显示模块确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿时具体用于:基于所述三维视觉地图与三维可视化地图之间的目标变换矩阵,将所述融合定位位姿转换为所述三维可视化地图中的目标定位位姿,并通过所述三维可视化地图显示所述目标定位位姿;其中,所述显示模块,还用于采用如下方式确定三维视觉地图与三维可视化地图之间的目标变换矩阵:针对多个标定点中的每个标定点,确定该标定点对应的坐标对,所述坐标对包括该标定点在所述三维视觉地图中的位置坐标和该标定点在所述三维可视化地图中的位置坐标;基于所述多个标定点对应的坐标对确定所述目标变换矩阵;或者,获取初始变换矩阵,基于所述初始变换矩阵将所述三维视觉地图中的位置坐标映射为所述三维可视化地图中的映射坐标,基于该映射坐标与所述三维可视化地图中的实际坐标的关系确定所述初始变换矩阵是否已收敛;若是,则将所述初始变换矩阵确定为所述目标变换矩阵;若否,则对所述初始变换矩阵进行调整,将调整后变换矩阵作为初始变换矩阵,返回执行基于所述初始变换矩阵将所述三维视觉地图中的位置坐标映射为所述三维可视化地图中的映射坐标的操作;或者,对所述三维可视化地图进行采样,得到与所述三维可视化地图对应的第一点云;以及,对所述三维视觉地图进行采样,得到与所述三维视觉地图对应的第二点云;采用ICP算法对所述第一点云和所述第二点云进行配准,得到所述三维视觉地图与三维可视化地图之间的目标变换矩阵。Wherein, the display module determines the target positioning pose corresponding to the fusion positioning pose, and when displaying the target positioning pose, it is specifically used for: based on the target transformation matrix between the three-dimensional visual map and the three-dimensional visualization map, Convert the fusion positioning pose into the target positioning pose in the three-dimensional visualization map, and display the target positioning pose through the three-dimensional visualization map; wherein, the display module is also used to determine in the following manner The target transformation matrix between the three-dimensional visual map and the three-dimensional visual map: for each calibration point in the plurality of calibration points, determine the coordinate pair corresponding to the calibration point, and the coordinate pair includes the calibration point in the three-dimensional visual map. and the position coordinates of the calibration point in the three-dimensional visual map; determine the target transformation matrix based on the coordinate pairs corresponding to the multiple calibration points; or obtain an initial transformation matrix, based on the initial transformation matrix The position coordinates in the three-dimensional visual map are mapped to the mapped coordinates in the three-dimensional visual map, and it is determined whether the initial transformation matrix has converged based on the relationship between the mapped coordinates and the actual coordinates in the three-dimensional visual map; if so, then Determine the initial transformation matrix as the target transformation matrix; if not, adjust the initial transformation matrix, use the adjusted transformation matrix as the initial transformation matrix, and return to execute the three-dimensional visual transformation based on the initial transformation matrix. The mapping of the position coordinates in the map is an operation of mapping coordinates in the three-dimensional visualization map; or, sampling the three-dimensional visualization map to obtain a first point cloud corresponding to the three-dimensional visualization map; sampling the visual map to obtain a second point cloud corresponding to the three-dimensional visual map; using the ICP algorithm to register the first point cloud and the second point cloud to obtain the three-dimensional visual map and the three-dimensional visualization map The target transformation matrix between .
CN202111350621.9A 2021-11-15 2021-11-15 Pose display method, device and system Pending CN114185073A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111350621.9A CN114185073A (en) 2021-11-15 2021-11-15 Pose display method, device and system
PCT/CN2022/131134 WO2023083256A1 (en) 2021-11-15 2022-11-10 Pose display method and apparatus, and system, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111350621.9A CN114185073A (en) 2021-11-15 2021-11-15 Pose display method, device and system

Publications (1)

Publication Number Publication Date
CN114185073A true CN114185073A (en) 2022-03-15

Family

ID=80540921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111350621.9A Pending CN114185073A (en) 2021-11-15 2021-11-15 Pose display method, device and system

Country Status (2)

Country Link
CN (1) CN114185073A (en)
WO (1) WO2023083256A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984377A (en) * 2022-12-21 2023-04-18 浙江大学 Visual positioning method and system for wide-area-oriented end edge cloud collaborative computing
WO2023083256A1 (en) * 2021-11-15 2023-05-19 杭州海康威视数字技术股份有限公司 Pose display method and apparatus, and system, server and storage medium
WO2024001849A1 (en) * 2022-06-28 2024-01-04 中兴通讯股份有限公司 Visual-localization-based pose determination method and apparatus, and electronic device
WO2024140962A1 (en) * 2022-12-30 2024-07-04 优奈柯恩(北京)科技有限公司 Method, apparatus and system for determining relative pose, and device and medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958773A (en) * 2023-08-02 2023-10-27 奥比中光科技集团股份有限公司 Point cloud fusion method and device and 3D printer

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105143821A (en) * 2013-04-30 2015-12-09 高通股份有限公司 Wide area localization from SLAM maps
CN106595659A (en) * 2016-11-03 2017-04-26 南京航空航天大学 Map merging method of unmanned aerial vehicle visual SLAM under city complex environment
US20200004266A1 (en) * 2019-08-01 2020-01-02 Lg Electronics Inc. Method of performing cloud slam in real time, and robot and cloud server for implementing the same
WO2020155616A1 (en) * 2019-01-29 2020-08-06 浙江省北大信息技术高等研究院 Digital retina-based photographing device positioning method
CN111738281A (en) * 2020-08-05 2020-10-02 鹏城实验室 Simultaneous positioning and mapping system, map soft switching method and storage medium
CN112115874A (en) * 2020-09-21 2020-12-22 武汉大学 Cloud-fused visual SLAM system and method
CN112945233A (en) * 2021-01-15 2021-06-11 北京理工大学 Global drift-free autonomous robot simultaneous positioning and map building method
CN113140040A (en) * 2021-04-26 2021-07-20 北京天地玛珂电液控制系统有限公司 Multi-sensor fusion coal mine underground space positioning and mapping method and device
CN113382365A (en) * 2021-05-21 2021-09-10 北京索为云网科技有限公司 Pose tracking method and device of mobile terminal

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8174568B2 (en) * 2006-12-01 2012-05-08 Sri International Unified framework for precise vision-aided navigation
CN107818592B (en) * 2017-11-24 2022-04-01 北京华捷艾米科技有限公司 Method, system and interactive system for collaborative synchronous positioning and map construction
CN114120301B (en) * 2021-11-15 2025-09-26 杭州海康威视数字技术股份有限公司 Method, device and apparatus for determining posture
CN114185073A (en) * 2021-11-15 2022-03-15 杭州海康威视数字技术股份有限公司 Pose display method, device and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105143821A (en) * 2013-04-30 2015-12-09 高通股份有限公司 Wide area localization from SLAM maps
CN106595659A (en) * 2016-11-03 2017-04-26 南京航空航天大学 Map merging method of unmanned aerial vehicle visual SLAM under city complex environment
WO2020155616A1 (en) * 2019-01-29 2020-08-06 浙江省北大信息技术高等研究院 Digital retina-based photographing device positioning method
US20200004266A1 (en) * 2019-08-01 2020-01-02 Lg Electronics Inc. Method of performing cloud slam in real time, and robot and cloud server for implementing the same
CN111738281A (en) * 2020-08-05 2020-10-02 鹏城实验室 Simultaneous positioning and mapping system, map soft switching method and storage medium
CN112115874A (en) * 2020-09-21 2020-12-22 武汉大学 Cloud-fused visual SLAM system and method
CN112945233A (en) * 2021-01-15 2021-06-11 北京理工大学 Global drift-free autonomous robot simultaneous positioning and map building method
CN113140040A (en) * 2021-04-26 2021-07-20 北京天地玛珂电液控制系统有限公司 Multi-sensor fusion coal mine underground space positioning and mapping method and device
CN113382365A (en) * 2021-05-21 2021-09-10 北京索为云网科技有限公司 Pose tracking method and device of mobile terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张剑华;王燕燕;王曾媛;陈胜勇;管秋;: "单目同时定位与建图中的地图恢复融合技术", 中国图象图形学报, no. 03, 16 March 2018 (2018-03-16) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023083256A1 (en) * 2021-11-15 2023-05-19 杭州海康威视数字技术股份有限公司 Pose display method and apparatus, and system, server and storage medium
WO2024001849A1 (en) * 2022-06-28 2024-01-04 中兴通讯股份有限公司 Visual-localization-based pose determination method and apparatus, and electronic device
CN115984377A (en) * 2022-12-21 2023-04-18 浙江大学 Visual positioning method and system for wide-area-oriented end edge cloud collaborative computing
WO2024140962A1 (en) * 2022-12-30 2024-07-04 优奈柯恩(北京)科技有限公司 Method, apparatus and system for determining relative pose, and device and medium

Also Published As

Publication number Publication date
WO2023083256A1 (en) 2023-05-19

Similar Documents

Publication Publication Date Title
US11860923B2 (en) Providing a thumbnail image that follows a main image
US10134196B2 (en) Mobile augmented reality system
US9699375B2 (en) Method and apparatus for determining camera location information and/or camera pose information according to a global coordinate system
US20200364509A1 (en) System and method for training a neural network for visual localization based upon learning objects-of-interest dense match regression
US9342927B2 (en) Augmented reality system for position identification
CN114185073A (en) Pose display method, device and system
CN111081199B (en) Selecting a temporally distributed panoramic image for display
Chen et al. Rise of the indoor crowd: Reconstruction of building interior view via mobile crowdsourcing
CN110617821B (en) Positioning method, positioning device and storage medium
US9551579B1 (en) Automatic connection of images using visual features
CN114120301B (en) Method, device and apparatus for determining posture
US9756260B1 (en) Synthetic camera lenses
CN114187344B (en) Map construction method, device and equipment
CN112991441A (en) Camera positioning method and device, electronic equipment and storage medium
Ayadi et al. A skyline-based approach for mobile augmented reality
JP6154759B2 (en) Camera parameter estimation apparatus, camera parameter estimation method, and camera parameter estimation program
US10878278B1 (en) Geo-localization based on remotely sensed visual features
CN114707392A (en) A local SLAM construction method, global SLAM construction method and construction device
Chang et al. Augmented reality services of photos and videos from filming sites using their shooting locations and attitudes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination