CN119295706A

CN119295706A - Optical perspective calibration method, system, electronic device and storage medium

Info

Publication number: CN119295706A
Application number: CN202411334132.8A
Authority: CN
Inventors: 崔海涛
Original assignee: Goolton Technology Co ltd
Current assignee: Goolton Technology Co ltd
Priority date: 2024-09-24
Filing date: 2024-09-24
Publication date: 2025-01-10

Abstract

A method, system, electronic device and storage medium for optical perspective calibration, relating to the field of augmented reality. In the method, the position and posture of a virtual calibration object on a display screen are adjusted according to a received first user voice, so that the virtual calibration object coincides with an actual calibration object in a physical space; an image of the actual calibration object is obtained according to a received second user voice, the first two-dimensional pixel coordinates of a first target corner point are obtained from the image, and the three-dimensional coordinates of the first target corner point are determined according to the first two-dimensional pixel coordinates; the second two-dimensional pixel coordinates of the second target corner point of the virtual calibration object on the display screen are obtained, and a projection matrix is calculated according to the three-dimensional coordinates and the second two-dimensional pixel coordinates; the projection matrix is used to convert points in the virtual space to the screen of an augmented reality head-mounted device. The technical solution provided in the present application is implemented to achieve personalized optical perspective calibration in a low-cost manner.

Description

Optical perspective calibration method, system, electronic equipment and storage medium

Technical Field

The application relates to the technical field of augmented reality, in particular to a method, a system, electronic equipment and a storage medium for optical perspective calibration.

Background

With advances in technology, augmented Reality (AR) technology has gained widespread attention and application. The augmented reality head-mounted device is used as an important carrier of AR technology, can fuse a virtual image with the real world, and provides immersive interactive experience. In order to achieve accurate fusion of the virtual image and the real world, accurate optical perspective calibration is required for the display system of the headset.

Traditional OST (Optical See Through, optical perspective) calibration simulates the relationship between human eyes and a screen by using a small-hole imaging model through a standard interpupillary distance and a standard wearing mode, and manually aligns marks in the screen with objects in a real space and constructs a calibration matrix. The manual OST calibration needs to be combined with the automatic OST calibration to effectively adapt to eyes of different people. For automatic OST calibration, microsoft-like Halolens glasses require the provision of a pupil tracking camera inside the glasses to automatically adjust the eye-screen parameters. This implementation of pupil tracking by adding two tracking cameras increases the power consumption cost of the device.

Therefore, how to perform optical perspective calibration with low cost becomes a problem to be solved in the technical development of the current augmented reality head-mounted equipment.

Disclosure of Invention

The application provides a method, a system, electronic equipment and a storage medium for optical perspective calibration, which solve the problem of high cost caused by the existing automatic optical perspective (OST) calibration technology, construct OST automatic calibration by using a large language model, and realize the automatic adaptation of OST internal parameters to different users through fuzzy instruction input.

In a first aspect of the present application, there is provided a method of optical perspective calibration applied to an augmented reality headset comprising a display screen, the method comprising:

According to the received first user voice, adjusting the position and the gesture of the virtual calibration object on the display screen so that the virtual calibration object coincides with an actual calibration object in a physical space;

Acquiring an image of the actual calibration object according to the received second user voice, acquiring a first two-dimensional pixel coordinate of a first target angular point from the image, and determining a three-dimensional coordinate of the first target angular point according to the first two-dimensional pixel coordinate;

acquiring a second two-dimensional pixel coordinate of a second target angular point of the virtual calibration object on the display screen, and calculating a projection matrix according to the three-dimensional coordinate and the second two-dimensional pixel coordinate, wherein the first target angular point and the second target angular point correspond to the same point in the actual calibration object;

points in virtual space are converted onto a screen of the augmented reality headset using the projection matrix.

By adopting the technical scheme, the position and the gesture of the virtual calibration object are adjusted through voice control, so that a user does not need to directly operate equipment or a physical interface, and the convenience and the comfort of use are greatly improved. The non-contact control mode is particularly suitable for operation when wearing the head-mounted equipment, and limits and interference of hand operation are reduced. The superposition of the actual calibration object and the virtual calibration object is indicated by using the voice of the user, so that the high-precision alignment in the calibration process is ensured. Meanwhile, the two-dimensional pixel coordinates of the corner points are extracted from the image of the actual calibration object, and the three-dimensional coordinates are calculated according to the two-dimensional pixel coordinates, so that the calibration accuracy is further improved. The method can automatically calculate the projection matrix according to the two-dimensional and three-dimensional coordinates of the corresponding angular points of the virtual calibration object and the actual calibration object. Projection matrices are key parameters connecting the virtual world with the physical world, whose exact computation is critical for the correct rendering and display of subsequent virtual content. The calculated projection matrix is used for converting the points in the virtual space to the screen of the augmented reality head-mounted device, so that the virtual content can be accurately and seamlessly fused into the physical environment. This not only promotes the user's sense of immersion, but also makes the presentation of virtual information in physical space more natural and intuitive. The application is suitable for different types of actual calibration objects and virtual calibration objects, and has stronger flexibility and adaptability. The calibration process can be completed rapidly through simple voice instruction and image recognition, and complex setting or adjustment is not needed. As an innovative calibration method, it has driven further development of augmented reality technology. By optimizing the calibration flow and improving the calibration precision, more stable and reliable technical support is provided for AR application, and the application and popularization of AR technology in more fields are facilitated.

Optionally, the method includes:

Identifying an actual calibration object in a physical space by using a first detection algorithm to obtain first data of each point in a camera coordinate system, wherein the first data comprises positions and directions;

Converting the first data into second data in a display screen coordinate system, and positioning a virtual calibration object according to the second data; rendering the virtual calibration object by using a drawing command of OpenGL, and presenting a rendering result on the display screen;

and positioning the position of the actual calibration object in the physical space through PnP.

By adopting the technical scheme, the first detection algorithm (such as an algorithm based on image processing or a machine learning model) is used for automatically identifying the actual calibration object in the physical space, and the position and direction data (first data) of the actual calibration object in the camera coordinate system are acquired. This reduces the need for manual intervention and improves the degree of automation of the calibration process. The first data (position and orientation in the camera coordinate system) are converted into second data in the display screen coordinate system, and the virtual calibration object is accurately positioned according to the second data. The conversion ensures that the virtual calibration object can accurately correspond to the actual calibration object in the physical space, and the calibration accuracy is improved. Rendering the virtual calibration object by using drawing commands of OpenGL, and presenting rendering results on a display screen of the augmented reality headset in real time. OpenGL provides powerful graphics rendering capabilities that can ensure that virtual calibrations are presented to users with high quality visual effects. The accurate position of the actual calibration object in the physical space is further positioned through a PnP (PERSPECTIVE-n-Point) algorithm. The PnP algorithm is an effective camera pose estimation algorithm that can estimate the position and pose of a camera with a small number of point correspondences. This helps to more accurately track and locate the actual calibration object in a dynamic or complex environment. By reducing manual intervention, improving calibration accuracy and rendering high-quality virtual calibration objects in real time, the methods remarkably optimize user experience. The user can quickly complete calibration and begin to enjoy the immersive experience brought by the augmented reality by simply wearing the head-mounted device and sending out voice instructions.

Optionally, the positioning the position of the actual calibration object in the physical space through PnP includes:

calibrating a camera of the augmented reality head-mounted device to obtain an internal reference matrix of the camera;

Matching the characteristic points in the image with the three-dimensional characteristic points on the actual calibration object, and calculating a first gesture of the camera according to the matched characteristic point pairs by using a PnP algorithm, wherein the first gesture comprises a first rotation matrix and a first translation vector;

And determining the position of the calibration object in the physical space according to the gesture.

By adopting the technical scheme, the internal reference matrix (comprising the focal length, the optical center and the like) of the camera is obtained by calibrating the camera of the augmented reality head-mounted equipment, which is the basis for accurate three-dimensional positioning. The accuracy of subsequent positioning calculation can be remarkably improved by accurately acquiring the internal reference matrix. And matching the characteristic points in the image with the three-dimensional characteristic points on the actual calibration object by using a PnP algorithm, and calculating the pose (comprising a rotation matrix and a translation vector) of the camera. The PnP algorithm has better robustness when processing noise, partial shielding or missing of characteristic points and the like, and can ensure that the camera gesture can still be accurately calculated in a complex environment. The PnP algorithm generally has high efficiency in calculating the camera pose, and can complete matching of a large number of feature points and pose calculation in a short time. This makes the application significantly advantageous in augmented reality applications where real-time requirements are high. According to the calculated camera pose (the first rotation matrix and the first translation vector), the position of the calibration object in the physical space can be accurately determined. This is critical to the task of subsequent virtual content rendering, spatial mapping, etc., and can ensure precise alignment of virtual elements with the physical environment. Accurate calibration object positioning can promote sense of realism and immersion of augmented reality application. The user can feel the seamless fusion of the virtual element and the physical environment, so that a more natural and smooth experience is obtained. The application is also applicable to dynamic environments because the PnP algorithm can calculate camera pose in real time. When the camera or the calibration object moves, the system can quickly update the attitude information and reposition the calibration object, so that the continuous and accurate display of the virtual element is ensured.

Optionally, the adjusting the position and the posture of the virtual calibration object on the display screen according to the received voice of the first user includes:

Converting the first user voice into instructions or parameters through a large language model, and calculating a new state of the virtual calibration object according to the instructions or parameters;

applying the new state to a state model of the calibration object to update the position and the posture of the virtual calibration object in the virtual space; the updated position and pose are rendered to the display screen using a graphics rendering engine.

By adopting the technical scheme, the user voice is converted into the instruction or the parameter through the large language model, and the step enables the user to interact with the system in a natural language mode without learning a specific command or gesture. The natural interaction mode not only improves the user experience, but also reduces the learning cost of the user. The large language model can understand and analyze complex voice instructions, including various adjustment requirements of position, posture, size and the like. This requires a high degree of intelligence in the system, enabling accurate capture of the user's intent and calculation of the new state of the virtual calibration. After receiving the user voice command, the system can rapidly calculate the new state of the virtual calibration object and apply the new state to the state model of the calibration object. This process is performed in real time, and the user can see the adjusted effect almost immediately, thereby improving the response speed and dynamic adjustment capability of the system. And the updated virtual calibration object position and posture are rendered on a display screen by using a graphic rendering engine, so that the accuracy and the authenticity of a rendering result are ensured. The graphics rendering engine is capable of handling complex graphics transformations and lighting effects, making the presentation of virtual calibrations on the screen more realistic and vivid. The position and the gesture of the virtual calibration object are adjusted through voice control, so that the calibration process is greatly simplified. The user does not need to manually adjust the parameters on the equipment or the screen, and can complete the calibration work only through voice, so that the calibration efficiency and convenience are improved. The method is not only suitable for static calibration scenes, but also can be used for real-time adjustment in dynamic environments. When the physical space or the camera position changes, the position and the gesture of the virtual calibration object can be updated rapidly, and continuous alignment of the virtual element and the physical environment is ensured. The technology of a large language model, a state model, a graphic rendering engine and the like are combined, and a brand new multi-mode interaction mode is provided for augmented reality application. The interaction mode is not only limited to voice, but also can be expanded to various interaction modes such as gestures, eye movements and the like, and provides beneficial exploration and reference for the development of the future intelligent interaction technology.

Optionally, the acquiring the first two-dimensional pixel coordinate of the first target angular point from the image, and determining the three-dimensional coordinate of the first target angular point according to the first two-dimensional pixel coordinate includes:

Identifying the boundary of an actual calibration object in the image by using a second detection algorithm, and acquiring a first two-dimensional pixel coordinate of a first target angular point, wherein the first two-dimensional pixel coordinate comprises a row coordinate and a column coordinate;

and obtaining a second posture of the camera under the world coordinate system according to the first two-dimensional pixel coordinate and the size of the actual calibration object through solvePnP algorithm, wherein the second posture comprises a second rotation matrix and a second translation vector, and the first two-dimensional pixel coordinate is converted into a three-dimensional coordinate through the second posture.

By adopting the technical scheme, the boundary of the actual calibration object in the image is identified by using a second detection algorithm (such as edge detection, contour detection and the like), and the two-dimensional pixel coordinates (including row coordinates and column coordinates) of the first target corner point are accurately extracted. The process ensures that the position information of the angular points in the image is accurate and free, and provides a reliable basis for the subsequent three-dimensional coordinate calculation. By solvePnP algorithm, according to the known dimension (such as side length and shape of the calibration plate) of the actual calibration object and the two-dimensional pixel coordinates of the corner points detected in the image, the pose of the camera under the world coordinate system can be accurately calculated, including the rotation matrix and the translation vector. This step is the key to connect the two-dimensional image information with the three-dimensional space information, providing the necessary transformation matrix for converting the two-dimensional coordinates into three-dimensional coordinates. After the pose of the camera is obtained, the two-dimensional pixel coordinates in the image are converted into three-dimensional coordinates using the pose. The process realizes the accurate mapping from the two-dimensional image to the three-dimensional space, so that the angular point information originally existing in the image can be reflected in the three-dimensional space, and powerful support is provided for the subsequent three-dimensional modeling, target tracking, scene reconstruction and other applications. The flow is based on the detection and calculation of the actual calibration object in the image, so that the flow can adapt to the influence of complex environmental factors such as illumination change, visual angle change and the like to a certain extent. This makes the technique more robust and adaptable in practical applications. The method is not only suitable for angular point detection and three-dimensional reconstruction of static images, but also can be expanded to the fields of dynamic target tracking, three-dimensional motion analysis and the like in video sequences. Meanwhile, the application scene and the functions of the method can be further expanded by combining other computer vision technologies (such as feature matching, image splicing and the like).

Optionally, the calculating a projection matrix according to the three-dimensional coordinates and the second two-dimensional pixel coordinates includes:

determining a third rotation matrix and a third translation vector of the camera by direct linear transformation using the plurality of feature points;

Calculating an estimated position of a target feature point under a camera coordinate system, and projecting the estimated position onto an image plane to obtain estimated pixel coordinates, wherein the target feature point is any one of the feature points;

calculating the reprojection errors between the two-dimensional pixel coordinates of the target feature points and the estimated pixel coordinates, and taking the sum of squares of the reprojection errors of all the feature points as an objective function;

iteratively updating parameters between the eye and the display screen by linearizing the objective function and solving a linear least squares problem until the amount of change in the objective function is less than a threshold;

And calculating a projection matrix according to the updated parameters between the eyes and the display screen.

By adopting the technical scheme, the third rotation matrix and the third translation vector of the camera are preliminarily determined by using a plurality of characteristic points through Direct Linear Transformation (DLT), and an initial estimate which is closer to a true value is provided for subsequent calculation. Then, through iterative optimization process, the re-projection error between the two-dimensional pixel coordinates and the estimated pixel coordinates of the target feature point is continuously reduced until a certain convergence condition is met (i.e. the variation of the objective function is smaller than the threshold value). The application can obviously improve the accuracy of the projection matrix and enable the projection matrix to be more approximate to the actual situation. Because of various interference factors such as illumination change, shielding, noise and the like in the actual environment, errors may exist in the directly acquired characteristic point information. However, by iterative optimization and calculation of the re-projection errors, the errors can be automatically adjusted and corrected, so that the system can adapt to complex and changeable environmental conditions, and the overall robustness is improved. In augmented reality, accurate target tracking and localization is critical. The projection matrix calculated by the method can accurately map the target in the three-dimensional space to the two-dimensional image plane, and high-precision target tracking and positioning are realized. This is important to improve user experience and system performance. The application is based on the detection and matching of the feature points, so that the application can be easily expanded to different application scenes. The present application can be applied to calculate a projection matrix as long as sufficient feature point information can be acquired, whether in an indoor environment or an outdoor environment. Although the iterative optimization process requires a certain computing resource, the computing burden can be reduced while the accuracy is ensured by reasonably setting the iteration times and the convergence threshold. In addition, with the continuous improvement of the hardware performance of the computer, the real-time performance of the application is further improved.

Optionally, the calculating the projection matrix according to the updated parameters between the eyes and the display screen includes:

Constructing a projection internal parameter according to the focal length and the optical center between the eyes and the display screen, and constructing a projection external parameter according to the rotation parameter and the translation parameter from the hardware coordinate system to the eye coordinate system;

And calculating according to the projection internal parameters and the projection external parameters to obtain the projection matrix.

By adopting the technical scheme, the relative position and posture relation between eyes and a display screen can be accurately described by respectively constructing the internal parameters (focal length and optical center) and the external parameters (rotation parameters and translation parameters from a hardware coordinate system to an eye coordinate system) of projection. These parameters are the key to computing the projection matrix, directly determining the accuracy and authenticity of the virtual content as it is presented on the display screen. The accurate projection matrix ensures that virtual content is presented on the display screen at the correct viewing angle, position and scale, thereby providing a high quality visual effect. This is particularly important for applications such as Augmented Reality (AR), virtual Reality (VR), etc., which can significantly enhance the user's immersive and interactive experience. The application has wide applicability since the calculation of the projection matrix is based on physical parameters between the eye and the display screen. Whether different types of display devices (e.g., head mounted displays, projectors, etc.) or different users (having different eye positions and focal lengths) can be calculated by adjusting the corresponding parameters to obtain a suitable projection matrix. In practice, the relative position and attitude between the eyes and the display screen changes as the user may move, adjust the display device, or perform other operations. By updating the parameters between the eyes and the display screen and recalculating the projection matrix, the presentation effect of the virtual content can be flexibly adjusted, and the system is ensured to be always in an optimal state. Although the calculation of the projection matrix involves a plurality of parameters and complex transformation relationships, once the initial calibration is completed and the internal and external parameters are determined, the projection matrix can be calculated by simple matrix multiplication. This helps to reduce the computational complexity of the system and improve real-time performance.

In a second aspect of the present application, a system for optical perspective calibration is provided, including an adjustment module, a coordinate module, a calculation module, and a display module, where:

The adjusting module is configured to adjust the position and the gesture of the virtual calibration object on the display screen according to the received voice of the first user so that the virtual calibration object is overlapped with the actual calibration object in the physical space;

The coordinate module is configured to acquire an image of the actual calibration object according to the received voice of the second user, acquire a first two-dimensional pixel coordinate of a first target angular point from the image, and determine a three-dimensional coordinate of the first target angular point according to the first two-dimensional pixel coordinate;

The computing module is configured to acquire a second two-dimensional pixel coordinate of a second target angular point of the virtual calibration object on the display screen, and compute a projection matrix according to the three-dimensional coordinate and the second two-dimensional pixel coordinate, wherein the first target angular point and the second target angular point correspond to the same point in the actual calibration object;

A display module configured to convert points in virtual space onto a screen of the augmented reality headset using the projection matrix.

In a third aspect the application provides an electronic device comprising a processor, a memory for storing instructions, a user interface and a network interface, both for communicating with other devices, the processor being for executing instructions stored in the memory to cause the electronic device to perform a method as claimed in any one of the preceding claims.

In a fourth aspect of the application there is provided a computer readable storage medium storing instructions which, when executed, perform a method as claimed in any one of the preceding claims.

In summary, one or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages:

1. The position and the gesture of the virtual calibration object are controlled through the voice of the user, so that the calibration process is more visual and convenient, the user can quickly complete the calibration through voice instructions without complex operation or professional knowledge, and the user experience is greatly improved;

3. the user can finely adjust the virtual calibration object through the voice instruction until the optimal superposition effect is achieved, and compared with the traditional manual adjustment, the mode is more accurate, and human errors are reduced;

4. The three-dimensional coordinates of the angular points can be accurately calculated by acquiring the two-dimensional pixel coordinates of the angular points from the image of the actual calibration object and combining with the gesture information of the camera, so that accurate data support is provided for the calculation of the subsequent projection matrix, and the calibration precision is further improved;

5. the application supports receiving the user voice command in real time and adjusting the virtual calibration object, so that the calibration process can be completed rapidly without waiting, which is particularly important for the augmented reality application needing to adapt to different environments or scenes rapidly;

6. No matter what environment the user is in, the user can calibrate only by clearly seeing the actual calibration object and giving out a voice instruction. The application has strong adaptability and flexibility, and can meet the requirements of different users;

7. By precisely calculating the projection matrix, the presentation effect of the virtual content on the display screen can be optimized, and unnecessary calculation and rendering burden can be reduced. This helps to improve the overall performance and response speed of the system;

8. the calculated projection matrix is used for converting the points in the virtual space to the screen of the augmented reality head-mounted equipment, so that the virtual content can be ensured to be presented in the correct view angle, position and proportion, and the display effect and the user experience are improved.

Drawings

FIG. 1 is a flow chart of a method of optical perspective calibration disclosed in an embodiment of the present application;

FIG. 2 is a schematic diagram of AprilTag disclosed in an embodiment of the present application;

FIG. 3 is a block diagram of a system for optical perspective calibration as disclosed in an embodiment of the present application;

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Reference numerals illustrate 301, an adjustment module, 302, a coordinate module, 303, a calculation module, 304, a display module, 401, a processor, 402, a communication bus, 403, a user interface, 404, a network interface, 405, and a memory.

Detailed Description

In order that those skilled in the art will better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments.

In describing embodiments of the present application, words such as "for example" or "for example" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "such as" or "for example" in embodiments of the application should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "or" for example "is intended to present related concepts in a concrete fashion.

In the description of embodiments of the application, the term "plurality" means two or more. For example, a plurality of systems means two or more systems, and a plurality of screen terminals means two or more screen terminals. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating an indicated technical feature. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The embodiment discloses a method for calibrating optical perspective, which is applied to augmented reality head-mounted equipment, and fig. 1 is a schematic flow chart of the method for calibrating optical perspective disclosed by the embodiment of the application, as shown in fig. 1, and the method comprises the following steps:

s110, adjusting the position and the gesture of a virtual calibration object on a display screen according to the received first user voice so that the virtual calibration object is overlapped with an actual calibration object in a physical space;

The user's voice (i.e., the first user voice) is obtained, such as "the calibrant is far from some", "the calibrant is too far, let it come back some" or "rotate backwards". The augmented reality headset converts it into text commands through a speech large model. The augmented reality headset analyzes the converted text command by natural language processing technology, and recognizes the intention (such as movement and rotation) of the user and specific action parameters (such as distance, back and backward rotation). And according to the analysis result, calculating the direction and the amplitude of the spatial transformation (translation or rotation) required to be performed by the virtual calibration object. Based on the calculated spatial transformation information, the position and the posture of the virtual calibration object displayed on the screen are dynamically adjusted by the three-dimensional graphics rendering engine. This process is performed in real time, ensuring that the change in the virtual calibration is almost synchronized with the issuance of the user's voice command. The augmented reality headset provides visual feedback on the display screen, allowing the user to intuitively see the change of the virtual calibration on the screen. The user can compare the image on the display screen with the actual calibration object, and further fine-tune the position of the virtual calibration object through the voice instruction until the two are completely overlapped in vision. The user can continue to fine tune the virtual calibration object through the voice instruction until a satisfactory calibration effect is achieved. The augmented reality headset can also record the operation habit of a user, optimize the accuracy of voice recognition and natural language processing, and improve the efficiency and the authenticity of three-dimensional graphic rendering.

Optionally, the method includes:

FIG. 2 is a schematic diagram of AprilTag disclosed in an embodiment of the present application, wherein AprilTag is used as the actual calibration object in the embodiment of the present application. The AprilTag two-dimensional marker with specific codes is printed out and accurately placed at specific positions in a fixed physical space which needs calibration. An image in physical space is captured with an integrated camera on the headset and a specific detection algorithm (e.g., the OpenCV-based AprilTag detection algorithm) is run to identify AprilTag markers in the image. The algorithm can analyze the coding information of each calibration object and the position and the direction of the coding information in a camera coordinate system, namely the first data. The first data (including position and orientation) in the camera coordinate system is converted into second data in the display screen coordinate system by the coordinate transformation matrix. This step is critical because it ensures that the virtual calibration object can be displayed at the correct viewing angle and position on the screen of the head-mounted display, corresponding to the physical calibration object. And determining the rendering position and the rendering direction of the virtual calibration object on the display screen of the head-mounted equipment according to the converted second data. Thus, the user can see a virtual image corresponding to the actual calibration in the physical space in the VR environment. And rendering the virtual calibration object on a display screen of the head-mounted device according to the position, the direction and the size information of the virtual calibration object by using the graphic drawing command of OpenGL. OpenGL provides rich graphics rendering functions, and can efficiently generate high-quality images, so as to ensure smoothness and authenticity of user visual experience. And the rendered virtual calibration object is presented on the screen of the head-mounted device in real time, so that a user can intuitively see and interact with the virtual calibration object in the VR environment. And (3) further accurately calculating the absolute position and direction of the actual calibration object in the physical space through a PnP (PERSPECTIVE-n-Point) algorithm while the virtual calibration object is rendered and displayed. The PnP algorithm utilizes the geometric relationship between a plurality of feature points (in this example, aprilTag's code points) on the calibration object and the camera imaging to achieve high-precision spatial positioning. This step not only improves the accuracy of positioning, but also enhances the stability and reliability of the VR system.

The actual calibration object in the physical space is accurately identified through a first detection algorithm, and the accurate position and direction information (namely, first data) of the actual calibration object in the camera coordinate system is acquired. This provides a reliable basis for subsequent spatial transformations and positioning of virtual calibration objects. And further accurately calculating the absolute position and direction of the actual calibration object in the physical space by using a PnP algorithm. The PnP algorithm realizes high-precision space positioning by matching the geometric relationship between a plurality of characteristic points on the calibration object and the imaging of the camera, and enhances the stability and reliability of the system. And converting the first data in the camera coordinate system into the second data in the display screen coordinate system in real time, and rapidly positioning the virtual calibration object according to the second data. This process ensures that the virtual calibration object can be displayed at the correct viewing angle and position on the head-mounted screen, in synchronism with the physical calibration object. And rendering the virtual calibration object efficiently by using drawing commands of OpenGL, and displaying a rendering result on a head display screen in real time. OpenGL provides powerful graphics rendering capabilities, enabling high quality images to be generated, guaranteeing the fluency and realism of the user's visual experience. By displaying virtual calibrations in the VR environment that correspond to the actual calibrations in the physical space, the user can intuitively see and interact with them. The visual display mode reduces the learning cost of the user and improves the convenience and naturalness of interaction. The user can adjust the position and the gesture of the virtual calibration object through voice or other interaction modes so as to adapt to different scenes and requirements. This flexibility enables the system to better adapt to different application scenarios and user habits.

And calibrating the camera of the AR headset. This is typically done by taking a series of images of a known pattern (e.g., a checkerboard) and using these images to calculate the internal parameters of the camera (the reference matrix). The internal reference matrix contains key information such as focal length and optical center of the camera, and is the basis for subsequent three-dimensional reconstruction and positioning. An internal reference matrix of the camera is obtained, which matrix is to be used in subsequent PnP calculations to convert the two-dimensional pixel coordinates in the image into three-dimensional coordinates in the camera coordinate system. An image processing algorithm (such as AprilTag detector in OpenCV) is used to detect AprilTag in the image and extract its codes and corner points as feature points. Meanwhile, corresponding three-dimensional feature points are generated in the memory according to the known size and pattern of AprilTag. These three-dimensional feature points represent the actual position of the calibration object in physical space. And matching the two-dimensional characteristic points with the three-dimensional characteristic points in the image. This matching process is generally straightforward and accurate since AprilTag has unique coding and fixed patterns. And taking the matched two-dimensional characteristic point coordinates and the corresponding three-dimensional characteristic point coordinates as input of a PnP algorithm. A first pose of the camera is calculated from these matched pairs of feature points using a PnP algorithm (e.g., EPnP, UPnP, etc.). This pose includes a rotation matrix (representing the orientation of the camera) and a translation vector (representing the position of the camera relative to a certain reference frame) of the camera. A first rotation matrix and a first translation vector of the camera are obtained, which together describe the pose of the camera when capturing an image. Since the absolute position of AprilTag markers in physical space (represented by their three-dimensional feature points) is known and the pose of the camera relative to these feature points has been calculated, the position of the markers relative to the camera can be easily calculated. The position of the calibration object in the camera coordinate system or the world coordinate system can be obtained through simple coordinate transformation by utilizing the gesture (rotation matrix and translation vector) of the camera and the three-dimensional characteristic points of the calibration object. The accurate position of the calibration object in the physical space is finally obtained, and the information can be used for subsequent AR rendering, space positioning or interactive operation.

The PnP algorithm can accurately calculate the pose (including the rotation matrix and the translation vector) of the camera by matching the two-dimensional feature points in the image with the three-dimensional feature points on the calibration object. The matching and calculating mode based on the multiple feature points has higher positioning precision compared with a method of single feature points or simple geometric relations. Because a plurality of characteristic points are used for matching and calculating, the PnP algorithm has stronger robustness to image noise, partial shielding or slight deformation and other interference factors. This makes the algorithm more reliable and stable in practical applications. With the continuous development of computer vision and computer hardware technology, the calculation efficiency of the PnP algorithm is remarkably improved. Modern algorithms can accomplish feature point matching and pose computation in a short time, thereby achieving real-time or near real-time spatial localization. Because the PnP algorithm can calculate the gesture of the camera in real time, the position information of the calibration object in the physical space can be updated in time by dynamically responding to the movement and rotation of the camera. This is particularly important for application scenarios requiring high precision dynamic tracking. In Augmented Reality (AR) applications, virtual information can be accurately overlaid into a real-world scene through PnP algorithms. The method has important significance for improving user experience and enhancing reality sense.

The received first user speech is passed to a speech recognition unit of the headset, which converts the speech signal into text. The converted text is then fed into a large language model (e.g., GPT series, BERT, etc.) for processing. The large language model uses deep learning techniques to understand the meaning of text and resolve it into specific instructions or parameters. These instructions or parameters may include distance moved, angle of rotation, scale of scaling, etc. to describe the new state of the virtual calibration. The parsed instructions or parameters are sent to a state computation unit. The state calculating unit calculates the new state of the virtual calibration object according to the current virtual calibration object state (such as position, gesture, size and the like) and the instruction or parameter input by the user. For example, if the user says "move the calibration object forward by 5 cm and rotate clockwise by 30 degrees", the state calculation unit calculates a new position and posture of the calibration object based on these parameters. The calculated new state is applied to the state model of the virtual calibration. The state model is a data structure for storing and representing all state information (e.g., position, pose, size, color, etc.) of the virtual calibration object. After updating the state model, the representation of the virtual calibration object in the virtual space changes accordingly. The updated virtual marker state is passed to the graphics rendering engine. The graphics-rendering engine is responsible for converting the three-dimensional model in virtual space (including the updated virtual calibration) into a two-dimensional image that can be displayed on a display screen. In the rendering process, the rendering engine considers various factors such as illumination, shadow, texture and the like to generate a realistic visual effect. Finally, the updated position and posture of the virtual calibration object are presented on a display screen in the form of images for viewing and interaction by a user.

The user can control the objects in the virtual environment through natural voice instructions without manually operating a mouse, keyboard or other physical controller. The interaction mode is more visual and convenient, and the immersion feeling and interaction experience of the user are enhanced. For complex or fine virtual environment operation, voice control can rapidly and accurately convey the intention of a user, and tedious and error possibly caused by manual operation is avoided. Particularly, in a scene requiring frequent adjustment of the position and the posture of the virtual calibration object, the voice control can remarkably improve the operation efficiency. The application can understand and respond diversified voice instructions by converting the voice into instructions or parameters through a large language model, thereby meeting the personalized requirements of different users. Either simple position movement or complex gesture adjustment can be achieved by voice commands. In applications such as Virtual Reality (VR) or Augmented Reality (AR), position and attitude adjustment of virtual calibrations is a common interaction requirement. The interactive and immersive effects of the virtual reality can be further enhanced by the voice control, so that the user can interact with the virtual environment more naturally.

S120, acquiring an image of the actual calibration object according to the received second user voice, acquiring a first two-dimensional pixel coordinate of a first target angular point from the image, and determining a three-dimensional coordinate of the first target angular point according to the first two-dimensional pixel coordinate;

The user's voice (i.e., the second user's voice) is acquired, and the second user's voice may be to acquire an image of the actual calibration object, or more specifically, to acquire the three-dimensional coordinates of a specific corner point on the actual calibration object. The system converts the second user speech into executable commands or parameters by a large language model or specialized speech recognition techniques. And acquiring an image of the actual calibration object through a camera of the head-mounted equipment according to the analyzed voice instruction. After the image is acquired, the headset may use image processing techniques to identify and analyze the image content. This typically includes edge detection, feature extraction, corner detection, etc. In this process, the headset will pay special attention to each target corner (illustrated with the first target corner). Through an image processing algorithm, the headset can identify the position of the first target corner in the image and record its two-dimensional pixel coordinates (i.e., x, y coordinates in the image coordinate system). With the two-dimensional pixel coordinates of the first target corner, the headset needs to further use other information to determine the coordinates of the first target corner in three-dimensional space. This typically involves camera internal and external parameters (e.g., focal length, optical center position, rotational and translational relationships of the camera to the world coordinate system, etc.), and possibly depth information (e.g., acquired by means of binocular stereo vision, structured light, toF sensors, etc.). The headset may convert the two-dimensional pixel coordinates to coordinates in a three-dimensional world coordinate system through a camera model and a three-dimensional reconstruction algorithm. This conversion process may involve mathematical methods such as the inverse of the perspective projection, matrix operations, etc. Finally, the headset outputs the three-dimensional coordinates of the calculated first target corner to a user or is used for subsequent processing.

And identifying the boundary of the actual calibration object in the image by using a second detection algorithm. The second detection algorithm may be based on edge detection, contour extraction, template matching, or deep learning, among other methods. After identifying the boundary of the calibration object, the headset further analyzes the boundary to locate the first target corner point. This usually involves corner detection of the boundary or the inference of the position of the first target corner from the known geometry of the calibration object (e.g. rectangular, circular, etc.). When the first target corner is found, the headset records its two-dimensional pixel coordinates in the image, including row and column coordinates. SolvePnP (PERSPECTIVE-n-Point) is a computer vision algorithm for computing the pose of a camera by matching n two-dimensional points in an image to their corresponding points in three-dimensional space. This pose includes a rotation matrix and translation vectors of the camera, which describe the position and orientation of the camera in the world coordinate system. In this scenario, the input of the solvePnP algorithm consists of two main parts, namely first two-dimensional pixel coordinates of the first target corner obtained from the image, and second a three-dimensional model of the actual calibration object or at least its dimensional information. The latter allows the algorithm to know the location of the corresponding corner point in three-dimensional space. Through solvePnP algorithm, the headset can calculate the second pose of the camera under the world coordinate system according to the input two-dimensional pixel coordinates and three-dimensional size information. This pose is represented in the form of a second rotation matrix and a second translation vector, which together describe the exact position and orientation of the camera when capturing the image. With the second pose of the camera, the headset may use this pose to convert the two-dimensional pixel coordinates in the image (i.e., the coordinates of the first target corner) to coordinates in three-dimensional space. This conversion process typically involves the inverse of the perspective projection, taking into account the camera's internal matrix (focal length, optical center, etc.) and external matrix (rotation matrix and translation vector). Finally, the headset outputs the coordinates of the first target corner in three-dimensional space. The three-dimensional coordinates of the first object corner point in the application are (x, y, z).

And accurately identifying the boundary of the actual calibration object in the image and the position of the first target corner by using a second detection algorithm (such as edge detection, corner detection and the like), and acquiring high-precision first two-dimensional pixel coordinates (comprising row coordinates and column coordinates). This provides an accurate data basis for subsequent three-dimensional coordinate calculations. Through solvePnP algorithm, the second pose (including the second rotation matrix and the second translation vector) of the camera under the world coordinate system can be calculated by combining the known actual calibration object size and the two-dimensional pixel coordinates in the image. This pose information reflects the precise position and orientation of the camera relative to the world coordinate system, thereby ensuring the accuracy of three-dimensional coordinate conversion. The whole process realizes the automatic processing from image acquisition to three-dimensional coordinate calculation without manual intervention. This greatly improves processing efficiency and reduces human error. In a real-time application scene (such as augmented reality, robot navigation and the like), the process can quickly respond and output three-dimensional coordinate information, and the real-time requirement is met. The process is not only suitable for specific actual calibration objects, but also can adapt to calibration objects with different shapes, sizes and materials by adjusting parameters of a detection algorithm and a solvePnP algorithm. This increases the flexibility and versatility of the process. Meanwhile, as solvePnP algorithm is one of algorithms widely used in the field of computer vision, the compatibility and the expandability are also strong, and the solvePnP algorithm can be seamlessly integrated with other computer vision algorithms or systems. In applications such as augmented reality, a more real and natural virtual and reality fusion effect can be achieved by accurately calculating the three-dimensional coordinates of an actual calibration object. This helps to enhance the user's immersion and quality of experience. In the scenes of robot navigation and the like, accurate three-dimensional coordinate information can help the robot to more accurately sense and understand the environment, so that more reasonable decisions and actions can be made.

S130, acquiring a second two-dimensional pixel coordinate of a second target angular point of the virtual calibration object on the display screen, and calculating a projection matrix according to the three-dimensional coordinate and the second two-dimensional pixel coordinate, wherein the first target angular point and the second target angular point correspond to the same point in the actual calibration object;

And finding a second target corner corresponding to the first target corner on the virtual calibration object, wherein the specific position of the second target corner on the screen, namely the second two-dimensional pixel coordinate, is usually calculated through the resolutions of the graphic rendering engine and the display device. The projection matrix is a matrix that converts three-dimensional coordinates into two-dimensional screen coordinates. In AR/MR/VR applications, in order to accurately place virtual elements at specific locations in the real world, a projection matrix is calculated that reflects the camera view, focal length, position, etc. The projection matrix can be solved using mathematical methods, such as the inverse of perspective projection, by knowing the three-dimensional coordinates of the actual calibration object (the three-dimensional coordinates of the first target corner) and the second two-dimensional pixel coordinates of the virtual calibration object on the display screen. This matrix ensures that the second target corner on the virtual calibration object corresponds exactly to the position of the first target corner of the actual calibration object when it is rendered onto the display screen. The second two-dimensional pixel coordinate of the second target corner in the present application is (u, v).

And selecting a plurality of corresponding characteristic points from the actual calibration object and the virtual calibration object. These feature points may be points that are easily identified and tracked, such as corner points, edge intersections, and the like. The third rotation matrix and the third translation vector of the camera can be estimated initially by a Direct Linear Transformation (DLT) algorithm using the three-dimensional coordinates of the feature points and their two-dimensional pixel coordinates on the display screen. These parameters describe the position and attitude of the camera relative to a world or reference coordinate system. And converting the three-dimensional coordinates of each target feature point to an estimated position under a camera coordinate system by using the rotation matrix and the translation vector obtained in the last step. These estimated positions are then projected onto the image plane by a projection model of the camera (e.g. perspective projection) to obtain corresponding estimated pixel coordinates. For each feature point, the difference between its actual two-dimensional pixel coordinates and the estimated pixel coordinates obtained by projection, i.e. the re-projection error, is calculated. And taking the sum of squares of the re-projection errors of all the characteristic points as an objective function. The value of this function reflects the accuracy of the parameters between the current eye and the display screen. To solve for the minimum of the objective function, it is often necessary to linearize the objective function. This involves making small adjustments to the camera parameters and observing how these adjustments affect the objective function. In each iteration, the amount of parameter update between the eye and the display screen that minimizes the change in the objective function is found by solving a linear least squares problem. These update amounts are applied to update the rotation matrix, translation vector, etc. parameters between the eye and the display screen and recalculate the objective function. And repeating the process until the variation of the objective function is smaller than a certain preset threshold value or the maximum iteration number is reached. This indicates that the parameters between the eye and the display screen have converged to a stable state. After the iterative optimization process is completed, the projection matrix is calculated using the updated parameters between the eyes and the display screen. The projection matrix is a matrix that converts three-dimensional coordinates into two-dimensional screen coordinates, and contains internal parameters (e.g., focal length, optical center, etc.) and external parameters (i.e., rotation matrix and translation vector) between the eyes and the display screen.

By using a plurality of feature points to perform direct linear transformation, the rotation matrix and translation vector of the camera can be estimated preliminarily, which provides a basis for subsequent high-precision alignment. The iterative optimization process gradually optimizes parameters between the eyes and the display screen by continuously reducing the reprojection error, namely the difference between the actual pixel coordinates and the estimated pixel coordinates, so that high-precision alignment between the virtual elements and the real world is realized. And a plurality of characteristic points are used for calculation, so that the robustness of the system is improved. Even if some feature points are blocked or misidentified, the system can still rely on other feature points to maintain a stable alignment effect. The iterative optimization process avoids the problem of alignment failure caused by inaccurate initial parameters by gradually approaching to the optimal solution, and improves the stability of the system. Although the iterative optimization process involves complex mathematical calculations, this process can be performed in real time with the support of modern computer hardware. This means that when the user moves the head or changes the viewing angle, the headset can immediately update the projection matrix to maintain continuous alignment of the virtual elements with the real world.

Optionally, the calculating a projection matrix according to the updated parameters between the eyes and the display screen comprises constructing a projection internal parameter according to the focal length and the optical center between the eyes and the display screen, and constructing a projection external parameter according to the rotation parameters and the translation parameters from the hardware coordinate system to the eye coordinate system;

The conversion relationship between the real space and the digital space can be expressed as:

Wherein E _k represents the projection internal reference between the eye and the screen, Representing the rotation parameters of the hardware coordinate system to the eye coordinate system,Representing translation parameters of the hardware coordinate system to the eye coordinate system.

Where P _k represents the projection matrix, E _k can be expressed by:

Where f _u and f _v denote the focal length between the eye and the screen, and c _u and c _v denote the optical center between the eye and the screen.

The projection matrix is a 3 x 4 matrix with a total of 12 unknowns. A feature point may give a set of 2 equations for u, v and x, y, z. If there are 6 sets of feature points, an analytical solution can be found.

For example, a set of 2 equations may be as follows:

By building the projection internal parameters from the focal length and optical center between the eye and the display screen, the headset can accurately simulate the optical characteristics of the camera lens, including how the focal length affects the magnification and reduction of the image, and how the optical center determines the center point of the image. This makes the rendering of the virtual element on the display screen more accurate, conforming to the actual optical laws. Meanwhile, the projection external parameters are constructed according to the rotation parameters and the translation parameters from the hardware coordinate system to the eye coordinate system, and the head-mounted device can accurately convert the virtual element from the position in the virtual space to the position under the view angle of the user. The conversion process considers the factors such as head movement, visual angle change and the like of the user, so that the precise alignment of the virtual element and the visual angle of the user is realized. The accurate projection matrix can ensure the rendering quality of the virtual elements on the display screen. Because the optical characteristics of the camera and the visual angle change of the user are considered, the virtual element can present a more real and natural visual effect during rendering, and the adverse factors such as distortion, blurring and the like are reduced. The parameters between the eyes and the display screen are updated through the iterative optimization process, and the projection matrix is calculated according to the parameters, so that the overall efficiency of the system can be improved. This is because the iterative optimization process can gradually approach the optimal solution, avoiding unnecessary computational waste. Meanwhile, the accurate projection matrix can reduce errors and repeated calculation in the rendering process, so that the rendering speed and the response capability of the system are improved.

And S140, converting points in a virtual space to a screen of the augmented reality headset by using the projection matrix.

After projection matrix conversion, points in virtual space are mapped to two-dimensional coordinates on the screen plane. These two-dimensional coordinates are then used to render the virtual element on the screen of the AR headset. The rendering process may also involve advanced graphics processing techniques such as texture mapping, illumination computation, shadow generation, etc., to further enhance the realism and immersion of the virtual elements. In AR applications, the user's head movement and viewing angle changes are continuous. Thus, the projection matrix and related parameters need to be updated in real time to ensure that the virtual elements are always properly aligned with the user's perspective. This typically tracks the user's head movements through sensors in the headset (e.g., gyroscopes, accelerometers, etc.), and updates the projection matrix and related parameters accordingly.

The embodiment also discloses a system for optical perspective calibration, and fig. 3 is a schematic block diagram of the system for optical perspective calibration disclosed in the embodiment of the application, as shown in fig. 3, the system includes an adjustment module 301, a coordinate module 302, a calculation module 303, and a display module 304, where:

The adjusting module 301 is configured to adjust the position and the posture of the virtual calibration object on the display screen according to the received voice of the first user, so that the virtual calibration object coincides with the actual calibration object in the physical space;

The coordinate module 302 is configured to obtain an image of the actual calibration object according to the received voice of the second user, obtain a first two-dimensional pixel coordinate of a first target angular point from the image, and determine a three-dimensional coordinate of the first target angular point according to the first two-dimensional pixel coordinate;

The calculating module 303 is configured to obtain a second two-dimensional pixel coordinate of a second target angular point of the virtual calibration object on the display screen, calculate a projection matrix according to the three-dimensional coordinate and the second two-dimensional pixel coordinate, where the first target angular point and the second target angular point correspond to the same point in the actual calibration object;

a display module 304 configured to convert points in virtual space onto a screen of the augmented reality headset using the projection matrix.

Optionally, the system includes a rendering module configured to:

Optionally, the rendering module is configured to:

Optionally, the adjustment module 301 is configured to:

Optionally, the coordinate module 302 is configured to:

Optionally, the computing module 303 is configured to:

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the embodiments of the apparatus and the method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the embodiments of the method are detailed in the method embodiments, which are not repeated herein.

This embodiment also discloses an electronic device, referring to fig. 4, which may comprise at least one processor 401, at least one communication bus 402, a user interface 403, a network interface 404, at least one memory 405.

Wherein communication bus 402 is used to enable connected communications between these components.

The user interface 403 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 403 may further include a standard wired interface and a standard wireless interface.

The network interface 404 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

Wherein the processor 401 may include one or more processing cores. The processor 401 connects the various parts within the entire server using various interfaces and lines, performs various functions of the server and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 405, and invoking data stored in the memory 405. Alternatively, the processor 401 may be implemented in at least one hardware form of digital signal Processing (DIGITAL SIGNAL Processing, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 401 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like, the GPU is used for rendering and drawing contents required to be displayed by the display screen, and the modem is used for processing wireless communication. It will be appreciated that the modem may not be integrated into the processor 401 and may be implemented by a single chip.

The Memory 405 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 405 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 405 may be used to store instructions, programs, code sets, or instruction sets. The memory 405 may include a stored program area that may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc., and a stored data area that may store data, etc., involved in the above-described respective method embodiments. The memory 405 may also optionally be at least one storage device located remotely from the aforementioned processor 401. As shown in FIG. 4, an operating system, a network communication module, a user interface module, and applications of the method of optical perspective calibration may be included in memory 405, which is a type of computer storage medium.

In the electronic device shown in fig. 4, the user interface 403 is mainly used as an interface for providing input for a user to obtain data input by the user, while the processor 401 may be used to invoke an application program in the memory 405 storing methods of optical perspective calibration, which when executed by the one or more processors 401, causes the electronic device to perform the method as in one or more of the embodiments described above.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all of the preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, such as a division of units, merely a division of logic functions, and there may be additional divisions in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be embodied essentially or partly in the form of a software product, or all or part of the technical solution, which is stored in a memory 405, and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps of the method of the embodiments of the present application. The memory 405 includes various media capable of storing program codes, such as a usb disk, a removable hard disk, a magnetic disk, or an optical disk.

The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.

Claims

1. A method of optical perspective calibration, for application to an augmented reality headset comprising a display screen, the method comprising:

2. The method of optical perspective calibration according to claim 1, characterized in that the method comprises:

converting the first data into second data in a display screen coordinate system, and positioning a virtual calibration object according to the second data;

rendering the virtual calibration object by using a drawing command of OpenGL, and presenting a rendering result on the display screen;

3. The method of optical perspective calibration according to claim 1, wherein the locating the position of the actual calibration object in physical space by PnP comprises:

4. The method of optical perspective calibration according to claim 1, wherein adjusting the position and posture of the virtual calibration object on the display screen according to the received first user voice comprises:

applying the new state to a state model of the calibration object to update the position and the posture of the virtual calibration object in the virtual space;

The updated position and pose are rendered to the display screen using a graphics rendering engine.

5. The method of optical perspective calibration according to claim 1, wherein the obtaining a first two-dimensional pixel coordinate of a first target corner from the image, and determining a three-dimensional coordinate of the first target corner according to the first two-dimensional pixel coordinate comprises:

6. The method of optical perspective calibration according to claim 1, wherein the calculating a projection matrix from the three-dimensional coordinates and the second two-dimensional pixel coordinates comprises:

7. The method of optical perspective calibration according to claim 6, wherein calculating a projection matrix from the updated parameters between the eye and the display screen comprises:

8. The system for calibrating the optical perspective is characterized by comprising an adjusting module, a coordinate module, a calculating module and a display module, wherein:

9. An electronic device comprising a processor, a memory, a user interface, and a network interface, the memory for storing instructions, the user interface and the network interface each for communicating with other devices, the processor for executing instructions stored in the memory to cause the electronic device to perform the method of any of claims 1-7.

10. A computer readable storage medium storing instructions which, when executed, perform the method of any one of claims 1-7.