Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The term "and/or" is used herein to describe an association relationship of associated objects, and means that there may be three relationships, for example, a and/or B, and that there may be three cases where a exists alone, while a and B exist together, and B exists alone. The symbol "/" herein indicates that the associated object is or is a relationship, e.g., A/B indicates A or B.
The terms "first" and "second" and the like in the description and in the claims are used for distinguishing between different objects and not for describing a particular sequential order of objects. In the description of the embodiments of the present application, unless otherwise specified, the meaning of "plurality" means two or more, for example, a plurality of processing units means two or more processing units and the like, and a plurality of elements means two or more elements and the like.
In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
Currently, in the fields of image processing and machine vision and graphics rendering related applications such as autopilot, VR/AR, etc., computing efficiency and computing accuracy have been major targets of interest in the industry, where feature point-based image recognition, matching, and stereoscopic vision have been of great interest because of the wide range of application scenarios.
The following describes the current image feature recognition and matching process, and referring to fig. 1, the image feature recognition and matching process may include the following steps:
(1) Feature recognition, namely feature detection, wherein feature points with remarkable properties, such as corner points, edge points or spots, are found in the image. These feature points are typically located at a significant structure of the image with good stability and repeatability.
(2) Characterization, for each detected feature point, its descriptor is computed. The descriptor is a vector or matrix representing the local image information around the feature points. As shown in fig. 1, in this step, a feature vector of each feature point in the two frame images is calculated separately.
(3) Feature matching, namely, finding out the most similar feature point pairs by comparing descriptors of feature points in different images. Feature matching may be performed using various distance metrics or similarity metrics methods, such as euclidean distance, hamming distance, or correlation metrics. A threshold is typically set to filter out matching pairs that are satisfactory. As shown in fig. 1, in this step, image feature matching is performed based on feature vectors between feature points of two frame images. Applications such as motion estimation, image stitching, background segmentation, etc. can be implemented based on the identified image feature points and matching the feature points between different images.
The algorithm for feature point recognition needs a large number of pixel calculations, and the higher the image resolution is, the more pixels need to be calculated, so that the more calculation resources and calculation time are consumed.
At present, the image feature point recognition scheme mainly uses a multithreading mode to calculate at a CPU end, but the parallelism degree is low, and the calculation time is relatively long. In addition, because the image data in the rendering pipeline is required to be copied or mapped in the graphics rendering application, the CPU accesses the image data in the rendering pipeline and then performs corresponding processing, so that the processing time is long and the memory power consumption is high, and the application of the mobile terminal is difficult to run at a high frame rate under the condition of limited computing capacity.
As more customized computing elements such as GPUs, DSPs, etc., are presented in mobile end devices, it is desirable to be able to efficiently and with low power consumption implement feature recognition by such computing elements in mobile end devices. However, even though hardware acceleration of such computing units has been utilized today, feature recognition and machine vision still face efficiency and power consumption issues at the mobile end device, and especially how to efficiently interact with the graphics rendering pipeline by applying feature recognition and machine vision to graphics rendering fields such as VR/AR applications, constitutes a major performance bottleneck.
Based on the above, the embodiment of the application provides an image feature recognition method and electronic equipment based on a GPU, in the process of recognizing image feature points generated by graphic rendering, by directly accessing image resources in a graphic rendering pipeline, the operation of copying data through a memory between the GPU and a CPU can be avoided, the heating of mobile terminal equipment caused by the memory bandwidth overhead is greatly reduced, meanwhile, the high concurrent multithreading processing capability provided by the GPU and the thread atomic operation access inter-core shared cache are utilized, the computing efficiency of feature point recognition can be greatly accelerated, and the purpose of high-frame-rate application is achieved.
Compared with the scheme of the related technology, the scheme of the application has the improvement that the characteristic point identification operation is directly executed in the rendering pipeline by utilizing the data specific to the GPU in the graphic rendering application, the identification efficiency is greatly improved on the premise of ensuring the identification accuracy, and the calculated power consumption is reduced.
That is, the embodiment of the application can efficiently and with low power consumption realize the feature point recognition operation of the rendered image by utilizing some characteristic functions of the GPU and logic for directly processing the feature point recognition in the graphics rendering pipeline, and can realize the image feature point recognition function with high frame rate even on a platform with limited computing capacity at the mobile end.
In order to facilitate understanding of the embodiments of the present application, some terms of the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.
The image coordinate system is a two-dimensional coordinate system, and as shown in fig. 2, the origin (0, 0) of coordinates of the image coordinate system is located at the center, and the pixels are used as units. Based on the image coordinate system, any one pixel on the image can be located.
World coordinate system-a three-dimensional coordinate system whose origin of coordinates may be determined as appropriate, as shown in fig. 2, the world coordinate system may represent an object in space in units of length, such as millimeters (mm).
In particular, as shown in fig. 2, in the image processing process, coordinate mapping may be performed between an image coordinate system and a world coordinate system. For example, the image coordinates of the pixel points are mapped with the object vertex coordinates, and the whole texture is attached to the surface of the three-dimensional object model in the rendering process, so that a finer texture representation effect is realized.
Homogeneous coordinates-homogeneous coordinates are a very useful basic concept in computer graphics, and can be used to scale, rotate, translate, and perspective the matrix of the projection by adding an additional dimension W to the cartesian coordinate system. For a point (X, Y) in the cartesian coordinate system, it becomes (X, Y, w) in the homogeneous coordinate system. The homogeneous coordinates (X, Y, w) are converted to cartesian coordinates by simply dividing X and Y by w, so X and Y in the cartesian coordinate system can be re-expressed as x=x/w and y=y/w.
The computing shader is a shader capable of flexibly using GPU high-speed operation to process non-graphic tasks, namely, a shader program which runs on the GPU and is outside a common rendering pipeline is used for a massive parallel general graphic processor algorithm, and can improve the rendering speed of partial games.
The call of the computing shader adopts a global working group, wherein the global working group comprises local working groups (also called local working groups), and each local working group comprises an execution unit. Illustratively, as shown in FIG. 3, the global workgroup size is (4,4,1) and the local workgroup size is 4X 4.
The computation pipeline, which is the pipeline in the GPU that controls the stage of the computation shader, is a single-process pipeline, and the built-in variables can determine the location (index) of the local workgroup and execution units.
Gl_ LocalInvocationID denotes the position of the execution unit in the local work group.
Gl_ GlobalInvocationID denotes the position of the current execution unit in the global working group, and is a three-dimensional variable.
Gl_ LocalInvocationIndex is a one-dimensional index obtained by converting three-dimensional variables of an execution unit, and is a flattened form of gl_ LocalInvocationID.
Rendering pipeline-is the process of converting a three-dimensional scene model to screen pixel space output. The graphics rendering pipeline accepts a set of three-dimensional coordinates, converting the three-dimensional coordinates into a colored two-dimensional image on a screen.
It should be noted that the shader code may be used to instruct the GPU how to draw the pattern. Among other things, the shader code determines how to draw a particular effect, and the rendering pipeline controls the "draw content" and the "question of which shaders to use for drawing.
SSBO-shader memory buffers (shader storage buffer object, SSBO) that store byte data and upload it to graphics memory so that shader programs can process the data. For example, SSBO is used to store vertex data, element data (vertex index), uniform block data, and the like. In actual implementation, the processed data can be transferred between two compute shaders by means of SSBO.
The present solution employs two SSBO, referred to as SSBO1 and SSBO2, respectively. For example, in the step of calculating the random number, the calculated random number queue is stored in SSBO 1. In the feature point sorting step, N feature point data having the largest response value are copied to SSBO2.
Multithreading shares a cache region-shared cache allows access by two or more processes or threads. Specifically, in the step of sorting feature points, the first N feature points with larger response values are screened out from the local pixel area, the N feature point data are sorted according to the sequence from the larger response value to the smaller response value, and the sorted feature point data are stored in the shared buffer area.
The application provides an image processing acceleration scheme at a software level, which can efficiently and low-power-consumption realize the characteristic point identification operation of a rendered image by utilizing some characteristic functions of a GPU and logic for directly processing characteristic point identification in a graphic rendering pipeline, and can realize the image characteristic point identification function with high frame rate on a platform with limited computing capacity at a mobile end. By the scheme of the application, the power consumption of the mobile terminal equipment in the aspects of feature recognition and machine vision algorithm can be effectively reduced, the method and the device are widely applied to image rendering application scene services such as VR/AR, the image processing efficiency and the computing accuracy are improved, and the computing power consumption is reduced.
The image feature recognition method based on the GPU provided by the embodiment of the application can be applied to electronic equipment with a display function. The electronic devices may include mobile phones, smart televisions, wearable devices, tablet computers (Pad), computers with wireless transceiving functionality, virtual Reality (VR) terminal devices, augmented reality (augmented reality, AR) terminal devices, wireless terminals in industrial control (industrial control), wireless terminals in unmanned driving (self-driving), wireless terminals in tele-surgery (remote medical surgery), wireless terminals in smart grid (SMART GRID), wireless terminals in transportation security (transportation safety), wireless terminals in smart city (SMART CITY), wireless terminals in smart home (smart home), and so forth. The embodiment of the application does not limit the specific technology and the specific equipment form adopted by the terminal equipment.
First, an exemplary electronic device 100 provided in an embodiment of the present application is described. In some embodiments, the electronic device 100 may be a cell phone, tablet computer, desktop computer, laptop computer, handheld computer, notebook computer, ultra-mobile personal computer (UMPC), netbook, and cellular telephone, personal Digital Assistant (PDA), augmented reality (augmented reality, AR) device, virtual Reality (VR) device, artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) device, wearable device, in-vehicle device, smart home device, and/or smart city device, and embodiments of the present application are not particularly limited as to the particular type of electronic device 100.
Fig. 4A is a schematic hardware structure of an electronic device according to an embodiment of the present application. As shown in fig. 4A, the electronic device 100 may include a processor 110 and a display screen 120.
Processor 110 may include one or more processing units, for example, processor 110 may include a central processing unit (center processing unit, CPU) 111 and a graphics processor (graphics processing unit, GPU) 112. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
Referring to fig. 4b, the cpu includes a controller, one or more Arithmetic Logic Units (ALUs), and a memory unit. The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution. ALUs are used to perform arithmetic and logical operations. The memory unit is used for storing instructions and data. In some embodiments, the memory unit may be a cache. The memory unit may hold instructions or data that is just used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it may be called directly from the memory unit. This avoids duplicate accesses and reduces the latency of the processor 110, thereby improving the efficiency of the system.
GPU 112 is a microprocessor for processing images, is connected to CPU 111 and to display screen 120. Referring to fig. 4b, the gpu 112 is correspondingly provided with a memory unit. The GPU 112 may receive the image data sent by the CPU and store the image data in the display memory unit, then perform graphics rendering and other processes on the image data, then send the processed image to the display memory unit, and then the display screen 120 may read the processed image from the display memory unit and display the processed image.
The display screen 120 is used to display the image rendered by the GPU. The display 120 includes a display panel. For example, the display panel may employ an organic light-emitting diode (OLED).
The electronic device 100 realizes an image display function through the CPU 111, the GPU 112, the display screen 120, and the like.
Taking a game scene as an example, the rendering of the image in the game is completed by a CPU and a GPU, wherein the CPU is responsible for processing some global tasks, such as determining an object and light which need to be provided for the GPU in a drawing instruction (Drawcall), setting a rendering pipeline of the GPU, and then notifying the GPU to render. And the GPU is responsible for the detailed task of image rendering.
When the method is actually realized, a CPU initiates a rendering flow, a game application uses the CPU to judge which objects need to be rendered, and after the objects needing to be rendered are found, the graphics APIs corresponding to the system are called one by one, each object is called once Drawcall, and Drawcall is the process that the CPU notifies the CPU to start rendering. When the CPU receives Drawcall, the CPU performs image drawing on the specified object, and then the CPU displays the drawn two-dimensional image on the screen.
In general, the CPU and the GPU have different advantages in the image processing process, respectively.
On the one hand, the CPU has the advantage of computing power and parallel processing power, and can rapidly process a large amount of data and perform complex computation. The CPU may also perform operations such as matrix operations and vectorization, which are important in computer vision. In addition, the CPU may perform time-consuming tasks such as sorting and searching.
On the other hand, the GPU has an advantage in that it is capable of graphics processing, and image processing and computation can be performed quickly. GPUs can also perform parallelized computations, which are particularly useful for large-scale image processing tasks. The GPU may also perform texture merging and shading, which may increase the accuracy of image recognition.
In particular, in order to solve the problems of efficiency, power consumption and the like of the mobile terminal equipment in the prior art, such as feature recognition and machine vision, the embodiment of the application can more efficiently interact with a graphics rendering pipeline by directly accessing image resources in the graphics rendering pipeline in the process of recognizing the image feature points generated by the graphics rendering, can reduce the power consumption of the mobile terminal equipment in the aspects of feature recognition and machine vision algorithm, for example, can greatly improve the recognition efficiency on the premise of ensuring the recognition accuracy when being applied to graphics rendering application scene services such as VR/AR and the like, and reduces the calculated power consumption.
The above is a specific description of the embodiment of the present application using the electronic device 100 as an example. It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. The electronic device 100 may have more or fewer components than shown in the figures, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
In addition, an operating system is run on the components. Such as the iOS operating system developed by apple corporation, the Android open source operating system developed by google corporation, the Windows operating system developed by microsoft corporation, etc. An operating application may be installed on the operating system.
The operating system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In the embodiment of the application, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated.
Fig. 5 is a software configuration block diagram of the electronic device 100 according to the embodiment of the present application. The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer (applications), an application framework layer (application framework), an Zhuoyun row (Android runtime) modules and system libraries, and a kernel layer (kernel), respectively.
The application layer may include a series of application packages, among other things. For example, the application layer may include applications such as video applications, gaming applications, autopilot applications, AR/VR applications, and the like, to which embodiments of the present application are not limited in any way.
The application framework layer provides an application programming interface (application programming interface, API) and programming framework for the application of the application layer. The application framework layer includes a number of predefined functions. Illustratively, the application framework layer may include a window manager, a content provider, a view system, a resource manager, etc., to which embodiments of the application are not limited in any way.
And the android runtime module is used for being responsible for scheduling and management of an android system. The android runtime module comprises a virtual machine and a core library. The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like. The core library contains the function functions that the java language needs to call.
The system library may include a plurality of functional modules. Such as surface manager (surface manager), media library (media library), three-dimensional graphics processing library (e.g., openGL ES), two-dimensional graphics engine (e.g., SGL), etc. The surface manager is used to manage the display subsystem and provides a fusion of the two-dimensional and three-dimensional layers for the plurality of applications.
The kernel layer is a layer between hardware and software. The kernel layer at least contains display drivers and sensor drivers.
For ease of illustration, the hardware layers that interact with the software architecture described above are also embodied in FIG. 5. For example, the hardware layers may include a CPU, a GPU, and a display screen.
Although the Android system is described as an example in the embodiment of the present application, the basic principle is equally applicable to electronic devices based on the iOS or Windows and other operating systems.
The execution main body of the GPU-based image feature recognition method provided by the embodiment of the present application may be the above-mentioned electronic device, or may be a functional module and/or a functional entity capable of implementing the GPU-based image feature recognition method in the electronic device, and the solution of the present application may be implemented by means of hardware and/or software, and may specifically be determined according to actual use requirements, which is not limited by the embodiment of the present application. The image feature recognition method based on the GPU provided by the embodiment of the application is exemplified by an electronic device with reference to the accompanying drawings.
In order to better understand the embodiments of the present application, the following briefly describes the flow of the GPU-based image feature point recognition method provided in the embodiments of the present application.
Step 101, creating an image pyramid. Wherein an image pyramid is created for the image to be processed in the rendering pipeline. Referring to fig. 6, the pyramid contains K levels of image data, and the image data of one pixel of each level of pyramid is an average value of the image data of four pixels at the corresponding position of the pyramid of its next level. Until the uppermost pyramid has only one pixel, the pixel value of the pyramid is the average value of all pixels of the whole picture.
And 102, calculating corner points of the pyramid layers. And binding corresponding levels of the image pyramid according to the scaling coefficient parameters by using a computing shader of the GPU. For example, the hierarchy of 16×16 pixel blocks in the image pyramid is bound, and the luminance data and depth data of the pixels are used to determine corner points in the 16×16 pixel blocks.
Step 103, calculating a response function value (which may be simply referred to as a response value) of the corner point. Wherein the response function value is calculated for pixels at diagonal points. And if the response function value is greater than or equal to the preset response threshold value, determining the corner point as a characteristic point.
And 104, suppressing the maximum value of the characteristic points. And taking the maximum value of the response function values of all the characteristic points in the 16 multiplied by 16 pixel block area by utilizing the shared buffer variable of the thread working group of the computation shader to obtain a local maximum value, and removing the similar characteristic points which are locally gathered.
And 105, outputting the characteristic point information. Wherein the function value of the local maxima and the pixel coordinates are recorded, and the response function value is written to the corresponding coordinate position in the mask image.
Thus, the identification of the image feature points is completed. According to the scheme, the characteristic point identification operation is directly executed in the rendering pipeline by utilizing the data specific to the GPU in the graphic rendering application, so that the identification efficiency is greatly improved on the premise of ensuring the identification accuracy, and the calculated power consumption is reduced.
The following describes in detail a specific implementation procedure of the GPU-based image feature recognition method according to the embodiment of the present application with reference to fig. 7.
Step 101 creating an image pyramid
An image pyramid is a collection of images made up of multiple sub-images of different resolutions of an image. The set of images is generated from a single image by continuous downsampling, the smallest image possibly having only one pixel.
In an embodiment of the application, a color image (or referred to as a rendered image) generated by a rendering pipeline is bound to a current operation image by a texture binding instruction (glBindTexture functions), and then a texture mapping instruction (GLGENERATEMIPMAP functions) is called to generate an image pyramid.
Wherein glBindTexture functions are used to bind textures, which function to replace the currently bound texture object with the texture object specified in the parameters so that subsequent operations are applied to the texture object. For example, after binding the color image texture 101, an image processing operation may be further performed on the color image texture 101.
In addition, embodiments of the present application create a depth image (also referred to as a texture image) DEPTHMIPMAP in R16FG16F format and the same size as the color image described above, bind the depth image DEPTHMIPMAP as an operation image by a texture binding instruction (glBindTexture function) and call a texture mapping instruction (GLGENERATEMIPMAP function) to generate a pyramid.
It should be noted that, in the related art, the image recognition is performed by the CPU and the GPU through the color image, and the improvement point of the scheme of the present application with respect to the related art is that, on the GPU side, besides binding the color image as an operation image and generating a corresponding image pyramid, binding the depth image as an operation image and generating a corresponding image pyramid may also be performed. In this way, the corner points in the image can be determined based on the image data of the color image and the depth image, the response function values of the corner points are calculated, and the characteristic point recognition is performed according to the response function values of the corner points.
In the embodiment of the application, the 0 th layer of the pyramid is an original image, and the sampled depth data is stored in the R channel of the depth image DEPTHMIPMAP.
In the embodiment of the application, the depth value corresponding to the color image is calculated for each i layer in the image pyramid from the first layer of the pyramid through for circulation. So that a corresponding depth value for each layer in the image pyramid can be obtained.
It should be noted that, the process of calculating the image data of a pixel in the ith layer according to the image data of four pixels in the ith layer may include converting the depth values of the four pixels in the ith layer of the image pyramid into a camera coordinate system, then averaging the camera space coordinates of the four pixels in the camera coordinate system to obtain the camera space coordinates of the ith layer of the image pyramid, and then converting the camera space coordinates of the ith layer of the image pyramid to obtain a pixel depth value corresponding to the ith layer of the image pyramid.
First, by the following equation 1, the depth value of the i-1 th layer pixel point (x, y) in the image pyramid is converted into a three-dimensional camera coordinate system.
View_mipi-1 (x, y) = (x, y, depth_mipi-1 (x, y), 1.0) × Inverse (projectionMatrix) (equation 1)
Where depth_mipi (x, y) represents the Depth value corresponding to the i-1 th layer in the image pyramid. projectionMatrix denotes a projection matrix, and Inverse () denotes an inversion matrix.
View_mip_i-1 (x, y) includes View_mip_i-1 (x, y). X, view_mip_i-1 (x, y), and View_mip_i-1 (x, y). Z.
Then, homogeneous coordinate transformation is performed through equation 2, and camera space coordinates of the i-1 layer pixel point (x, y) of the image pyramid are calculated:
View_mip_i-1(x,y)=(View_mip_i-1(x,y).x/View_mip_i-1(x,y).w,View_mip_i-1(x,y).y/View_mip_i-1(x,y).w,View_mip_i-1(x,y).z/View_mip_i-1(x,y).w,1.0) ( Equation 2
Wherein view_mip_i-1 (x, y). W represents the variables introduced by the homogeneous coordinate transformation.
Wherein View_mip_i-1 (x, y) represents the camera spatial coordinates of the pixel point (x, y) of the i-1 th level of the image pyramid in the three-dimensional camera coordinate system.
The depth value of the pixel point (x, y) is a one-dimensional coordinate, the depth value of the pixel point (x, y) is converted into a three-dimensional camera coordinate system, and camera space coordinates of the pixel point (x, y) are obtained through homogeneous coordinate transformation, and the camera space coordinates are three-dimensional coordinates.
According to the above equations 1 and 2, the depth value of the i-1 th layer pixel (x, y) in the image pyramid is converted into a three-dimensional camera coordinate system, so as to obtain the camera space coordinate view_mip_i (x, y) corresponding to the i-1 th layer pixel (x, y) in the image pyramid.
View_mip_i-1 (x, y) comprises the following three components:
(1) The X component is x1=view_mipi-1 (X, y) ·x/view_mipi-1 (X, y) ·w;
(2) Y component y1=view_mip_i-1 (x, Y) ·y/view_mip_i-1 (x, Y) ·w;
(3) Z component z1=view_mip_i-1 (x, y) ·z/view_mip_i-1 (x, y) ·w.
Similarly, according to equations 1 and 2 above, the depth value of pixel (x+1, y) of the i-1 th layer of the image pyramid can be converted into a three-dimensional camera coordinate system, resulting in the camera space coordinate View_mipi-1 (x+1, y) of pixel (x+1, y) of the i-1 th layer in the three-dimensional camera coordinate system. View_mip_i-1 (x+1, y) comprises the following three components:
(1) The X component is x2=view_mip_i-1 (x+1, y) ·x/view_mip_i-1 (x+1, y) ·w;
(2) Y component y2=view_mip_i-1 (x+1, Y) ·y/view_mip_i-1 (x+1, Y) ·w;
(3) Z component z2=view_mip_i-1 (x+1, y) ·z/view_mip_i-1 (x+1, y) ·w.
Similarly, according to equations 1 and 2 above, the depth value of the pixel (x, y+1) of the i-1 th layer of the image pyramid can be converted into a three-dimensional camera coordinate system, resulting in the camera space coordinate View_mipi-1 (x, y+1) of the pixel (x, y+1) of the i-1 th layer in the three-dimensional camera coordinate system. View_mip_i-1 (x, y+1) comprises the following three components:
(1) The X component is x3=view_mip_i-1 (X, y+1) ·x/view_mip_i-1 (X, y+1) ·w;
(2) Y component y3=view_mip_i-1 (x, y+1) ·y/view_mip_i-1 (x, y+1) ·w;
(3) Z component z3=view_mip_i-1 (x, y+1) ·z/view_mip_i-1 (x, y+1) ·w.
Similarly, according to equations 1 and 2 above, the depth value of pixel (x+1, y+1) of the i-1 th layer of the image pyramid can be converted into a three-dimensional camera coordinate system, resulting in the camera space coordinate view_mip_i-1 (x+1, y+1) of pixel (x+1, y+1) of the i-1 th layer under the three-dimensional camera coordinate system. View_mip_i-1 (x+1, y+1) comprises the following three components:
(1) The X component is x4=view_mip_i-1 (x+1, y+1) ·x/view_mip_i-1 (x+1, y+1) ·w;
(2) Y component y4=view_mip_i-1 (x+1, y+1) ·y/view_mip_i-1 (x+1, y+1) ·w;
(3) Z component z4=view_mip_i-1 (x+1, y+1) ·z/view_mip_i-1 (x+1, y+1) ·w.
Thus, the respective camera space coordinates of the four pixel points (x, y), (x+1, y), (x, y+1) and (x+1, y+1) of the i-1 th hierarchical pyramid are calculated by the above equations 1 and 2.
The average value of the camera space coordinates of the four pixel points (x, y), (x+1, y), (x, y+1) and (x+1, y+1) of the i-1 th level of the image pyramid is calculated by the following equation 3, and the camera space coordinates of the pixel point (x, y) of the i-1 th level can be obtained.
View_mip_i(x,y)=(View_mip_i-1(x,y)+View_mip_i-1(x+1,y)+View_mip_i-1(x,y+1)+View_mip_i-1(x+1,y+1))/4.0( Equation 3
View_mip_i-1 (x, y), view_mip_i-1 (x+1, y), view_mip_i-1 (x, y+1), view_mip_i-1 (x+1, y+1) represent the camera spatial coordinates of the four pixel points (x, y), (x+1, y), (x, y+1), and (x+1, y+1) of the i-1 level of the image pyramid in the three-dimensional camera coordinate system, respectively. It will be appreciated that the camera space coordinate values for the four pixels are summed and then divided by 4 to yield the camera space coordinate for the four pixels.
View_mip_i (x, y) represents the camera space coordinates of the i-th level pixel obtained by averaging the camera space coordinates of the i-1 level four pixel points in the three-dimensional camera coordinate system. View_mip_i (x, y) includes three components:
(1) The X component is View_mipi (X, y). X= (x1+x2+x3+x4)/4;
(2) The Y component is view_mipi (x, Y). Y= (y1+y2+y3+y4)/4;
(3) The Z component is view_mipi (x, y). Z= (z1+z2+z3+z4)/4.
Then, the pixel camera spatial coordinates view_mip_i (x, y) found in the three-dimensional camera coordinate system are multiplied by the projection matrix to obtain NDC coordinates as intermediate results by the following equation 4:
Ndc_mip_i (x, y) =view_mip_i (x, y) × projectionMatrix (equation 4)
Where ndc_mipi (x, y) represents NDC coordinates of a pixel of the i-th level of the image pyramid in the three-dimensional camera coordinate system. NDC coordinates are four-dimensional coordinates, including x, y, z, w components. Ndc_mipi (x, y) includes ndc_mipi (x, y) ·x, ndc_mipi (x, y) ·y, ndc_mipi (x, y) ·z, and ndc_mipi (x, y) ·w. Where ndc_mipi (x, y). W represents the variable introduced by the homogeneous coordinate transformation.
Then, through the following equation 5, the ratio between ndc_mip_i (x, y) z and ndc_mip_i (x, y) w is obtained through homogeneous coordinate transformation, so that the depth value corresponding to the ith level of the image pyramid can be obtained.
Depth_mipi (x, y) =NDC_mipi (x, y) ·z/NDC_mipi (x, y) ·w (equation 5)
Where depth_mipi (x, y) represents the Depth value corresponding to the i-th level of the image pyramid.
In the embodiment of the application, through the equations 1 to 5, four pixel depth values of the ith level of the image pyramid are converted into a camera coordinate system, then the camera space coordinates of the four pixels are averaged in the camera coordinate system to obtain the camera space coordinates of the ith level of the image pyramid, and then the camera space coordinates of the ith level of the image pyramid are converted to obtain the pixel depth values corresponding to the ith level of the image pyramid.
In equations 1 through 5 above, i takes 1,2, in order, and the depth value corresponding to each layer color image in the K-layer image pyramid can be calculated.
The process of calculating the depth value corresponding to the color image for each layer of the image pyramid is described below by way of example.
Illustratively, as shown in FIG. 8, the image pyramid includes 8 layers. Starting from the first layer pyramid, a depth value corresponding to the color image is calculated for each layer of the image pyramid by a for loop.
(1) Calculating depth value (1 is taken to be i) corresponding to layer 1 of image pyramid
First, a color image is converted from a two-dimensional image coordinate system to a three-dimensional camera coordinate system by the following equation:
View_mip_0(x,y)=(x,y,Depth_mip_0(x,y),1.0)*Inverse(projectionMatrix)。
view_mip0 (x, y) includes View_mip0 (x, y). X, view_mip0 (x, y). Y, and View_mip0 (x, y). Z.
Then, camera space coordinates of the four pixel points of the 0 th layer are calculated in the three-dimensional camera coordinate system through homogeneous coordinate transformation:
View_mip_0(x,y)=
(View_mip_0(x,y).x/View_mip_0(x,y).w,
View_mip_0(x,y).y/View_mip_0(x,y).w,
View_mip_0(x,y).z/View_mip_0(x,y).w,
1.0)
View_mip_0(x+1,y)=
(View_mip_0(x+1,y).x/View_mip_0(x+1,y).w,
View_mip_0(x+1,y).y/View_mip_0(x+1,y).w,
View_mip_0(x+1,y).z/View_mip_0(x+1,y).w,
1.0)
View_mip_0(x,y+1)=
(View_mip_0(x,y+1).x/View_mip_0(x,y+1).w,
View_mip_0(x,y+1).y/View_mip_0(x,y+1).w,
View_mip_0(x,y+1).z/View_mip_0(x,y+1).w,
1.0)
View_mip_0(x+1,y+1)=
(View_mip_0(x+1,y+1).x/View_mip_0(x+1,y+1).w,
View_mip_0(x+1,y+1).y/View_mip_0(x+1,y+1).w,
View_mip_0(x+1,y+1).z/View_mip_0(x+1,y+1).w,
1.0)
View_mip0 (x, y). W, view_mip0 (x+1, y). W, view_mip0 (x, y+1). W, view_mip0 (x+1, y+1). W represent variables introduced by the homogeneous coordinate transformation.
Then, the camera space coordinates of the four pixel points of the 0 th layer are averaged in a three-dimensional camera coordinate system to obtain the camera space coordinates of the pixel points (x, y) of the 1 st layer:
View_mip_1(x,y)=(View_mip_0(x,y)+View_mip_0(x+1,y)+View_mip_0(x,y+1)+View_mip_0(x+1,y+1))/4.0。
Wherein View_mip1 (x, y) contains three components, view_mip1 (x, y). X, view_mip1 (x, y). Y and View_mip1 (x, y). Z.
Then, view_mip1 (x, y) is converted from the three-dimensional camera coordinate system to the NDC coordinate system by the following equation:
NDC_mip_1(x,y)=View_mip_1(x,y)*projectionMatrix。
wherein, NDC_mip1 (x, y) comprises four components, NDC_mip1 (x, y). X, NDC_mip1 (x, y). Y, NDC_mip1 (x, y). Z and NDC_mip1 (x, y). W.
Then carrying out homogeneous coordinate transformation, and calculating a depth value corresponding to the 1 st layer of the image pyramid:
Depth_mip_1(x,y)=NDC_mip_1(x,y).z/NDC_mip_1(x,y).w。
from this, the Depth value depth_mip1 (x, y) corresponding to layer 1 of the image pyramid is calculated.
(2) Calculating depth value (i is 2) corresponding to layer 2 of image pyramid
First, the Depth value depth_mip1 (x, y) corresponding to layer 1 of the image pyramid is converted into a three-dimensional camera coordinate system:
View_mip_1(x,y)=(x,y,Depth_mip_1(x,y),1.0)*Inverse(projectionMatrix)。
View_mip1 (x, y) includes View_mip1 (x, y). X, view_mip1 (x, y). Y, and View_mip1 (x, y). Z.
Then, camera space coordinates of the four pixel points of the 1 st layer are calculated in the three-dimensional camera coordinate system through homogeneous coordinate transformation:
View_mip_1(x,y)=
(View_mip_1(x,y).x/View_mip_1(x,y).w,
View_mip_1(x,y).y/View_mip_1(x,y).w,
View_mip_1(x,y).z/View_mip_1(x,y).w,
1.0)
View_mip_1(x+1,y)=
(View_mip_1(x+1,y).x/View_mip_1(x+1,y).w,
View_mip_1(x+1,y).y/View_mip_1(x+1,y).w,
View_mip_1(x+1,y).z/View_mip_1(x+1,y).w,
1.0)
View_mip_1(x,y+1)=
(View_mip_1(x,y+1).x/View_mip_1(x,y+1).w,
View_mip_1(x,y+1).y/View_mip_1(x,y+1).w,
View_mip_1(x,y+1).z/View_mip_1(x,y+1).w,
1.0)
View_mip_1(x+1,y+1)=
(View_mip_1(x+1,y+1).x/View_mip_1(x+1,y+1).w,
View_mip_1(x+1,y+1).y/View_mip_1(x+1,y+1).w,
View_mip_1(x+1,y+1).z/View_mip_1(x+1,y+1).w,
1.0)
View_mip1 (x, y). W, view_mip1 (x+1, y). W, view_mip1 (x, y+1). W, view_mip1 (x+1, y+1). W represent variables introduced by the homogeneous coordinate transformation.
Then, the camera space coordinates of the four pixels of the 1 st layer are averaged in a three-dimensional camera coordinate system to obtain the camera space coordinates of one pixel in the 2 nd layer image:
View_mip_2(x,y)=(View_mip_1(x,y)+View_mip_1(x+1,y)+View_mip_1(x,y+1)+View_mip_1(x+1,y+1))/4.0。
Wherein View_mip2 (x, y) contains three components, view_mip2 (x, y). X, view_mip2 (x, y). Y and View_mip2 (x, y). Z.
Then, view_mip2 (x, y) is converted from the three-dimensional camera coordinate system to DNC coordinates by the following equation:
NDC_mip_2(x,y)=View_mip_2(x,y)*projectionMatrix。
wherein, NDC_mip2 (x, y) comprises four components, NDC_mip2 (x, y). X, NDC_mip2 (x, y). Y, NDC_mip2 (x, y). Z and NDC_mip2 (x, y). W.
Then carrying out homogeneous coordinate transformation, and calculating a depth value corresponding to the 2 nd layer of the image pyramid:
Depth_mip_2(x,y)=NDC_mip_2(x,y).z/NDC_mip_2(x,y).w
From this, the Depth value depth_mip2 (x, y) corresponding to layer 2 of the image pyramid is calculated.
(3) I sequentially taking 3, and calculating the depth value corresponding to the 3 rd layer, the third layer and the fourth layer of the image pyramid by K
In the above manner, depth value depth_mip3 (x, y) corresponding to layer 3 of the image pyramid, and Depth value depth_mipn (x, y) corresponding to layer K may be sequentially calculated.
By the method provided by the embodiment of the application, the depth value corresponding to each layer of the image pyramid can be calculated. It should be noted that the depth values (or referred to as depth data) of different levels calculated herein may be applied to the following steps of calculating corner points, which will be described in detail below.
Step 102, calculating corner points of pyramid layers
Here, the corner may be defined as a window that moves in any direction and has a large response value change, and the corner may be defined as a pixel window.
In the embodiment of the application, the point with the brightness difference larger than the threshold value can be used as the corner point by calculating the brightness difference of each pixel of the color image in the surrounding area. It should be noted that, unlike the related art, the scheme of the application not only performs brightness difference judgment based on the color image, but also performs brightness difference judgment based on the depth image, so that the accuracy of identifying the corner points can be improved.
Since the corner points are points that are significantly different from surrounding pixels, the corner points can be used as feature points in the image. By finding the corner points in the image, the characteristic points in the image can be identified according to the corner points.
In an embodiment of the present application, a Color image (Color) and a depth image (DEPTHMIPMAP) corresponding to a pyramid level may be bound in a computation shader according to specified parameters.
Wherein the specified parameters may include a scaling factor and an ID of the image.
For example, the specified parameter glBindImage (id=101, α=4) indicates that the identification of the image is 101, and the scaling coefficient α takes 4. Layer 4 in the image pyramid, the color image identified as 101, and the corresponding depth image may be bound in the computational shader according to the specified parameters.
Alternatively, different scaling factors may be preset for different types or scenes of games. For example, for a type 1 game, the corresponding scaling factor parameter is 3, indicating that the third layer in the pyramid is taken, and for a type 2 game, the corresponding scaling factor parameter is 4, indicating that the fourth layer in the pyramid is taken. The method can be specifically set according to actual use requirements, and the embodiment of the application is not limited. For convenience of explanation, in the embodiment of the present application, scaling factor of 4 is taken as an example for illustration.
For example, referring again to fig. 8, assuming that the original image is 256×256 pixels in size, an 8-layer pyramid is generated, and assuming that the scaling factor α=4, then the fourth-layer image of the pyramid may be tied in the computational shader. Wherein the fourth layer image of the pyramid is 16 x 16 pixels in size. The present application can determine all corner points in the 16×16 pixel block area by using the luminance data of the 16×16 pixel block and the depth data thereof (i.e., the depth value corresponding to the fourth level).
The process of determining corner points according to the luminance data of pixels and the depth data thereof according to the embodiment of the present application is described in detail below.
(1) Process for determining corner points according to brightness data of pixel points
In the compute shader of the GPU, the luminance data may include luminance values for individual pixels in a 16 x16 block of pixels.
The luminance value of each pixel is calculated in the calculation shader of the GPU by the following equation 6:
Intesity=color.r 0.299+color.g 0.587+color.b 0.114 (equation 6)
Wherein color.r, color.g, and color.b represent luminance values of a certain pixel in a Color image (Color) on R, G, and B channels, respectively. The luminance represents the luminance value of a certain pixel in a Color image (Color). For convenience of description, the luminance value corresponding to the pixel point (x, y) is denoted as the luminance (x, y).
In the embodiment of the application, the brightness difference between the pixel points is calculated in the calculation shader of the GPU based on the pixel points (x, y) and a plurality of pixel points which are at a preset distance from the pixel points (x, y).
The preset distance is an empirical value of multiple tests, and can be determined according to the application scene, perspective, aperture and other factors. The preset distance may be 2 pixels, for example.
Illustratively, FIG. 9A shows the positional relationship of pixel (x, y) with pixel (x-2, y), (x+2, y), (x, y-2), and (x, y+2).
As shown in Table 1 below, the luminance values corresponding to each of the pixel points (x-2, y), (x+2, y), (x, y-2), and (x, y+2) are respectively denoted as the luminance (x-2, y), the luminance (x+2, y), the luminance (x, y-2), and the luminance (x, y+2).
TABLE 1
As shown in table 2 below, the absolute value of the difference between the luminance of the luminance (x, y) and the luminance (x-2, y) was calculated, and the absolute value of the difference between the luminance (x, y) and the luminance (x+2, y) and the absolute value of the difference between the luminance (x, y) and the luminance (x, y-2) were calculated.
TABLE 2
| Two pixel points |
Absolute value of difference in luminance |
| (X, y) and (x-2, y) |
∣Intensity(x,y)-Intensity(x-2,y)∣ |
| (X, y) and (x+2, y) |
∣Intensity(x,y)-Intensity(x+2,y)∣ |
| (X, y) and (x, y-2) |
∣Intensity(x,y)-Intensity(x,y-2)∣ |
| (X, y) and (x, y+2) |
∣Intensity(x,y)-Intensity(x,y+2)∣ |
Then, judging whether the absolute value of the difference between the pixel brightness is larger than or equal to a preset brightness threshold value in a computing shader of the GPU, and counting the number that the absolute value of the difference between the brightness is larger than or equal to the preset brightness threshold value.
If the counted number is greater than or equal to the preset number (for example, 3), and the initial selection condition is met, then the current pixel point is judged to be possibly the corner point, and the next step is continued to be executed so as to further judge whether the current pixel point is the corner point or not.
If the counted number is smaller than the preset number, judging that the current pixel point is not likely to be the corner point, and stopping processing the current pixel point.
It should be noted that, through the step (1), it is determined that a part of the pixels in the image are unlikely to be corner points in the computing shader of the GPU, so that the part of the pixels can be removed, and then the rest of the pixels which are likely to be corner points can be further determined according to the depth data in the computing shader of the GPU, so as to reduce the calculation amount and improve the recognition accuracy.
(2) Process for determining corner points according to depth data of pixel points
The depth data includes depth values corresponding to the pixels of the selected level of the image pyramid calculated in the step S101.
Depth data corresponding to a pixel point (x, y) of the i-th layer of the Depth image DEPTHMIPMAP may be represented as depth_mipi (x, y). Illustratively, taking 4, the present application employs a 16×16 pixel block of layer 4 of the image pyramid, and the Depth data corresponding to the Depth image DEPTHMIPMAP may be denoted as depth_mip4 (x, y).
For the pixel points meeting the initial selection condition determined in the previous step, sampling Depth data (depth_mipx, depth_mipy and depth_mipz) corresponding to the Depth image DEPTHMIPMAP according to the image coordinates (x, y) of the pixel points, and multiplying the sampling result by the inverse matrix of the projection matrix, thereby calculating the spatial coordinates of the pixel in the camera coordinate system:
pos (x, y, z) = [ (x, y): 2.0-1.0, depth_mip4 (x, y), 1.0] = Inverse (projectionMatrix) (equation 7)
Where projectionMatrix denotes a projection matrix and Inverse () denotes an inversion matrix.
Pos (x, y, z) includes three components, pos (x, y, z). X, pos (x, y, z). Y, and Pos (x, y, z). Z.
Then, by homogeneous coordinate transformation, the spatial coordinates of Pos (x, y, z) can be expressed as:
Pos (x, y, z) = (Pos (x, y, z),. X/Pos (x, y, z),. W, pos (x, y, z),. Y/Pos (x, y, z),. W, pos (x, y, z),. Z/Pos (x, y, z),. W) (equation 8)
Where Pos (x, y, z). W represents the variable introduced by the homogeneous coordinate transformation.
Then, the spatial coordinates Pos (x, y, z) of the pixel point are converted into two-dimensional image coordinates Pos (x, y), so that the two-dimensional image coordinates Pos (x, y) of the pixel point are calculated according to the depth value and the coordinate conversion of the pixel point.
Similarly, according to the depth value and coordinate conversion of the pixel point, the image coordinates of four pixel points near the pixel point (x, y) are calculated again:
(a) Two-dimensional image coordinates Pos (x-2, y) of the pixel point (x-2, y);
(b) Two-dimensional image coordinates Pos (x+2, y) of the pixel point (x+2, y);
(c) Two-dimensional image coordinates Pos (x, y-2) of the pixel point (x, y-2);
(d) Two-dimensional image coordinates Pos (x, y+2) of the pixel point (x, y+2).
Then, the absolute value of the difference between the pixel brightness of the pixel points Pos (x, y) and Pos (x-2, y), pos (x, y) and Pos (x+2, y), pos (x, y) and Pos (x, y-2), pos (x, y) and Pos (x, y+2) is calculated.
Next, according to an algorithm similar to the above step (1), it is determined in the calculation shader of the GPU whether the absolute value of the difference between pixel intensities is greater than or equal to a preset distance threshold, and the number of differences between pixels whose absolute value is greater than or equal to the preset pixel threshold is counted. If the counted number is greater than or equal to the preset number (e.g. 3), i.e. the condition is satisfied, it may be determined that the pixel point is a corner point. If the counted number is smaller than the preset number (e.g. 3), the current pixel cannot be the corner point, and the processing of the current pixel is stopped.
In the embodiment of the present application, for each pixel point in a 16×16 pixel block area in a GPU's computation shader, according to the step (1), it is determined that a part of the pixels in the image are unlikely to be corner points according to the luminance data of the pixel point in the GPU's computation shader, the part of the pixels are removed, and then according to the step (2), further determination is made on the remaining pixels that are likely to be corner points according to the depth data of the pixel point, thereby determining all corner points in the 16×16 pixel block area.
Step 103, calculating the response function value of the corner pixel
The process of calculating the response function value (otherwise referred to as the response value) for the corner pixels identified in step 102 in the computation shader of the GPU is described in detail below.
In the embodiment of the application, the response function value of each corner pixel is calculated according to the brightness difference and the depth difference of each corner pixel (x, y) and four adjacent pixels (x-1, y), (x+1, y), (x, y-1) and (x, y+1). Fig. 9B shows the positional relationship of the corner pixel (x, y) with four neighboring pixels (x-1, y), (x+1, y), (x, y-1), and (x, y+1).
(1) Luminance values of four neighboring pixels (x-1, y), (x+1, y), (x, y-1), and (x, y+1) around the corner pixel (x, y) are calculated, and depth values of four neighboring pixels around the corner pixel are calculated.
The luminance value of two adjacent pixels in the x direction of the corner pixel (x, y) is the luminance (x-1, y), and the luminance (x+1, y).
The luminance value of two adjacent pixels in the y direction of the corner pixel (x, y): luminance (x, y-1), luminance (x, y+1).
The Depth value of two adjacent pixels of the corner pixel (x, y) in the x direction is Depth_mip (x-1, y), depth_mip (x+1, y).
The Depth value of two adjacent pixels of the corner pixel (x, y) in the y direction is depth_mip (x, y-1), depth_mip (x, y+1).
(2) And (3) calculating the brightness difference of the corner pixels (x, y) in the x direction and the y direction according to the brightness value obtained by calculation in the step (1).
Luminance difference of corner pixel (x, y) in x direction: ix=density (x+1, y) -density (x-1, y);
luminance difference of corner pixel (x, y) in y direction iy=luminance (x, y+1) -luminance (x, y-1).
(3) And (3) calculating the depth difference between the x direction and the y direction of the corner pixel (x, y) according to the depth value calculated in the step (1).
Depth difference of corner pixel (x, y) in x direction: dx=depth_mip (x+1, y) -depth_mip (x-1, y);
Depth difference of corner pixel (x, y) in y-direction dy=depth_mip (x, y+1) -depth_mip (x, y-1).
(4) Using Ix, iy, dx, and dy calculated in the above (3), a response function value (denoted as response) of the corner pixel (x, y) is calculated using an exponential function.
Response function value of corner pixel (x, y) response=exp (Ix, iy, dx, dy) =e (Ix,Iy,dx,dy). (equation 9)
In the embodiment of the application, the response function value can be calculated for each corner point in the 16×16 pixel block area, and the response function values of all corner point pixels in the 16×16 pixel block area can be calculated.
It should be noted that, in the related art, the response function value of the pixel is calculated only by calculating the brightness difference between the adjacent pixels, unlike the related art, in the embodiment of the present application, not only the brightness difference between the adjacent pixels is calculated on the GPU side, but also the depth difference between the adjacent pixels is calculated on the GPU side, and the response function value of each pixel is calculated on the GPU side according to the brightness difference between the adjacent pixels and the depth difference between the adjacent pixels, thereby improving the accuracy of calculating the response function value.
In the embodiment of the application, whether each corner pixel is a feature point can be judged by comparing the response function value response of the corner pixel with a preset response threshold. If the response function value response is greater than or equal to the preset response threshold, the current pixel point is a feature point, and the step 4 is continuously executed. If the response function value response is smaller than the preset response threshold, the current pixel is not a feature point, and the processing of the current pixel is stopped.
Therefore, some pixel points are screened out from the corner pixels according to the response function values of the corner pixels to serve as characteristic points, the calculated amount can be reduced, and the calculation accuracy can be improved.
It should be noted that, by performing feature point screening for each 16×16 pixel block area, there are results that one or more feature points exist in some 16×16 pixel block areas and no feature points exist in some 16×16 pixel block areas.
Thus, all feature points and their response function values within each 16×16 pixel block area can be determined by step 103.
Step 104, maximum value suppression of feature points
Maximum value suppression means searching local maximum values and removing similar feature points gathered locally.
Since there may be locally aggregated similar feature points in the 16×16 pixel block area determined by step 103, these similar feature points are redundant feature points, it is necessary to reject this part of similar feature points.
It should be noted that, in the related art, the maximum value suppression is implemented by executing the program code on the CPU through OpenCV, unlike the related art, the present application adopts the shared cache variable to implement the maximum value suppression.
In the embodiment of the application, the shared buffer variable of the thread working group of the computation shader can be utilized to take the maximum value of the response function values of all the characteristic points in each 16×16 pixel block area so as to eliminate the similar characteristic points which are locally gathered.
Referring to FIG. 10, in an embodiment of the present application, assuming the original image size is W H, then in the compute shader, W/16H/16 thread work groups (alternatively referred to as local work groups) may be employed. Wherein each thread work group comprises 16×16 threads, and each thread corresponds to a feature point pixel.
In the embodiment of the present application, each thread work group defines two shared (shared) variables of the shared cache, which are respectively referred to as a shared variable 1 and a shared variable 2. The shared variable 1 is used for storing the maximum response value max_response in the current working group, and the shared variable 2 is used for storing the pixel coordinate max_response_index corresponding to the maximum response value.
Referring to table 3 below, assuming w×h is 256×256, W/16×h/16=256 thread work groups are employed in the compute shader, each thread work group including 16×16 threads, each thread work group defining two shared variables of the shared cache. For each thread work group, determining the maximum response value of 16×16 threads in the thread work group, storing the maximum response value in the shared variable 1, and storing the pixel coordinate corresponding to the maximum response value in the shared variable 2.
TABLE 3 Table 3
In the embodiment of the present application, for each thread work group, each feature point pixel invokes the atomic operation atomicMax of the compute shader, computes the maximum between its own response value response and max_response, and assigns the computed maximum to max_response. And if the maximum response value max_response is equal to the response value response of the user, assigning the image coordinates of the user to max_response_index.
For example, consider a thread work group 1 having 16×16 threads, one for each feature point pixel, three of which are illustrated as examples. For example, the response values of the three feature points are 1,2,1.5, respectively. The response value response of each feature point is compared with max_response, and the maximum value is assigned to max_response. The maximum response value max_response stored in shared variable 1 is initialized to 0.
First, an atomic operation atomicMax (1, 0) =1 of the compute shader is invoked, and the maximum value of the response value 1 and the initialization value 0 can be determined to be 1, and the maximum value 1 is assigned to max_response, that is, the maximum response value max_response=1.
Then, the atomic operation atomicMax (2, 1) =2 of the computation shader is called again, the maximum value can be determined to be 2, and the maximum value 2 is assigned to max_response, that is, the maximum response value max_response=2.
Then, the atomic operation atomicMax (1.5, 2) =2 of the computation shader is called again, the maximum value can be determined to be 2, and the maximum value 2 is assigned to max_response, that is, the maximum response value max_response=2.
Thus, a local maximum response value can be determined. For thread work group 1, shared variable 1 holds the maximum response value in thread work group 1, and shared variable 2 holds the pixel coordinates corresponding to the maximum response value.
Through the step 4, the response function values of all the feature points in each 16×16 pixel block area are maximized by using the shared buffer variables of the thread work group of the computation shader, so as to eliminate the similar feature points which are locally aggregated.
For example, referring again to FIG. 10, for a W/16 XH/16 thread work group (also referred to as a local work group), a local maximum response value for 16X 16 threads in each thread work group may be determined according to step 104, such that W/16 XH/16 local maximum response values may be determined. And respectively recording W/16 XH/16 local maximum response values and pixel coordinates corresponding to the local maximum response values through the shared buffer variables of the thread work groups.
Step 105, outputting the characteristic point information
And writing the local maximum response values to pixel coordinates in a Mask image according to the local maximum response values and the pixel coordinates corresponding to the local maximum response values respectively recorded by the shared buffer variables of the thread work group.
In an embodiment of the present application, the G channel for each pixel in the depth image DEPTHMIPMAP is initialized to 0.0. The G channel of the pixel may be used to hold the maximum response value max_response of the thread work group. It should be noted that, the G channel is taken as an example for illustration, and in actual implementation, an R channel or a B channel may also be used, and in particular, may be set according to a use requirement, which is not limited by the embodiment of the present application.
For each feature point, the index gl_ LocalInvocationIndex of the feature point pixel in the thread work group is looked up. If the index of the feature point pixel 1 in the thread work group 1 is gl_ LocalInvocationIndex =0, the maximum response value max_response of the thread work group 1 is written to the G channel at the max_response_index position of the depth image DEPTHMIPMAP, thereby generating image feature point information. Thus, the identification of the image feature points is completed. Here, the position index of the thread is exemplified as 0, and the position index of the thread may be set to other values in actual implementation.
The embodiment of the application takes the depth image written with the maximum response value as a mask image.
Illustratively, as shown in fig. 11, by performing feature point recognition on a color image, 5 local maximum response values, namely, feature point 1 to feature point 5, are determined for the color image, the response values and pixel coordinates of the five feature points are stored in a shared buffer, and then the response values of the five feature points are written into corresponding pixel coordinates of a mask image, so as to obtain the mask image.
The mask image is a result of performing feature point recognition on the color image. All the identified feature points are marked in the mask image, and the response value of each feature point is marked at the pixel coordinates of the feature point.
It should be noted that, the results of feature point recognition are simply illustrated herein, and it is to be understood that, in actual implementation, more or fewer feature points may be recognized for a frame of image, and may be specifically determined according to actual situations, which is not limited by the embodiment of the present application.
According to the scheme, the characteristic point identification operation is directly executed in the rendering pipeline by utilizing the data specific to the GPU in the graphic rendering application, so that the identification efficiency is greatly improved on the premise of ensuring the identification accuracy, and the calculated power consumption is reduced.
The image feature point recognition method based on the GPU provided by the embodiment of the application is described above with reference to a specific embodiment, and the feature point recognition result can be used for feature point matching. The following describes a possible implementation manner of image feature matching based on GPU according to the embodiment of the present application with reference to a specific embodiment.
For a better understanding of the embodiments of the present application, feature point recognition and matching will be described in association with the accompanying drawings.
After feature point recognition is carried out on each color image, a recognition result corresponding to each color image, namely a mask image, is obtained. And generating a mask image corresponding to each color image, wherein the mask image is marked with characteristic points and response values. And matching the feature points of the images with different colors by adopting corresponding mask images. For convenience of explanation, the feature point recognition and matching of the first color image and the second color image will be exemplified.
For example, referring to fig. 12A and 12B, feature point recognition is performed on the first color image and the second color image, respectively, to obtain a first mask image and a second mask image, and then feature points marked in the two mask images are compared to determine matched feature point pairs in the two mask images.
Specifically, response values of the feature points are marked in the first mask image, all the feature points in the first mask image are ordered according to the magnitude of the response values, and the largest N feature points are selected as a first feature point sequence. Then, a feature vector is calculated for each feature point in the first sequence of feature points.
And similarly, marking response values of the feature points in the second mask image, sorting all the feature points in the second mask image according to the response values, and selecting the largest N feature points as a second feature point sequence. A feature vector is calculated for each feature point in the second sequence of feature points.
Then, distance values between the feature vector of each feature point in the first feature point sequence and the feature vectors of all points in the second feature point sequence are calculated, wherein the two feature points with the smallest distance are the best matching points, so that a group of feature point pairs [ (x 0, y 0), (x 1, y 1) ] corresponding to each other is obtained, wherein (x 0, y 0) is the image coordinates of the feature point in the first color image, and (x 1, y 1) is the image coordinates of the feature point matched with (x 0, y 0) in the second color image.
As can be seen from fig. 12A and fig. 12B, in the embodiment of the present application, feature point recognition is performed on two frames of images, respective feature points of the two frames of images are recognized, and then feature point matching is performed on the respective feature points of the two frames of images, so as to determine matched feature point pairs in the two frames of images.
The following briefly describes a flow of image feature point matching provided by an embodiment of the present application. As shown in fig. 13, in an embodiment of the present application, the process of implementing image feature point matching by the electronic device through the GPU may include the following steps 201 to 205:
step 201, calculating a random vector by the GPU.
And loading the first mask image corresponding to the first color image and the second mask image corresponding to the second color image into a computing shader of the GPU. And calculating random numbers on the GPU by using the image coordinates of the pixels to generate 256 4-dimensional integer random constants.
The mask image is an image obtained by recognizing feature points of a color image (which may also be referred to as a real frame image), and features points and response values of the features points are marked in the mask image. Illustratively, referring back to fig. 11, a black region in the mask image is a non-feature point, a white region in the mask image is a feature point, and each feature point marks a response value of the feature point.
And 202, sequencing the feature points through the GPU.
Wherein the first color image and the second color image are loaded into a compute shader of the GPU. And sorting the characteristic points on the GPU according to the response values according to the characteristic points marked in the mask image and the response values, and selecting the characteristic points with the largest response values from the first N characteristic points.
And 203, calculating the feature vector of each feature point through the GPU.
Wherein the global workgroup size of the compute shader of the GPU is set to 16 x 16 (i.e., 256). And (3) calculating the feature vectors of the selected N feature points by utilizing the shared cache among the thread groups of the GPU's computation shader and the atomic operation and combining the 256 random constants ivec4 generated in the step (1).
And 204, calculating the feature distance between the feature vectors in the two frames of images by using the GPU.
And calculating feature distances between every two N feature vectors of the two frames of images, and counting the minimum feature distance.
And 205, performing feature matching according to the feature distance through the GPU.
If the minimum feature distance is smaller than the preset distance threshold, the feature points to which the two feature vectors of the minimum feature distance belong are successfully matched, and the two feature points successfully matched form a feature point pair. If the minimum feature distance is greater than or equal to the preset distance threshold, then the match fails. Thereby completing the matching of the image feature points.
The application provides a technology for matching image characteristics of mobile terminal equipment in real time and with low power consumption through a GPU, which is characterized in that characteristic points of different images in the graphics rendering pipeline are matched through special functions of the GPU, such as a graphics rendering unit of the GPU, a graphics rendering pipeline, a computing shader of the GPU, an inter-core shared cache, atomic operation of a thread work group and the like, so that the computing efficiency is greatly improved to reduce the time cost of pixel-by-pixel computation, and meanwhile, the cost of data copying and context switching between the characteristic matching and the graphics rendering pipeline is reduced.
In the embodiment of the application, in the process of matching the image feature points generated by the graphic rendering, the image resources are directly accessed in the graphic rendering pipeline, the copying operation of DDR (double data rate) between the GPU and the CPU is canceled, the heating of mobile terminal equipment caused by the DDR bandwidth overhead is greatly reduced, and meanwhile, the high concurrent multithreading processing capacity provided by the GPU and the shared cache among cores accessed by the thread atomic operation are utilized, so that the computing efficiency of feature point matching can be greatly accelerated, and the purpose of high-frame-rate application is achieved.
The flow of image feature point matching provided by the embodiment of the application is briefly described above, and a specific implementation manner of image feature matching based on the GPU provided by the embodiment of the application is described in detail below with reference to the accompanying drawings.
Step 201, calculating a random vector
In an embodiment of the present application, a readable and writable SSBO of size 256 and supporting ivec-4 vectors is created, ivec vector represents a signed integer four-component vector. For ease of illustration, this SSBO is denoted SSBO1.
It should be noted that, after the feature point recognition is performed on the color image, a mask image may be obtained. The mask image is marked with response function values corresponding to the image feature point pixels. Wherein the first color image corresponds to the first mask image and the second color image corresponds to the second mask image.
In the step of calculating the random number, the first mask image and the second mask image are loaded into a computation shader of the GPU, as shown in fig. 14A, the size of the local work group of the specified computation pipeline is 16×16, and the size of the global work group is (1, 1). That is, the global workgroup contains only one local workgroup, 16×16 threads as one local workgroup. 16 x 16 threads, i.e., 256 threads, are started on the GPU.
In an embodiment of the application, a random number seed is generated in a compute shader. For example, the value of the position index (gl_ LocalInvocationIndex) is multiplied by the scale factor (scale) of the parameter-specific random number to obtain a random number seed (denoted as r) by the following equation 10:
r=gl_ LocalInvocationIndex × scale (equation 10)
In the embodiment of the application, each thread in the working group comprehensively utilizes a trigonometric function and an exponential function to operate on a random number seed R to generate a 4-dimensional random vector (marked as R).
R=e sin(r) (equation 11)
Wherein each of the 4-dimensional random vectors is in the range of 0.0, 1.0.
It should be noted that, the embodiment of the present application generates 256 4-dimensional random vectors in total through calculation.
In an embodiment of the present application, each 4-dimensional random vector (R) is adjusted according to an offset parameter (offset). Illustratively, the random number may be adjusted by the following equation 12:
R' =r20xoffset (equation 12)
It should be noted that, in equation 12, 20 is a value that can be set according to practical situations, that is, the sampling size is defined as 20×20, and the offset parameter offset may be used to scale on the basis of the sampling size of 20×20, so that the scaled sampling size meets the requirement of sampling the image to be processed.
Illustratively, assuming that the local workgroup size employed for the image to be processed is 32×32, i.e., the image block size is 32×32, the offset parameter may take values in the range of [1.5,2.0 ]. Still further by way of example, assuming that the local workgroup size employed for the image to be processed is 16×16, i.e., the image block size is 16×16, the offset parameter may take values in the range of [0.25,0.5 ].
In actual implementation, R '= (r×20×offset) may be in decimal form, and for convenience of calculation, the result of the operation of R' may be taken as an integer part. The 4-dimensional random vectors are natural numbers through adjustment.
For ease of illustration, the 4-dimensional integer vector random number may be denoted as [ A, B, C, D ]. Wherein the two random numbers a and B represent a first random offset and the two random numbers C and D represent a second random offset. The 4-dimensional integer vector random number may be applied to the feature vector for calculating the feature points in the following steps, and the specific calculation process will be described in detail below.
Illustratively, the adjusted 4-dimensional random vector [ a, B, C, D ] = [2,2,3,5].
In the embodiment of the application, all 256 4-dimensional random vectors are adjusted to obtain 256 adjusted 4-dimensional random vectors, and the 256 adjusted 4-dimensional random vectors form a random number queue.
As shown in fig. 14B, in the embodiment of the present application, 256 adjusted 4-dimensional random vectors may be cached in SSBO 1.
In summary, the present application loads two frames of mask images into a computing shader of a GPU, generates a random number seed for each frame of mask image in the computing shader, then calculates the random number seed by using a trigonometric function and an exponential function to generate 256 four-dimensional vector random numbers, then performs scaling adjustment on the four-dimensional vector random numbers, and then combines the adjusted 256 four-dimensional integer vector random numbers [ a, B, C, D ] to form a random number queue.
It should be noted that, when the feature vector of the feature point is calculated later, each thread of the local working group may search the corresponding random number in the random number queue according to its own index (gl_ LocalInvocationIndex) and obtain the corresponding random offset, then sample the color image by using the thread coordinates and the random offset to obtain the rgb color value of the color image, calculate the brightness value of the color image according to the rgb color value of the color image, and further determine the feature vector of the feature point according to the brightness value of the color image.
Step 202, feature point ordering
First, the mask image obtained by recognizing the image feature points has a certain width and height. For convenience of explanation, the width of the mask image is denoted as W, and the height of the mask image is denoted as H.
In an embodiment of the present application, SSBOs of vec4 vectors of size N× (W/16) × (H/16) are created. For ease of illustration, this SSBO is denoted SSBO 2. Where vec4 vector refers to a 4-component floating point vector.
Wherein SSBO1 may include (W/16) × (H/16) pieces of data, each of which may store N pieces of data.
In the embodiment of the application, SSBO 2 can be used for segment storage of the characteristic point data of each of (W/16) x (H/16) local work groups, wherein each local work group is a 16 x 16 image block. Since the number of feature points stored in one 16×16 image block is at most 16×16, N is less than or equal to 256.
It should be noted that N may be 8 or 12, and may be specifically set according to practical requirements, which is not limited by the embodiment of the present application. For convenience of explanation, in the embodiment of the present application, N is taken as 8 to be exemplary. When N is 8, N is more than or equal to 0 and less than or equal to 8.
Illustratively, it is assumed that the mask image has a size of 256×256, i.e., a width W and a height H of 256 pixels. As described above, in the embodiment of the present application, 16×16=256 threads are taken as one local work group, so that 256 local work groups can be divided for a mask image of 256×256 in size.
Accordingly, the size of SSBO 2 may be:
n× (W/16) × (H/16) =N× (256/16) × (256/16) =256×N (equation 13)
That is, SSBO 2 may store 256 pieces of data of a local work group in segments, and each segment may store N pieces of data.
In the feature point ordering step, two frames of mask images are loaded into a computation shader, the size of a local work group of a computation pipeline is designated as 16×16, and the size of a global work group is (W/16, h/16, 1), i.e., the global work group includes W/16×h/16 local work groups. That is, the global workgroup contains W/16×H/16 local workgroups, with 16×16 threads as one local workgroup. 16 x 16 threads, i.e., 256 threads, are started on the GPU.
Illustratively, assuming that the width W and the height H of the mask image are 256 pixels, the global workgroup of the computation pipeline of the computation shader has a size (16,16,1) and the local workgroup has a size of 16×16, as shown in fig. 15.
In the embodiment of the application, the characteristic point ordering is realized by utilizing a shared buffer mechanism among thread work groups of the computing shader. Illustratively, a shared cache of vec4 vectors of size N may be defined. The shared cache may store data for N feature points.
In an embodiment of the present application, referring to FIG. 16, the global workgroup includes W/16 XH/16 local workgroups. Feature points are extracted for each local work group in the global work group. The local working group may or may not include one or more feature points. In order to improve the calculation rate, the number N of the characteristic points in the local working group is controlled to be N at most, namely the number N of the characteristic points in each local working group is in the range of [0, N ].
Specifically, referring to fig. 17, for each feature point corresponding to the local working group, the response values of the feature points may be extracted, n feature points with the largest response values in the current local pixel region may be counted by using an atomic operation (atomicMax), and the n feature points with the largest response values may be arranged in order from large to small. Then, the data of the n feature points are stored in the shared cache. In the thread with the position index (gl_ LocalInvocationIndex) of 0, the data of n feature points in the shared cache are copied to the corresponding position of the SSBO 2. In this case, the thread with the index of 0 is taken as an example for illustration, and in actual implementation, threads with other values of the index may be selected, so long as any thread in the thread work group is selected each time to execute subsequent operations, so as to avoid error reporting caused by simultaneous execution of multiple threads.
For example, assuming that N is 8 and N is 3, that is, the length of each subarray is 8, in the case that there are 3 feature points, the 3 feature points may be sorted from the large to the small in response value into one subarray in SSBO 2, and the other 5 positions of the subarray are written with 0.
The embodiment of the application determines N characteristic points with the largest response values on the whole image in a sectional sorting and sectional storage mode.
In one implementation, the whole image is divided into a plurality of local pixel areas, and N feature points with the largest response values in each local pixel area are respectively stored in a plurality of segments of SSBO 2 after being sequenced. Referring to fig. 18A, all feature points in all local pixel areas stored in a plurality of segments of SSBO 2 are sorted, and feature points of the first N maximum response values in the entire image are selected.
In another implementation manner, the whole image is divided into a plurality of local pixel areas, after the N feature points with the largest response values in each local pixel area are sorted, the N feature points are respectively stored in a plurality of segments of the SSBO 2, and then the feature points in each local pixel area are sorted according to the response values from large to small. And selecting one characteristic point with the local maximum response value in each local pixel region, and then sorting all the selected characteristic points with the local maximum response values, thereby selecting the characteristic points with the first N local maximum response values in the whole image.
Referring to fig. 18B, in SSBO 2, there may be stored (W/16) × (H/16) sub-arrays, each having a length of N, wherein N pieces of feature point data may be stored in each sub-array, the N feature points being arranged in order from large to small according to response values. That is, the first position in each subarray stores the value with the largest response value, i.e., the local maximum value, among the N feature points. For (W/16) x (H/16) child arrays, (W/16) x (H/16) local maxima may be extracted. And then sorting the (W/16) x (H/16) local maximum values from large to small, and selecting N feature points with the largest global value from the (W/16) x (H/16) local maximum values as N feature points with the largest response values on the whole image.
Referring to fig. 19, after feature point identification is performed on an image, a large number of feature points are identified in the image, and in order to improve the efficiency of feature point matching, in the embodiment of the application, the N feature points with the largest overall situation are screened out by adopting a feature point segmentation ordering mode, namely, only the N feature points with the largest response values are reserved through feature point ordering, and then feature vectors of the N feature points with the largest response values are calculated.
By utilizing the feature point segmentation sequencing mode provided by the application, feature point sequencing is respectively carried out on the first mask image and the second mask image, so that two groups of feature point sequences are obtained, wherein the two groups of feature point sequences are respectively the first feature point sequence of the first mask image and the second feature point sequence of the second mask image. For example, N may be 8, that is, the first feature point sequence includes 8 feature points with the largest response values on the first mask image, and the second feature point sequence includes 8 feature points with the largest response values on the second mask image.
Step 203, calculating the feature vector of the feature point
For the first and second feature point sequences generated in step 202, a feature vector (specialVector) is calculated for each of the two feature point sequences.
Referring to fig. 20, after the segment ordering of a large number of feature points in the mask image, N (e.g., 8) feature points having the largest response values are retained in the mask image. Then, a feature vector is calculated for each feature point by first determining two pixels offset by a random offset with respect to the feature point and then calculating the feature vector for the feature point based on the luminance difference of the two pixels.
The process of calculating the feature vector of each feature point will be described in detail below.
In an embodiment of the present application, the first color image and the second color image are loaded to the compute shader, specifying that the local workgroup of the compute pipeline of the compute shader is 16 x 16 in size and the global workgroup is (1, 1) in size. That is, the global workgroup contains only one local workgroup, 16×16 threads as one local workgroup. 16 x 16 threads, i.e., 256 threads, are started on the GPU.
In the GPU's compute shader, each thread of the local workgroup looks up a 4-dimensional random vector corresponding to a subscript in 256 4-dimensional random vectors cached in SSBO 1 as a random offset (offsetUV _i) according to its own position index (gl_ LocalInvocationIndex).
Wherein, the random offset (offsetUV _i) can be expressed as a 4-dimensional random vector [ A, B, C, D ], A and B represent a first random offset, which can be denoted as offsetUV _i1, and C and D represent a second random offset, which can be denoted as offsetUV _i2.
It should be noted that the reason for using the 4-dimensional random vector in the embodiment of the present application is that two-dimensional random vectors are required to do address offset. That is, in calculating the feature vector of a pixel, two neighboring pixels are required to be taken at random for the position of the current pixel. If a neighboring pixel is taken randomly, then a two-dimensional random vector is required that is offset randomly in both the lateral and longitudinal directions. Further, if two neighboring pixels are taken at random, two-dimensional random vectors are required, which constitute a four-dimensional random vector.
For example, referring to FIG. 21, assume a 4-dimensional random vector of [2,2,3,5]. Wherein [2,2] represents a first random offset and [3,5] represents a second random offset. And (3) performing first offset on the characteristic points, wherein the random offset is [2,2], and the coordinates of the offset pixel image are (x+2, y+2). And (3) performing second offset on the characteristic points, wherein the random offset is [3,4], and the coordinates of the pixel image after the offset are (x+3, y+5).
In the embodiment of the application, for each thread, two random neighbor pixels at the pixel position of the current feature point are determined, the color values of the two random neighbor pixels are obtained by sampling, then the brightness of each random neighbor pixel is calculated according to the color value of each random neighbor pixel, then the respective brightness of the two random neighbor pixels is compared, and the feature vector of the feature point is obtained according to the comparison result. The process of calculating the feature vector of the feature point will be described in detail below.
First, a first Color image (Color 1) is sampled by using the coordinates (gl_globalinvationid. Xy) and the random offset offsetUV _i1 of each thread, and a first rgb Color value of a pixel point corresponding to the thread is obtained:
Color 1_i.rgba=texture (Color 1, gl_globalInvationid.xy+ offsetUV _i1) (equation 14)
Wherein the texture () function returns a vector of the vec4 type, representing the color value of the texture at the specified location.
The Color 1. Rgba indicates the first rgb Color value of the pixel point. The first rgb Color value is a 4-dimensional vector of Color1_ i.r, color1_i.g, color1_i.b, and Color1_i.a.
Similarly, the first Color image (Color 1) is sampled by using the coordinates (gl_globalinvationid. Xy) and the random offset offsetUV _i2 of each thread, and the second rgb Color value of the pixel point corresponding to the thread is obtained:
Color1 i +1.Rgba = texture (Color 1, gl_Globalntervalation ID. Xy+ offsetUV _i2) (equation 15)
The Color1_i+1.Rgba represents the second rgb Color value of the pixel point. The second rgb Color values are 4-dimensional vectors of Color1_i+1.r, color1_i+1.g, color1_i+1.B, and Color1_i+1.A.
Then, a first luminance value intensity_i is calculated according to the first rgb color value of the pixel point:
intesity_i=color 1_ i.r x 0.299+color1_i.g x 0.587+color1_i.b x 0.114 (equation 16)
Similarly, a second luminance value of luminance_i+1 is calculated from the second rgb color value of the pixel point:
Intesity_i+1=color 1_i+1.r 0.299+color1_i+1.g 0.587+color1_i+1.b 0.114 (equation 17)
Then, each thread calculates the size relation between the first brightness value intensity_i and the second brightness value intensity_i+1 corresponding to the thread by utilizing the atomic operation atomicOr () of the thread work group, and a calculation result is obtained:
result= atomicOr (integrity_i, integrity_i+1) (equation 18)
For example, if the density_i+1 is greater than the density_i, then result is 1. If the density_i+1 is less than or equal to the density_i, then result is 0.
In the embodiment of the application, the operations are respectively carried out on 16×16 threads, so that 16×16 results can be obtained.
Then, 16×16 results are output to the gl LocalInvocationIndex th position of the feature vector (specialVector).
In the embodiment of the application, a plurality of results can be written through an Atomic () function, and the Atomic () function can be specifically expressed as:
Atomic(specialVector[gl_LocalInvocationIndex],result)
wherein simultaneous writing of multiple results can be supported by the Atomic () function. Thus, read-write conflicts in the GPU caused by read-write locks can be avoided.
In the embodiment of the application, after all the thread calculation is completed, a feature vector of a feature point is obtained.
In this way, the feature vector is calculated for each of the N feature points of the first Color image (Color 1), resulting in N feature vectors. Wherein the feature vector of each feature point may be a 256-bit sequence of 0 or 1.
It should be noted that, the above is exemplified by calculating the feature vector of each feature point in the first feature point sequence for the first Color image (Color 1), and the process of calculating the feature vector of each feature point in the second feature point sequence for the second Color image (Color 2) is similar to the above, and will not be repeated. Therefore, by the algorithm provided by the embodiment of the application, the feature vector of each feature point in the first feature point sequence and the feature vector of each feature point in the second feature point sequence can be calculated.
For example, referring to table 4 below, the first feature point sequence includes N feature points with the largest response values in the first color image, each of the N feature points corresponds to a feature vector, and the feature vector corresponding to each feature point is a sequence of 256 bits of 0 or 1. Similarly, the second feature point sequence includes N feature points with the largest response values in the second color image, each feature point in the N feature points corresponds to a feature vector, and the feature vector corresponding to each feature point is a sequence of 256 bits 0 or 1.
TABLE 4 Table 4
Step 204, calculating feature distances between feature vectors
In the above step 202, a first feature point sequence (denoted by KeyPoint 1) is generated for one frame image and a second feature point sequence (denoted by KeyPoint) is generated for another frame image, and in step 203, a feature vector of each feature point in the first feature point sequence and a feature vector of each feature point in the second feature point sequence are calculated.
The two feature point sequences respectively comprise N feature points, each feature point corresponds to a feature vector, and each feature point corresponds to a 256-bit 0 or 1 sequence.
It should be noted that the int data type occupies 4 bytes, i.e. 32 bits, in the memory, so that 8 int variables can be used to represent 256 bits of data. In particular, in the scheme of the application, since the feature vector is a group of 256-bit 0 or 1 sequences, the application uses 8 int-type variable segments to store the values of the feature vector. Where each int-type variable stores a sequence of 32 bits 0 or 1.
Wherein 8 int-type variables can be represented as INT SVDATA [ k ], k 0, 1.
Any one of the feature vectors in the first feature point sequence KeyPoint1 may be expressed as KeyPoint1.svData[0]、KeyPoint1.svData[1]、KeyPoint1.svData[2]、KeyPoint1.svData[3]、KeyPoint1.svData[4]、KeyPoint1.svData[5]、KeyPoint1.svData[6]、KeyPoint1.svData[7].
Any one of the feature vectors in the second feature point sequence KeyPoint1 can be expressed as KeyPoint2.svData[0]、KeyPoint2.svData[1]、KeyPoint2.svData[2]、KeyPoint2.svData[3]、KeyPoint2.svData[4]、KeyPoint2.svData[5]、KeyPoint2.svData[6]、KeyPoint2.svData[7].
In the embodiment of the present application, the exclusive or operation may be performed on the feature vectors of the first feature point sequence KeyPoint and the second feature point sequence KeyPoint in pairs:
Int result0=KeyPoint1.svData[0]^KeyPoint2.svData[0];
Int result1=KeyPoint1.svData[1]^KeyPoint2.svData[1];
Int result2=KeyPoint1.svData[2]^KeyPoint2.svData[2];
Int result3=KeyPoint1.svData[3]^KeyPoint2.svData[3];
Int result4=KeyPoint1.svData[4]^KeyPoint2.svData[4];
Int result5=KeyPoint1.svData[5]^KeyPoint2.svData[5];
Int result6=KeyPoint1.svData[6]^KeyPoint2.svData[6];
Int result7=KeyPoint1.svData[7]^KeyPoint2.svData[7]。
Where "≡" is the sign of the exclusive or operation. Table 5 shows the arithmetic rules of the exclusive-or operation.
TABLE 5
| |
|
Exclusive or result |
| 0 |
0 |
0 |
| 1 |
1 |
0 |
| 0 |
1 |
1 |
| 1 |
0 |
1 |
In the embodiment of the application, by storing the feature vectors of the first feature point sequence KeyPoint and the second feature point sequence KeyPoint in pairs according to the segments, 8 exclusive-or operations are correspondingly performed on the feature vectors of the first feature point sequence KeyPoint and the second feature point sequence KeyPoint in pairs, and 8 exclusive-or results are correspondingly obtained. Each exclusive or result is a sequence of 32 bits 0 or 1.
In the embodiment of the present application, the number of 1 in each exclusive-or result may be counted by using bitCount () instruction of the GPU, and the numbers of 1 in 8 exclusive-or results are accumulated and summed, and the sum obtained by accumulation is used as the feature distance (SVDISTANCE) of two feature vectors:
svDistance=bitCount(result0)+bitCount(result1)+bitCount(result2)+bitCount(result3)+bitCount(result4)+bitCount(result5)+bitCount(result6)+bitCount(result7).
the method of calculating the feature distances of the two feature vectors is described above. The method for calculating the feature distance of each feature point in the two feature point sequences according to the embodiment of the present application is described below based on the above method for calculating the feature distances of the two feature vectors.
Referring to fig. 22A and 22B, the first feature point sequence (KeyPoint 1) includes N feature points (denoted as KeyPoint 1_i) having the largest response value in the first color image, each feature point corresponding to a feature vector. Similarly, the second feature point sequence (KeyPoint) includes N feature points (denoted as KeyPoint2 _i) with the largest response values in the second color image, and each feature point corresponds to a feature vector.
In the embodiment of the application, according to the feature vector of each feature point (KeyPoint 1 _i) in the first feature point sequence (KeyPoint 1), and the feature vectors of all feature points (Keypoint 2_i, i) in the second feature point sequence (Keypoint 2), the feature distances between every two feature points are calculated by taking the feature vectors of 1,2, N in sequence.
A total of N sets of computations are required, and each set of computations results in N feature distances.
The first set of calculations, as shown in fig. 22A and table 6 below, may calculate feature distances between each pair based on the feature vector of the first feature point (KeyPoint 1 _1) in the first feature point sequence (KeyPoint 1) and the feature vectors of the N feature points Keypoint2_1, keypoint2_2, keypoint2_n in the second feature point sequence (Keypoint 2). For feature point KeyPoint1_1, N feature distances can be obtained by calculation.
The second set of calculations, as shown in fig. 22B and table 6 below, may calculate feature distances between each pair based on the feature vector of the second feature point (KeyPoint 1 _2) in the first feature point sequence (KeyPoint 1) and the feature vectors of the N feature points Keypoint2_1, keypoint2_2, keypoint2_n in the second feature point sequence (Keypoint 2). For feature point KeyPoint1_2, N feature distances can be obtained by calculation.
As can be seen from table 6, the feature distances between the feature vectors of the nth feature point (KeyPoint 1 _n) in the first feature point sequence (KeyPoint 1) and the feature vectors of the N feature points Keypoint2 _2_1, keypoint2_2, keypoint2_n in the second feature point sequence (Keypoint) can be calculated. For feature point KeyPoint1_n, N feature distances can be obtained by calculation.
Wherein Keypoint1_1, keypoint1 _1_2,.. Keypoint1_n represent the first and second feature points, respectively, of the first feature point sequence (Keypoint), respectively. Keypoint2_1, keypoint2_2,.. Keypoint2_n represent the first feature point and the second feature point of the second feature point sequence (Keypoint 2), respectively.
TABLE 6
Step 205, performing feature matching according to the feature distance
In the embodiment of the present application, referring to the following table 7, for each of the N sets of computations, a minimum value of the N feature distances is found, and the minimum value of the N feature distances is compared with a preset distance threshold (SVDISTANCE _min), and whether the feature points are matched is determined according to the comparison result, so as to improve accuracy of feature point matching.
TABLE 7
If the minimum distance value in the N feature distances is smaller than the preset distance threshold value, the two feature points corresponding to the minimum distance value are successfully matched. If the minimum distance value of the N feature distances is greater than or equal to the preset distance threshold value, the matching fails.
When the matching operation of the feature points in the two frames of images is completed, the feature point pairs characterizing the same object in different images may be output.
For example, as shown in fig. 23, assuming that the feature point (x 1, y 1) in the first color image and the feature point (x 2, y 2) in the second color image are two feature points successfully matched, the feature point (x 1, y 1) and the feature point (x 1, y 1) constitute one feature point pair. Accordingly, the pair of feature points may be noted as [ (x 1, y 1), (x 2, y 2) ].
In actual implementation, after the feature points of the two frames of images are matched, the matched feature point pairs between the two frames of images can be determined, and then the two frames of images are used for the purposes of motion estimation, image stitching, background segmentation and the like according to the position information of the matched feature point pairs.
The following describes an application scenario of feature point identification and matching based on GPU implementation provided by the embodiment of the present application. An exemplary description will be given below taking a game scene as an example.
The feature point identification and matching based on GPU implementation provided by the embodiment of the application can be applied to inter-frame prediction of game images. Referring to fig. 24, one frame of image may be predicted from two consecutive frames of game images as an intermediate frame of game image of the two frames of game images. The continuous two frames of game images are respectively called a real frame N-1 and a real frame N+1, and the predicted intermediate frame game image is called a predicted frame N.
In the embodiment of the application, the prediction frame can be generated according to the two rendering instruction streams acquired successively. The term "two rendering instruction streams acquired sequentially" means that the rendering instruction stream 1 is acquired first, then the rendering instruction stream 2 is acquired, and after the rendering instruction stream 1 is acquired, the GPU does not acquire other rendering instruction streams before the rendering instruction stream 2 is acquired. Specifically, the GPU may sequentially obtain two rendering instruction streams, where one rendering instruction stream is used to draw one real frame image, identify feature points of each real frame image, match feature points in the two real frame images after identifying feature points of each real frame image, calculate a motion vector (including translation and/or rotation, for example) of each pixel according to image coordinate information of a matched feature point pair, and then make pixel-by-pixel offset on a 3D scene image corresponding to a certain real frame based on the motion vector, so as to generate a final predicted frame.
Assuming that the GPU generates a real frame N-1 according to the rendering instruction stream 1 acquired first, generates a real frame n+1 according to the rendering instruction stream 2 acquired later, and generates a predicted frame N according to the rendering instruction stream 1 and the rendering instruction stream 2, the GPU may send these frames into the queue to be displayed in the order of the real frame N-1, the predicted frame N, and the real frame n+1. And then the display screen sequentially plays the real frame N-1, the predicted frame N and the real frame N+1.
It should be understood that, because the resource consumption of the predicted frame generated by adopting the inter-frame prediction technology is smaller than the resource consumption of the real frame rendered by the game application, by reducing the number of the real frames generated by the game application and generating the predicted frame by adopting the inter-frame prediction technology, the total amount of the game frames can be ensured, and the power consumption load of the chip can be reduced.
In the inter-frame prediction process, the embodiment of the application respectively performs feature point identification and matching on two frames of images, calculates a motion vector (including translation and/or rotation for example) of each pixel according to the image coordinate information of the matched feature point pair, and performs pixel-by-pixel offset on a 3D scene image corresponding to a certain real frame based on the motion vector to generate a predicted frame. It should be noted that, in the embodiment of the present application, by performing image feature recognition and matching on a GPU in real time and with low power consumption, feature points of different images in the graphics rendering pipeline are recognized and matched by introducing the graphics rendering unit of the GPU and the graphics rendering pipeline, especially introducing the special functions of the GPU such as the computing shader of the GPU and the inter-core shared buffer and the atomic operation of the thread work group, so as to greatly improve the computing efficiency, reduce the time overhead of pixel-by-pixel computation, and simultaneously reduce the overhead of feature recognition and matching and data copying and context switching between the graphics rendering pipelines.
The application can realize the characteristic point matching operation of the rendered image with high efficiency and low power consumption by utilizing some characteristic functions of the GPU and logic for directly processing characteristic point matching in a graphic rendering pipeline, and can realize the image characteristic point matching function with high frame rate on a platform with limited computing capacity at a mobile end. By the scheme of the application, the power consumption of the mobile terminal equipment in the aspects of feature recognition and machine vision algorithm can be effectively reduced, the method and the device are widely applied to image rendering application scene services such as VR/AR, the image processing efficiency and the computing accuracy are improved, and the computing power consumption is reduced.
In the embodiment of the present application, "greater than" may be replaced with "greater than or equal to", "less than or equal to" may be replaced with "less than", or "greater than or equal to" may be replaced with "greater than", "less than" may be replaced with "less than or equal to".
The various embodiments described herein may be separate solutions or may be combined according to inherent logic, which fall within the scope of the present application.
The foregoing describes the solution provided by the embodiments of the present application primarily from the perspective of method steps. It will be appreciated that, in order to implement the above-described functions, an electronic device implementing the method includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The present application also provides a chip coupled to a memory for reading and executing a computer program or instructions stored in the memory to perform the methods of the embodiments described above.
The application also provides an electronic device comprising a chip for reading and executing a computer program or instructions stored in a memory, such that the method in the embodiments is performed.
The present embodiment also provides a computer-readable storage medium having stored therein computer instructions that, when executed on an electronic device, cause the electronic device to perform the above-described related method steps to implement the GPU-based image feature recognition method in the above-described embodiments.
The present embodiment also provides a computer program product, in which a program code is stored on a computer readable storage medium, which when run on a computer causes the computer to perform the above-mentioned related steps to implement the GPU-based image feature recognition method in the above-mentioned embodiments.
In addition, the embodiment of the application also provides a device which can be a chip, a component or a module, and the device can comprise a processor and a memory which are connected, wherein the memory is used for storing computer execution instructions, and when the device runs, the processor can execute the computer execution instructions stored in the memory so that the chip can execute the image feature recognition method based on the GPU in the method embodiments.
The electronic device, the computer readable storage medium, the computer program product or the chip provided in this embodiment are used to execute the corresponding method provided above, so that the beneficial effects thereof can be referred to the beneficial effects in the corresponding method provided above, and will not be described herein.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.