CN119445147A

CN119445147A - Image feature matching method, electronic device and storage medium based on GPU

Info

Publication number: CN119445147A
Application number: CN202310968444.3A
Authority: CN
Inventors: 刘智超
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-07-31
Filing date: 2023-07-31
Publication date: 2025-02-14

Abstract

The present application provides a GPU-based image feature matching method, electronic device and storage medium, and relates to the field of image processing technology. A mask image (wherein feature points and response function values are marked) is loaded in the compute shader of the GPU to generate a four-dimensional random vector. The feature point pixels with the largest response function values in the mask image are determined in the compute shader of the GPU. A color image is loaded in the compute shader of the GPU, and the feature vectors of the feature points with the largest response function values in the color image are determined based on four-dimensional random constants. Feature distances are calculated based on pairwise feature vectors and feature point matching is performed based on feature distances. The present application performs feature matching by directly accessing image resources inside the graphics rendering pipeline, avoiding the operation of copying data through memory between the GPU and the CPU, reducing power consumption, and utilizing the high-concurrency multi-threaded processing capability provided by the GPU to greatly accelerate the computational efficiency of feature point matching.

Description

Image feature matching method based on GPU, electronic equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a GPU-based image feature matching method, an electronic device, and a storage medium.

Background

Currently, in the fields of image processing and machine vision and graphics rendering related applications such as autopilot, VR/AR, etc., computing efficiency and computing accuracy have been major targets of interest in the industry, where feature point-based image recognition, matching, and stereoscopic vision have been of great interest because of the wide range of application scenarios.

Since the feature point matching algorithm adopted at present needs a large number of pixel calculations, and the higher the image resolution is, the more pixels need to be calculated, the larger the consumed calculation resources and calculation time are. Therefore, how to efficiently and with low power consumption realize the feature point matching of the rendered image is a problem to be solved.

Disclosure of Invention

The application provides a GPU-based image feature matching method, electronic equipment and a storage medium, which solve the problems of large calculation resources and long calculation time required by image feature point matching.

In a first aspect, the present application provides a GPU-based image feature matching method, the method comprising:

the method comprises the steps of obtaining feature vectors corresponding to feature points of a color image, wherein one feature point corresponds to one feature vector, N feature points of the first color image correspond to N feature vectors when the color image is a first color image, and N feature points of a second color image correspond to N feature vectors when the color image is a second color image;

According to N feature vectors corresponding to N feature points of the first color image and N feature vectors corresponding to N feature points of the second color image, N feature distances between each feature point of the first color image and N feature points of the second color image are determined;

If the minimum feature distance in the N feature distances is smaller than a preset distance threshold, determining the feature points of the two feature vectors of the minimum feature distance as matched feature point pairs;

The obtaining the feature vector corresponding to the feature point of the color image includes:

Loading a mask image in a first computing shader of the GPU, and generating M four-dimensional random constants according to image coordinate data of feature point pixels in the mask image, wherein the mask image is an image obtained by carrying out feature point recognition on the color image, and marks a plurality of feature point pixels and corresponding response function values;

Loading the mask image in a second computing shader of the GPU, sequencing a plurality of marked characteristic point pixels in the mask image according to the size of response function values, and determining the first N characteristic point pixels with the largest response function values in the mask image;

And loading the color image in a third calculation shader of the GPU, determining adjacent pixel brightness differences of the characteristic point pixels in the color image according to the M four-dimensional random constants, and determining the characteristic vectors of the characteristic points with the largest response function values according to the adjacent pixel brightness differences of the characteristic point pixels in the color image as N characteristic vectors corresponding to the N characteristic points of the color image.

According to the image feature matching method based on the GPU, provided by the embodiment of the application, a mask image (feature points and response function values are marked in the mask image) is loaded in a computing shader of the GPU, and a four-dimensional random vector is generated. And loading the mask image in a computing shader of the GPU and determining the first N characteristic point pixels with the largest response function values in the mask image. And loading the color image in a computing shader of the GPU, and determining the feature vector of the first N feature points with the largest response function values in the color image according to the four-dimensional random constant. And calculating the feature distance based on the feature vectors and carrying out feature point matching according to the feature distance. According to the application, the image resources are directly accessed to perform feature matching in the graphics rendering pipeline, so that the operation of copying data through the memory between the GPU and the CPU can be avoided, the heating of the mobile terminal equipment caused by the memory bandwidth overhead is greatly reduced, and meanwhile, the high concurrent multithreading processing capability provided by the GPU is utilized, so that the computing efficiency of feature point matching can be greatly accelerated, and the purpose of high-frame-rate application is achieved.

It should be noted that, first, feature point recognition is performed on a rendering image (or referred to as a color image) generated in a rendering pipeline, and then, image feature matching is performed based on respective feature points in two frames of images. After feature point recognition is performed on the color image, a mask image can be obtained. The mask image is marked with response function values corresponding to the image feature point pixels. Wherein the first color image corresponds to the first mask image and the second color image corresponds to the second mask image.

In some possible implementations, the generating the M four-dimensional random constants according to the image coordinate data of the feature point pixels in the mask image includes multiplying the position index values of the feature point pixels in the mask image by a specified random number scaling factor to obtain a random number seed, calculating the random number seed by using a trigonometric function and an exponential function for each thread in the local work group in the first computation shader of the GPU to generate one four-dimensional random constant, and adjusting the one four-dimensional random constant according to the offset parameter to obtain an adjusted four-dimensional random constant, and obtaining 256 adjusted four-dimensional random constants in the case that the local work group includes 16×16 threads, where M is equal to 256.

For example, in the step of calculating the random constants, the first mask image and the second mask image are loaded into a compute shader of the GPU, the size of the local workgroup of the specified compute pipeline is 16×16, and the size of the global workgroup is (1, 1). That is, the global workgroup contains only one local workgroup, 16×16 threads as one local workgroup. 16 x 16 threads, i.e., 256 threads, are started on the GPU.

In an embodiment of the application, a random number seed is generated in a compute shader. For example, the value of the position index (gl_ LocalInvocationIndex) is multiplied by the scale factor (scale) of the parameter-specific random number to obtain a random number seed by the following equation 11. And each thread in the working group comprehensively utilizes the trigonometric function and the exponential function to operate the random number seed r to generate a 4-dimensional random vector. Each 4-dimensional random vector is then adjusted according to an offset parameter (offset). And adjusting all the 256 4-dimensional random vectors to obtain 256 adjusted 4-dimensional random vectors, wherein the 256 adjusted 4-dimensional random vectors form a random number queue.

In some possible implementations, each of the M four-dimensional random constants includes a first random offset and a second random offset. In this case, the determining the adjacent pixel luminance differences of the feature point pixels in the color image according to the M four-dimensional random constants includes determining, in a third calculation shader of the GPU, for each thread of a local work group, a first pixel color value offset by the first random offset and a second pixel color value offset by the second random offset with respect to a first feature point pixel, and calculating the first pixel luminance value according to the first pixel color value and the second pixel luminance value according to the second pixel color value, and calculating a luminance difference between the second pixel luminance value and the first pixel luminance value.

In some possible implementations, the determining the feature vector of the feature point with the largest response function value according to the brightness difference between adjacent pixels of the feature point pixels in the color image includes outputting a result of 1 if the brightness difference is greater than 0, outputting a result of 0 if the brightness difference is less than or equal to 0, and obtaining a sequence of 0 or 1 as the feature vector corresponding to the first feature point pixel according to the output result.

In some possible implementations, where the local work group includes 16×16 threads, the feature vector corresponding to the first feature point pixel is a sequence of 0 or 1 of a set of 256 bits.

In the embodiment of the application, in the process of matching the image characteristic points generated by the graphic rendering, by directly accessing the image resources in the graphic rendering pipeline, the data copying operation of the memory between the GPU and the CPU can be avoided, the heating of the mobile terminal equipment caused by the memory bandwidth overhead is greatly reduced, and meanwhile, the high concurrent multithreading processing capacity provided by the GPU and the thread atomic operation access to the inter-core shared cache are utilized, so that the computing efficiency of the characteristic point matching can be greatly accelerated, and the purpose of high-frame-rate application is achieved.

In some possible implementations, the method further includes storing feature vectors for one feature point pixel of the first color image in segments according to S integer variables; the method comprises the steps of storing a feature vector of one feature point pixel of a second color image in a segmented mode according to S integer variables, carrying out exclusive OR operation on an i integer variable stored in a segmented mode of a first color image and an i integer variable stored in a segmented mode of the second color image, wherein i is 1, S, counting the number of 1 in an exclusive OR operation result by utilizing bitCount instructions of the GPU, and taking the counted number of 1 as a feature distance between one feature point pixel of the first color image and one feature point pixel of the second color image.

For example, the two feature point sequences respectively include N feature points, each feature point corresponds to a feature vector, and each feature point corresponds to a sequence of 256 bits 0 or 1.

In some possible implementations, the int data type occupies 4 bytes, i.e. 32 bits, in memory, so that 256 bits of data can be represented using 8 int type variables. In particular, in the scheme of the application, since the feature vector is a group of 256-bit 0 or 1 sequences, the application uses 8 int-type variable segments to store the values of the feature vector. Where each int-type variable stores a sequence of 32 bits 0 or 1. That is, S may take 8. The feature vectors of the two feature point sequences are stored in a segmented mode through 8 int-type variables, 8 exclusive-or operations are correspondingly carried out on the feature vectors in pairs, and 8 exclusive-or results are correspondingly obtained. Each exclusive or result is a sequence of 32 bits 0 or 1. The number of 1 in each exclusive-or result can be counted by using bitCount () instruction of the GPU, the numbers of 1 in 8 exclusive-or results are accumulated and summed, and the sum value obtained by accumulation is used as the feature distance of the two feature vectors.

In the embodiment of the application, a first characteristic point sequence is generated for one frame of image, a second characteristic point sequence is generated for the other frame of image, and then the characteristic vector of each characteristic point in the first characteristic point sequence and the characteristic vector of each characteristic point in the second characteristic point sequence are obtained through calculation. And then, calculating feature distances between every two feature points according to the feature vector of each feature point in the first feature point sequence and the feature vectors of all feature points in the second feature point sequence. And comparing the minimum value of the characteristic distance with a preset distance threshold value, and judging whether the characteristic points are matched according to a comparison result so as to improve the accuracy of the characteristic point matching.

In some possible implementations, the value of M is equal to the local workgroup size of the compute pipeline in the third compute shader of the GPU. For example, M takes 256, and the local workgroup size of the compute pipeline in the third compute shader of the GPU is 16×16.

In some possible implementations, the size of the local workgroup of the compute pipeline in the third compute shader of the GPU is equal to the size of the local workgroup of the compute pipeline in the first compute shader of the GPU, and the size of the global workgroup of the compute pipeline in the third compute shader of the GPU is equal to the size of the global workgroup of the compute pipeline in the first compute shader of the GPU. For example, the local working groups of the computational pipelines are all 16×16 in size, and the global working groups of the computational pipelines are all (1, 1) in size.

In some possible implementations, the product of the size of the local workgroup of the compute pipeline and the size of the global workgroup in the second compute shader of the GPU is equal to the size of the mask image. The size of the local working group of the computation pipeline in the second computation shader of the GPU is equal to the size of the local working group of the computation pipeline in the third computation shader of the GPU. For example, the local working groups of the compute pipeline are each 16×16 in size.

In some possible implementations, in the first compute shader of the GPU, the local workgroup of the compute pipeline is 16 x 16 in size and the global workgroup is (1, 1) in size.

In some possible implementations, in the second compute shader of the GPU, the local workgroup of the compute pipeline is 16 x 16 in size and the global workgroup is (W/16, H/16, 1) in size, and the mask image is W x H pixels in size.

In some possible implementations, in a third compute shader of the GPU, the local workgroup size of the compute pipeline is 16 x 16, and the global workgroup size is (1, 1).

In some possible implementations, after the generating the M four-dimensional random constants from the image coordinate data of the pixels in the mask image, the method further includes storing the M four-dimensional random constants in a first shader memory buffer (SSBO 1). In an embodiment of the present application, a readable and writable SSBO is created that is 256 in size and supports ivec vectors. By using the shader memory buffer, the high concurrent multithreading processing capability provided by the GPU and the thread atomic operation access to the inter-core shared buffer are facilitated, the computing efficiency of feature point matching can be greatly accelerated, and the purpose of high-frame-rate application is achieved.

In some possible implementations, in the second compute shader of the GPU, the global workgroup includes W/16×h/16 local workgroups. In this case, the above-mentioned sorting the marked plurality of feature point pixels in the mask image according to the magnitude of the response function values, determining the feature point pixel with the largest response function value of the first N in the mask image includes determining, for W/16×h/16 local work groups, the feature point with the largest response function value of the first N in each local work group by using the atomic operation atomicMax of the GPU, the feature points with the largest response function value of the first N being arranged from large to small according to the response function values, and determining, for W/16×h/16 local work groups, the feature point with the largest response function value of the first N in the global work group from n×w/16×h/16 feature points as the feature point with the largest response function value of the first N in the mask image.

In some possible implementations, the determining the characteristic point with the largest response function value of the first N in the global working group from the characteristic points with the largest response function value of the N x W/16 x H/16 for the W/16 x H/16 local working groups comprises sequentially storing the characteristic point with the largest response function value of the first N in each local working group in a second shader memory buffer area for the W/16 x H/16 local working groups, and determining the characteristic point with the largest response function value of the first N in the global working group from the characteristic points with the largest response function value of the N x W/16 x H/16 stored in the second shader memory buffer area.

In one implementation, the whole image is divided into a plurality of local pixel areas, and N feature points with the largest response values in each local pixel area are respectively stored in a plurality of segments of the second shader storage buffer area after being sequenced. And sequencing all the characteristic points in all the local pixel areas stored in the segments of the first shader storage buffer area, and selecting the characteristic points with the first N maximum response values in the whole image.

In another implementation manner, the whole image is divided into a plurality of local pixel areas, N feature points with the largest response value in each local pixel area are respectively stored in a plurality of segments of the first shader storage buffer area after being sequenced, and then the feature points in each local pixel area are sequenced from large to small according to the response value. And selecting one characteristic point with the local maximum response value in each local pixel region, and then sorting all the selected characteristic points with the local maximum response values, thereby selecting the characteristic points with the first N local maximum response values in the whole image.

In some possible implementations, the storing the first N feature points with the largest response function values in each local work group in the second shader storage buffer area sequentially for the W/16×H/16 local work groups includes storing the first N feature points with the largest response function values in each local work group in a shared buffer area among work groups according to a position index for the W/16×H/16 local work groups, and copying the feature points in the shared buffer area to corresponding positions of the second shader storage buffer area according to the position index.

In some possible implementations, the matched pair of feature points includes a first feature point and a second feature point. In this case, after the feature points to which the two feature vectors of the minimum feature distance belong are determined as the matched feature point pairs, the method further includes extracting coordinate parameters of the first feature points from mask images corresponding to the first color images, and extracting coordinate parameters of the second feature points from mask images corresponding to the second color images.

When the matching operation of the feature points in the two frames of images is completed, the feature point pairs characterizing the same object in different images may be output. For example, assuming that the feature point (x 1, y 1) in the first color image and the feature point (x 2, y 2) in the second color image are two feature points successfully matched, the feature point (x 1, y 1) and the feature point (x 1, y 1) constitute one feature point pair. Accordingly, the pair of feature points may be noted as [ (x 1, y 1), (x 2, y 2) ].

In actual implementation, after the feature points of the two frames of images are matched, the matched feature point pairs between the two frames of images can be determined, and then the two frames of images are used for the purposes of motion estimation, image stitching, background segmentation and the like according to the position information of the matched feature point pairs.

The application provides a technology for matching image characteristics of mobile terminal equipment in real time and with low power consumption through a GPU, which is characterized in that characteristic points of different images in the graphics rendering pipeline are matched through special functions of the GPU, such as a graphics rendering unit of the GPU, a graphics rendering pipeline, a computing shader of the GPU, an inter-core shared cache, atomic operation of a thread work group and the like, so that the computing efficiency is greatly improved to reduce the time cost of pixel-by-pixel computation, and meanwhile, the cost of data copying and context switching between the characteristic matching and the graphics rendering pipeline is reduced.

In a second aspect, the present application provides a GPU-based image feature matching apparatus comprising means for performing the method of the first aspect described above. The apparatus may correspond to performing the method described in the first aspect, and the relevant descriptions of the units in the apparatus are referred to the description of the first aspect, which is omitted herein for brevity.

The method described in the first aspect may be implemented by hardware, or may be implemented by executing corresponding software by hardware. The hardware or software includes one or more modules or units corresponding to the functions described above. Such as a processing module or unit, a display module or unit, etc.

In a third aspect, the application provides an electronic device comprising a processor, a computer program or instructions stored in the processor and in a memory, the processor being for executing the computer program or instructions such that the method of the first aspect is performed.

In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program (also referred to as instructions or code) for implementing the method in the first aspect. For example, the computer program, when executed by a computer, causes the computer to perform the method of the first aspect.

In a fifth aspect, the present application provides a chip comprising a processor. The processor is configured to read and execute a computer program stored in the memory to perform the method of the first aspect and any possible implementation thereof. Optionally, the chip further comprises a memory, and the memory is connected with the processor through a circuit or a wire.

In a sixth aspect, the present application provides a system-on-a-chip, comprising a processor. The processor is configured to read and execute a computer program stored in the memory to perform the method of the first aspect and any possible implementation thereof. Optionally, the chip system further comprises a memory, and the memory is connected with the processor through a circuit or a wire.

In a seventh aspect, the application provides a computer program product comprising a computer program (which may also be referred to as instructions or code) which, when executed by an electronic device, causes the electronic device to carry out the method of the first aspect.

It will be appreciated that the advantages of the second to seventh aspects may be found in the relevant description of the first aspect, and are not described here again.

Drawings

FIG. 1 is a schematic diagram of a process for image feature recognition and matching in an embodiment of the present application;

FIG. 2 is a schematic diagram of coordinate system transformation in an embodiment of the present application;

FIG. 3 is a schematic diagram of a global workgroup and a local workgroup in an embodiment of the application;

Fig. 4A is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 4B is a schematic structural diagram of a CPU and a GPU according to an embodiment of the present application;

Fig. 5 is a schematic software architecture diagram of an electronic device according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an image pyramid provided by an embodiment of the present application;

FIG. 7 is a flowchart of image feature recognition based on a GPU according to an embodiment of the present application;

fig. 8 is a schematic diagram of calculating corner points by using an image pyramid according to an embodiment of the present application;

FIG. 9A is a schematic diagram illustrating a positional relationship between a pixel and an adjacent pixel according to an embodiment of the present application;

Fig. 9B is a schematic diagram of a relationship between corner points and adjacent pixels according to an embodiment of the present application;

FIG. 10 is a schematic diagram of obtaining a plurality of local maximum response values for a mask image according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a mask image after feature point recognition according to an embodiment of the present application;

fig. 12A is a schematic diagram of a flow chart of feature point matching for two frames of images according to an embodiment of the present application;

Fig. 12B is a second schematic flow chart of feature point matching for two frames of images according to the embodiment of the present application;

FIG. 13 is a flowchart of a GPU-based image feature matching method according to an embodiment of the present application;

FIG. 14A is a schematic diagram of a global workgroup and a local workgroup of a designated computational pipeline when random variables are computed by a compute shader in an embodiment of the application;

FIG. 14B is a schematic diagram of storing random variables as they are calculated by a compute shader in an embodiment of the application;

FIG. 15 is a schematic diagram of a global workgroup and a local workgroup of a designated computational pipeline when feature point ordering by a compute shader in accordance with an embodiment of the application;

FIG. 16 is a schematic diagram of feature point extraction in units of local work groups when feature point ordering by a compute shader in an embodiment of the application;

FIG. 17 is a schematic diagram of storing feature points of each local work group in segments during feature point ordering by a compute shader in an embodiment of the application;

FIG. 18A is a schematic diagram of segment ordering of feature points of each local work group when feature point ordering is performed by a compute shader in an embodiment of the application;

FIG. 18B is a schematic diagram of segment ordering of feature points of each local work group when feature point ordering is performed by a compute shader in an embodiment of the application;

FIG. 19 is a schematic diagram showing that N feature points with maximum response values in a mask image are reserved after feature point ordering by a computing shader in an embodiment of the application;

FIG. 20 is a schematic diagram of calculating feature vectors for N feature points with the largest response values in a mask image according to an embodiment of the present application;

FIG. 21 is a diagram showing a feature vector calculation according to the luminance difference of the feature point offset in an embodiment of the present application;

FIG. 22A is a diagram showing feature distance calculation between feature vectors according to an embodiment of the present application;

FIG. 22B is a second diagram illustrating feature distance calculation between feature vectors according to an embodiment of the present application;

FIG. 23 is a schematic diagram of feature point matching according to feature distance in an embodiment of the present application;

Fig. 24 is an application scenario schematic diagram of a GPU-based image feature matching method according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The term "and/or" is used herein to describe an association relationship of associated objects, and means that there may be three relationships, for example, a and/or B, and that there may be three cases where a exists alone, while a and B exist together, and B exists alone. The symbol "/" herein indicates that the associated object is or is a relationship, e.g., A/B indicates A or B.

The terms "first" and "second" and the like in the description and in the claims are used for distinguishing between different objects and not for describing a particular sequential order of objects. In the description of the embodiments of the present application, unless otherwise specified, the meaning of "plurality" means two or more, for example, a plurality of processing units means two or more processing units and the like, and a plurality of elements means two or more elements and the like.

In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

The following describes the current image feature recognition and matching process, and referring to fig. 1, the image feature recognition and matching process may include the following steps:

(1) Feature recognition, namely feature detection, wherein feature points with remarkable properties, such as corner points, edge points or spots, are found in the image. These feature points are typically located at a significant structure of the image with good stability and repeatability.

(2) Characterization, for each detected feature point, its descriptor is computed. The descriptor is a vector or matrix representing the local image information around the feature points. As shown in fig. 1, in this step, a feature vector of each feature point in the two frame images is calculated separately.

(3) Feature matching, namely, finding out the most similar feature point pairs by comparing descriptors of feature points in different images. Feature matching may be performed using various distance metrics or similarity metrics methods, such as euclidean distance, hamming distance, or correlation metrics. A threshold is typically set to filter out matching pairs that are satisfactory. As shown in fig. 1, in this step, image feature matching is performed based on feature vectors between feature points of two frame images.

Applications such as motion estimation, image stitching, background segmentation, etc. can be implemented based on the identified image feature points and matching the feature points between different images. These matching algorithms all require a large number of pixel computations, and the higher the resolution of the image, the larger the number of pixels that need to be computed, and therefore the greater the computational resources and computation time that are consumed.

The image feature point matching scheme is mainly calculated by using a multithreading mode at the CPU end, but the parallelism degree is low, and the calculation time is relatively long. In addition, because the image data in the rendering pipeline is required to be copied or mapped in the graphics rendering application, the CPU accesses the image data in the rendering pipeline and then performs corresponding processing, so that the processing time is long and the memory power consumption is high, and the application of the mobile terminal is difficult to run at a high frame rate under the condition of limited computing capacity.

As more customized computing elements such as GPUs, DSPs, etc., are presented in mobile end devices, it is desirable to be able to achieve feature matching at the mobile end device through such computing elements efficiently and with low power consumption. However, even though hardware acceleration of such computing units has been utilized today, feature matching and machine vision still face efficiency and power consumption issues at the mobile end device, especially how to efficiently interact with the graphics rendering pipeline by applying feature matching and machine vision to graphics rendering fields such as VR/AR applications, constitutes a major performance bottleneck.

Based on this, the embodiment of the application provides a GPU-based image feature matching method and electronic equipment, in the process of matching image feature points generated by graphic rendering, by directly accessing image resources in a graphic rendering pipeline, the operation of copying the image resources from a GPU memory space to a CPU through a memory (e.g. DDR) is canceled, the heating of mobile terminal equipment caused by DDR bandwidth overhead is greatly reduced, meanwhile, the high concurrent multithreading processing capability provided by the GPU and the thread atomic operation access inter-core shared cache are utilized, the computing efficiency of feature point matching can be greatly accelerated, and the purpose of high-frame-rate application is achieved.

Compared with the scheme of the related technology, the scheme of the application has the improvement that the scheme of the application directly utilizes the data specific to the GPU to execute the characteristic point matching operation in the rendering pipeline in the graphic rendering application, thereby greatly improving/matching efficiency and reducing the calculated power consumption on the premise of ensuring the matching accuracy.

That is, the embodiment of the application can realize the characteristic point matching operation of the rendered image with high efficiency and low power consumption by utilizing some characteristic functions of the GPU and logic for directly processing the characteristic point matching in the graphic rendering pipeline, and can realize the image characteristic point matching function with high frame rate even on a platform with limited computing capacity at a mobile end.

In order to facilitate understanding of the embodiments of the present application, some terms of the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.

The image coordinate system is a two-dimensional coordinate system, and as shown in fig. 2, the origin (0, 0) of coordinates of the image coordinate system is located at the center, and the pixels are used as units. Based on the image coordinate system, any one pixel on the image can be located.

World coordinate system-a three-dimensional coordinate system whose origin of coordinates may be determined as appropriate, as shown in fig. 2, the world coordinate system may represent an object in space in units of length, such as millimeters (mm).

In particular, as shown in fig. 2, in the image processing process, coordinate mapping may be performed between an image coordinate system and a world coordinate system. For example, the image coordinates of the pixel points are mapped with the object vertex coordinates, and the whole texture is attached to the surface of the three-dimensional object model in the rendering process, so that a finer texture representation effect is realized.

Homogeneous coordinates-homogeneous coordinates are a very useful basic concept in computer graphics, and can be used to scale, rotate, translate, and perspective the matrix of the projection by adding an additional dimension W to the cartesian coordinate system. For a point (X, Y) in the cartesian coordinate system, it becomes (X, Y, w) in the homogeneous coordinate system. The homogeneous coordinates (X, Y, w) are converted to cartesian coordinates by simply dividing X and Y by w, so X and Y in the cartesian coordinate system can be re-expressed as x=x/w and y=y/w.

The computing shader is a shader capable of flexibly using GPU high-speed operation to process non-graphic tasks, namely, a shader program which runs on the GPU and is outside a common rendering pipeline is used for a massive parallel general graphic processor algorithm, and can improve the rendering speed of partial games.

The call of the computing shader adopts a global working group, wherein the global working group comprises local working groups (also called local working groups), and each local working group comprises an execution unit. Illustratively, as shown in FIG. 3, the global workgroup size is (4,4,1) and the local workgroup size is 4X 4.

The computation pipeline, which is the pipeline in the GPU that controls the stage of the computation shader, is a single-process pipeline, and the built-in variables can determine the location (index) of the local workgroup and execution units.

Gl_ LocalInvocationID denotes the position of the execution unit in the local work group.

Gl_ GlobalInvocationID denotes the position of the current execution unit in the global working group, and is a three-dimensional variable.

Gl_ LocalInvocationIndex is a one-dimensional index obtained by converting three-dimensional variables of an execution unit, and is a flattened form of gl_ LocalInvocationID.

Rendering pipeline-is the process of converting a three-dimensional scene model to screen pixel space output. The graphics rendering pipeline accepts a set of three-dimensional coordinates, converting the three-dimensional coordinates into a colored two-dimensional image on a screen.

It should be noted that the shader code may be used to instruct the GPU how to draw the pattern. Among other things, the shader code determines how to draw a particular effect, and the rendering pipeline controls the "draw content" and the "question of which shaders to use for drawing.

SSBO-shader memory buffers (shader storage buffer object, SSBO) that store byte data and upload it to graphics memory so that shader programs can process the data. For example, SSBO is used to store vertex data, element data (vertex index), uniform block data, and the like. In actual implementation, the processed data can be transferred between two compute shaders by means of SSBO.

The present solution employs two SSBO, referred to as SSBO1 and SSBO2, respectively. For example, in the step of calculating the random number, the calculated random number queue is stored in SSBO 1. In the feature point sorting step, N feature point data having the largest response value are copied to SSBO2.

Multithreading shares a cache region-shared cache allows access by two or more processes or threads. Specifically, in the step of sorting feature points, the first N feature points with larger response values are screened out from the local pixel area, the N feature point data are sorted according to the sequence from the larger response value to the smaller response value, and the sorted feature point data are stored in the shared buffer area.

The application provides an image processing acceleration scheme at a software level, which can realize the characteristic point matching operation of a rendered image with high efficiency and low power consumption by utilizing some characteristic functions of a GPU and logic for directly processing characteristic point matching in a graphic rendering pipeline, and can realize the image characteristic point matching function with high frame rate on a platform with limited computing capacity at a mobile end. By the scheme of the application, the power consumption of the mobile terminal equipment in the aspects of feature matching and machine vision algorithm can be effectively reduced, the method and the device are widely applied to image rendering application scene services such as VR/AR, the image processing efficiency and the computing accuracy are improved, and the computing power consumption is reduced.

The image feature matching method based on the GPU provided by the embodiment of the application can be applied to electronic equipment with a display function. The electronic devices may include mobile phones, smart televisions, wearable devices, tablet computers (Pad), computers with wireless transceiving functionality, virtual Reality (VR) terminal devices, augmented reality (augmented reality, AR) terminal devices, wireless terminals in industrial control (industrial control), wireless terminals in unmanned driving (self-driving), wireless terminals in tele-surgery (remote medical surgery), wireless terminals in smart grid (SMART GRID), wireless terminals in transportation security (transportation safety), wireless terminals in smart city (SMART CITY), wireless terminals in smart home (smart home), and so forth. The embodiment of the application does not limit the specific technology and the specific equipment form adopted by the terminal equipment.

First, an exemplary electronic device 100 provided in an embodiment of the present application is described. In some embodiments, the electronic device 100 may be a cell phone, tablet, desktop, laptop, handheld, notebook, ultra mobile personal computer (ultra mobile personal computer, UMPC), netbook, and cellular telephone, personal Digital Assistant (PDA), augmented reality (augmented reality, AR) device, virtual Reality (VR) device, artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) device, wearable device, vehicle device, smart home device, and/or smart city device, and embodiments of the present application are not particularly limited to a particular type of the electronic device 100.

Fig. 4A is a schematic hardware structure of an electronic device according to an embodiment of the present application. As shown in fig. 4A, the electronic device 100 may include a processor 110 and a display screen 120.

Processor 110 may include one or more processing units, for example, processor 110 may include a central processing unit (center processing unit, CPU) 111 and a graphics processor (graphics processing unit, GPU) 112. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

Referring to fig. 4b, the cpu includes a controller, one or more Arithmetic Logic Units (ALUs), and a memory unit. The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution. ALUs are used to perform arithmetic and logical operations. The memory unit is used for storing instructions and data. In some embodiments, the memory unit may be a cache. The memory unit may hold instructions or data that is just used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it may be called directly from the memory unit. This avoids duplicate accesses and reduces the latency of the processor 110, thereby improving the efficiency of the system.

GPU 112 is a microprocessor for processing images, is connected to CPU 111 and to display screen 120. Referring to fig. 4b, the gpu 112 is correspondingly provided with a memory unit. The GPU 112 may receive the image data sent by the CPU and store the image data in the display memory unit, then perform graphics rendering and other processes on the image data, then send the processed image to the display memory unit, and then the display screen 120 may read the processed image from the display memory unit and display the processed image.

The display screen 120 is used to display the image rendered by the GPU. The display 120 includes a display panel. For example, the display panel may employ an organic light-emitting diode (OLED).

The electronic device 100 realizes an image display function through the CPU 111, the GPU 112, the display screen 120, and the like.

Taking a game scene as an example, the rendering of the image in the game is completed by a CPU and a GPU, wherein the CPU is responsible for processing some global tasks, such as determining an object and light which need to be provided for the GPU in a drawing instruction (Drawcall), setting a rendering pipeline of the GPU, and then notifying the GPU to render. And the GPU is responsible for the detailed task of image rendering.

When the method is actually realized, a CPU initiates a rendering flow, a game application uses the CPU to judge which objects need to be rendered, and after the objects needing to be rendered are found, the graphics APIs corresponding to the system are called one by one, each object is called once Drawcall, and Drawcall is the process that the CPU notifies the CPU to start rendering. When the CPU receives Drawcall, the CPU performs image drawing on the specified object, and then the CPU displays the drawn two-dimensional image on the screen.

In general, the CPU and the GPU have different advantages in the image processing process, respectively.

On the one hand, the CPU has the advantage of computing power and parallel processing power, and can rapidly process a large amount of data and perform complex computation. The CPU may also perform operations such as matrix operations and vectorization, which are important in computer vision. In addition, the CPU may perform time-consuming tasks such as sorting and searching.

On the other hand, the GPU has an advantage in that it is capable of graphics processing, and image processing and computation can be performed quickly. GPUs can also perform parallelized computations, which are particularly useful for large-scale image processing tasks. The GPU may also perform texture merging and shading, which may increase the accuracy of image matching.

In particular, in order to solve the problems of efficiency, power consumption and the like of the mobile terminal equipment in the prior art, such as feature matching and machine vision, the embodiment of the application can more efficiently interact with a graphics rendering pipeline by directly accessing image resources in the graphics rendering pipeline in the process of matching the image feature points generated by the graphics rendering, can reduce the power consumption of the mobile terminal equipment in the aspects of feature matching and machine vision algorithm, for example, can greatly improve the matching efficiency on the premise of ensuring the matching accuracy when being applied to graphics rendering application scene services such as VR/AR, and can reduce the calculated power consumption.

The above is a specific description of the embodiment of the present application using the electronic device 100 as an example. It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. The electronic device 100 may have more or fewer components than shown in the figures, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

In addition, an operating system is run on the components. Such as the iOS operating system developed by apple corporation, the Android open source operating system developed by google corporation, the Windows operating system developed by microsoft corporation, etc. An operating application may be installed on the operating system.

The operating system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In the embodiment of the application, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated.

Fig. 5 is a software configuration block diagram of the electronic device 100 according to the embodiment of the present application. The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer (applications), an application framework layer (application framework), an Zhuoyun row (Android runtime) modules and system libraries, and a kernel layer (kernel), respectively.

The application layer may include a series of application packages, among other things. For example, the application layer may include applications such as video applications, gaming applications, autopilot applications, AR/VR applications, and the like, to which embodiments of the present application are not limited in any way.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for the application of the application layer. The application framework layer includes a number of predefined functions. Illustratively, the application framework layer may include a window manager, a content provider, a view system, a resource manager, etc., to which embodiments of the application are not limited in any way.

And the android runtime module is used for being responsible for scheduling and management of an android system. The android runtime module comprises a virtual machine and a core library. The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like. The core library contains the function functions that the java language needs to call.

The system library may include a plurality of functional modules. Such as surface manager (surface manager), media library (media library), three-dimensional graphics processing library (e.g., openGL ES), two-dimensional graphics engine (e.g., SGL), etc. The surface manager is used to manage the display subsystem and provides a fusion of the two-dimensional and three-dimensional layers for the plurality of applications.

The kernel layer is a layer between hardware and software. The kernel layer at least contains display drivers and sensor drivers.

For ease of illustration, the hardware layers that interact with the software architecture described above are also embodied in FIG. 5. For example, the hardware layers may include a CPU, a GPU, and a display screen.

Although the Android system is described as an example in the embodiment of the present application, the basic principle is equally applicable to electronic devices based on the iOS or Windows and other operating systems.

The execution main body of the GPU-based image feature matching method provided by the embodiment of the present application may be the above-mentioned electronic device, or may be a functional module and/or a functional entity capable of implementing the GPU-based image feature matching method in the electronic device, and the scheme of the present application may be implemented by hardware and/or software, and may specifically be determined according to actual use requirements, which is not limited by the embodiment of the present application. The image feature matching method based on the GPU provided by the embodiment of the application is exemplified by an electronic device with reference to the accompanying drawings.

Before describing the GPU-based image feature matching method provided by the embodiment of the present application, the GPU-based image feature matching method provided by the embodiment of the present application is described with reference to a specific embodiment.

In order to better understand the embodiments of the present application, a brief description will be given below of a flow of image feature point identification provided in the embodiments of the present application. In the embodiment of the present application, the process of implementing image feature point identification by the electronic device through the GPU may include:

Step 101, creating an image pyramid. Wherein an image pyramid is created for the image to be processed in the rendering pipeline. Referring to fig. 6, the pyramid contains N levels of image data, and the image data of one pixel of each level of pyramid is an average value of the image data of four pixels at the corresponding position of the pyramid of the next level thereof. Until the uppermost pyramid has only one pixel, the pixel value of the pyramid is the average value of all pixels of the whole picture.

And 102, calculating corner points of the pyramid layers. And binding corresponding levels of the image pyramid according to the scaling coefficient parameters by using a computing shader of the GPU. For example, the hierarchy of 16×16 pixel blocks in the image pyramid is bound, and corner points in the 16×16 pixel blocks are calculated using the image and depth information.

Step 103, calculating a response function value (which may be simply referred to as a response value) of the corner point. Wherein the response function value is calculated for pixels at diagonal points. And if the response function value is greater than or equal to the preset response threshold value, determining the corner point as a characteristic point.

And 104, suppressing the maximum value of the characteristic points. And taking the maximum value of the response function values of all the characteristic points in the 16 multiplied by 16 pixel block area by utilizing the shared buffer variable of the thread working group of the computation shader to obtain a local maximum value, and removing the similar characteristic points which are locally gathered.

And 105, outputting the characteristic point information. Wherein the function value of the local maxima and the pixel coordinates are recorded, and the response function value is written to the corresponding coordinate position in the mask image.

Thus, the identification of the image feature points is completed. According to the scheme, the characteristic point identification operation is directly executed in the rendering pipeline by utilizing the data specific to the GPU in the graphic rendering application, so that the identification efficiency is greatly improved on the premise of ensuring the identification accuracy, and the calculated power consumption is reduced.

The image feature point recognition process according to the embodiment of the present application is described in detail below with reference to fig. 7.

Step 101 creating an image pyramid

An image pyramid is a collection of images made up of multiple sub-images of different resolutions of an image. The set of images is generated from a single image by continuous downsampling, the smallest image possibly having only one pixel.

In an embodiment of the application, a color image (or referred to as a rendered image) generated by a rendering pipeline is bound to a current operation image by a texture binding instruction (glBindTexture functions), and then a texture mapping instruction (GLGENERATEMIPMAP functions) is called to generate an image pyramid.

Wherein glBindTexture functions are used to bind textures, which function to replace the currently bound texture object with the texture object specified in the parameters so that subsequent operations are applied to the texture object. For example, after binding the color image texture 101, an image processing operation may be further performed on the color image texture 101.

In addition, embodiments of the present application create a depth image (also referred to as a texture image) DEPTHMIPMAP in R16FG16F format and the same size as the color image described above, bind the depth image DEPTHMIPMAP as an operation image by a texture binding instruction (glBindTexture function) and call a texture mapping instruction (GLGENERATEMIPMAP function) to generate a pyramid.

It should be noted that, in the related art, the image recognition is performed by the CPU and the GPU through the color image, and the improvement point of the scheme of the present application with respect to the related art is that, on the GPU side, besides binding the color image as an operation image and generating a corresponding image pyramid, binding the depth image as an operation image and generating a corresponding image pyramid may also be performed. In this way, the corner points in the image can be determined based on the image data of the color image and the depth image, the response function values of the corner points are calculated, and the characteristic point recognition is performed according to the response function values of the corner points.

In the embodiment of the present application, layer 0 of the pyramid is an original image, and data of a depth image corresponding to sampling is saved to an R channel of the depth image DEPTHMIPMAP. In the embodiment of the present application, the original image refers to a depth image having the same size as the color image described above.

In the embodiment of the application, the depth value corresponding to the color image is calculated for each i layer in the image pyramid from the first layer of the pyramid through for circulation. So that a corresponding depth value for each layer in the image pyramid can be obtained.

It should be noted that, the process of calculating the image data of a pixel in the ith layer according to the image data of four pixels in the ith layer may include converting the depth values of the four pixels in the ith layer of the image pyramid into a camera coordinate system, then averaging the camera space coordinates of the four pixels in the camera coordinate system to obtain the camera space coordinates of the ith layer of the image pyramid, and then converting the camera space coordinates of the ith layer of the image pyramid to obtain a pixel depth value corresponding to the ith layer of the image pyramid.

First, by the following equation 1, the depth value of the i-1 th layer pixel point (x, y) in the image pyramid is converted into a three-dimensional camera coordinate system.

View_mipi-1 (x, y) = (x, y, depth_mipi-1 (x, y), 1.0) × Inverse (projectionMatrix) (equation 1)

Where depth_mipi (x, y) represents the Depth value corresponding to the i-1 th layer in the image pyramid. projectionMatrix denotes a projection matrix, and Inverse () denotes an inversion matrix.

View_mip_i-1 (x, y) includes View_mip_i-1 (x, y). X, view_mip_i-1 (x, y), and View_mip_i-1 (x, y). Z. Then, homogeneous coordinate transformation is performed through equation 2, and camera space coordinates of the i-1 layer pixel point (x, y) of the image pyramid are calculated:

View_mip_i-1(x,y)＝(View_mip_i-1(x,y).x/View_mip_i-1(x,y).w,View_mip_i-1(x,y).y/View_mip_i-1(x,y).w,View_mip_i-1(x,y).z/View_mip_i-1(x,y).w,1.0) ( Equation 2

Wherein view_mip_i-1 (x, y). W represents the variables introduced by the homogeneous coordinate transformation.

Wherein View_mip_i-1 (x, y) represents the camera spatial coordinates of the pixel point (x, y) of the i-1 th level of the image pyramid in the three-dimensional camera coordinate system.

The depth value of the pixel point (x, y) is a one-dimensional coordinate, the depth value of the pixel point (x, y) is converted into a three-dimensional camera coordinate system, and camera space coordinates of the pixel point (x, y) are obtained through homogeneous coordinate transformation, and the camera space coordinates are three-dimensional coordinates.

According to the above equations 1 and 2, the depth value of the i-1 th layer pixel (x, y) in the image pyramid is converted into a three-dimensional camera coordinate system, so as to obtain the camera space coordinate view_mip_i (x, y) corresponding to the i-1 th layer pixel (x, y) in the image pyramid.

View_mip_i-1 (x, y) comprises the following three components:

(1) The X component is x1=view_mipi-1 (X, y) ·x/view_mipi-1 (X, y) ·w;

(2) Y component y1=view_mip_i-1 (x, Y) ·y/view_mip_i-1 (x, Y) ·w;

(3) Z component z1=view_mip_i-1 (x, y) ·z/view_mip_i-1 (x, y) ·w.

Similarly, according to equations 1 and 2 above, the depth value of pixel (x+1, y) of the i-1 th layer of the image pyramid can be converted into a three-dimensional camera coordinate system, resulting in the camera space coordinate View_mipi-1 (x+1, y) of pixel (x+1, y) of the i-1 th layer in the three-dimensional camera coordinate system. View_mip_i-1 (x+1, y) comprises the following three components:

(1) The X component is x2=view_mip_i-1 (x+1, y) ·x/view_mip_i-1 (x+1, y) ·w;

(2) Y component y2=view_mip_i-1 (x+1, Y) ·y/view_mip_i-1 (x+1, Y) ·w;

(3) Z component z2=view_mip_i-1 (x+1, y) ·z/view_mip_i-1 (x+1, y) ·w.

Similarly, according to equations 1 and 2 above, the depth value of the pixel (x, y+1) of the i-1 th layer of the image pyramid can be converted into a three-dimensional camera coordinate system, resulting in the camera space coordinate View_mipi-1 (x, y+1) of the pixel (x, y+1) of the i-1 th layer in the three-dimensional camera coordinate system. View_mip_i-1 (x, y+1) comprises the following three components:

(1) The X component is x3=view_mip_i-1 (X, y+1) ·x/view_mip_i-1 (X, y+1) ·w;

(2) Y component y3=view_mip_i-1 (x, y+1) ·y/view_mip_i-1 (x, y+1) ·w;

(3) Z component z3=view_mip_i-1 (x, y+1) ·z/view_mip_i-1 (x, y+1) ·w.

Similarly, according to equations 1 and 2 above, the depth value of pixel (x+1, y+1) of the i-1 th layer of the image pyramid can be converted into a three-dimensional camera coordinate system, resulting in the camera space coordinate view_mip_i-1 (x+1, y+1) of pixel (x+1, y+1) of the i-1 th layer under the three-dimensional camera coordinate system. View_mip_i-1 (x+1, y+1) comprises the following three components:

(1) The X component is x4=view_mip_i-1 (x+1, y+1) ·x/view_mip_i-1 (x+1, y+1) ·w;

(2) Y component y4=view_mip_i-1 (x+1, y+1) ·y/view_mip_i-1 (x+1, y+1) ·w;

(3) Z component z4=view_mip_i-1 (x+1, y+1) ·z/view_mip_i-1 (x+1, y+1) ·w.

Thus, the respective camera space coordinates of the four pixel points (x, y), (x+1, y), (x, y+1) and (x+1, y+1) of the i-1 th hierarchical pyramid are calculated by the above equations 1 and 2.

The average value of the camera space coordinates of the four pixel points (x, y), (x+1, y), (x, y+1) and (x+1, y+1) of the i-1 th level of the image pyramid is calculated by the following equation 3, and the camera space coordinates of the pixel point (x, y) of the i-1 th level can be obtained.

View_mip_i(x,y)＝(View_mip_i-1(x,y)+View_mip_i-1(x+1,y)+View_mip_i-1(x,y+1)+View_mip_i-1(x+1,y+1))/4.0( Equation 3

View_mip_i-1 (x, y), view_mip_i-1 (x+1, y), view_mip_i-1 (x, y+1), view_mip_i-1 (x+1, y+1) represent the camera spatial coordinates of the four pixel points (x, y), (x+1, y), (x, y+1), and (x+1, y+1) of the i-1 level of the image pyramid in the three-dimensional camera coordinate system, respectively. It will be appreciated that the camera space coordinate values for the four pixels are summed and then divided by 4 to yield the camera space coordinate for the four pixels.

View_mip_i (x, y) represents the camera space coordinates of the i-th level pixel obtained by averaging the camera space coordinates of the i-1 level four pixel points in the three-dimensional camera coordinate system. View_mip_i (x, y) includes three components:

(1) The X component is View_mipi (X, y). X= (x1+x2+x3+x4)/4;

(2) The Y component is view_mipi (x, Y). Y= (y1+y2+y3+y4)/4;

(3) The Z component is view_mipi (x, y). Z= (z1+z2+z3+z4)/4.

Then, the pixel camera spatial coordinates view_mip_i (x, y) found in the three-dimensional camera coordinate system are multiplied by the projection matrix to obtain NDC coordinates as intermediate results by the following equation 4:

Ndc_mip_i (x, y) =view_mip_i (x, y) × projectionMatrix (equation 4)

Where ndc_mipi (x, y) represents NDC coordinates of a pixel of the i-th level of the image pyramid in the three-dimensional camera coordinate system. NDC coordinates are four-dimensional coordinates, including x, y, z, w components. Ndc_mipi (x, y) includes ndc_mipi (x, y) ·x, ndc_mipi (x, y) ·y, ndc_mipi (x, y) ·z, and ndc_mipi (x, y) ·w. Where ndc_mipi (x, y). W represents the variable introduced by the homogeneous coordinate transformation.

Then, through the following equation 5, the ratio between ndc_mip_i (x, y) z and ndc_mip_i (x, y) w is obtained through homogeneous coordinate transformation, so that the depth value corresponding to the ith level of the image pyramid can be obtained.

Depth_mipi (x, y) =NDC_mipi (x, y) ·z/NDC_mipi (x, y) ·w (equation 5)

Where depth_mipi (x, y) represents the Depth value corresponding to the i-th level of the image pyramid.

In the embodiment of the application, through the equations 1 to 5, four pixel depth values of the ith level of the image pyramid are converted into a camera coordinate system, then the camera space coordinates of the four pixels are averaged in the camera coordinate system to obtain the camera space coordinates of the ith level of the image pyramid, and then the camera space coordinates of the ith level of the image pyramid are converted to obtain the pixel depth values corresponding to the ith level of the image pyramid.

In equations 1 through 5 above, i takes 1,2, in order, and the depth value corresponding to each layer color image in the K-layer image pyramid can be calculated.

The process of calculating the depth value corresponding to the color image for each layer of the image pyramid is described below by way of example.

Illustratively, as shown in FIG. 8, the image pyramid includes 8 layers. Starting from the first layer pyramid, a depth value corresponding to the color image is calculated for each layer of the image pyramid by a for loop.

(1) Calculating depth value (1 is taken to be i) corresponding to layer 1 of image pyramid

First, a color image is converted from a two-dimensional image coordinate system to a three-dimensional camera coordinate system by the following equation:

View_mip_0(x,y)=(x,y,Depth_mip_0(x,y),1.0)*Inverse(projectionMatrix)。

view_mip0 (x, y) includes View_mip0 (x, y). X, view_mip0 (x, y). Y, and View_mip0 (x, y). Z.

Then, camera space coordinates of the four pixel points of the 0 th layer are calculated in the three-dimensional camera coordinate system through homogeneous coordinate transformation:

View_mip_0(x,y)=

(View_mip_0(x,y).x/View_mip_0(x,y).w,

View_mip_0(x,y).y/View_mip_0(x,y).w,

View_mip_0(x,y).z/View_mip_0(x,y).w,

1.0)

View_mip_0(x+1,y)=

(View_mip_0(x+1,y).x/View_mip_0(x+1,y).w,

View_mip_0(x+1,y).y/View_mip_0(x+1,y).w,

View_mip_0(x+1,y).z/View_mip_0(x+1,y).w,

1.0)

View_mip_0(x,y+1)=

(View_mip_0(x,y+1).x/View_mip_0(x,y+1).w,

View_mip_0(x,y+1).y/View_mip_0(x,y+1).w,

View_mip_0(x,y+1).z/View_mip_0(x,y+1).w,

1.0)

View_mip_0(x+1,y+1)=

(View_mip_0(x+1,y+1).x/View_mip_0(x+1,y+1).w,

View_mip_0(x+1,y+1).y/View_mip_0(x+1,y+1).w,

View_mip_0(x+1,y+1).z/View_mip_0(x+1,y+1).w,

1.0)

View_mip0 (x, y). W, view_mip0 (x+1, y). W, view_mip0 (x, y+1). W, view_mip0 (x+1, y+1). W represent variables introduced by the homogeneous coordinate transformation.

Then, the camera space coordinates of the four pixel points of the 0 th layer are averaged in a three-dimensional camera coordinate system to obtain the camera space coordinates of the pixel points (x, y) of the 1 st layer:

View_mip_1(x,y)=(View_mip_0(x,y)+View_mip_0(x+1,y)+View_mip_0(x,y+1)+View_mip_0(x+1,y+1))/4.0。

Wherein View_mip1 (x, y) contains three components, view_mip1 (x, y). X, view_mip1 (x, y). Y and View_mip1 (x, y). Z.

Then, view_mip1 (x, y) is converted from the three-dimensional camera coordinate system to the NDC coordinate system by the following equation:

NDC_mip_1(x,y)=View_mip_1(x,y)*projectionMatrix。

wherein, NDC_mip1 (x, y) comprises four components, NDC_mip1 (x, y). X, NDC_mip1 (x, y). Y, NDC_mip1 (x, y). Z and NDC_mip1 (x, y). W.

Then carrying out homogeneous coordinate transformation, and calculating a depth value corresponding to the 1 st layer of the image pyramid:

Depth_mip_1(x,y)=NDC_mip_1(x,y).z/NDC_mip_1(x,y).w

from this, the Depth value depth_mip1 (x, y) corresponding to layer 1 of the image pyramid is calculated.

(2) Calculating depth value (i is 2) corresponding to layer 2 of image pyramid

First, the Depth value depth_mip1 (x, y) corresponding to layer 1 of the image pyramid is converted into a three-dimensional camera coordinate system:

View_mip_1(x,y)=(x,y,Depth_mip_1(x,y),1.0)*Inverse(projectionMatrix)。

View_mip1 (x, y) includes View_mip1 (x, y). X, view_mip1 (x, y). Y, and View_mip1 (x, y). Z.

Then, camera space coordinates of the four pixel points of the 1 st layer are calculated in the three-dimensional camera coordinate system through homogeneous coordinate transformation:

View_mip_1(x,y)=

(View_mip_1(x,y).x/View_mip_1(x,y).w,

View_mip_1(x,y).y/View_mip_1(x,y).w,

View_mip_1(x,y).z/View_mip_1(x,y).w,

1.0)

View_mip_1(x+1,y)=

(View_mip_1(x+1,y).x/View_mip_1(x+1,y).w,

View_mip_1(x+1,y).y/View_mip_1(x+1,y).w,

View_mip_1(x+1,y).z/View_mip_1(x+1,y).w,

1.0)

View_mip_1(x,y+1)=

(View_mip_1(x,y+1).x/View_mip_1(x,y+1).w,

View_mip_1(x,y+1).y/View_mip_1(x,y+1).w,

View_mip_1(x,y+1).z/View_mip_1(x,y+1).w,

1.0)

View_mip_1(x+1,y+1)=

(View_mip_1(x+1,y+1).x/View_mip_1(x+1,y+1).w,

View_mip_1(x+1,y+1).y/View_mip_1(x+1,y+1).w,

View_mip_1(x+1,y+1).z/View_mip_1(x+1,y+1).w,

1.0)

View_mip1 (x, y). W, view_mip1 (x+1, y). W, view_mip1 (x, y+1). W, view_mip1 (x+1, y+1). W represent variables introduced by the homogeneous coordinate transformation.

Then, the camera space coordinates of the four pixels of the 1 st layer are averaged in a three-dimensional camera coordinate system to obtain the camera space coordinates of one pixel in the 2 nd layer image:

View_mip_2(x,y)=(View_mip_1(x,y)+View_mip_1(x+1,y)+View_mip_1(x,y+1)+View_mip_1(x+1,y+1))/4.0。

Wherein View_mip2 (x, y) contains three components, view_mip2 (x, y). X, view_mip2 (x, y). Y and View_mip2 (x, y). Z.

Then, view_mip2 (x, y) is converted from the three-dimensional camera coordinate system to DNC coordinates by the following equation:

NDC_mip_2(x,y)=View_mip_2(x,y)*projectionMatrix。

wherein, NDC_mip2 (x, y) comprises four components, NDC_mip2 (x, y). X, NDC_mip2 (x, y). Y, NDC_mip2 (x, y). Z and NDC_mip2 (x, y). W.

Then carrying out homogeneous coordinate transformation, and calculating a depth value corresponding to the 2 nd layer of the image pyramid:

Depth_mip_2(x,y)=NDC_mip_2(x,y).z/NDC_mip_2(x,y).w

From this, the Depth value depth_mip2 (x, y) corresponding to layer 2 of the image pyramid is calculated.

(3) I sequentially taking 3, and calculating the depth value corresponding to the 3 rd layer, the third layer and the fourth layer of the image pyramid by K

In the above manner, depth value depth_mip3 (x, y) corresponding to layer 3 of the image pyramid, and Depth value depth_mipn (x, y) corresponding to layer K may be sequentially calculated.

By the method provided by the embodiment of the application, the depth value corresponding to each layer of the image pyramid can be calculated. It should be noted that the depth values (or referred to as depth information) of different levels calculated herein may be applied to the following steps of calculating corner points, which will be described in detail below.

Step 102, calculating corner points of pyramid layers

Here, the corner may be defined as a window that moves in any direction and has a large response value change, and the corner may be defined as a pixel window.

In the embodiment of the application, the point with the brightness difference larger than the threshold value can be used as the corner point by calculating the brightness difference of each pixel of the color image in the surrounding area. It should be noted that, unlike the related art, the scheme of the application not only performs brightness difference judgment based on the color image, but also performs brightness difference judgment based on the depth image, so that the accuracy of identifying the corner points can be improved.

Since the corner points are points that are significantly different from surrounding pixels, the corner points can be used as feature points in the image. By finding the corner points in the image, the characteristic points in the image can be identified according to the corner points.

In an embodiment of the present application, a Color image (Color) and a depth image (DEPTHMIPMAP) corresponding to a pyramid level may be bound in a computation shader according to specified parameters.

Wherein the specified parameters may include a scaling factor and an ID of the image.

For example, the specified parameter glBindImage (id=101, α=4) indicates that the identification of the image is 101, and the scaling coefficient α takes 4. Layer 4 in the image pyramid, the color image identified as 101, and the corresponding depth image may be bound in the computational shader according to the specified parameters.

Alternatively, different scaling factors may be preset for different types or scenes of games. For example, for a type 1 game, the corresponding scaling factor parameter is 3, indicating that the third layer in the pyramid is taken, and for a type 2 game, the corresponding scaling factor parameter is 4, indicating that the fourth layer in the pyramid is taken. The method can be specifically set according to actual use requirements, and the embodiment of the application is not limited. For convenience of explanation, in the embodiment of the present application, scaling factor of 4 is taken as an example for illustration.

For example, referring again to fig. 8, assuming that the original image is 256×256 pixels in size, an 8-layer pyramid is generated, and assuming that the scaling factor α=4, then the fourth-layer image of the pyramid may be tied in the computational shader. Wherein the fourth layer image of the pyramid is 16 x 16 pixels in size. The present application can determine all corner points in the 16×16 pixel block area by using the image brightness information of the 16×16 pixel block and the depth information thereof (i.e., the depth value corresponding to the fourth level).

The process of determining corner points according to the image brightness information and the depth information thereof according to the embodiment of the present application is described in detail below.

(1) Process for determining corner points from image brightness information

In the computational shader of the GPU, the image brightness information may include brightness values for individual pixels in a 16 x 16 block of pixels.

The luminance value of each pixel is calculated in the calculation shader of the GPU by the following equation 6:

intesity=color.r 0.299+color.g 0.587+color.b 0.114 (equation 6)

Wherein color.r, color.g, and color.b represent luminance values of a certain pixel in a Color image (Color) on R, G, and B channels, respectively. The luminance represents the luminance value of a certain pixel in a Color image (Color). For convenience of description, the luminance value corresponding to the pixel point (x, y) is denoted as the luminance (x, y).

In the embodiment of the application, the brightness difference between the pixel points is calculated in the calculation shader of the GPU based on the pixel points (x, y) and a plurality of pixel points which are at a preset distance from the pixel points (x, y).

The preset distance is an empirical value of multiple tests, and can be determined according to the application scene, perspective, aperture and other factors. The preset distance may be 2 pixels, for example.

Illustratively, FIG. 9A shows the positional relationship of pixel (x, y) with pixel (x-2, y), (x+2, y), (x, y-2), and (x, y+2).

As shown in Table 1 below, the luminance values corresponding to each of the pixel points (x-2, y), (x+2, y), (x, y-2), and (x, y+2) are respectively denoted as the luminance (x-2, y), the luminance (x+2, y), the luminance (x, y-2), and the luminance (x, y+2).

TABLE 1

Pixel dot	Brightness value
		(x,y)	Intensity(x,y)
(x-2,y)	Intensity(x-2,y)
		(x+2,y)	Intensity(x+2,y)
(x,y-2)	Intensity(x,y-2)
		(x,y+2)	Intensity(x,y+2)

As shown in table 2 below, the absolute value of the difference between the luminance of the luminance (x, y) and the luminance (x-2, y) was calculated, and the absolute value of the difference between the luminance (x, y) and the luminance (x+2, y) and the absolute value of the difference between the luminance (x, y) and the luminance (x, y-2) were calculated.

TABLE 2

Two pixel points	Absolute value of difference in luminance
		(X, y) and (x-2, y)	∣Intensity(x,y)-Intensity(x-2,y)∣
(X, y) and (x+2, y)	∣Intensity(x,y)-Intensity(x+2,y)∣
		(X, y) and (x, y-2)	∣Intensity(x,y)-Intensity(x,y-2)∣
(X, y) and (x, y+2)	∣Intensity(x,y)-Intensity(x,y+2)∣

Then, judging whether the absolute value of the difference between the pixel brightness is larger than or equal to a preset brightness threshold value in a computing shader of the GPU, and counting the number that the absolute value of the difference between the brightness is larger than or equal to the preset brightness threshold value.

If the counted number is greater than or equal to the preset number (for example, 3), and the initial selection condition is met, then the current pixel point is judged to be possibly the corner point, and the next step is continued to be executed so as to further judge whether the current pixel point is the corner point or not.

If the counted number is smaller than the preset number, judging that the current pixel point is not likely to be the corner point, and stopping processing the current pixel point.

It should be noted that, through the step (1), it is determined that a part of the pixels in the image are unlikely to be corner points in the computing shader of the GPU, so that the part of the pixels can be removed, and then the rest of the pixels which are likely to be corner points can be further determined according to the depth information in the computing shader of the GPU, so as to reduce the calculation amount and improve the recognition accuracy.

(2) Process for determining corner points from depth information

The depth information is a depth value corresponding to the level selected by the image pyramid calculated in the step S101.

The Depth information corresponding to the Depth image DEPTHMIPMAP may be depth_mipi (x, y). Illustratively, taking 4, the present application uses a 16×16 pixel block of the 4 th layer of the image pyramid, and the Depth information corresponding to the Depth image DEPTHMIPMAP may be depth_mip4 (x, y).

For the pixel points meeting the initial selection condition determined in the previous step, sampling Depth information (depth_mipx, depth_mipy and depth_mipz) corresponding to the Depth image DEPTHMIPMAP according to the image coordinates (x, y) of the pixel points, and multiplying the sampling result by the inverse matrix of the projection matrix, thereby calculating the spatial coordinates of the pixel in the camera coordinate system:

Pos (x, y, z) = [ (x, y): 2.0-1.0, depth_mip4 (x, y), 1.0] = Inverse (projectionMatrix) (equation 7)

Where projectionMatrix denotes a projection matrix and Inverse () denotes an inversion matrix.

Pos (x, y, z) includes three components, pos (x, y, z). X, pos (x, y, z). Y, and Pos (x, y, z). Z.

Then, by homogeneous coordinate transformation, the spatial coordinates of Pos (x, y, z) can be expressed as:

Pos(x,y,z)=(Pos(x,y,z).x/Pos(x,y,z).w,Pos(x,y,z).y/Pos(x,y,z).w,Pos(x,y,z).z/Pos(x,y,z).w)

(equation 8)

Where Pos (x, y, z). W represents the variable introduced by the homogeneous coordinate transformation.

Then, the spatial coordinates Pos (x, y, z) of the pixel point are converted into two-dimensional image coordinates Pos (x, y), so that the two-dimensional image coordinates Pos (x, y) of the pixel point are calculated according to the depth information and the coordinate conversion of the pixel point.

And according to the depth information and coordinate conversion of the pixel point, calculating two-dimensional image coordinates Pos (x-2, y) of the pixel point (x-2, y).

And according to the depth information and the coordinate conversion of the pixel point, calculating two-dimensional image coordinates Pos (x+2, y) of the pixel point (x+2, y).

And according to the depth information and the coordinate conversion of the pixel point, calculating the two-dimensional image coordinate Pos (x, y-2) of the pixel point (x, y-2).

And according to the depth information and the coordinate conversion of the pixel point, calculating a two-dimensional image coordinate Pos (x, y+2) of the pixel point (x, y+2).

Then, the absolute value of the difference between the pixel brightness of the pixel points Pos (x, y) and Pos (x-2, y), pos (x, y) and Pos (x+2, y), pos (x, y) and Pos (x, y-2), pos (x, y) and Pos (x, y+2) is calculated.

Next, according to an algorithm similar to the above step (1), it is determined in the calculation shader of the GPU whether the absolute value of the difference between pixel intensities is greater than or equal to a preset distance threshold, and the number of differences between pixels whose absolute value is greater than or equal to the preset pixel threshold is counted. If the counted number is greater than or equal to the preset number (e.g. 3), i.e. the condition is satisfied, it may be determined that the pixel point is a corner point. If the counted number is smaller than the preset number (e.g. 3), the current pixel cannot be the corner point, and the processing of the current pixel is stopped.

In the embodiment of the present application, for each pixel point in a 16×16 pixel block area in the computing shader of the GPU, according to the step (1), it is determined that a part of the pixels in the image are unlikely to be corner points according to the image brightness information in the computing shader of the GPU, the part of the pixels are removed, and then according to the step (2), further determination is made on the pixels that are likely to be the corner points remaining according to the image depth information, thereby determining all the corner point pixels in the 16×16 pixel block area.

Step 103, calculating the response function value of the corner pixel

The process of calculating the response function value (otherwise referred to as the response value) for the corner pixels identified in step 102 in the computation shader of the GPU is described in detail below.

In the embodiment of the application, the response function value of each corner pixel is calculated according to the brightness difference and the depth difference of each corner pixel (x, y) and four adjacent pixels (x-1, y), (x+1, y), (x, y-1) and (x, y+1). Fig. 9B shows the positional relationship of the corner pixel (x, y) with four neighboring pixels (x-1, y), (x+1, y), (x, y-1), and (x, y+1).

(1) Luminance values of four neighboring pixels (x-1, y), (x+1, y), (x, y-1), and (x, y+1) around the corner pixel (x, y) are calculated, and depth values of four neighboring pixels around the corner pixel are calculated.

The luminance value of two adjacent pixels in the x direction of the corner pixel (x, y) is the luminance (x-1, y), and the luminance (x+1, y).

The luminance value of two adjacent pixels in the y direction of the corner pixel (x, y): luminance (x, y-1), luminance (x, y+1).

The Depth value of two adjacent pixels of the corner pixel (x, y) in the x direction is Depth_mip (x-1, y), depth_mip (x+1, y).

The Depth value of two adjacent pixels of the corner pixel (x, y) in the y direction is depth_mip (x, y-1), depth_mip (x, y+1).

(2) And (3) calculating the brightness difference of the corner pixels (x, y) in the x direction and the y direction according to the brightness value obtained by calculation in the step (1).

Luminance difference of corner pixel (x, y) in x direction: ix=density (x+1, y) -density (x-1, y);

luminance difference of corner pixel (x, y) in y direction iy=luminance (x, y+1) -luminance (x, y-1).

(3) And (3) calculating the depth difference between the x direction and the y direction of the corner pixel (x, y) according to the depth value calculated in the step (1).

Depth difference of corner pixel (x, y) in x direction: dx=depth_mip (x+1, y) -depth_mip (x-1, y);

Depth difference of corner pixel (x, y) in y-direction dy=depth_mip (x, y+1) -depth_mip (x, y-1).

(4) Using Ix, iy, dx, and dy calculated in the above (3), a response function value (denoted as response) of the corner pixel (x, y) is calculated using an exponential function.

Response function value of corner pixel (x, y) response=exp (Ix, iy, dx, dy) =e ^{(Ix,Iy,dx,dy)}. (equation 9)

In the embodiment of the application, the response function value can be calculated for each corner point in the 16×16 pixel block area, and the response function values of all corner point pixels in the 16×16 pixel block area can be calculated.

It should be noted that, in the related art, the response function value of the pixel is calculated only by calculating the brightness difference between the adjacent pixels, unlike the related art, in the embodiment of the present application, not only the brightness difference between the adjacent pixels is calculated on the GPU side, but also the depth difference between the adjacent pixels is calculated on the GPU side, and the response function value of each pixel is calculated on the GPU side according to the brightness difference between the adjacent pixels and the depth difference between the adjacent pixels, thereby improving the accuracy of calculating the response function value.

In the embodiment of the application, whether each corner pixel is a feature point can be judged by comparing the response function value response of the corner pixel with a preset response threshold. If the response function value response is greater than or equal to the preset response threshold, the current pixel point is a feature point, and the step 4 is continuously executed. If the response function value response is smaller than the preset response threshold, the current pixel is not a feature point, and the processing of the current pixel is stopped.

Therefore, some pixel points are screened out from the corner pixels according to the response function values of the corner pixels to serve as characteristic points, the calculated amount can be reduced, and the calculation accuracy can be improved.

It should be noted that, by performing feature point screening for each 16×16 pixel block area, there are results that one or more feature points exist in some 16×16 pixel block areas and no feature points exist in some 16×16 pixel block areas.

Thus, all feature points and their response function values within each 16×16 pixel block area can be determined by step 103.

Step 104, maximum value suppression of feature points

Maximum value suppression means searching local maximum values and removing similar feature points gathered locally.

Since there may be locally aggregated similar feature points in the 16×16 pixel block area determined by step 103, these similar feature points are redundant feature points, it is necessary to reject this part of similar feature points.

It should be noted that, in the related art, the maximum value suppression is implemented by executing the program code on the CPU through OpenCV, unlike the related art, the present application adopts the shared cache variable to implement the maximum value suppression.

In the embodiment of the application, the shared buffer variable of the thread working group of the computation shader can be utilized to take the maximum value of the response function values of all the characteristic points in each 16×16 pixel block area so as to eliminate the similar characteristic points which are locally gathered.

Referring to FIG. 10, in an embodiment of the present application, assuming the original image size is W H, then in the compute shader, W/16H/16 thread work groups (alternatively referred to as local work groups) may be employed. Wherein each thread work group comprises 16×16 threads, and each thread corresponds to a feature point pixel.

In the embodiment of the present application, each thread work group defines two shared (shared) variables of the shared cache, which are respectively referred to as a shared variable 1 and a shared variable 2. The shared variable 1 is used for storing the maximum response value max_response in the current working group, and the shared variable 2 is used for storing the pixel coordinate max_response_index corresponding to the maximum response value.

Referring to table 3 below, assuming w×h is 256×256, W/16×h/16=256 thread work groups are employed in the compute shader, each thread work group including 16×16 threads, each thread work group defining two shared variables of the shared cache. For each thread work group, determining the maximum response value of 16×16 threads in the thread work group, storing the maximum response value in the shared variable 1, and storing the pixel coordinate corresponding to the maximum response value in the shared variable 2.

TABLE 3 Table 3

In the embodiment of the present application, for each thread work group, each feature point pixel invokes the atomic operation atomicMax of the compute shader, computes the maximum between its own response value response and max_response, and assigns the computed maximum to max_response. And if the maximum response value max_response is equal to the response value response of the user, assigning the image coordinates of the user to max_response_index.

For example, consider a thread work group 1 having 16×16 threads, one for each feature point pixel, three of which are illustrated as examples. For example, the response values of the three feature points are 1,2,1.5, respectively. The response value response of each feature point is compared with max_response, and the maximum value is assigned to max_response. The maximum response value max_response stored in shared variable 1 is initialized to 0.

First, an atomic operation atomicMax (1, 0) =1 of the compute shader is invoked, and the maximum value of the response value 1 and the initialization value 0 can be determined to be 1, and the maximum value 1 is assigned to max_response, that is, the maximum response value max_response=1.

Then, the atomic operation atomicMax (2, 1) =2 of the computation shader is called again, the maximum value can be determined to be 2, and the maximum value 2 is assigned to max_response, that is, the maximum response value max_response=2.

Then, the atomic operation atomicMax (1.5, 2) =2 of the computation shader is called again, the maximum value can be determined to be 2, and the maximum value 2 is assigned to max_response, that is, the maximum response value max_response=2.

Thus, a local maximum response value can be determined. For thread work group 1, shared variable 1 holds the maximum response value in thread work group 1, and shared variable 2 holds the pixel coordinates corresponding to the maximum response value.

Through the step 4, the response function values of all the feature points in each 16×16 pixel block area are maximized by using the shared buffer variables of the thread work group of the computation shader, so as to eliminate the similar feature points which are locally aggregated.

For example, referring again to FIG. 10, for a W/16 XH/16 thread work group (also referred to as a local work group), a local maximum response value for 16X 16 threads in each thread work group may be determined according to step 104, such that W/16 XH/16 local maximum response values may be determined. And respectively recording W/16 XH/16 local maximum response values and pixel coordinates corresponding to the local maximum response values through the shared buffer variables of the thread work groups.

Step 105, outputting the characteristic point information

And writing the local maximum response values to pixel coordinates in a Mask image according to the local maximum response values and the pixel coordinates corresponding to the local maximum response values respectively recorded by the shared buffer variables of the thread work group.

In an embodiment of the present application, the G channel for each pixel in the depth image DEPTHMIPMAP is initialized to 0.0. The G channel of the pixel may be used to hold the maximum response value max_response of the thread work group. It should be noted that, the G channel is taken as an example for illustration, and in actual implementation, an R channel or a B channel may also be used, and in particular, may be set according to a use requirement, which is not limited by the embodiment of the present application.

For each feature point, the index gl_ LocalInvocationIndex of the feature point pixel in the thread work group is looked up. If the index of the feature point pixel 1 in the thread work group 1 is gl_ LocalInvocationIndex =0, the maximum response value max_response of the thread work group 1 is written to the G channel at the max_response_index position of the depth image DEPTHMIPMAP, thereby generating image feature point information. Thus, the identification of the image feature points is completed. Here, the position index of the thread is exemplified as 0, and the position index of the thread may be set to other values in actual implementation.

The embodiment of the application takes the depth image written with the maximum response value as a mask image.

Illustratively, as shown in fig. 11, by performing feature point recognition on a color image, 5 local maximum response values, namely, feature point 1 to feature point 5, are determined for the color image, the response values and pixel coordinates of the five feature points are stored in a shared buffer, and then the response values of the five feature points are written into corresponding pixel coordinates of a mask image, so as to obtain the mask image.

The mask image is a result of performing feature point recognition on the color image. All the identified feature points are marked in the mask image, and the response value of each feature point is marked at the pixel coordinates of the feature point.

It should be noted that, the results of feature point recognition are simply illustrated herein, and it is to be understood that, in actual implementation, more or fewer feature points may be recognized for a frame of image, and may be specifically determined according to actual situations, which is not limited by the embodiment of the present application.

According to the scheme, the characteristic point identification operation is directly executed in the rendering pipeline by utilizing the data specific to the GPU in the graphic rendering application, so that the identification efficiency is greatly improved on the premise of ensuring the identification accuracy, and the calculated power consumption is reduced.

The process of recognizing the image feature points by the GPU provided by the embodiment of the application is described above with reference to specific embodiments, and the result of feature point recognition is used for feature point matching. The image feature matching method based on the GPU provided by the embodiment of the application is described below with reference to specific embodiments.

For a better understanding of the embodiments of the present application, feature point recognition and matching will be described in association with the accompanying drawings.

After feature point recognition is carried out on each color image, a recognition result corresponding to each color image, namely a mask image, is obtained. And generating a mask image corresponding to each color image, wherein the mask image is marked with characteristic points and response values. And matching the feature points of the images with different colors by adopting corresponding mask images. For convenience of explanation, the feature point recognition and matching of the first color image and the second color image will be exemplified.

For example, referring to fig. 12A and 12B, feature point recognition is performed on the first color image and the second color image, respectively, to obtain a first mask image and a second mask image, and then feature points marked in the two mask images are compared to determine matched feature point pairs in the two mask images.

Specifically, response values of the feature points are marked in the first mask image, all the feature points in the first mask image are ordered according to the magnitude of the response values, and the largest N feature points are selected as a first feature point sequence. Then, a feature vector is calculated for each feature point in the first sequence of feature points.

And similarly, marking response values of the feature points in the second mask image, sorting all the feature points in the second mask image according to the response values, and selecting the largest N feature points as a second feature point sequence. A feature vector is calculated for each feature point in the second sequence of feature points.

Then, distance values between the feature vector of each feature point in the first feature point sequence and the feature vectors of all points in the second feature point sequence are calculated, wherein the two feature points with the smallest distance are the best matching points, so that a group of feature point pairs [ (x 0, y 0), (x 1, y 1) ] corresponding to each other is obtained, wherein (x 0, y 0) is the image coordinates of the feature point in the first color image, and (x 1, y 1) is the image coordinates of the feature point matched with (x 0, y 0) in the second color image.

As can be seen from fig. 12A and fig. 12B, in the embodiment of the present application, feature point recognition is performed on two frames of images, respective feature points of the two frames of images are recognized, and then feature point matching is performed on the respective feature points of the two frames of images, so as to determine matched feature point pairs in the two frames of images.

The following briefly describes a flow of image feature point matching provided by an embodiment of the present application. As shown in fig. 13, in an embodiment of the present application, the process of implementing image feature point matching by the electronic device through the GPU may include the following steps 201 to 205:

step 201, calculating a random vector by the GPU.

And loading the first mask image corresponding to the first color image and the second mask image corresponding to the second color image into a computing shader of the GPU. And calculating random numbers on the GPU by using the image coordinates of the pixels to generate 256 4-dimensional integer random constants.

The mask image is an image obtained by recognizing feature points of a color image (which may also be referred to as a real frame image), and features points and response values of the features points are marked in the mask image. Illustratively, referring back to fig. 11, a black region in the mask image is a non-feature point, a white region in the mask image is a feature point, and each feature point marks a response value of the feature point.

And 202, sequencing the feature points through the GPU.

Wherein the first color image and the second color image are loaded into a compute shader of the GPU. And sorting the characteristic points on the GPU according to the response values according to the characteristic points marked in the mask image and the response values, and selecting the characteristic points with the largest response values from the first N characteristic points.

And 203, calculating the feature vector of each feature point through the GPU.

Wherein the global workgroup size of the compute shader of the GPU is set to 16 x 16 (i.e., 256). And (3) calculating the feature vectors of the selected N feature points by utilizing the shared cache among the thread groups of the GPU's computation shader and the atomic operation and combining the 256 random constants ivec4 generated in the step (1).

And 204, calculating the feature distance between the feature vectors in the two frames of images by using the GPU.

And calculating feature distances between every two N feature vectors of the two frames of images, and counting the minimum feature distance.

And 205, performing feature matching according to the feature distance through the GPU.

If the minimum feature distance is smaller than the preset distance threshold, the feature points to which the two feature vectors of the minimum feature distance belong are successfully matched, and the two feature points successfully matched form a feature point pair. If the minimum feature distance is greater than or equal to the preset distance threshold, then the match fails. Thereby completing the matching of the image feature points.

In the embodiment of the application, in the process of matching the image feature points generated by the graphic rendering, the image resources are directly accessed in the graphic rendering pipeline, the copying operation of a memory between the GPU and the CPU is canceled, the heating of mobile terminal equipment caused by the memory bandwidth overhead is greatly reduced, and meanwhile, the high concurrent multithreading processing capacity provided by the GPU and the thread atomic operation access to the inter-core shared cache are utilized, so that the computing efficiency of feature point matching can be greatly accelerated, and the purpose of high-frame-rate application is achieved.

The flow of image feature point matching provided by the embodiment of the application is briefly described above, and a specific implementation manner of the image feature matching method based on the GPU provided by the embodiment of the application is described in detail below with reference to the accompanying drawings.

Step 201, calculating a random vector

In an embodiment of the present application, a readable and writable SSBO of size 256 and supporting ivec-4 vectors is created, ivec vector represents a signed integer four-component vector. For ease of illustration, this SSBO is denoted SSBO1.

It should be noted that, after the feature point recognition is performed on the color image, a mask image may be obtained. The mask image is marked with response function values corresponding to the image feature point pixels. Wherein the first color image corresponds to the first mask image and the second color image corresponds to the second mask image.

In the step of calculating the random number, the first mask image and the second mask image are loaded into a computation shader of the GPU, as shown in fig. 14A, the size of the local work group of the specified computation pipeline is 16×16, and the size of the global work group is (1, 1). That is, the global workgroup contains only one local workgroup, 16×16 threads as one local workgroup. 16 x 16 threads, i.e., 256 threads, are started on the GPU.

In an embodiment of the application, a random number seed is generated in a compute shader. For example, the value of the position index (gl_ LocalInvocationIndex) is multiplied by the scale factor (scale) of the parameter-specific random number to obtain a random number seed (denoted as r) by the following equation 10:

r=gl_ LocalInvocationIndex × scale (equation 10)

In the embodiment of the application, each thread in the working group comprehensively utilizes a trigonometric function and an exponential function to operate on a random number seed R to generate a 4-dimensional random vector (marked as R).

R=e ^sin(r) (equation 11)

Wherein each of the 4-dimensional random vectors is in the range of 0.0, 1.0.

It should be noted that, the embodiment of the present application generates 256 4-dimensional random vectors in total through calculation.

In an embodiment of the present application, each 4-dimensional random vector (R) is adjusted according to an offset parameter (offset). Illustratively, the random number may be adjusted by the following equation 12:

R' =r20xoffset (equation 12)

It should be noted that, in equation 12, 20 is a value that can be set according to practical situations, that is, the sampling size is defined as 20×20, and the offset parameter offset may be used to scale on the basis of the sampling size of 20×20, so that the scaled sampling size meets the requirement of sampling the image to be processed.

Illustratively, assuming that the local workgroup size employed for the image to be processed is 32×32, i.e., the image block size is 32×32, the offset parameter may take values in the range of [1.5,2.0 ]. Still further by way of example, assuming that the local workgroup size employed for the image to be processed is 16×16, i.e., the image block size is 16×16, the offset parameter may take values in the range of [0.25,0.5 ].

In actual implementation, R '= (r×20×offset) may be in decimal form, and for convenience of calculation, the result of the operation of R' may be taken as an integer part. The 4-dimensional random vectors are natural numbers through adjustment.

For ease of illustration, the 4-dimensional integer vector random number may be denoted as [ A, B, C, D ]. Wherein the two random numbers a and B represent a first random offset and the two random numbers C and D represent a second random offset. The 4-dimensional integer vector random number may be applied to the feature vector for calculating the feature points in the following steps, and the specific calculation process will be described in detail below.

Illustratively, the adjusted 4-dimensional random vector [ a, B, C, D ] = [2,2,3,5].

In the embodiment of the application, all 256 4-dimensional random vectors are adjusted to obtain 256 adjusted 4-dimensional random vectors, and the 256 adjusted 4-dimensional random vectors form a random number queue.

As shown in fig. 14B, in the embodiment of the present application, 256 adjusted 4-dimensional random vectors may be cached in SSBO 1.

In summary, the present application loads two frames of mask images into a computing shader of a GPU, generates a random number seed for each frame of mask image in the computing shader, then calculates the random number seed by using a trigonometric function and an exponential function to generate 256 four-dimensional vector random numbers, then performs scaling adjustment on the four-dimensional vector random numbers, and then combines the adjusted 256 four-dimensional integer vector random numbers [ a, B, C, D ] to form a random number queue.

It should be noted that, when the feature vector of the feature point is calculated later, each thread of the local working group may search the corresponding random number in the random number queue according to its own index (gl_ LocalInvocationIndex) and obtain the corresponding random offset, then sample the color image by using the thread coordinates and the random offset to obtain the rgb color value of the color image, calculate the brightness value of the color image according to the rgb color value of the color image, and further determine the feature vector of the feature point according to the brightness value of the color image.

Step 202, feature point ordering

First, the mask image obtained by recognizing the image feature points has a certain width and height. For convenience of explanation, the width of the mask image is denoted as W, and the height of the mask image is denoted as H.

In an embodiment of the present application, SSBOs of vec4 vectors of size N× (W/16) × (H/16) are created. For ease of illustration, this SSBO is denoted SSBO 2. Where vec4 vector refers to a 4-component floating point vector.

Wherein SSBO1 may include (W/16) × (H/16) pieces of data, each of which may store N pieces of data.

In the embodiment of the application, SSBO 2 can be used for segment storage of the characteristic point data of each of (W/16) x (H/16) local work groups, wherein each local work group is a 16 x 16 image block. Since the number of feature points stored in one 16×16 image block is at most 16×16, N is less than or equal to 256.

It should be noted that N may be 8 or 12, and may be specifically set according to practical requirements, which is not limited by the embodiment of the present application. For convenience of explanation, in the embodiment of the present application, N is taken as 8 to be exemplary. When N is 8, N is more than or equal to 0 and less than or equal to 8.

Illustratively, it is assumed that the mask image has a size of 256×256, i.e., a width W and a height H of 256 pixels. As described above, in the embodiment of the present application, 16×16=256 threads are taken as one local work group, so that 256 local work groups can be divided for a mask image of 256×256 in size.

Accordingly, the size of SSBO 2 may be:

n× (W/16) × (H/16) =N× (256/16) × (256/16) =256×N (equation 13)

That is, SSBO 2 may store 256 pieces of data of a local work group in segments, and each segment may store N pieces of data.

In the feature point ordering step, two frames of mask images are loaded into a computation shader, the size of a local work group of a computation pipeline is designated as 16×16, and the size of a global work group is (W/16, h/16, 1), i.e., the global work group includes W/16×h/16 local work groups. That is, the global workgroup contains W/16×H/16 local workgroups, with 16×16 threads as one local workgroup. 16 x 16 threads, i.e., 256 threads, are started on the GPU.

Illustratively, assuming that the width W and the height H of the mask image are 256 pixels, the global workgroup of the computation pipeline of the computation shader has a size (16,16,1) and the local workgroup has a size of 16×16, as shown in fig. 15.

In the embodiment of the application, the characteristic point ordering is realized by utilizing a shared buffer mechanism among thread work groups of the computing shader. Illustratively, a shared cache of vec4 vectors of size N may be defined. The shared cache may store data for N feature points.

In an embodiment of the present application, referring to FIG. 16, the global workgroup includes W/16 XH/16 local workgroups. Feature points are extracted for each local work group in the global work group. The local working group may or may not include one or more feature points. In order to improve the calculation rate, the number N of the characteristic points in the local working group is controlled to be N at most, namely the number N of the characteristic points in each local working group is in the range of [0, N ].

Specifically, referring to fig. 17, for each feature point corresponding to the local working group, the response values of the feature points may be extracted, n feature points with the largest response values in the current local pixel region may be counted by using an atomic operation (atomicMax), and the n feature points with the largest response values may be arranged in order from large to small. Then, the data of the n feature points are stored in the shared cache. In the thread with the position index (gl_ LocalInvocationIndex) of 0, the data of n feature points in the shared cache are copied to the corresponding position of the SSBO 2. In this case, the thread with the index of 0 is taken as an example for illustration, and in actual implementation, threads with other values of the index may be selected, so long as any thread in the thread work group is selected each time to execute subsequent operations, so as to avoid error reporting caused by simultaneous execution of multiple threads.

The embodiment of the application determines N characteristic points with the largest response values on the whole image in a sectional sorting and sectional storage mode.

In one implementation, the whole image is divided into a plurality of local pixel areas, and N feature points with the largest response values in each local pixel area are respectively stored in a plurality of segments of SSBO 2 after being sequenced. Referring to fig. 18A, all feature points in all local pixel areas stored in a plurality of segments of SSBO 2 are sorted, and feature points of the first N maximum response values in the entire image are selected.

In another implementation manner, the whole image is divided into a plurality of local pixel areas, after the N feature points with the largest response values in each local pixel area are sorted, the N feature points are respectively stored in a plurality of segments of the SSBO 2, and then the feature points in each local pixel area are sorted according to the response values from large to small. And selecting one characteristic point with the local maximum response value in each local pixel region, and then sorting all the selected characteristic points with the local maximum response values, thereby selecting the characteristic points with the first N local maximum response values in the whole image.

For example, referring to fig. 18B, in SSBO 2, (W/16) × (H/16) sub-arrays each having a length of N may be stored, wherein N feature point data may be stored in each sub-array, the N feature points being arranged in order from large to small according to the response value. That is, the first position in each subarray stores the value with the largest response value, i.e., the local maximum value, among the N feature points. For (W/16) x (H/16) child arrays, (W/16) x (H/16) local maxima may be extracted. And then sorting the (W/16) x (H/16) local maximum values from large to small, and selecting N feature points with the largest global value from the (W/16) x (H/16) local maximum values as N feature points with the largest response values on the whole image.

For example, assuming that N is 8, i.e., each sub-array has a length of 8, if there are 3 feature points, the 3 feature points may be sorted from big to small in response values into one sub-array in SSBO 2, and the other 5 positions of the sub-array are written with 0.

Referring to fig. 19, after feature point identification is performed on an image, a large number of feature points are identified in the image, and in order to improve the efficiency of feature point matching, in the embodiment of the application, the N feature points with the largest overall situation are screened out by adopting a feature point segmentation ordering mode, namely, only the N feature points with the largest response values are reserved through feature point ordering, and then feature vectors of the N feature points with the largest response values are calculated.

By utilizing the feature point segmentation sequencing mode provided by the application, feature point sequencing is respectively carried out on the first mask image and the second mask image, so that two groups of feature point sequences are obtained, wherein the two groups of feature point sequences are respectively the first feature point sequence of the first mask image and the second feature point sequence of the second mask image. For example, N may be 8, that is, the first feature point sequence includes 8 feature points with the largest response values on the first mask image, and the second feature point sequence includes 8 feature points with the largest response values on the second mask image.

Step 203, calculating the feature vector of the feature point

For the first and second feature point sequences generated in step 202, a feature vector (specialVector) is calculated for each of the two feature point sequences.

Referring to fig. 20, after the segment ordering of a large number of feature points in the mask image, N (e.g., 8) feature points having the largest response values are retained in the mask image. Then, a feature vector is calculated for each feature point by first determining two pixels offset by a random offset with respect to the feature point and then calculating the feature vector for the feature point based on the luminance difference of the two pixels.

The process of calculating the feature vector of each feature point will be described in detail below.

In an embodiment of the present application, the first color image and the second color image are loaded to the compute shader, specifying that the local workgroup of the compute pipeline of the compute shader is 16 x 16 in size and the global workgroup is (1, 1) in size. That is, the global workgroup contains only one local workgroup, 16×16 threads as one local workgroup. 16 x 16 threads, i.e., 256 threads, are started on the GPU.

In the GPU's compute shader, each thread of the local workgroup looks up a 4-dimensional random vector corresponding to a subscript in 256 4-dimensional random vectors cached in SSBO 1 as a random offset (offsetUV _i) according to its own position index (gl_ LocalInvocationIndex).

Wherein, the random offset (offsetUV _i) can be expressed as a 4-dimensional random vector [ A, B, C, D ], A and B represent a first random offset, which can be denoted as offsetUV _i1, and C and D represent a second random offset, which can be denoted as offsetUV _i2.

It should be noted that the reason for using the 4-dimensional random vector in the embodiment of the present application is that two-dimensional random vectors are required to do address offset. That is, in calculating the feature vector of a pixel, two neighboring pixels are required to be taken at random for the position of the current pixel. If a neighboring pixel is taken randomly, then a two-dimensional random vector is required that is offset randomly in both the lateral and longitudinal directions. Further, if two neighboring pixels are taken at random, two-dimensional random vectors are required, which constitute a four-dimensional random vector.

For example, referring to FIG. 21, assume a 4-dimensional random vector of [2,2,3,5]. Wherein [2,2] represents a first random offset and [3,5] represents a second random offset. And (3) performing first offset on the characteristic points, wherein the random offset is [2,2], and the coordinates of the offset pixel image are (x+2, y+2). And (3) performing second offset on the characteristic points, wherein the random offset is [3,4], and the coordinates of the pixel image after the offset are (x+3, y+5).

In the embodiment of the application, for each thread, two random neighbor pixels at the pixel position of the current feature point are determined, the color values of the two random neighbor pixels are obtained by sampling, then the brightness of each random neighbor pixel is calculated according to the color value of each random neighbor pixel, then the respective brightness of the two random neighbor pixels is compared, and the feature vector of the feature point is obtained according to the comparison result. The process of calculating the feature vector of the feature point will be described in detail below.

First, a first Color image (Color 1) is sampled by using the coordinates (gl_globalinvationid. Xy) and the random offset offsetUV _i1 of each thread, and a first rgb Color value of a pixel point corresponding to the thread is obtained:

Color 1_i.rgba=texture (Color 1, gl_globalInvationid.xy+ offsetUV _i1) (equation 14)

Wherein the texture () function returns a vector of the vec4 type, representing the color value of the texture at the specified location.

The Color 1. Rgba indicates the first rgb Color value of the pixel point. The first rgb Color value is a 4-dimensional vector of Color1_ i.r, color1_i.g, color1_i.b, and Color1_i.a.

Similarly, the first Color image (Color 1) is sampled by using the coordinates (gl_globalinvationid. Xy) and the random offset offsetUV _i2 of each thread, and the second rgb Color value of the pixel point corresponding to the thread is obtained:

Color1 i +1.Rgba = texture (Color 1, gl_Globalntervalation ID. Xy+ offsetUV _i2) (equation 15)

The Color1_i+1.Rgba represents the second rgb Color value of the pixel point. The second rgb Color values are 4-dimensional vectors of Color1_i+1.r, color1_i+1.g, color1_i+1.B, and Color1_i+1.A.

Then, a first luminance value intensity_i is calculated according to the first rgb color value of the pixel point:

intesity_i=color 1_ i.r x 0.299+color1_i.g x 0.587+color1_i.b x 0.114 (equation 16)

Similarly, a second luminance value of luminance_i+1 is calculated from the second rgb color value of the pixel point:

Intesity_i+1=color 1_i+1.r 0.299+color1_i+1.g 0.587+color1_i+1.b 0.114 (equation 17)

Then, each thread calculates the size relation between the first brightness value intensity_i and the second brightness value intensity_i+1 corresponding to the thread by utilizing the atomic operation atomicOr () of the thread work group, and a calculation result is obtained:

result= atomicOr (integrity_i, integrity_i+1) (equation 18)

For example, if the density_i+1 is greater than the density_i, then result is 1. If the density_i+1 is less than or equal to the density_i, then result is 0.

In the embodiment of the application, the operations are respectively carried out on 16×16 threads, so that 16×16 results can be obtained.

Then, 16×16 results are output to the gl LocalInvocationIndex th position of the feature vector (specialVector).

In the embodiment of the application, a plurality of results can be written through an Atomic () function, and the Atomic () function can be specifically expressed as:

Atomic(specialVector[gl_LocalInvocationIndex],result)

wherein simultaneous writing of multiple results can be supported by the Atomic () function. Thus, read-write conflicts in the GPU caused by read-write locks can be avoided.

In the embodiment of the application, after all the thread calculation is completed, a feature vector of a feature point is obtained.

In this way, the feature vector is calculated for each of the N feature points of the first Color image (Color 1), resulting in N feature vectors. Wherein the feature vector of each feature point may be a 256-bit sequence of 0 or 1.

It should be noted that, the above is exemplified by calculating the feature vector of each feature point in the first feature point sequence for the first Color image (Color 1), and the process of calculating the feature vector of each feature point in the second feature point sequence for the second Color image (Color 2) is similar to the above, and will not be repeated. Therefore, by the algorithm provided by the embodiment of the application, the feature vector of each feature point in the first feature point sequence and the feature vector of each feature point in the second feature point sequence can be calculated.

For example, referring to table 4 below, the first feature point sequence includes N feature points with the largest response values in the first color image, each of the N feature points corresponds to a feature vector, and the feature vector corresponding to each feature point is a sequence of 256 bits of 0 or 1. Similarly, the second feature point sequence includes N feature points with the largest response values in the second color image, each feature point in the N feature points corresponds to a feature vector, and the feature vector corresponding to each feature point is a sequence of 256 bits 0 or 1.

TABLE 4 Table 4

Step 204, calculating feature distances between feature vectors

In the above step 202, a first feature point sequence (denoted by KeyPoint 1) is generated for one frame image and a second feature point sequence (denoted by KeyPoint) is generated for another frame image, and in step 203, a feature vector of each feature point in the first feature point sequence and a feature vector of each feature point in the second feature point sequence are calculated.

The two feature point sequences respectively comprise N feature points, each feature point corresponds to a feature vector, and each feature point corresponds to a 256-bit 0 or 1 sequence.

It should be noted that the int data type occupies 4 bytes, i.e. 32 bits, in the memory, so that 8 int variables can be used to represent 256 bits of data. In particular, in the scheme of the application, since the feature vector is a group of 256-bit 0 or 1 sequences, the application uses 8 int-type variable segments to store the values of the feature vector. Where each int-type variable stores a sequence of 32 bits 0 or 1.

Wherein 8 int-type variables can be represented as INT SVDATA [ k ], k 0, 1.

Any one of the feature vectors in the first feature point sequence KeyPoint1 may be expressed as KeyPoint1.svData[0]、KeyPoint1.svData[1]、KeyPoint1.svData[2]、KeyPoint1.svData[3]、KeyPoint1.svData[4]、KeyPoint1.svData[5]、KeyPoint1.svData[6]、KeyPoint1.svData[7].

Any one of the feature vectors in the second feature point sequence KeyPoint1 can be expressed as KeyPoint2.svData[0]、KeyPoint2.svData[1、KeyPoint2.svData[2]、KeyPoint2.svData[3]、KeyPoint2.svData[4]、KeyPoint2.svData[5]、KeyPoint2.svData[6]、KeyPoint2.svData[7].

In the embodiment of the present application, the exclusive or operation may be performed on the feature vectors of the first feature point sequence KeyPoint and the second feature point sequence KeyPoint in pairs:

Int result0=KeyPoint1.svData[0]^KeyPoint2.svData[0];

Int result1=KeyPoint1.svData[1]^KeyPoint2.svData[1];

Int result2=KeyPoint1.svData[2]^KeyPoint2.svData[2];

Int result3=KeyPoint1.svData[3]^KeyPoint2.svData[3];

Int result4=KeyPoint1.svData[4]^KeyPoint2.svData[4];

Int result5=KeyPoint1.svData[5]^KeyPoint2.svData[5];

Int result6=KeyPoint1.svData[6]^KeyPoint2.svData[6];

Int result7=KeyPoint1.svData[7]^KeyPoint2.svData[7]。

Where "≡" is the sign of the exclusive or operation. Table 5 shows the arithmetic rules of the exclusive-or operation.

TABLE 5

In the embodiment of the application, the feature vectors of the first feature point sequence KeyPoint and the second feature point sequence KeyPoint are saved by 8 int-type variable fragments, 8 exclusive-or operations are correspondingly carried out for every two pairs, and 8 exclusive-or results are correspondingly obtained. Each exclusive or result is a sequence of 32 bits 0 or 1.

In the embodiment of the present application, the number of 1 in each exclusive-or result may be counted by using bitCount () instruction of the GPU, and the numbers of 1 in 8 exclusive-or results are accumulated and summed, and the sum obtained by accumulation is used as the feature distance (SVDISTANCE) of two feature vectors:

svDistance＝bitCount(result0)+bitCount(result1)+bitCount(result2)+bitCount(result3)+bitCount(result4)+bitCount(result5)+bitCount(result6)+bitCount(result7).

the method of calculating the feature distances of the two feature vectors is described above. The method for calculating the feature distance of each feature point in the two feature point sequences according to the embodiment of the present application is described below based on the above method for calculating the feature distances of the two feature vectors.

Referring to fig. 22A and 22B, the first feature point sequence (KeyPoint 1) includes N feature points (denoted as KeyPoint 1_i) having the largest response value in the first color image, each feature point corresponding to a feature vector. Similarly, the second feature point sequence (KeyPoint) includes N feature points (denoted as KeyPoint2 _i) with the largest response values in the second color image, and each feature point corresponds to a feature vector.

In the embodiment of the application, according to the feature vector of each feature point (KeyPoint 1 _i) in the first feature point sequence (KeyPoint 1), and the feature vectors of all feature points (Keypoint 2_i, i) in the second feature point sequence (Keypoint 2), the feature distances between every two feature points are calculated by taking the feature vectors of 1,2, N in sequence.

A total of N sets of computations are required, and each set of computations results in N feature distances.

The first set of calculations, as shown in fig. 22A and table 6 below, may calculate feature distances between each pair based on the feature vector of the first feature point (KeyPoint 1 _1) in the first feature point sequence (KeyPoint 1) and the feature vectors of the N feature points Keypoint2_1, keypoint2_2, keypoint2_n in the second feature point sequence (Keypoint 2). For feature point KeyPoint1_1, N feature distances can be obtained by calculation.

The second set of calculations, as shown in fig. 22B and table 6 below, may calculate feature distances between each pair based on the feature vector of the second feature point (KeyPoint 1 _2) in the first feature point sequence (KeyPoint 1) and the feature vectors of the N feature points Keypoint2_1, keypoint2_2, keypoint2_n in the second feature point sequence (Keypoint 2). For feature point KeyPoint1_2, N feature distances can be obtained by calculation.

As can be seen from table 6, the feature distances between the feature vectors of the nth feature point (KeyPoint 1 _n) in the first feature point sequence (KeyPoint 1) and the feature vectors of the N feature points Keypoint2 _2_1, keypoint2_2, keypoint2_n in the second feature point sequence (Keypoint) can be calculated. For feature point KeyPoint1_n, N feature distances can be obtained by calculation.

Wherein Keypoint1_1, keypoint1 _1_2,.. Keypoint1_n represent the first and second feature points, respectively, of the first feature point sequence (Keypoint), respectively. Keypoint2_1, keypoint2_2,.. Keypoint2_n represent the first feature point and the second feature point of the second feature point sequence (Keypoint 2), respectively.

TABLE 6

Step 205, performing feature matching according to the feature distance

In the embodiment of the present application, referring to the following table 7, for each of the N sets of computations, a minimum value of the N feature distances is found, and the minimum value of the N feature distances is compared with a preset distance threshold (SVDISTANCE _min), and whether the feature points are matched is determined according to the comparison result, so as to improve accuracy of feature point matching.

TABLE 7

If the minimum distance value in the N feature distances is smaller than the preset distance threshold value, the two feature points corresponding to the minimum distance value are successfully matched. If the minimum distance value of the N feature distances is greater than or equal to the preset distance threshold value, the matching fails.

When the matching operation of the feature points in the two frames of images is completed, the feature point pairs characterizing the same object in different images may be output.

For example, as shown in fig. 23, assuming that the feature point (x 1, y 1) in the first color image and the feature point (x 2, y 2) in the second color image are two feature points successfully matched, the feature point (x 1, y 1) and the feature point (x 1, y 1) constitute one feature point pair. Accordingly, the pair of feature points may be noted as [ (x 1, y 1), (x 2, y 2) ].

The following describes an application scenario of feature point identification and matching based on GPU implementation provided by the embodiment of the present application. An exemplary description will be given below taking a game scene as an example.

The feature point identification and matching based on GPU implementation provided by the embodiment of the application can be applied to inter-frame prediction of game images. Referring to fig. 24, one frame of image may be predicted from two consecutive frames of game images as an intermediate frame of game image of the two frames of game images. The continuous two frames of game images are respectively called a real frame N-1 and a real frame N+1, and the predicted intermediate frame game image is called a predicted frame N.

In the embodiment of the application, the prediction frame can be generated according to the two rendering instruction streams acquired successively. The term "two rendering instruction streams acquired sequentially" means that the rendering instruction stream 1 is acquired first, then the rendering instruction stream 2 is acquired, and after the rendering instruction stream 1 is acquired, the GPU does not acquire other rendering instruction streams before the rendering instruction stream 2 is acquired. Specifically, the GPU may sequentially obtain two rendering instruction streams, where one rendering instruction stream is used to draw one real frame image, identify feature points of each real frame image, match feature points in the two real frame images after identifying feature points of each real frame image, calculate a motion vector (including translation and/or rotation, for example) of each pixel according to image coordinate information of a matched feature point pair, and then make pixel-by-pixel offset on a 3D scene image corresponding to a certain real frame based on the motion vector, so as to generate a final predicted frame.

Assuming that the GPU generates a real frame N-1 according to the rendering instruction stream 1 acquired first, generates a real frame n+1 according to the rendering instruction stream 2 acquired later, and generates a predicted frame N according to the rendering instruction stream 1 and the rendering instruction stream 2, the GPU may send these frames into the queue to be displayed in the order of the real frame N-1, the predicted frame N, and the real frame n+1. And then the display screen sequentially plays the real frame N-1, the predicted frame N and the real frame N+1.

It should be understood that, because the resource consumption of the predicted frame generated by adopting the inter-frame prediction technology is smaller than the resource consumption of the real frame rendered by the game application, by reducing the number of the real frames generated by the game application and generating the predicted frame by adopting the inter-frame prediction technology, the total amount of the game frames can be ensured, and the power consumption load of the chip can be reduced.

In the inter-frame prediction process, the embodiment of the application respectively performs feature point identification and matching on two frames of images, calculates a motion vector (including translation and/or rotation for example) of each pixel according to the image coordinate information of the matched feature point pair, and performs pixel-by-pixel offset on a 3D scene image corresponding to a certain real frame based on the motion vector to generate a predicted frame. It should be noted that, in the embodiment of the present application, by performing image feature recognition and matching on a GPU in real time and with low power consumption, feature points of different images in the graphics rendering pipeline are recognized and matched by introducing the graphics rendering unit of the GPU and the graphics rendering pipeline, especially introducing the special functions of the GPU such as the computing shader of the GPU and the inter-core shared buffer and the atomic operation of the thread work group, so as to greatly improve the computing efficiency, reduce the time overhead of pixel-by-pixel computation, and simultaneously reduce the overhead of feature recognition and matching and data copying and context switching between the graphics rendering pipelines.

The application can realize the characteristic point matching operation of the rendered image with high efficiency and low power consumption by utilizing some characteristic functions of the GPU and logic for directly processing characteristic point matching in a graphic rendering pipeline, and can realize the image characteristic point matching function with high frame rate on a platform with limited computing capacity at a mobile end. By the scheme of the application, the power consumption of the mobile terminal equipment in the aspects of feature matching and machine vision algorithm can be effectively reduced, the method and the device are widely applied to image rendering application scene services such as VR/AR, the image processing efficiency and the computing accuracy are improved, and the computing power consumption is reduced.

In the embodiment of the present application, "greater than" may be replaced with "greater than or equal to", "less than or equal to" may be replaced with "less than", or "greater than or equal to" may be replaced with "greater than", "less than" may be replaced with "less than or equal to".

The various embodiments described herein may be separate solutions or may be combined according to inherent logic, which fall within the scope of the present application.

The foregoing describes the solution provided by the embodiments of the present application primarily from the perspective of method steps. It will be appreciated that, in order to implement the above-described functions, an electronic device implementing the method includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The present application also provides a chip coupled to a memory for reading and executing a computer program or instructions stored in the memory to perform the methods of the embodiments described above.

The application also provides an electronic device comprising a chip for reading and executing a computer program or instructions stored in a memory, such that the method in the embodiments is performed.

The present embodiment also provides a computer-readable storage medium having stored therein computer instructions that, when executed on an electronic device, cause the electronic device to perform the above-described related method steps to implement the GPU-based image feature matching method in the above-described embodiments.

The present embodiment also provides a computer program product, in which a program code is stored in a computer readable storage medium, which when executed on a computer causes the computer to perform the above-mentioned related steps to implement the GPU-based image feature matching method in the above-mentioned embodiments.

In addition, the embodiment of the application also provides a device which can be a chip, a component or a module, and the device can comprise a processor and a memory which are connected, wherein the memory is used for storing computer execution instructions, and when the device runs, the processor can execute the computer execution instructions stored in the memory so that the chip can execute the GPU-based image feature matching method in the method embodiments.

The electronic device, the computer readable storage medium, the computer program product or the chip provided in this embodiment are used to execute the corresponding method provided above, so that the beneficial effects thereof can be referred to the beneficial effects in the corresponding method provided above, and will not be described herein.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A GPU-based image feature matching method, comprising:

2. The method of claim 1, wherein each of the M four-dimensional random constants comprises a first random offset and a second random offset;

the determining the adjacent pixel brightness difference of the feature point pixels in the color image according to the M four-dimensional random constants includes:

In a third computation shader of the GPU, for each thread of a local working set, determining a first pixel color value shifted by the first random offset and a second pixel color value shifted by the second random offset relative to a first feature point pixel;

A luminance difference between the second pixel luminance value and the first pixel luminance value is calculated.

3. The method according to claim 2, wherein the determining the feature vector of the feature point having the largest response function value from the adjacent pixel brightness differences of the feature point pixels in the color image includes:

If the brightness difference value is greater than 0, outputting a result of 1;

if the brightness difference value is smaller than or equal to 0, outputting a result of 0;

and obtaining a sequence of 0 or 1 according to the output result, and taking the sequence as the feature vector corresponding to the first feature point pixel.

4. A method according to claim 3, wherein in the case where the local work group comprises 16 x 16 threads, the feature vector corresponding to the first feature point pixel is a sequence of 0 or 1 of a group of 256 bits.

5. The method according to any one of claims 1 to 4, further comprising:

storing the feature vector of one feature point pixel of the first color image in a segmented mode according to S integer variables;

storing the feature vector of one feature point pixel of the second color image according to S integer variable segments;

exclusive or-operating the i-th integer variable stored in the segments of the first color image with the i-th integer variable stored in the segments of the second color image, i being 1.

Counting the number of 1 s in an exclusive OR operation result by utilizing bitCount instructions of the GPU;

And taking the counted number of 1 as the characteristic distance between one characteristic point pixel of the first color image and one characteristic point pixel of the second color image.

6. The method of claim 5, wherein S takes 8 in the case where the feature vector is a set of 256-bit sequences of 0 or 1 and the 1 integer variable is 32 bits.

7. The method according to any one of claims 1 to 6, wherein the value of M is equal to the local workgroup size of a computation pipeline in a third computation shader of the GPU.

8. The method of any one of claims 1 to 7, wherein a size of a local workgroup of a computation pipeline in a third computation shader of the GPU is equal to a size of a local workgroup of a computation pipeline in a first computation shader of the GPU, and wherein a size of a global workgroup of a computation pipeline in the third computation shader of the GPU is equal to a size of a global workgroup of a computation pipeline in the first computation shader of the GPU.

9. The method according to any one of claims 1 to 8, wherein the product of the size of the local working group and the size of the global working group of the computation pipeline in the second computation shader of the GPU is equal to the size of the mask image;

The size of the local working group of the computation pipeline in the second computation shader of the GPU is equal to the size of the local working group of the computation pipeline in the third computation shader of the GPU.

10. The method according to any one of claims 1 to 9, wherein in the first compute shader of the GPU, the local workgroup size of the compute pipeline is 16 x 16 and the global workgroup size is (1, 1).

11. The method according to any one of claims 1 to 10, wherein in the second compute shader of the GPU, the local workgroup of the compute pipeline has a size of 16 x 16, the global workgroup has a size of (W/16, H/16, 1), and the mask image has a size of W x H pixels.

12. The method according to any one of claims 1 to 11, wherein in the third compute shader of the GPU, the local workgroup size of the compute pipeline is 16 x 16 and the global workgroup size is (1, 1).

13. The method according to any one of claims 1 to 12, wherein the generating M four-dimensional random constants from image coordinate data of feature point pixels in the mask image comprises:

multiplying the position index value of the feature point pixel in the mask image by a specified random number scaling coefficient to obtain a random number seed;

in a first computation shader of the GPU, for each thread in a local work group, calculating a random number seed by utilizing a trigonometric function and an exponential function to generate a four-dimensional random constant, and adjusting the four-dimensional random constant according to an offset parameter to obtain an adjusted four-dimensional random constant;

In the case where the local work group includes 16×16 threads, 256 adjusted four-dimensional random constants are obtained, M being equal to 256.

14. The method according to any one of claims 1 to 13, wherein after said generating M four-dimensional random constants from image coordinate data of pixels in the mask image, the method further comprises:

m four-dimensional random constants are stored in the first shader memory buffer.

15. The method according to any one of claims 1 to 14, wherein in the second compute shader of the GPU, a global workgroup comprises W/16 x H/16 local workgroups;

The step of sorting the marked plurality of feature point pixels in the mask image according to the response function value, and determining the first N feature point pixels with the largest response function value in the mask image includes:

For W/16 XH/16 local working groups, utilizing atomic operation atomicMax of the GPU to determine the characteristic points with the largest response function values of the first N characteristic points in each local working group, wherein the characteristic points with the largest response function values of the first N characteristic points are arranged from large to small according to the response function values;

For W/16 XH/16 local work groups, determining the characteristic point with the largest response function value of the first N in the global work group from N XW/16 XH/16 characteristic points as the characteristic point with the largest response function value of the first N in the mask image.

16. The method of claim 15, wherein for W/16 x H/16 local work groups, determining the first N feature points in the global work group with the largest response function value from the nxw/16 x H/16 feature points comprises:

For W/16 XH/16 local working groups, sequentially storing the first N characteristic points with the largest response function values in each local working group in a second shader storage buffer area;

And determining the characteristic points with the largest response function values in the first N response function values in the global working group from the N multiplied by W/16 multiplied by H/16 characteristic points stored in the second shader memory buffer.

17. The method of claim 16, wherein for W/16×h/16 local work groups, storing the first N feature points with the largest response function values in each local work group in the second shader memory buffer sequentially, includes:

For W/16 XH/16 local working groups, storing the first N characteristic points with the largest response function values in each local working group in a shared buffer area among the working groups according to the position index;

Copying the characteristic points in the shared buffer area to the corresponding positions of the second shader memory buffer area according to the position index.

18. The method according to any one of claims 1 to 17, wherein the matched pairs of feature points comprise a first feature point and a second feature point;

After the feature points to which the two feature vectors of the minimum feature distance belong are determined to be the matched feature point pairs, the method further includes:

Extracting coordinate parameters of the first feature points from mask images corresponding to the first color images;

And extracting the coordinate parameters of the second characteristic points from the mask image corresponding to the second color image.

19. An electronic device comprising a processor, a memory, and a computer program stored on the memory, the processor configured to execute the computer program to cause the electronic device to implement the method of any one of claims 1-18.

20. A chip comprising a processor, a memory and a computer program stored on the memory and executable on the processor, wherein the processor is configured to execute the computer program to implement the method of any one of claims 1 to 18.

21. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when run on an electronic device, causes the electronic device to perform the method of any one of claims 1 to 18.