CN114821479B

CN114821479B - Image recognition method, device and equipment

Info

Publication number: CN114821479B
Application number: CN202210491488.7A
Authority: CN
Inventors: 李劼; 邬浩; 王益安
Original assignee: Beijing Datamesh Technology Co ltd
Current assignee: Beijing Datamesh Technology Co ltd
Priority date: 2022-05-07
Filing date: 2022-05-07
Publication date: 2025-04-01
Anticipated expiration: 2042-05-07
Also published as: CN114821479A

Abstract

The embodiments of the present application relate to the field of computers, and disclose an image recognition method, device, and equipment. Obtain image information of the recognition target, each image information includes: color information of each recognition target; obtain first coordinate information of the recognition target based on the color information of the recognition target; obtain depth map information of the recognition target; obtain spatial transformation features corresponding to the recognition target; obtain second coordinate information corresponding to the recognition target based on the first coordinate information and spatial transformation features corresponding to the recognition target; obtain dimension information based on the second coordinate information corresponding to the recognition target. It can be seen that the technical solution converts the original 3D recognition object recognition process into 2D image information recognition by utilizing the image information of the target object and the corresponding spatial feature transformation. In this way, the recognition process of the target object can be made simpler and the application difficulty can be reduced; on the other hand, the training of 3D recognition objects using 2D image information is realized, thereby improving training efficiency.

Description

Image recognition method, device and equipment

Technical Field

The embodiment of the invention relates to the field of computers, and relates to an image identification method, device and equipment.

Background

Image recognition technology refers to technology that processes, analyzes, and understands images with a computer to recognize recognition objects of various different modes. The recognition objects of the existing image recognition technology are generally divided into two-dimensional (2D) objects and three-dimensional (3D) objects according to dimension division. In some scenes identified by detection, a computer is generally required to identify and distinguish a 2D object from a 3D object according to picture information.

For the recognition of the 2D object, the 2D picture information is usually combined with a corresponding learning method to train the computer, the method is mature, and the recognition process is simple. For 3D object recognition, depth information of the 3D object is generally acquired by a plurality of image acquisition devices arranged at different angles, and corresponding spatial characteristics of the 3D object are obtained by combining a point cloud technology, so that a computer can complete 3D object recognition through the corresponding spatial characteristics.

On the one hand, the current recognition of the 3D object is complicated in the process of obtaining depth information and applying the point cloud technology, so that the actual application difficulty is high, and on the other hand, the training object is an actual object, so that the available training set resources are less than those of the training set resources available for recognizing the 2D object, and the training efficiency is low.

Disclosure of Invention

The embodiment of the application provides an image recognition method, device and equipment, which are used for solving the problems of high difficulty in distinguishing and recognizing 3D images and low training efficiency in the existing detection and recognition scene.

In a first aspect, an embodiment of the present application provides an image recognition method, where the method includes:

Acquiring picture information of at least one identification target, wherein each identification target corresponds to one piece of picture information, and each piece of picture information comprises color information corresponding to each identification target;

Obtaining first coordinate information of each identification target in corresponding picture information according to the color information corresponding to each identification target;

Acquiring depth map information of at least one identification target, wherein each identification target corresponds to one depth map information;

Obtaining the space transformation characteristics corresponding to each recognition target according to the picture information corresponding to each recognition target and the depth map information corresponding to each recognition target;

Obtaining second coordinate information corresponding to each recognition target according to the first coordinate information corresponding to each recognition target and the space transformation characteristic corresponding to each recognition target, wherein the second coordinate information acts on the depth map information;

and obtaining the dimension information of the identification target according to the second coordinate information corresponding to the identification target.

Therefore, the spatial transformation characteristic relation is established through the picture information and the depth map information of the identification target, the dimension information of the identification target can be obtained through calculation through the pixel coordinates obtained through the picture information, and the identification difficulty of the 3D identification target is reduced.

In some possible embodiments, the picture information is acquired by a picture acquisition device and the depth map information is acquired by a grey-scale map acquisition device.

Therefore, the acquired picture information and depth map information can be mutually independent, and the subsequent establishment of the space transformation characteristic is facilitated.

In some possible implementations, the first coordinate information of each recognition target in the corresponding picture information includes four pixel coordinate information of an outline of each recognition target in the corresponding picture information.

Therefore, the rectangular region where the outer contour of the identification target is located can be obtained through the four pixel coordinate information, and the second coordinate information can be obtained through the space transformation characteristic conveniently.

In some possible embodiments, the spatial transformation characteristics are derived from a distortion correction matrix.

In some possible implementations, the second coordinate information corresponding to each recognition target includes four depth value information of an outer contour of each recognition target in a corresponding gray scale map, and the pixel coordinate information of the outer contour of each recognition target corresponds to the depth value information of the outer contour of each recognition target one by one.

In this way, the first coordinate information can be converted into the second coordinate information by the spatial transformation feature, and the feature of the identification target in the picture information can be represented in the gray scale map.

In some possible embodiments, the obtaining the dimension information of the identification target according to the second coordinate information corresponding to the identification target includes:

if the second coordinate information is larger than or equal to a preset threshold value, judging that the dimension information corresponding to the identification target is a three-dimensional target;

and if the second coordinate information is smaller than a preset threshold value, judging that the dimension information corresponding to the identification target is a two-dimensional target.

In some possible embodiments, the obtaining the first coordinate information of each recognition target in the corresponding picture information according to the color information corresponding to each recognition target is implemented by a target detection neural network, R-CNN.

In a second aspect, an embodiment of the present application further provides an image recognition apparatus, including:

the first acquisition module is used for acquiring picture information of at least one identification target, wherein each identification target corresponds to one piece of picture information, and each piece of picture information comprises color information corresponding to each identification target;

The first processing module is used for obtaining first coordinate information of each identification target in the corresponding picture information according to the color information corresponding to each identification target;

the second acquisition module is used for acquiring the depth map information of the at least one identification target, wherein each identification target corresponds to one depth map information;

The feature construction module is used for obtaining the space transformation feature corresponding to each recognition target according to the picture information corresponding to each recognition target and the depth map information corresponding to each recognition target;

the second processing module is used for obtaining second coordinate information corresponding to each recognition target according to the first coordinate information corresponding to each recognition target and the space transformation characteristic corresponding to each recognition target, and the second coordinate information acts on the depth map information;

And the judging module is used for obtaining the dimension information of the identification target according to the second coordinate information corresponding to the identification target.

In a third aspect, the embodiment of the application also provides an electronic device, which comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

The memory is configured to store executable instructions that, when executed, cause the processor to perform the test method of any of the possible implementations of the first or second aspect.

In a fourth aspect, embodiments of the present application also provide a computer-readable storage medium having stored therein executable instructions that, when executed, cause a computing device to perform the test method of any of the possible implementations of the first or second aspects.

The embodiment of the application provides a technical scheme of an image recognition method, which comprises the steps of firstly obtaining picture information of at least one recognition target, wherein each recognition target corresponds to one picture information, each picture information comprises color information corresponding to each recognition target, obtaining first coordinate information of each recognition target in the corresponding picture information according to the color information corresponding to each recognition target, obtaining depth map information of the at least one recognition target, wherein each recognition target corresponds to one depth map information, obtaining space transformation characteristics corresponding to each recognition target according to the picture information corresponding to each recognition target and the depth map information corresponding to each recognition target, obtaining second coordinate information corresponding to each recognition target according to the first coordinate information corresponding to each recognition target and the space transformation characteristics corresponding to each recognition target, and acting on the depth map information according to the second coordinate information corresponding to each recognition target. The color information of the identification target is used for obtaining the coordinates of the position of the identification target in the picture information in the process, the second space coordinates corresponding to the identification target in the gray level diagram can be obtained through calculation according to the space feature transformation between the picture information and the depth diagram information, the size of the second space coordinates is combined with the threshold value preset in the corresponding gray level diagram for comparison, and the final judging result is used as confirmation of the dimension information of the identification target. Therefore, the technical scheme of the application converts the original identification process of the 3D identification object into the identification process of the 2D picture information by utilizing the picture information of the target object and the corresponding spatial feature transformation. Therefore, on one hand, the recognition process of the target object is simpler, the application difficulty is reduced, and on the other hand, the training of recognizing the 3D recognition object by adopting the 2D picture information is realized, and the training efficiency is improved.

Drawings

FIG. 1 is a schematic flow chart of an image processing method according to an embodiment of the present application;

fig. 2 is a schematic diagram illustrating an exemplary composition of an image processing apparatus according to an embodiment of the present application;

fig. 3 is a schematic diagram of an exemplary structure of an image processing apparatus according to an embodiment of the present application.

Detailed Description

The terminology used in the following examples of the application is for the purpose of describing alternative embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include the plural forms as well. It should also be understood that, although the terms first, second, etc. may be used in the following embodiments to describe some type of object, the object is not limited to these terms. These terms are used to distinguish between specific objects of that class of objects. For example, other classes of objects that may be described in the following embodiments using the terms first, second, etc. are not described in detail herein.

Any of the electronic devices to which embodiments of the present application relate may be electronic devices such as cell phones, tablet computers, wearable devices (e.g., smartwatches, smartbracelets, etc.), notebook computers, desktop computers, and vehicle-mounted devices. The electronic device is pre-installed with a software deployment application. It will be appreciated that embodiments of the present application are not limited in any way by the particular type of electronic device.

The following is a description of several exemplary embodiments, and describes technical solutions of the embodiments of the present application and technical effects produced by the technical solutions of the present application.

In a first aspect of the present application, an image recognition method is provided, referring to fig. 1, fig. 1 illustrates a flowchart of an image processing method provided by an embodiment of the present application, including the following steps:

Optionally, the picture information is acquired by a picture acquisition device (e.g. a high definition camera), and the depth map information is acquired by a depth map (e.g. a depth camera) acquisition device.

Optionally, for the obtained depth map information, the expression form may be presented in a gray scale form, and taking a K device of a certain platform as an example, the number of the obtained depth map data information is the same as the number of pixels, and the value range may be 0-65535 frames. And according to different equipment precision of practical application, performing corresponding mapping processing on the value range to 0-1 frame to obtain a corresponding gray level map.

Optionally, the first coordinate information of each identification target in the corresponding picture information comprises four pixel coordinate information of the outline of each identification target in the corresponding picture information.

Alternatively, the four pixel coordinate information may be the abscissa information corresponding to two pixel points, for example, the coordinate (x 1, y 1) corresponding to the pixel point a, the coordinate (x 2, y 2) corresponding to the pixel point B, and the four pixel coordinate information may be the x1, y1, x2 and y2.

Optionally, the four pixel coordinate information includes x (the abscissa of the rectangular plane coordinate system where the object is located), y (the ordinate of the rectangular plane coordinate system where the object is located), width (the horizontal width corresponding to the object), and height (the vertical height corresponding to the object).

Optionally, the spatial transformation characteristic is obtained from a distortion correction matrix.

Optionally, the second coordinate information corresponding to each identification target includes four depth value information of the outer contour of each identification target in the corresponding gray scale map, and the pixel coordinate information of the outer contour of each identification target corresponds to the depth value information of the outer contour of each identification target one by one.

Optionally, the obtaining the dimension information of the identification target according to the second coordinate information corresponding to the identification target includes:

Optionally, the first coordinate information of each recognition target in the corresponding picture information is obtained according to the color information corresponding to each recognition target, and is implemented through a target detection neural network, and R-CNN.

For example, a company needs to identify a specific beverage quantity on a shelf (i.e. three-dimensional identification objects), but needs to distinguish beverage bottle pictures (i.e. two-dimensional identification objects) on a billboard placed near the shelf, so as to avoid interference during quantity comparison, i.e. realize the distinction of two-dimensional information and three-dimensional information of identification targets.

First, the image capturing device and the gray-scale image capturing device may be used to obtain color information and depth map information for the beverage bottle image on the billboard. Thus, color information and depth map information corresponding to the two-dimensional recognition target can be obtained;

Obtaining pixel coordinates (namely first coordinate information) corresponding to the outline of the beverage bottle in the billboard or four pixel vertex coordinates (also can be used as the first coordinate information) of a rectangular frame of the area of the beverage bottle in the billboard by training the color information and combining with a corresponding neural network (for example, the neural network can identify the position of the beverage bottle in the picture according to the color information);

According to the obtained picture information and depth map information, a distortion correction matrix (namely a space transformation characteristic) corresponding to the beverage bottle in the billboard is obtained, and the relation of the gray value corresponding to the identification object (namely the beverage bottle in the billboard) in the picture information pixel by pixel can be obtained through the matrix;

Further, by acquiring the depth map information of the two-dimensional recognition target (including the second coordinate information and the corresponding depth value information), a threshold value for distinguishing the dimension information of the recognition target may be obtained and set, for example, when the depth value information of the recognition target is smaller than the threshold value, the recognition target is determined to be a two-dimensional target (i.e., the dimension information of the recognition target is a two-dimensional target), and when the depth value information of the recognition target is greater than or equal to the threshold value, the recognition target is determined to be a three-dimensional target (i.e., the dimension information of the recognition target is a three-dimensional target). In the process, two-dimensional and three-dimensional distinguishing and identification are not carried out on the identification target by means of a plurality of depth image acquisition devices and in a point cloud mode, so that the distinguishing process of the two-dimensional and three-dimensional identification target is simplified. And the process can obtain a training set of the two-dimensional target, and the training set is used for training and identifying the two-dimensional target by the system.

Likewise, the three-dimensional recognition target (i.e., the beverage bottle on the shelf) is recognized, and the above process is not described in detail herein. Finally, a training set of the three-dimensional target is obtained, and the training set of the three-dimensional target is also a picture set because the picture information and the gray picture information are adopted for the three-dimensional target. Therefore, training of the three-dimensional target by using the picture set (two-dimensional training set) can be achieved, and training efficiency is improved.

The embodiment of the application provides a technical scheme of an image recognition method, which comprises the steps of firstly obtaining picture information of at least one recognition target, wherein each recognition target corresponds to one picture information, each picture information comprises color information corresponding to each recognition target, obtaining first coordinate information of each recognition target in the corresponding picture information according to the color information corresponding to each recognition target, obtaining depth map information of the at least one recognition target, wherein each recognition target corresponds to one depth map information, obtaining space transformation characteristics corresponding to each recognition target according to the picture information corresponding to each recognition target and the depth map information corresponding to each recognition target, obtaining second coordinate information corresponding to each recognition target according to the first coordinate information corresponding to each recognition target and the space transformation characteristics corresponding to each recognition target, and acting on the depth map information according to the second coordinate information corresponding to each recognition target. The color information of the identification target is used for obtaining the coordinates of the position of the identification target in the picture information in the process, the second space coordinates corresponding to the identification target in the gray level diagram can be obtained through calculation according to the space feature transformation between the picture information and the depth diagram information, the size of the second space coordinates is combined with the threshold value preset in the corresponding gray level diagram for comparison, and the final judging result is used as confirmation of the dimension information of the identification target. It can be seen that the original process of recognizing the 3D recognition object is converted into a process of recognizing the 2D picture information by using the picture information of the target object and the corresponding spatial feature transformation. Therefore, on one hand, the recognition process of the target object is simpler, the application difficulty is reduced, and on the other hand, the training of recognizing the 3D recognition object by adopting the 2D picture information is realized, and the training efficiency is improved.

The above embodiments describe each implementation of the image processing method provided by the embodiment of the present application from aspects of at least one image information of an identification target, obtaining first coordinate information, obtaining depth map information of at least one identification target, obtaining a spatial transformation feature, obtaining second coordinate information, obtaining target dimension information, and the like. It should be understood that the above-mentioned functions may be implemented in hardware or a combination of hardware and computer software in the embodiments of the present application by the processing steps of acquiring the picture information of at least one recognition target, acquiring the first coordinate information, acquiring the depth map information of at least one recognition target, acquiring the spatial transformation feature, acquiring the second coordinate information, acquiring the dimension information of the target, and so on. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

For example, if the above implementation steps are implemented by software modules, the corresponding functions are implemented. As shown in fig. 2, the image recognition apparatus may include a first acquisition module, a first processing module, a second acquisition module, a feature construction module, a second processing module, and a determination module. The image processing apparatus may be configured to perform part or all of the operations of the image processing method described above.

For example:

It can be seen that in this way,

It will be appreciated that the functions of the above modules may be integrated into a hardware entity implementation, for example, the acquisition module may be integrated into a transceiver implementation, the generation module, the selection module, and the construction module may be integrated into a processor implementation, and programs and instructions implementing the functions of the above modules may be maintained in a memory. As shown in fig. 3, there is provided an electronic device including a processor, a transceiver for performing image information and object picture acquisition in the image generation method, and a memory for storing the program/code preloaded by the aforementioned deployment apparatus, or for storing the code for execution by the processor, or the like. The processor, when executing the code stored in the memory, causes the electronic device to perform some or all of the operations of the software deployment method in the method.

The specific procedures are described in detail in the examples of the above method and are not described in detail here.

In a specific implementation, corresponding to the foregoing electronic device, the embodiment of the present application further provides a computer storage medium, where the computer storage medium provided in the electronic device may store a program, and when the program is executed, part or all of the steps in each embodiment including the foregoing software deployment method may be implemented. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (random access memory, RAM), or the like.

One or more of the above modules or units may be implemented in software, hardware, or a combination of both. When any of the above modules or units are implemented in software, the software exists in the form of computer program instructions and is stored in a memory, a processor can be used to execute the program instructions and implement the above method flows. The processor may include, but is not limited to, at least one of a central processing unit (central processing unit, CPU), microprocessor, digital Signal Processor (DSP), microcontroller (microcontroller unit, MCU), or artificial intelligence processor, etc. running various types of software computing devices, each of which may include one or more cores for executing software instructions to perform operations or processes. The processor may be built into a SoC (system on a chip) or an application-specific integrated circuit (ASIC), or may be a separate semiconductor chip. The processor may further include necessary hardware accelerators, such as field programmable gate arrays (field programmable GATE ARRAY, FPGAs), PLDs (programmable logic devices), or logic circuits implementing dedicated logic operations, in addition to the cores for executing software instructions for operation or processing.

When the above modules or units are implemented in hardware, the hardware may be any one or any combination of a CPU, microprocessor, DSP, MCU, artificial intelligence processor, ASIC, soC, FPGA, PLD, dedicated digital circuitry, hardware accelerator, or non-integrated discrete device that may run the necessary software or that is independent of the software to perform the above method flows.

Further, a bus interface may be included in FIG. 3, which may include any number of interconnected buses and bridges, with various circuits of the memory, in particular, represented by one or more of the processors and the memory. The bus interface may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides an interface. The transceiver provides a means for communicating with various other apparatus over a transmission medium. The processor is responsible for managing the bus architecture and general processing, and the memory may store data used by the processor in performing operations.

When the above modules or units are implemented in software, they may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.

It should be understood that, in various embodiments of the present application, the size of the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments.

All parts of the specification are described in a progressive manner, and all parts of the embodiments which are the same and similar to each other are referred to each other, and each embodiment is mainly described as being different from other embodiments. In particular, for apparatus and system embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the description of the method embodiments section.

While alternative embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present application in further detail, and are not to be construed as limiting the scope of the application, but are merely intended to cover any modifications, equivalents, improvements, etc. based on the teachings of the application.

Claims

1. An image recognition method, characterized in that the method comprises:

Acquire image information of at least one recognition target, wherein each recognition target corresponds to one image information, and each of the image information includes: color information corresponding to each recognition target;

Obtaining first coordinate information of each recognition target in the corresponding image information according to the color information corresponding to each recognition target;

Acquire depth map information of the at least one recognition target, wherein each recognition target corresponds to one depth map information;

Obtaining a spatial transformation feature corresponding to each recognition target according to the image information corresponding to each recognition target and the depth map information corresponding to each recognition target;

According to the first coordinate information corresponding to each identified target and the spatial transformation feature corresponding to each identified target, second coordinate information corresponding to each identified target is obtained, and the second coordinate information acts on the depth map information;

Obtaining dimension information of the identified target according to the second coordinate information corresponding to the identified target;

The second coordinate information corresponding to each identified target includes: four depth value information of the outer contour of each identified target in the corresponding grayscale image, and the pixel coordinate information of the outer contour of each identified target corresponds one-to-one to the depth value information of the outer contour of each identified target.

2. The image recognition method according to claim 1 is characterized in that the image information is obtained through an image acquisition device, and the depth map information is obtained through a depth map acquisition device.

3. The image recognition method according to claim 1 is characterized in that the first coordinate information of each recognized target in the corresponding picture information includes: four pixel coordinate information of the outer contour of each recognized target in the corresponding picture information.

4. The image recognition method according to claim 1 is characterized in that the spatial transformation features are obtained by a distortion correction matrix.

5. The image recognition method according to claim 1, characterized in that the dimension information of the recognition target is obtained according to the second coordinate information corresponding to the recognition target, and the implementation method includes:

If the depth value information is greater than or equal to a preset threshold, it is determined that the dimension information corresponding to the identified target is a three-dimensional target;

If the depth value information is less than a preset threshold, the dimension information corresponding to the identified target is determined to be a two-dimensional target.

6. The image recognition method according to claim 1 is characterized in that the first coordinate information of each identified target in the corresponding image information is obtained based on the color information corresponding to each identified target through a target detection neural network, R-CNN.

7. An image recognition device, characterized in that the device comprises:

A first acquisition module is used to acquire image information of at least one recognition target, wherein each recognition target corresponds to a piece of image information, and each piece of image information includes: color information corresponding to each recognition target;

A first processing module, used for obtaining first coordinate information of each recognition target in the corresponding image information according to the color information corresponding to each recognition target;

A second acquisition module is used to acquire the depth map information of the at least one recognition target, wherein each recognition target corresponds to a depth map information;

A feature construction module, used to obtain the spatial transformation feature corresponding to each recognition target according to the image information corresponding to each recognition target and the depth map information corresponding to each recognition target;

A second processing module, configured to obtain second coordinate information corresponding to each identified target according to the first coordinate information corresponding to each identified target and the spatial transformation feature corresponding to each identified target, wherein the second coordinate information acts on the depth map information;

A determination module is used to obtain the dimension information of the identified target based on the second coordinate information corresponding to the identified target; the second coordinate information corresponding to each identified target includes: four depth value information of the outer contour of each identified target in the corresponding grayscale image, and the pixel coordinate information of the outer contour of each identified target corresponds one-to-one to the depth value information of the outer contour of each identified target.

8. An electronic device, characterized in that the electronic device comprises: a memory and a processor, the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor executes the method according to any one of claims 1 to 6 by executing the computer instructions.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions, and the computer instructions are used to enable the computer to execute the method according to any one of claims 1 to 6.