CN114169497A

CN114169497A - Feature extraction method and device based on convolutional neural network model

Info

Publication number: CN114169497A
Application number: CN202111395422.XA
Authority: CN
Inventors: 李�昊
Original assignee: Beijing Dagou Technology Co ltd
Current assignee: Beijing Dagou Technology Co ltd
Priority date: 2021-11-23
Filing date: 2021-11-23
Publication date: 2022-03-11

Abstract

The application provides a feature extraction method and device based on a convolutional neural network model. The method comprises the following steps: receiving an image to be processed, and placing the image to be processed in a regular hexagon pixel coordinate system; extracting feature data in the image to be processed by using a convolution neural network model with a hexagonal convolution kernel and a hexagonal coordinate system, and outputting the feature data; the coordinate system in the convolutional neural network model is a regular hexagon pixel coordinate system, and each pixel point in the regular hexagon pixel coordinate system is a regular hexagon. The distances from the central pixel point to the surrounding pixel points in the hexagonal convolution kernel are the same, so that the characteristic data provided by the surrounding pixel points have the same influence on the central pixel point in the convolution calculation, the reliability of the convolution calculation result is greatly improved, the accuracy of the convolution calculation result is improved, and the calculation amount of the convolution calculation is reduced.

Description

Feature extraction method and device based on convolutional neural network model

Technical Field

The application relates to the field of computers, in particular to a feature extraction method and device based on a convolutional neural network model.

Background

The convolutional neural network model has a unique weight sharing structure, so that the network scale is reduced, and the training is easier. Convolutional neural network models are widely used in the field of image processing, such as, but not limited to: face recognition, gesture recognition, traffic sign recognition, commodity recognition, and the like.

In the convolutional neural network model, convolution operation is performed using convolutional kernel sliding, and the features of each pixel point are obtained one by sampling the pixel points and the surrounding pixel points (the pixel value of one point is replaced by the weighted average of the pixel values of the surrounding points), as shown in fig. 1.

In the conventional convolutional neural network, each pixel value in a picture is generally set as a square lattice, and the size, the number and the sliding step length of a corresponding convolutional kernel are selected based on specific requirements to extract features in the picture. However, in the convolution calculation, the calculation result is a weighted average of the surrounding points with respect to the center point, and the closer the distance, the higher the weight of the influence. Most of the conventional convolution kernels are square nine-square grid structures (such as the convolution kernels shown in fig. 2), and the weights of the pixels on the diagonal lines are relatively small, so that the influence of characteristic information provided by the pixels on convolution calculation is small, the reliability of a convolution calculation result is influenced, and the accuracy is difficult to guarantee. Therefore, it is desirable to provide a technical solution to overcome the above technical problems.

Disclosure of Invention

One of the technical problems to be solved by the present application is to provide a feature extraction method and apparatus based on a convolutional neural network model, so as to improve the reliability and accuracy of a convolutional calculation result.

According to an embodiment of the first aspect of the present application, there is provided a feature extraction method based on a convolutional neural network model, including:

receiving an image to be processed, and placing the image to be processed in a regular hexagon pixel coordinate system;

extracting feature data in the image to be processed by using a convolution neural network model with a hexagonal convolution kernel, and outputting the feature data;

the pixel point distance method comprises the following steps that a coordinate system in the convolution neural network model is a regular hexagon pixel coordinate system, each pixel point in the regular hexagon pixel coordinate system is a regular hexagon, and distances from a center pixel point in a hexagonal convolution kernel to all surrounding pixel points are the same.

According to an embodiment of the second aspect of the present application, there is provided a feature extraction apparatus based on a convolutional neural network model, including:

the receiving unit is used for receiving the image to be processed and placing the image to be processed in a regular hexagon pixel coordinate system;

the image processing unit is used for extracting the characteristic data in the image to be processed by utilizing a convolution neural network model with a hexagonal convolution kernel and outputting the characteristic data;

the coordinate system in the convolutional neural network model is the regular hexagon pixel coordinate system, each pixel point in the regular hexagon pixel coordinate system is a regular hexagon, and the distances from the central pixel point to the surrounding pixel points in the hexagonal convolutional kernel are the same.

According to an embodiment of the third aspect of the present application, there is provided a face recognition method, including:

receiving an image to be processed containing an object to be identified, and placing the image to be processed in a regular hexagon pixel coordinate system;

extracting face feature data in the image to be processed by using a face recognition model with a hexagonal convolution kernel;

identifying according to the face feature data to obtain a face identification result matched with the object to be identified;

According to an embodiment of the fourth aspect of the present application, there is provided a gesture recognition method including:

extracting attitude characteristic data in the image to be processed by utilizing an attitude identification model with a hexagonal convolution kernel;

recognizing according to the posture characteristic data to obtain a posture type matched with the object to be recognized;

According to an embodiment of the fifth aspect of the present application, there is provided a road condition identification method, including:

receiving an image to be processed, and placing the image to be processed in a regular hexagon pixel coordinate system, wherein the image to be processed comprises a road image and an environment image;

extracting road condition characteristic data in the image to be processed by using a road condition identification model with a hexagonal convolution kernel;

identifying according to the road condition characteristic data to obtain a road condition identification result;

According to an embodiment of the sixth aspect of the present application, there is provided an electronic device, which includes a processor and a memory, wherein the memory stores executable codes, and when the executable codes are executed by the processor, the processor is enabled to implement at least the feature extraction method based on the convolutional neural network model in the first aspect.

According to an embodiment of the seventh aspect of the present application, there is provided a computer-readable storage medium, wherein instructions, when executed by an electronic device, enable the electronic device to execute a feature extraction method based on a convolutional neural network model, at least implementing the first aspect.

According to an eighth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the convolutional neural network model-based feature extraction method in the first aspect.

In the embodiment of the application, for the image to be processed, the convolutional neural network model with the hexagonal convolutional kernel is utilized to extract the feature data in the image to be processed, and the feature data is output. Because the distances from the central pixel point to the surrounding pixel points in the hexagonal convolution kernel are the same, the influence of the characteristic data provided by the surrounding pixel points on the central pixel point in the convolution calculation is also the same, so that the problem of low reliability of the convolution calculation result caused by the fact that the characteristics of the surrounding pixel points cannot be accurately acquired is effectively solved, the reliability of the convolution calculation result is greatly improved, and the accuracy of the convolution calculation result is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

fig. 1 is a schematic diagram of a convolutional neural network model in the related art.

Fig. 2 is a schematic diagram of the principle of a convolution kernel in the related art.

Fig. 3 is a flowchart illustrating a feature extraction method based on a convolutional neural network model according to an embodiment of the present application.

Fig. 4 to 7 are schematic diagrams of the principle of a hexagonal convolution kernel according to an embodiment of the present application.

Fig. 8 is a schematic diagram of the principle of a convolutional neural network model in the related art.

FIG. 9 is a schematic diagram of a convolution kernel transformation process according to one embodiment of the present application.

FIG. 10 is a schematic diagram of a pooling layer according to one embodiment of the present application.

FIG. 11 is a schematic diagram of an active layer according to one embodiment of the present application.

FIG. 12 is a schematic diagram of the convolution calculation principle according to one embodiment of the present application.

FIG. 13 is a schematic diagram of the convolution calculation principle according to another embodiment of the present application.

Fig. 14 is a schematic structural diagram of a feature extraction device based on a convolutional neural network model according to an embodiment of the present application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

The computer equipment comprises user equipment and network equipment. Wherein the user equipment includes but is not limited to computers, smart phones, PDAs, etc.; the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of computers or network servers, wherein Cloud Computing is one of distributed Computing, a super virtual computer consisting of a collection of loosely coupled computers. The computer equipment can be independently operated to realize the application, and can also be accessed into a network to realize the application through the interactive operation with other computer equipment in the network. The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.

It should be noted that the user equipment, the network device, the network, etc. are only examples, and other existing or future computer devices or networks may also be included in the scope of the present application, if applicable, and are included by reference.

The methods discussed below, some of which are illustrated by flow diagrams, may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine or computer readable medium such as a storage medium. The processor(s) may perform the necessary tasks.

Specific structural and functional details disclosed herein are merely representative and are provided for purposes of describing example embodiments of the present application. This application may, however, be embodied in many alternate forms and should not be construed as limited to only the embodiments set forth herein.

It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element may be termed a second element, and, similarly, a second element may be termed a first element, without departing from the scope of example embodiments. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being "directly connected" or "directly coupled" to another element, there are no intervening elements present. Other words used to describe the relationship between elements (e.g., "between" versus "directly between", "adjacent" versus "directly adjacent to", etc.) should be interpreted in a similar manner.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

The convolutional neural network model has a unique weight sharing structure, so that the network scale is reduced, and the training is easier. Convolutional neural network models are widely used in the field of image processing, such as, but not limited to: face image processing, gesture recognition, traffic sign recognition, voice recognition, and the like.

In the conventional convolutional neural network, each pixel value in a picture is generally set as a square lattice, and the size, the number and the sliding step length of a corresponding convolutional kernel are selected based on specific requirements to extract features in the picture. In the convolution calculation, the calculation result is a weighted average of the surrounding points with respect to the center point, and the closer the distance, the higher the weight of the influence.

Specifically, since all points in the real world are continuous points, each point has correlation with surrounding points, and points closer to each other have a larger influence on the point. Therefore, in the convolution calculation, the calculation result is a weighted average of the surrounding points with respect to the center point, and the closer the distance, the higher the weight of the influence thereof. Taking the pixel points shown in fig. 2 as an example, it is assumed that the pixel points participating in the convolution calculation include: the display device comprises a central pixel point (a central point for short), pixel points in four directions of the upper, lower, left and right of the central point and four pixel points on diagonal lines. In fig. 2, the distance from the point a to the center point is smaller than the distance from the point B to the center point. That is, the four pixel points in the diagonal direction have much smaller influence on the center point than the four pixel points in the up, down, left, and right directions. During convolution calculation, the weights of four pixel points on the diagonal line are relatively small, so that the characteristic information provided by the four points has less influence on convolution settlement.

In summary, the feature information quantity of the peripheral points collected for the central point in the conventional convolutional neural network is less, so that the reliability of the convolutional calculation result is influenced, and the accuracy is difficult to guarantee.

In view of at least one of the above technical problems, the present application provides a feature extraction method and apparatus based on a convolutional neural network model, so as to improve the reliability and accuracy of a convolutional calculation result.

The core principle of the technical scheme is as follows: and for the image to be processed, extracting the characteristic data in the image to be processed by utilizing a convolution neural network model with a hexagonal convolution kernel, and outputting the characteristic data. The coordinate system in the convolutional neural network model is a regular hexagon pixel coordinate system, and each pixel point in the regular hexagon pixel coordinate system is a regular hexagon. Because the distances from the central pixel point to the surrounding pixel points in the hexagonal convolution kernel are the same, the influence of the characteristic data provided by the surrounding pixel points on the central pixel point in the convolution calculation is also the same, so that the problem of low reliability of the convolution calculation result caused by the fact that the characteristics of the surrounding pixel points cannot be accurately acquired is effectively solved, the reliability of the convolution calculation result is greatly improved, and the accuracy of the convolution calculation result is improved.

Based on the core principle, the technical scheme provided by the disclosure can be applied to various image processing scenes. The image to be processed includes, but is not limited to, a picture and a video. The various image processing scenes include, for example, an advertisement material production scene, a post-production scene of a movie or television work, and a short video production scene.

The image color matching scheme provided by the present disclosure may be executed by an electronic device, which may be a terminal device such as a smartphone, a tablet, a PC, a notebook, etc. In an optional embodiment, the electronic device may also be implemented as a service device, for example, a server cluster, a cloud server, and the like, for executing the technical solution provided in the present application.

After the core principle, the application scenario, and the execution device are introduced, a method, an apparatus, a device, and a medium provided by the present disclosure are described below with reference to specific embodiments.

Fig. 3 is a flowchart illustrating a feature extraction method based on a convolutional neural network model according to an exemplary embodiment. It will be appreciated that the method may be used in a variety of image processing scenarios employing convolutional neural network models. As shown in fig. 3, the method comprises the steps of:

in 301, an image to be processed is received and placed in a regular hexagonal pixel coordinate system.

For example, the image to be processed may be one or more pictures, a video, or a video stream. In fact, the image to be processed may be acquired according to a specific scene and input into the neural network model, and the relevant parameters and the specific acquisition mode of the image to be processed are not limited in this application.

Further, in 301, after receiving the image to be processed, the image to be processed is placed in the regular hexagonal pixel coordinate system. Specifically, in the process of extracting the image to be processed, the image to be processed is input into the coordinate system of the regular hexagonal pixels.

In the embodiment of the present application, the coordinate system in the convolutional neural network model is a regular hexagon pixel coordinate system, each pixel point in the regular hexagon pixel coordinate system is a regular hexagon, and distances from a central pixel point to surrounding pixel points in a hexagonal convolutional kernel are the same.

Further, in 302, feature data in the image to be processed is extracted using a convolutional neural network model having a hexagonal convolutional kernel, and the feature data is output.

It can be understood that the essence of the present application is that a hexagonal convolution kernel is used to replace a conventional square convolution kernel (such as a convolution kernel with a squared figure structure) to execute various computation flows related to the convolution kernel in a convolutional neural network, so as to retain more characteristic information, improve the credibility and accuracy of convolution computation, and at the same time, reduce the computation complexity of the convolution kernel and improve the computation efficiency.

Since the real world is substantially continuous and there is no actual boundary between objects, the shapes that can fill the entire plane and satisfy symmetry are: triangular, square, regular hexagonal. Based on this, in consideration of the defects of the square structure described above, in practical application, a regular hexagon can be used as a basic structure of the pixel point and the convolution kernel of the image to be processed. The convolution kernel of a regular hexagonal structure is referred to herein as a hexagonal convolution kernel.

In the embodiment of the application, the distances from the central pixel point to all the surrounding pixel points in the hexagonal convolution kernel are the same. Specifically, in the case of using a regular hexagon as a pixel structure for forming an image, assuming that a central pixel remains unchanged, a convolution kernel is formed by a central pixel and six pixels having the same distance to the central pixel, as shown in fig. 4.

It can be understood that fig. 5 provides five pixel point arrangement structures of the image to be processed. In fig. 5, it is assumed that the to-be-processed image with square pixel points is the pixel arrangement structure a in fig. 5, and accordingly, when a regular hexagon is used as the pixel point structure of the to-be-processed image, the arrangement structure of the pixel points in the to-be-processed image can be as shown in the pixel arrangement structure b-e illustrated in fig. 5.

The pixel arrangement structure in the regular hexagon pixel coordinate system comprises any one or more of the following structures: the device comprises a left side transverse arrangement structure, a right side transverse arrangement structure, a first longitudinal arrangement structure and a second longitudinal arrangement structure.

For example, in fig. 5, the left-side horizontal arrangement is shown as pixel arrangement b, the right-side horizontal arrangement is shown as pixel arrangement c, the first vertical arrangement is shown as pixel arrangement d, and the second vertical arrangement is shown as pixel arrangement e.

And the weights of all the surrounding pixel points in the hexagonal convolution kernel are the same relative to the central pixel point. Therefore, the image characteristics obtained by performing convolution calculation by using the convolution kernel have higher credibility and accuracy.

In addition, in the convolution calculation process, the hexagonal convolution kernel needs to calculate 7 pixel points, and the square convolution kernel needs to calculate at least 9 (namely 3 × 3) pixel points, so that the complexity of the convolution kernel can be greatly reduced through the hexagonal convolution kernel, and the calculation efficiency of the convolution neural network is improved.

Optionally, before extracting the feature data in the image to be processed in 302, row pixels and/or column pixels with a hexagonal structure are filled around edge pixel points of the image to be processed. In short, a preset number of filling layers (pad) are added to the outer layer of the image to be processed, so that the image with the initial size is obtained after the convolution calculation.

For example, it is assumed that the pixel structure of the image to be processed is as shown in the pixel arrangement structure b in fig. 5. One pad may be added to the outer layer of the image to be processed, that is, 0 may be filled outside the pixel arrangement structure b in fig. 5 as shown in fig. 6. Of course, in practical applications, the convolution kernel of each row may also be replaced, as shown in fig. 7.

It should be noted that the hexagonal convolution kernel referred to in the present application may be obtained by transforming an existing 3 × 3 square convolution kernel. A specific procedure may be to fill the extra two spaces on the sides of the 3 × 3 square convolution kernel with 0 to obtain a regular hexagon. The specific transformation process is shown in fig. 9. The calculation of the hexagonal convolution kernel can be realized on the basis of the existing convolution kernel through the transformation mode, so that the reconstruction difficulty of the existing neural network is reduced.

In the embodiment of the present application, optionally, in any convolutional layer using the hexagonal convolutional kernel, the output size of the convolutional layer is one third of the input size.

Specifically, adjacent pixel points in the image to be processed generally tend to have similar values, and therefore, output pixel points adjacent to the convolution layer generally also have similar values. This means that most of the information contained in the convolutional layer output is redundant. For this reason, the effective feature value of the image to be processed can be extracted through a pooling layer (pooling), and when a conventional convolution kernel is used, the parameters W and H become 1/2W and 1/2H respectively after each pooling, as shown in fig. 8.

When the hexagonal structure is used as the pixel structure and the convolution kernel structure in the image to be processed, three hexagonal pixel points are changed into one hexagonal pixel point as shown in fig. 10 after pooling, and thus the parameter obtained after pooling is 1/3 of the original parameter as shown in fig. 11.

In practical applications, in the embodiment of the present application, up-sampling (upsampling) needs to be performed when the to-be-processed image is enlarged from the feature value to a specified resolution. Taking a picture to be processed with a size (544, 3) as an example, a feature value (17,17,16) can be obtained after a series of convolution pooling processes as described above. In this case, in order to compare the feature value with the original to-be-processed picture, the feature value needs to be updated. In this case, one pixel point can be added to the pixel points of the regular hexagonal structure according to the information of the surrounding three pixel points. This upsampling process is seen as opposed to pooling.

In addition, in the conventional convolutional neural network, after the image to be processed passes through the convolutional layer, the activation layer and the pooling layer, the width and the height become one half of the original image. In the convolutional neural network with the hexagonal convolutional kernel, after an image to be processed passes through a convolutional layer, an activation layer and a pooling layer, the width of the image to be processed is two thirds of the height of the original image, and the height of the image to be processed is one half of the width of the original image. The up-sampling corresponds to a width twice the height of the original image and a height three-halves the width of the original image. The above trend is shown in fig. 12.

In practical applications, the active layer may be implemented by using a Rectified Linear Unit (ReLU). The ReLU is also called a modified linear unit, and is an Activation Function (Activation Function) commonly used in a neural network, and generally refers to a nonlinear Function represented by a ramp Function and a variant thereof.

In practical applications, according to the hexagonal pixel arrangement structure described above, four-directional coordinate systems corresponding to the arrangement structure shown as the pixel arrangement structure b-e in fig. 5 can be obtained. According to actual requirements, a corresponding coordinate system can be selected for convolution calculation.

For example, in the actual processing process, one pixel point may be added according to the feature information of three surrounding pixel points, so that the middle processing result diagram in the middle of fig. 13 is obtained from the leftmost original drawing shown in fig. 13. Further, the intermediate processing result diagram is rotated by 90 ° according to the coordinate system corresponding to the original, and the single row point is moved backward, thereby obtaining the most summarized result diagram (i.e., the diagram located at the rightmost position in fig. 13).

By the feature extraction method provided by fig. 3, the distances from the central pixel point to the surrounding pixel points in the hexagonal convolution kernel are the same, so that the feature data provided by the surrounding pixel points have the same influence on the central pixel point in the convolution calculation, the problem of low reliability of the convolution calculation result due to the fact that the characteristic information amount of the surrounding points acquired by the central point is small is effectively solved, the reliability of the convolution calculation result is greatly improved, and the accuracy of the convolution calculation result is improved.

In the embodiment of the present application, the feature data obtained by the feature extraction method described above is applied to any one of the following image processing scenes: face image processing, attitude image processing, road condition image processing, commodity image processing and scene image processing.

Of course, these scenarios are merely examples, and in fact, this feature extraction method may also be applied to other scenarios, which are not limited herein. Three feature extraction methods are described below by taking three scenes as examples.

In an optional embodiment, a face recognition method is further provided, where the method includes:

receiving an image to be processed containing an object to be identified, and placing the image to be processed in a regular hexagon pixel coordinate system; extracting face feature data in the image to be processed by using a face recognition model with a hexagonal convolution kernel; and identifying according to the face feature data to obtain a face identification result matched with the object to be identified.

In the method, a coordinate system in the convolutional neural network model is a regular hexagon pixel coordinate system, each pixel point in the regular hexagon pixel coordinate system is a regular hexagon, and distances from a central pixel point to all surrounding pixel points in a hexagonal convolutional kernel are the same.

Here, the face recognition model having a hexagonal convolution kernel is obtained by training based on a convolution neural network model having a hexagonal convolution kernel according to the above method, and an output result of the face recognition model is a category of the recognized face.

The face recognition model can be used for recognizing a plurality of input images to be processed, so that the faces in the images to be processed are recognized and classified.

In another alternative embodiment, a gesture recognition method is further provided, and the method includes:

receiving an image to be processed containing an object to be identified, and placing the image to be processed in a regular hexagon pixel coordinate system; extracting attitude characteristic data in the image to be processed by utilizing an attitude identification model with a hexagonal convolution kernel; and identifying according to the attitude characteristic data to obtain the attitude type matched with the object to be identified.

Here, the posture recognition model having the hexagonal convolution kernel is trained based on the convolution neural network model having the hexagonal convolution kernel according to the above method, and the output result of the posture recognition model is the category of the recognized posture.

The gesture recognition model can be used for recognizing a plurality of input images to be processed, so that gestures in the images to be processed are recognized and classified.

In another optional embodiment, a traffic identification method is further provided, where the method includes:

receiving an image to be processed, and placing the image to be processed in a regular hexagon pixel coordinate system, wherein the image to be processed comprises a road image and an environment image; extracting road condition characteristic data in the image to be processed by using a road condition identification model with a hexagonal convolution kernel; and identifying according to the road condition characteristic data to obtain a road condition identification result.

The coordinate system in the convolutional neural network model is a regular hexagon pixel coordinate system, each pixel point in the regular hexagon pixel coordinate system is a regular hexagon, and the distances from a central pixel point in the hexagonal convolution kernel to all surrounding pixel points are the same.

Here, the road condition identification model with a hexagonal convolution kernel is obtained by training based on a convolution neural network model with a hexagonal convolution kernel according to the above method, and an output result of the road condition identification model is the identified road condition category.

The road condition identification model can identify a plurality of input images to be processed, so that the road conditions in the images to be processed are identified and classified. For example, the current road condition is in a congested state, or in an idle state.

The embodiment of the present application further provides a feature extraction device based on a convolutional neural network model, which is corresponding to the above feature extraction method based on a convolutional neural network model, and performs image processing by using a convolutional neural network model based on a hexagonal convolutional kernel. For the description of the convolutional neural network model with hexagonal convolutional kernel, reference may be made to the foregoing embodiments, and details are not repeated here. As shown in fig. 14, which is a schematic structural diagram of the apparatus, the apparatus mainly includes:

and the receiving unit 140 is configured to receive the image to be processed and place the image to be processed in the regular hexagonal pixel coordinate system. The receiving unit 140 receives an image to be processed, which may be a picture or a video, for example, and needs to be processed.

And the image processing unit 141 is configured to extract feature data in the image to be processed by using a convolutional neural network model with a hexagonal convolutional kernel, and output the feature data.

The coordinate system in the convolutional neural network model is a regular hexagon pixel coordinate system, each pixel point in the regular hexagon pixel coordinate system is a regular hexagon, and the distances from a central pixel point in a hexagonal convolutional kernel to all surrounding pixel points are the same.

In the embodiment of the application, the image to be processed is received and is placed in a regular hexagonal pixel coordinate system. And then, extracting the characteristic data in the image to be processed by using a convolution neural network model with a hexagonal convolution kernel, and outputting the characteristic data. The coordinate system in the convolutional neural network model is a regular hexagon pixel coordinate system, and each pixel point in the regular hexagon pixel coordinate system is a regular hexagon. Because the distances from the central pixel point to the surrounding pixel points in the hexagonal convolution kernel are the same, the influence of the characteristic data provided by the surrounding pixel points on the central pixel point in the convolution calculation is also the same, so that the problem of low reliability of the convolution calculation result caused by the fact that the characteristics of the surrounding pixel points cannot be accurately acquired is effectively solved, the reliability of the convolution calculation result is greatly improved, and the accuracy of the convolution calculation result is improved.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A feature extraction method based on a convolutional neural network model is characterized by comprising the following steps:

2. The method of claim 1, wherein surrounding pixels in the hexagonal convolution kernel have the same weight relative to a center pixel.

3. The method of claim 1, wherein, in any convolutional layer that utilizes the hexagonal convolutional kernel, the output size of the convolutional layer is one-third of the input size.

4. The method of claim 1, wherein the placing the image to be processed in a regular hexagonal pixel coordinate system comprises:

and in the process of extracting the image to be processed, inputting the image to be processed into a coordinate system of the regular hexagon pixels.

5. The method of claim 1, wherein the pixel arrangement structure in the regular hexagonal pixel coordinate system comprises any one or more of:

the device comprises a left side transverse arrangement structure, a right side transverse arrangement structure, a first longitudinal arrangement structure and a second longitudinal arrangement structure.

6. The method of claim 1, wherein prior to extracting the feature data in the image to be processed, further comprising:

and filling row pixels and/or column pixels with hexagonal structures at the periphery of the edge pixel points of the image to be processed.

7. The method of claim 1, wherein the feature data is applied to any one of the following image processing scenarios:

face image processing, attitude image processing, road condition image processing, commodity image processing and scene image processing.

8. A feature extraction device based on a convolutional neural network model is characterized by comprising:

the receiving unit is used for receiving an image to be processed and placing the image to be processed in a regular hexagon pixel coordinate system;

9. A face recognition method, comprising:

10. A gesture recognition method, comprising:

11. A road condition identification method is characterized by comprising the following steps:

12. An electronic device, comprising: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform a convolutional neural network model-based feature extraction method as defined in any one of claims 1 to 7.