Disclosure of Invention
The embodiment of the application aims to provide a three-dimensional face reconstruction method, device, computer equipment and storage medium based on artificial intelligence, so as to solve the technical problems that the existing model-free three-dimensional face reconstruction method can lose the properties of a face to a certain extent, cannot meet the requirements of high-precision and high-fidelity three-dimensional face reconstruction, and causes lower precision of three-dimensional face reconstruction.
In order to solve the technical problems, the embodiment of the application provides a face three-dimensional reconstruction method based on artificial intelligence, which adopts the following technical scheme:
Acquiring pre-constructed two-dimensional face image data;
Extracting a face region of interest from the two-dimensional face image data to obtain a corresponding face image, detecting face key points of the face image to obtain corresponding two-dimensional key points, and carrying out face segmentation on the face image to obtain a corresponding face mask image;
information extraction is carried out on the face image based on a convolutional neural network in a preset initial reconstruction model to obtain corresponding model parameters, and the model parameters are processed based on a processing network in the initial reconstruction model to obtain a corresponding three-dimensional face model;
generating a corresponding first loss based on the three-dimensional face model and the face image;
Generating a corresponding second loss based on the three-dimensional face model, the two-dimensional key points and the face mask image;
generating a composite loss based on the first loss and the second loss;
optimizing the initial reconstruction model based on the comprehensive loss to obtain a corresponding target reconstruction model;
and carrying out three-dimensional reconstruction processing on the input face image to be processed based on the target reconstruction model to obtain a corresponding target three-dimensional face model.
Further, the step of generating a corresponding first loss based on the three-dimensional face model and the face image specifically includes:
Acquiring a three-dimensional scanning face model corresponding to the face image;
Acquiring first vertex information of the three-dimensional scanning face model;
Acquiring second vertex information of the three-dimensional face model;
calculating a first Euclidean distance between the first vertex information and the second vertex information;
and taking the first Euclidean distance as the first loss.
Further, the step of generating the corresponding second loss based on the three-dimensional face model, the two-dimensional key points and the face mask image specifically includes:
Performing projection processing on the three-dimensional face model to obtain a corresponding target two-dimensional key point;
Rendering the three-dimensional face model to obtain a corresponding two-dimensional face image;
Calculating a second Euclidean distance between the target two-dimensional key point and the two-dimensional key point, and taking the second Euclidean distance as a key point loss;
Calculating pixel loss between the two-dimensional face image and the face mask image;
Calculating the similarity loss between the two-dimensional face image and the face mask image;
The second loss is generated based on the keypoint loss, the pixel loss, and the similarity loss.
Further, the step of generating a comprehensive loss based on the first loss and the second loss specifically includes:
Acquiring a first weight, a second weight, a third weight and a fourth weight which respectively correspond to the first loss, the key point loss, the pixel loss and the similarity loss;
acquiring a preset loss calculation formula;
Calculating the first loss, the key point loss, the pixel loss, the similarity loss, the first weight, the second weight, the third weight and the fourth weight based on the loss calculation formula to obtain a corresponding calculation result;
And taking the calculation result as the comprehensive loss.
Further, the step of obtaining the pre-constructed two-dimensional face image data specifically includes:
acquiring initial two-dimensional face image data acquired in advance;
performing data cleaning processing on the initial two-dimensional face image data to obtain corresponding first face image data;
performing data clipping processing on the first face image data to obtain corresponding second face image data;
normalizing the second face image data to obtain corresponding third face image data;
and taking the third face image data as the two-dimensional face image data.
Further, after the step of optimizing the initial reconstruction model based on the comprehensive loss to obtain a corresponding target reconstruction model, the method further includes:
constructing verification data based on the face image data;
performing performance verification on the target reconstruction model based on the verification data to obtain performance index data of the target reconstruction model;
Performing data analysis on the performance index data to generate a performance evaluation result corresponding to the target reconstruction model;
and carrying out corresponding model adjustment processing on the target reconstruction model based on the performance evaluation result.
Further, after the step of performing three-dimensional reconstruction processing on the input face image to be processed based on the target reconstruction model to obtain a corresponding target three-dimensional face model, the method further includes:
performing smoothing treatment on the target three-dimensional face model to obtain a corresponding first target three-dimensional face model;
Performing texture mapping processing on the first target three-dimensional face model to obtain a corresponding second target three-dimensional face model;
And storing the second target three-dimensional face model.
In order to solve the technical problems, the embodiment of the application also provides a face three-dimensional reconstruction device based on artificial intelligence, which adopts the following technical scheme:
the acquisition module is used for acquiring the pre-constructed two-dimensional face image data;
The first processing module is used for extracting a face region of interest from the two-dimensional face image data to obtain a corresponding face image, detecting face key points of the face image to obtain corresponding two-dimensional key points, and carrying out face segmentation on the face image to obtain a corresponding face mask image;
the second processing module is used for extracting information from the face image based on a convolutional neural network in a preset initial reconstruction model to obtain corresponding model parameters, and processing the model parameters based on a processing network in the initial reconstruction model to obtain a corresponding three-dimensional face model;
the first generation module is used for generating a corresponding first loss based on the three-dimensional face model and the face image;
The second generation module is used for generating a corresponding second loss based on the three-dimensional face model, the two-dimensional key points and the face mask image;
a third generation module for generating a composite loss based on the first loss and the second loss;
The optimization module is used for carrying out optimization treatment on the initial reconstruction model based on the comprehensive loss to obtain a corresponding target reconstruction model;
and the reconstruction module is used for carrying out three-dimensional reconstruction processing on the input face image to be processed based on the target reconstruction model to obtain a corresponding target three-dimensional face model.
In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:
Acquiring pre-constructed two-dimensional face image data;
Extracting a face region of interest from the two-dimensional face image data to obtain a corresponding face image, detecting face key points of the face image to obtain corresponding two-dimensional key points, and carrying out face segmentation on the face image to obtain a corresponding face mask image;
information extraction is carried out on the face image based on a convolutional neural network in a preset initial reconstruction model to obtain corresponding model parameters, and the model parameters are processed based on a processing network in the initial reconstruction model to obtain a corresponding three-dimensional face model;
generating a corresponding first loss based on the three-dimensional face model and the face image;
Generating a corresponding second loss based on the three-dimensional face model, the two-dimensional key points and the face mask image;
generating a composite loss based on the first loss and the second loss;
optimizing the initial reconstruction model based on the comprehensive loss to obtain a corresponding target reconstruction model;
and carrying out three-dimensional reconstruction processing on the input face image to be processed based on the target reconstruction model to obtain a corresponding target three-dimensional face model.
In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:
Acquiring pre-constructed two-dimensional face image data;
Extracting a face region of interest from the two-dimensional face image data to obtain a corresponding face image, detecting face key points of the face image to obtain corresponding two-dimensional key points, and carrying out face segmentation on the face image to obtain a corresponding face mask image;
information extraction is carried out on the face image based on a convolutional neural network in a preset initial reconstruction model to obtain corresponding model parameters, and the model parameters are processed based on a processing network in the initial reconstruction model to obtain a corresponding three-dimensional face model;
generating a corresponding first loss based on the three-dimensional face model and the face image;
Generating a corresponding second loss based on the three-dimensional face model, the two-dimensional key points and the face mask image;
generating a composite loss based on the first loss and the second loss;
optimizing the initial reconstruction model based on the comprehensive loss to obtain a corresponding target reconstruction model;
and carrying out three-dimensional reconstruction processing on the input face image to be processed based on the target reconstruction model to obtain a corresponding target three-dimensional face model.
Compared with the prior art, the embodiment of the application has the following main beneficial effects:
The method comprises the steps of firstly obtaining pre-built two-dimensional face image data, then extracting a face region of interest from the two-dimensional face image data to obtain a corresponding face image, detecting face key points of the face image to obtain a corresponding two-dimensional key point, carrying out face segmentation on the face image to obtain a corresponding face mask image, then extracting information from the face image based on a convolution neural network in a preset initial reconstruction model to obtain corresponding model parameters, processing the model parameters based on a processing network in the initial reconstruction model to obtain a corresponding three-dimensional face model, subsequently generating a corresponding first loss based on the three-dimensional face model and the face image, generating a corresponding second loss based on the three-dimensional face model, the two-dimensional key points and the face mask image, further generating a comprehensive loss based on the first loss and the second loss, further optimizing the initial reconstruction model based on the comprehensive loss to obtain a corresponding target reconstruction model, and finally carrying out three-dimensional reconstruction on the target to be processed to obtain the target reconstructed model. According to the application, the target reconstruction model is constructed by combining the first loss of the two-dimensional layer and the second loss of the three-dimensional layer for mixed supervision training, so that the target reconstruction model can be better guided to learn, and the reconstruction precision of the target reconstruction model is improved.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs, the terms used in the description herein are used for the purpose of describing particular embodiments only and are not intended to limit the application, and the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the above description of the drawings are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Mov i ng P i cture Experts G roup Aud i o Layer I I I, dynamic video expert compression standard audio plane 3), MP4 (Mov i ng P i ctu re Experts G roup Aud i o Layer I V, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the three-dimensional face reconstruction method based on artificial intelligence provided by the embodiment of the application is generally executed by a server/terminal device, and correspondingly, the three-dimensional face reconstruction device based on artificial intelligence is generally arranged in the server/terminal device.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow chart of one embodiment of an artificial intelligence based face three-dimensional reconstruction method in accordance with the present application is shown. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs. The artificial intelligence-based face three-dimensional reconstruction method provided by the embodiment of the application can be applied to any scene needing face three-dimensional reconstruction, and can be applied to products of the scenes, such as face three-dimensional reconstruction processing in financial application, and comprises the following steps:
step S201, two-dimensional face image data constructed in advance is acquired.
In this embodiment, the electronic device (for example, the server/terminal device shown in fig. 1) on which the artificial intelligence-based face three-dimensional reconstruction method operates may acquire two-dimensional face image data through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connections, wifi connections, bluetooth connections, wimax connections, Z i gbee connections, UWB (u l t r a W i deband) connections, and other now known or later developed wireless connection means. The above specific implementation process of acquiring the pre-constructed two-dimensional face image data will be described in further detail in the following specific embodiments, which will not be described herein.
Step S202, extracting a face region of interest from the two-dimensional face image data to obtain a corresponding face image, detecting face key points from the face image to obtain corresponding two-dimensional key points, and carrying out face segmentation on the face image to obtain a corresponding face mask image.
In this embodiment, the face region of interest is extracted from the two-dimensional face image data to detect the face position in the two-dimensional face image data and obtain a corresponding face image. In addition, the face key points of the face image are detected to obtain two-dimensional key points of the target number. The target number is not particularly limited, and for example, 68 may be selected. And the resulting two-dimensional keypoints are used to construct the keypoint loss of the 2D portion. In addition, the face image is subjected to face segmentation to obtain a corresponding face mask image, and the face mask image does not contain ears and necks. The obtained face mask image is used for subsequently constructing pixel loss and similarity loss of the 2D part.
And step S203, carrying out information extraction on the face image based on a convolution neural network in a preset initial reconstruction model to obtain corresponding model parameters, and processing the model parameters based on a processing network in the initial reconstruction model to obtain a corresponding three-dimensional face model.
In this embodiment, the initial reconstruction model at least includes a convolutional neural network and a processing network. The input of the convolutional neural network is a face image extracted through a face region of interest, the face image is subjected to characteristic information extraction through the convolutional neural network, and corresponding model parameters are output, wherein the model parameters comprise deformation parameters for controlling deformation of a three-dimensional deformable model (3DMorphab l e Mode l,3DMM), camera parameters required by subsequent rendering, and texture parameters required by texture generation. The input of the processing network is the model parameter output by the convolutional neural network, and a three-dimensional face model corresponding to the face image is generated. The generated three-dimensional face model is used for calculating Euclidean distance between vertexes with the three-dimensional scanned face model obtained by scanning characters in the input face image, so that loss of the 3D part is obtained.
Step S204, generating a corresponding first loss based on the three-dimensional face model and the face image.
In this embodiment, the foregoing specific implementation process of generating the corresponding first loss based on the three-dimensional face model and the face image will be described in further detail in the following specific embodiments, which will not be described herein.
Step S205, generating a corresponding second loss based on the three-dimensional face model, the two-dimensional key points and the face mask image.
In this embodiment, the foregoing specific implementation process of generating the corresponding second loss based on the three-dimensional face model, the two-dimensional key points, and the face mask image will be described in further detail in the following specific embodiments, which will not be described herein.
Step S206, generating a comprehensive loss based on the first loss and the second loss.
In this embodiment, the foregoing implementation process of generating the integrated loss based on the first loss and the second loss will be described in further detail in the following embodiments, which will not be described herein.
And step S207, optimizing the initial reconstruction model based on the comprehensive loss to obtain a corresponding target reconstruction model.
In the embodiment, the training process of the initial reconstruction model comprises forward propagation, namely inputting a face image into the initial reconstruction model, and calculating the predicted three-dimensional face model parameters through the forward propagation of the initial reconstruction model. And generating a three-dimensional face model according to the prediction parameters, and calculating loss function values of the 2D layer and the 3D layer. The back propagation and optimization comprises the steps of calculating gradient information of a loss function on model parameters by using a back propagation algorithm, updating the model parameters according to the gradient information to minimize the loss function value, and repeating the forward propagation and back propagation processes until a preset training round is reached or other stop conditions are met, so that the training and optimization process of an initial reconstruction model is completed, and the target reconstruction model is obtained.
And step S208, carrying out three-dimensional reconstruction processing on the input face image to be processed based on the target reconstruction model to obtain a corresponding target three-dimensional face model.
In this embodiment, the face image to be processed is a two-dimensional image. And inputting the face image to be processed into the target reconstruction model, performing three-dimensional reconstruction processing on the face image to be processed through the target reconstruction model, and outputting a target three-dimensional face model corresponding to the face image to be processed.
The method comprises the steps of firstly obtaining pre-built two-dimensional face image data, then extracting a face region of interest from the two-dimensional face image data to obtain a corresponding face image, detecting face key points of the face image to obtain a corresponding two-dimensional key point, carrying out face segmentation on the face image to obtain a corresponding face mask image, then extracting information from the face image based on a convolution neural network in a preset initial reconstruction model to obtain corresponding model parameters, processing the model parameters based on a processing network in the initial reconstruction model to obtain a corresponding three-dimensional face model, subsequently generating a corresponding first loss based on the three-dimensional face model and the face image, generating a corresponding second loss based on the three-dimensional face model, the two-dimensional key points and the face mask image, further generating a comprehensive loss based on the first loss and the second loss, further optimizing the initial reconstruction model based on the comprehensive loss to obtain a corresponding target reconstruction model, and finally carrying out three-dimensional reconstruction on the target to be processed to obtain the target reconstructed model. According to the application, the target reconstruction model is constructed by combining the first loss of the two-dimensional layer and the second loss of the three-dimensional layer for mixed supervision training, so that the target reconstruction model can be better guided to learn, and the reconstruction precision of the target reconstruction model is improved.
In some alternative implementations, step S204 includes the steps of:
and acquiring a three-dimensional scanning face model corresponding to the face image.
In this embodiment, the three-dimensional scanned face model is three-dimensional scanned data obtained after performing a real scanning process on the face image.
And acquiring first vertex information of the three-dimensional scanning face model.
In this embodiment, vertex labeling processing is performed on the three-dimensional scanned face model to extract first vertex information of the three-dimensional scanned face model.
And obtaining second vertex information of the three-dimensional face model.
In this embodiment, vertex labeling processing is performed on the three-dimensional face model to extract second vertex information of the three-dimensional face model.
A first euclidean distance between the first vertex information and the second vertex information is calculated.
In this embodiment, euclidean distance (Euc L I DEAN D I STANCE) is a common concept in mathematics and computer science that is used to measure the true distance between two points in an m-dimensional space. This distance definition is based on the calculation of the "normal" (i.e. straight line) distance between two points in euclidean space. In two-dimensional and three-dimensional space, euclidean distance is the distance between two points. A first euclidean distance between the first vertex information and the second vertex information may be calculated by using a euclidean distance formula.
And taking the first Euclidean distance as the first loss.
In this embodiment, the euclidean distance error between the vertices is calculated by calculating the three-dimensional face model predicted by the initial reconstruction model and the three-dimensional scanned face model obtained by real scanning on the 3D layer, and is used as the first loss, and the first loss can be used to monitor the appearance of the initial reconstruction model later.
The method comprises the steps of obtaining a three-dimensional scanning face model corresponding to the face image, obtaining first vertex information of the three-dimensional scanning face model, obtaining second vertex information of the three-dimensional face model, and subsequently calculating a first Euclidean distance between the first vertex information and the second vertex information, wherein the first Euclidean distance is used as the first loss. According to the method and the device for generating the first loss, the first Euclidean distance between the first vertex information of the three-dimensional scanning face model corresponding to the face image and the second vertex information of the three-dimensional face model is calculated, so that the first loss corresponding to the initial reconstruction model is generated rapidly and conveniently, and the generation efficiency of the obtained first loss is improved.
In some alternative implementations of the present embodiment, step S205 includes the steps of:
and carrying out projection processing on the three-dimensional face model to obtain a corresponding target two-dimensional key point.
In this embodiment, the three-dimensional face model may be projected onto a two-dimensional plane and rendered to generate a corresponding 2D picture. In the projection process, various factors such as illumination, viewing angle, camera parameters and the like are considered to ensure that the rendered 2D picture is as close as possible to the real situation. And then predicting the key points of the target quantity, namely the target two-dimensional key points, on the rendered 2D picture by using deep learning or other algorithms. These key points typically include the locations of facial feature points such as eyes, nose, mouth, etc., which are critical to the tasks of face recognition, expression analysis, etc. The target number is not particularly limited, and for example, 68 may be selected.
And rendering the three-dimensional face model to obtain a corresponding two-dimensional face image.
In this embodiment, the three-dimensional face model is projected onto a two-dimensional plane and rendered to generate a corresponding 2D picture, and the 2D picture is used as the two-dimensional face image.
And calculating a second Euclidean distance between the target two-dimensional key point and the two-dimensional key point, and taking the second Euclidean distance as a key point loss.
In this embodiment, the second euclidean distance between the target two-dimensional keypoint and the two-dimensional keypoint may be calculated by using a euclidean distance formula.
And calculating pixel loss between the two-dimensional face image and the face mask image.
In this embodiment, the pixel loss may be obtained by calculating the pixel difference (color information error at the pixel level) between the rendered two-dimensional face image and the real image, that is, the face mask image.
And calculating the similarity loss between the two-dimensional face image and the face mask image.
In this embodiment, the similarity between the rendered two-dimensional face image and the real image, that is, the face mask image, may be calculated by using a face recognition algorithm, so as to obtain the similarity loss.
The second loss is generated based on the keypoint loss, the pixel loss, and the similarity loss.
In this embodiment, the key point loss, the pixel loss and the similarity loss are integrated to obtain integrated data, and the integrated data is used as the second loss. The method comprises the steps of calculating a key point loss between the target two-dimensional key point and the two-dimensional key point on a 2D layer, calculating a pixel loss between the two-dimensional face image and the face mask image, calculating a similarity loss between the two-dimensional face image and the face mask image, and constructing a second loss, wherein the second loss can be used for supervising the similarity of the initial reconstruction model on a 2D original image.
The method comprises the steps of obtaining a corresponding target two-dimensional key point through projection processing of the three-dimensional face model, obtaining a corresponding two-dimensional face image through rendering processing of the three-dimensional face model, then calculating a second Euclidean distance between the target two-dimensional key point and the two-dimensional key point, taking the second Euclidean distance as key point loss, calculating pixel loss between the two-dimensional face image and the face mask image, calculating similarity loss between the two-dimensional face image and the face mask image, and generating the second loss based on the key point loss, the pixel loss and the similarity loss. According to the method, the corresponding key point loss, pixel loss and similarity loss are obtained through calculation based on the three-dimensional face model, the two-dimensional key points and the face mask image, so that the second loss corresponding to the initial reconstruction model is quickly and conveniently generated based on the key point loss, the pixel loss and the similarity loss, and the generation efficiency of the obtained second loss is improved.
In some alternative implementations, step S206 includes the steps of:
And acquiring a first weight, a second weight, a third weight and a fourth weight which respectively correspond to the first loss, the key point loss, the pixel loss and the similarity loss.
In this embodiment, the numerical selection of the first weight, the second weight, the third weight, and the fourth weight is not specifically limited, and may be set according to the actual service usage requirement. In addition, a weight generation algorithm may be further used to generate a first weight, a second weight, a third weight, and a fourth weight corresponding to the first loss, the keypoint loss, the pixel loss, and the similarity loss, respectively, so as to improve accuracy of the generated weight value.
And acquiring a preset loss calculation formula.
In this embodiment, the predetermined loss calculation formula specifically includes s=a×h+b×i+c×j+d×k, where H is a first loss, a is a first weight, I is a key point loss, b is a second weight, J is a pixel loss, c is a third weight, K is a similarity loss, and d is a fourth weight.
And calculating the first loss, the key point loss, the pixel loss, the similarity loss, the first weight, the second weight, the third weight and the fourth weight based on the loss calculation formula to obtain a corresponding calculation result.
In this embodiment, the first loss, the keypoint loss, the pixel loss, the similarity loss, the first weight, the second weight, the third weight, and the fourth weight are substituted into corresponding positions in the above-mentioned loss calculation formula to perform calculation processing, so as to obtain a corresponding calculation result.
And taking the calculation result as the comprehensive loss.
According to the method, based on the obtained first weight, second weight, third weight and fourth weight which correspond to the first loss, the key point loss, the pixel loss and the similarity loss respectively, the first loss, the key point loss, the pixel loss and the similarity loss are calculated by using a preset loss calculation formula, so that corresponding comprehensive losses are generated rapidly and accurately, the generation efficiency of the comprehensive losses is improved, and the accuracy of the generated comprehensive losses is guaranteed.
In some alternative implementations, step S201 includes the steps of:
and acquiring initial two-dimensional face image data acquired in advance.
In this embodiment, the initial two-dimensional face image data is two-dimensional face image data for different scenes acquired in advance.
And performing data cleaning processing on the initial two-dimensional face image data to obtain corresponding first face image data.
In this embodiment, the above data cleaning refers to removing outliers, repeated items, or low-quality images in the initial two-dimensional face image data.
And performing data clipping processing on the first face image data to obtain corresponding second face image data.
In this embodiment, the data clipping processing refers to clipping the first face image data with a preset size to obtain second face image data meeting the input size requirement of the convolutional neural network in the initial reconstruction model. The numerical selection of the preset size can be set according to actual model construction requirements.
And carrying out normalization processing on the second face image data to obtain corresponding third face image data.
In this embodiment, the normalization process is to normalize the pixel value of the second face image data, and scale it to a range of [0,1] or [ -1,1] in general.
And taking the third face image data as the two-dimensional face image data.
According to the application, the pre-acquired initial two-dimensional face image data is acquired, and then the data cleaning treatment, the data clipping treatment and the normalization treatment are carried out on the initial two-dimensional face image data, so that the pre-processing of the initial two-dimensional face image data is rapidly and intelligently completed, the corresponding third face image data is obtained, the generation efficiency of the third face image data is effectively improved, and the data normalization of the obtained third face image data is improved.
In some optional implementations of this embodiment, after step S207, the electronic device may further perform the following steps:
And constructing verification data based on the face image data.
In this embodiment, the face image data may be randomly selected from the data of the face image data and used as the verification data. The value of the above specified ratio is not particularly limited, and may be determined according to actual use requirements, and may be set to 0.3, for example.
And performing performance verification on the target reconstruction model based on the verification data to obtain performance index data of the target reconstruction model.
In this embodiment, the verification data are respectively input into the target reconstruction model, so that the three-dimensional reconstruction processing is performed on the verification data through the target reconstruction model, and then the performance verification processing is performed on the target reconstruction model according to the reconstruction result output by the target reconstruction model, that is, the performance evaluation is performed on the target reconstruction model to obtain the performance index data of the target reconstruction model. The performance index data may include calculated accuracy values of the target reconstruction model for evaluating the performance of the model, such as distance errors between vertices, pixel differences, and facial similarity.
And carrying out data analysis on the performance index data to generate a performance evaluation result corresponding to the target reconstruction model.
In this embodiment, the performance index data specifically includes a distance error between vertices, a pixel difference, and a face similarity. Comparing the calculated distance error between the vertexes with a preset error threshold, comparing the pixel difference with a preset difference threshold, and comparing the human face similarity with a preset similarity threshold, if the distance error between the vertexes is larger than the corresponding error threshold, the pixel difference is larger than the corresponding difference threshold, and the human face similarity is larger than the corresponding similarity threshold, judging that the target reconstruction model passes the performance verification, and generating a first performance evaluation result that the target reconstruction model passes the performance verification. And if at least one index data in all the performance index data is smaller than the corresponding index threshold, judging that the target reconstruction model fails to pass the performance verification, and generating a second performance evaluation result that the target reconstruction model fails to pass the performance verification. The values of the error threshold, the difference threshold and the similarity threshold are not particularly limited, and may be set according to actual model verification requirements.
And carrying out corresponding model adjustment processing on the target reconstruction model based on the performance evaluation result.
In this embodiment, if the content of the performance evaluation result is that the target reconstruction model passes the performance verification, the target reconstruction model is determined to be a model meeting the required performance requirement, and then no adjustment is required for the target reconstruction model. In addition, if the content of the performance evaluation result is that the target reconstruction model fails performance verification, a preset model improvement strategy can be adopted to carry out improvement treatment on the target reconstruction model. The model improvement strategy can comprise strategies such as adjusting model parameters or training parameters, replacing a layer structure of a model, modifying a model framework, and enhancing data strategies.
The method comprises the steps of constructing verification data based on the face image data, performing performance verification on the target reconstruction model based on the verification data to obtain performance index data of the target reconstruction model, performing data analysis on the performance index data to generate a performance evaluation result corresponding to the target reconstruction model, and performing corresponding model adjustment processing on the target reconstruction model based on the performance evaluation result. After the construction of the target reconstruction model is completed, the performance of the target reconstruction model is intelligently verified by using verification data constructed based on the face image data, and the target reconstruction model is subjected to corresponding model adjustment processing according to the performance evaluation result after the analysis of the performance index data of the obtained target reconstruction model, so that the finally obtained target reconstruction model can meet the corresponding performance requirement expectation, the processing accuracy of the three-dimensional reconstruction processing of the input face image to be processed by using the target reconstruction model is ensured, and the accuracy of the generated target three-dimensional face model corresponding to the face image to be processed is improved.
In some optional implementations of this embodiment, after step S208, the electronic device may further perform the following steps:
and carrying out smoothing treatment on the target three-dimensional face model to obtain a corresponding first target three-dimensional face model.
In this embodiment, the smoothing process is performed on the target three-dimensional face model to eliminate noise and unnecessary details in the target three-dimensional face model, so as to obtain a corresponding first target three-dimensional face model.
And performing texture mapping processing on the first target three-dimensional face model to obtain a corresponding second target three-dimensional face model.
In this embodiment, the texture mapping process refers to mapping texture information on a two-dimensional face image to be processed onto the first target three-dimensional face model, so as to improve the reality of the first target three-dimensional face model, and obtain a corresponding second target three-dimensional face model.
And storing the second target three-dimensional face model.
In this embodiment, the storage manner of the second target three-dimensional face model is not specifically limited, and for example, a storage manner such as blockchain storage, local database storage, cloud server storage, etc. may be used.
The method comprises the steps of carrying out smoothing treatment on the target three-dimensional face model to obtain a corresponding first target three-dimensional face model, carrying out texture mapping treatment on the first target three-dimensional face model to obtain a corresponding second target three-dimensional face model, and subsequently storing the second target three-dimensional face model. According to the application, after the input face image to be processed is subjected to three-dimensional reconstruction processing based on the target reconstruction model to obtain the target three-dimensional face model, the target three-dimensional face model is further subjected to smoothing processing and texture mapping processing to obtain the second target three-dimensional face model, so that the optimization processing of the target three-dimensional face model is realized, and the authenticity of the second target three-dimensional face model is improved. In addition, the second target three-dimensional face model is stored, so that the privacy and the safety of the second target three-dimensional face model can be ensured.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
It is emphasized that, to further ensure the privacy and security of the target reconstruction model, the target reconstruction model may also be stored in a blockchain node.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (B l ockcha i n), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (ART I F I C I A L I NTE L L I GENCE, A I) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-On-y Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a face three-dimensional reconstruction apparatus based on artificial intelligence, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 3, the artificial intelligence-based face three-dimensional reconstruction device 300 according to the present embodiment includes an acquisition module 301, a first processing module 302, a second processing module 303, a first generating module 304, a second generating module 305, a third generating module 306, an optimizing module 307, and a reconstruction module 308. Wherein:
An acquisition module 301, configured to acquire two-dimensional face image data constructed in advance;
The first processing module 302 is configured to extract a face region of interest from the two-dimensional face image data to obtain a corresponding face image, detect a face key point from the face image to obtain a corresponding two-dimensional key point, and perform face segmentation on the face image to obtain a corresponding face mask image;
The second processing module 303 is configured to extract information from the face image based on a convolutional neural network in a preset initial reconstruction model to obtain corresponding model parameters, and process the model parameters based on a processing network in the initial reconstruction model to obtain a corresponding three-dimensional face model;
a first generation module 304, configured to generate a corresponding first loss based on the three-dimensional face model and the face image;
A second generation module 305, configured to generate a corresponding second loss based on the three-dimensional face model, the two-dimensional key points, and the face mask image;
a third generation module 306 for generating a composite loss based on the first loss and the second loss;
an optimization module 307, configured to perform optimization processing on the initial reconstruction model based on the comprehensive loss, so as to obtain a corresponding target reconstruction model;
the reconstruction module 308 is configured to perform three-dimensional reconstruction processing on the input face image to be processed based on the target reconstruction model, so as to obtain a corresponding target three-dimensional face model.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence-based face three-dimensional reconstruction method in the foregoing embodiment, and are not described herein again.
In some alternative implementations of the present embodiment, the first generating module 304 includes:
the first acquisition sub-module is used for acquiring a three-dimensional scanning face model corresponding to the face image;
The second acquisition sub-module is used for acquiring first vertex information of the three-dimensional scanning face model;
the third acquisition sub-module is used for acquiring second vertex information of the three-dimensional face model;
a first calculation sub-module for calculating a first euclidean distance between the first vertex information and the second vertex information;
and the first determining submodule is used for taking the first Euclidean distance as the first loss.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence-based face three-dimensional reconstruction method in the foregoing embodiment, and are not described herein again.
In some alternative implementations of the present embodiment, the second generating module 305 includes:
the projection sub-module is used for carrying out projection processing on the three-dimensional face model to obtain a corresponding target two-dimensional key point;
The rendering sub-module is used for rendering the three-dimensional face model to obtain a corresponding two-dimensional face image;
The second calculation sub-module is used for calculating a second Euclidean distance between the target two-dimensional key point and the two-dimensional key point, and taking the second Euclidean distance as a key point loss;
a third calculation sub-module, configured to calculate a pixel loss between the two-dimensional face image and the face mask image;
A fourth calculation sub-module, configured to calculate a similarity loss between the two-dimensional face image and the face mask image;
a generation sub-module for generating the second loss based on the keypoint loss, the pixel loss, and the similarity loss.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence-based face three-dimensional reconstruction method in the foregoing embodiment, and are not described herein again.
In some optional implementations of this embodiment, the third generating module 306 includes:
a fourth obtaining sub-module, configured to obtain a first weight, a second weight, a third weight, and a fourth weight corresponding to the first loss, the keypoint loss, the pixel loss, and the similarity loss, respectively;
a fifth obtaining sub-module, configured to obtain a preset loss calculation formula;
A fifth calculation sub-module, configured to perform calculation processing on the first loss, the keypoint loss, the pixel loss, the similarity loss, the first weight, the second weight, the third weight, and the fourth weight based on the loss calculation formula, so as to obtain a corresponding calculation result;
and the second determining submodule is used for taking the calculation result as the comprehensive loss.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence-based face three-dimensional reconstruction method in the foregoing embodiment, and are not described herein again.
In some optional implementations of this embodiment, the acquiring module 301 includes:
A sixth acquisition sub-module, configured to acquire initial two-dimensional face image data acquired in advance;
The cleaning submodule is used for carrying out data cleaning processing on the initial two-dimensional face image data to obtain corresponding first face image data;
The cutting sub-module is used for carrying out data cutting processing on the first face image data to obtain corresponding second face image data;
The normalization sub-module is used for carrying out normalization processing on the second face image data to obtain corresponding third face image data;
and the third determining submodule is used for taking the third face image data as the two-dimensional face image data.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence-based face three-dimensional reconstruction method in the foregoing embodiment, and are not described herein again.
In some optional implementations of this embodiment, the artificial intelligence based face three-dimensional reconstruction device further includes:
the construction module is used for constructing verification data based on the face image data;
The verification module is used for performing performance verification on the target reconstruction model based on the verification data to obtain performance index data of the target reconstruction model;
The fourth generation module is used for carrying out data analysis on the performance index data and generating a performance evaluation result corresponding to the target reconstruction model;
And the adjusting module is used for carrying out corresponding model adjusting processing on the target reconstruction model based on the performance evaluation result.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence-based face three-dimensional reconstruction method in the foregoing embodiment, and are not described herein again.
In some optional implementations of this embodiment, the artificial intelligence based face three-dimensional reconstruction device further includes:
The smoothing module is used for carrying out smoothing treatment on the target three-dimensional face model to obtain a corresponding first target three-dimensional face model;
The mapping module is used for carrying out texture mapping processing on the first target three-dimensional face model to obtain a corresponding second target three-dimensional face model;
and the storage module is used for storing the second target three-dimensional face model.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence-based face three-dimensional reconstruction method in the foregoing embodiment, and are not described herein again.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It should be noted that only computer device 4 having components 41-43 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware thereof includes, but is not limited to, a microprocessor, an application specific integrated circuit (APP L I CAT I on SPEC I F I C I NTEGRATED C I rcu I t, AS IC), a programmable gate array (Fie l d-Programmab L E GATE AR RAY, FPGA), a digital Processor (D I G I TA L S I GNA L Processor, DSP), an embedded device, and the like.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (SMART MED I A CARD, SMC), a secure digital (Secu RE D I G I TA L, SD) card, a flash memory card (F L ASH CARD) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is generally used to store an operating system and various application software installed on the computer device 4, such as computer readable instructions of a face three-dimensional reconstruction method based on artificial intelligence. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a central processing unit (Cent ra lProcess i ng Un i t, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, for example, execute computer readable instructions of the artificial intelligence-based face three-dimensional reconstruction method.
The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.
The present application also provides another embodiment, namely, a computer readable storage medium, where computer readable instructions are stored, where the computer readable instructions are executable by at least one processor to cause the at least one processor to perform the steps of the artificial intelligence based face three-dimensional reconstruction method as described above.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.