WO2020155908A1

WO2020155908A1 - Method and apparatus for generating information

Info

Publication number: WO2020155908A1
Application number: PCT/CN2019/126382
Authority: WO
Inventors: 郭冠军
Original assignee: 北京字节跳动网络技术有限公司
Priority date: 2019-01-31
Filing date: 2019-12-18
Publication date: 2020-08-06
Also published as: CN109816791B; CN109816791A

Abstract

Disclosed are a method and apparatus for generating information. The method comprises: acquiring a left face image and a right face image obtained by photographing a target face (201); performing the following steps on face images in the left face image and the right face image: inputting the face images to a pre-trained map generation model to obtain maps corresponding to the face images; for points in the maps, determining three-dimensional coordinates of face key points corresponding to the points on the basis of the coordinates of the points in the maps and the pixel values of the points; and generating three-dimensional meshes corresponding to the face images on the basis of the determined three-dimensional coordinates of the face key points in the face images (202); and generating a result three-dimensional mesh corresponding to the target face on the basis of the three-dimensional mesh corresponding to the left face image and the three-dimensional mesh corresponding to the right face image (203). The present invention facilitates improving the accuracy of three-dimensional face reconstruction.

Description

Method and device for generating information

Cross references to related applications

This application is filed based on a Chinese patent application with an application number of 201910100632.8 and an application date of January 31, 2019, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference into this application.

Technical field

The embodiments of the present disclosure relate to the field of computer technology, and in particular to methods and devices for generating information.

Background technique

With the popularity of mobile phone video applications, various face effects have also been widely used. As an effective face representation technology, three-dimensional face reconstruction has broad application prospects.

Three-dimensional face reconstruction is a process of directly returning the three-dimensional mesh information (3D mesh) of the face through the pixel information of a given two-dimensional face image. Generally, a two-dimensional face image used for three-dimensional face reconstruction uses an electronic device (such as a mobile phone, a camera, etc.) including a camera to obtain an image by taking a face from a certain angle.

Summary of the invention

The embodiments of the present disclosure propose methods and apparatuses for generating information.

In a first aspect, an embodiment of the present disclosure provides a method for generating information, the method including: acquiring a left face image and a right face image obtained by photographing a target face, where the left face image The and right face images are binocular vision images; for the face images in the left face image and the right face image, perform the following steps: input the face image into the pre-trained mapping generation model to obtain the face image The corresponding map, where the points in the map correspond to the key points of the face in the face image; for the points in the map, based on the coordinates of the point in the map and the pixel value of the point, Determine the three-dimensional coordinates of the face key points corresponding to this point; generate the three-dimensional grid corresponding to the face image based on the determined three-dimensional coordinates of the face key points in the face image; based on the left face image The corresponding three-dimensional grid and the three-dimensional grid corresponding to the right face image are generated to generate the resulting three-dimensional grid corresponding to the target face.

In some embodiments, based on the coordinates of the point in the map and the pixel value of the point, determining the three-dimensional coordinates of the key point of the face corresponding to the point includes: determining the person corresponding to the point based on the pixel value of the point The depth value of the face key point; the coordinates of the face key point corresponding to the point in the face image are determined based on the coordinates of the point in the map; the depth value of the face key point corresponding to the point corresponds to the point The coordinates of the key points of the face in the face image are determined to determine the three-dimensional coordinates of the key points of the face corresponding to the point.

In some embodiments, based on the pixel value of the point, determining the depth value of the face key point corresponding to the point includes: in response to determining that the pixel value of the point is greater than or equal to a preset threshold, determining the pixel value of the point as The depth value of the key point of the face corresponding to this point.

In some embodiments, determining the depth value of the key point of the face corresponding to the point based on the pixel value of the point further includes: in response to determining that the pixel value of the point is less than a preset threshold, determining the preset threshold as the point The depth value of the corresponding key point of the face.

In some embodiments, based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, generating the resulting three-dimensional grid corresponding to the target face includes: The center line of the corresponding three-dimensional grid and the center line of the three-dimensional grid corresponding to the right face image are established to establish a reference plane, where the reference plane penetrates the three-dimensional grid along the center line and divides the three-dimensional grid into two parts; extracting the left Among the three-dimensional grids corresponding to the face image, the three-dimensional grid on the left side of the reference plane is used as the left three-dimensional grid, and the three-dimensional grid corresponding to the right face image is extracted, and the three-dimensional grid on the right side of the reference plane is used as Right three-dimensional grid; splicing the extracted left and right three-dimensional grids to generate the resulting three-dimensional grid corresponding to the target face.

In some embodiments, the map generation model is obtained by training in the following steps: Obtain a training sample set, where the training sample includes a sample face image, the coordinates and depth values of key points of the face in the sample face image; for training samples Concentrated training samples, based on the coordinates of the face key points in the training sample, determine the mapping position of the face key points in the training sample in the map to be constructed, and based on the face key points in the training sample Determine the pixel value of the corresponding mapping position in the mapping image to be constructed, and use the determined mapping position and the pixel value of the mapping position to construct a mapping image corresponding to the sample face image in the training sample; use a machine In the learning method, the sample face image of the training sample in the training sample set is taken as the input, and the map corresponding to the input sample face image is taken as the desired output, and the map generation model is obtained by training.

In some embodiments, the training samples in the training sample set are generated by the following steps: using the depth map acquisition device to collect the face depth map of the sample face, and obtain the face image corresponding to the face depth map; Perform face key point detection on the corresponding face image to determine the coordinates of the face key points in the face image corresponding to the face depth map; compare the face image corresponding to the face depth map to the determined person The coordinates of the face key points and the depth values of the face key points determined based on the face depth map are summarized as training samples.

In a second aspect, an embodiment of the present disclosure provides an apparatus for generating information. The apparatus includes: an image acquiring unit configured to acquire a left face image and a right face image obtained by photographing a target face , Where the left face image and the right face image are binocular vision images; the first generating unit is configured to perform the following steps for the face images in the left face image and the right face image: The image is input into the pre-trained map generation model to obtain the map corresponding to the face image, where the points in the map correspond to the key points of the face in the face image; for the points in the map, based on The coordinates of the point in the map and the pixel value of the point determine the three-dimensional coordinates of the key points of the face corresponding to the point; based on the determined three-dimensional coordinates of the key points of the face in the face image, the person is generated The three-dimensional mesh corresponding to the face image; the second generating unit is configured to generate the resultant three-dimensional mesh corresponding to the target face based on the three-dimensional mesh corresponding to the left face image and the three-dimensional mesh corresponding to the right face image grid.

In some embodiments, the first generating unit is further configured to: determine the depth value of the key point of the face corresponding to the point based on the pixel value of the point; determine the person corresponding to the point based on the coordinates of the point in the map. The coordinates of the face key point in the face image; based on the depth value of the face key point corresponding to the point and the coordinates of the face key point corresponding to the point in the face image, determine the face key corresponding to the point The three-dimensional coordinates of the point.

In some embodiments, the first generating unit is further configured to: in response to determining that the pixel value of the point is greater than or equal to a preset threshold, determine the pixel value of the point as the depth value of the key point of the face corresponding to the point.

In some embodiments, the first generating unit is further configured to: in response to determining that the pixel value of the point is less than the preset threshold, determine the preset threshold as the depth value of the key point of the face corresponding to the point.

In some embodiments, the second generating unit includes: a establishing module configured to respectively pass the center line of the three-dimensional grid corresponding to the left face image and the center line of the three-dimensional grid corresponding to the right face image to establish a reference Surface, where the reference surface runs through the three-dimensional grid along the centerline, dividing the three-dimensional grid into two parts; the extraction module is configured to extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid located on the left side of the reference plane The grid is used as the left three-dimensional grid, and the three-dimensional grid corresponding to the right face image is extracted, the three-dimensional grid on the right side of the reference plane is used as the right three-dimensional grid; the splicing module is configured to pair the extracted left three-dimensional grid Join with the right 3D grid to generate the result 3D grid corresponding to the target face.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, when one or more programs are processed by one or more The processor executes, so that one or more processors implement the method of any one of the foregoing methods for generating information.

In a fourth aspect, the embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the method of any one of the above methods for generating information is implemented.

The method and device for generating information provided by the embodiments of the present disclosure obtain the left face image and the right face image obtained by shooting the target face, wherein the left face image and the right face image are double Then, for the face images in the left face image and the right face image, perform the following steps: input the face image into the pre-trained map generation model to obtain the map corresponding to the face image, Among them, the points in the map correspond to the key points of the face in the face image; for the points in the map, based on the coordinates of the point in the map and the pixel value of the point, determine the person corresponding to the point The three-dimensional coordinates of the key points of the face; based on the determined three-dimensional coordinates of the key points of the face in the face image, a three-dimensional grid corresponding to the face image is generated, and finally based on the three-dimensional grid corresponding to the left face image The three-dimensional grid corresponding to the right face image is used to generate the resulting three-dimensional grid corresponding to the target face. It can be understood that due to occlusion and shooting angles, the left face image and the right face image can record facial features at different angles, so here, the three-dimensional grid corresponding to the left face image and the right face image corresponding to the The 3D mesh can generate a more accurate 3D mesh corresponding to the target face, which helps to improve the accuracy of 3D face reconstruction.

Description of the drawings

By reading the detailed description of the non-limiting embodiments with reference to the following drawings, other features, purposes and advantages of the present disclosure will become more apparent:

Fig. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure can be applied;

Fig. 2 is a flowchart of an embodiment of a method for generating information according to the present disclosure;

Fig. 3 is a schematic diagram of an application scene of the method for generating information according to an embodiment of the present disclosure;

FIG. 4 is a flowchart of another embodiment of a method for generating information according to the present disclosure;

Fig. 5 is a schematic structural diagram of an embodiment of an apparatus for generating information according to the present disclosure;

Fig. 6 is a schematic structural diagram of a computer system suitable for implementing an electronic device of an embodiment of the present disclosure.

detailed description

The present disclosure will be further described in detail below in conjunction with the drawings and embodiments. It can be understood that the specific embodiments described here are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for ease of description, only the parts related to the relevant invention are shown in the drawings.

It should be noted that the embodiments in the present disclosure and the features in the embodiments can be combined with each other if there is no conflict. Hereinafter, the present disclosure will be described in detail with reference to the drawings and in conjunction with embodiments.

FIG. 1 shows an exemplary system architecture 100 to which an embodiment of the method for generating information or the apparatus for generating information of the present disclosure can be applied.

As shown in FIG. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used to provide a medium for communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables.

The user can use the

terminal devices

101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages, and so on. Various communication client applications, such as image processing software, video playback software, web browser applications, search applications, instant messaging tools, and social platform software, may be installed on the

terminal devices

101, 102, and 103.

The

terminal devices

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they can be various electronic devices, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, moving picture expert compression Standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compression standard audio layer 4) player, laptop portable computer and desktop computer, etc. When the

terminal devices

101, 102, 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, multiple software or software modules used to provide distributed services), or as a single software or software module. No specific restrictions are made here.

The server 105 may be a server that provides various services, for example, an image processing server that processes the left face image and the right face image obtained by shooting the target face of the

terminal device

101, 102, 103. The image processing server can analyze and process the received data such as the left face image and the right face image, and obtain the processing result (for example, the resultant three-dimensional grid corresponding to the target face).

It should be noted that the method for generating information provided by the embodiments of the present disclosure can be executed by the server 105 or by the

terminal devices

101, 102, 103; accordingly, the device for generating information can be set on the server 105 can also be set in the

terminal devices

101, 102, 103.

It should be noted that the server can be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or as a single server. When the server is software, it can be implemented as multiple software or software modules (for example, multiple software or software modules for providing distributed services), or as a single software or software module. No specific restrictions are made here.

It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks and servers. In the case that the data used in the process of generating the resultant three-dimensional grid corresponding to the target face does not need to be obtained remotely, the above system architecture may not include a network, but only include terminal devices or servers.

With continued reference to FIG. 2, there is shown a process 200 of an embodiment of the method for generating information according to the present disclosure. The method for generating information includes the following steps:

Step 201: Obtain a left face image and a right face image obtained by shooting a target face.

In this embodiment, the execution subject of the method for generating information (for example, the server shown in FIG. 1) may obtain the left face image and the right person obtained by shooting the target face through a wired connection or a wireless connection. Face image. Among them, the target face is the face whose corresponding three-dimensional mesh is to be generated. In practice, after the three-dimensional mesh of the face is generated, operations such as rendering the three-dimensional mesh can be performed to realize the reconstruction of the three-dimensional face.

In this embodiment, the left face image and the right face image are binocular vision images. Specifically, the above-mentioned execution subject may obtain the left face image and the right face image pre-stored locally, or may obtain the left face image and the right face image sent by a communication connected electronic device (such as the terminal device shown in FIG. 1). Face image. It should be noted that both the left face image and the right face image are two-dimensional face images.

In practice, various devices including binocular cameras (such as binocular cameras) can be used to photograph the target face to obtain the left and right face images corresponding to the target face. It should be noted that the binocular camera is usually two cameras arranged in a horizontal direction. When shooting with binocular cameras, the camera arranged on the left can be determined as the left camera, and the captured image is the left image (corresponding to the left face image); correspondingly, the camera arranged on the right is determined as The right camera, the image taken by it is the right image (corresponding to the right face image).

Step 202: For the face images in the left face image and the right face image, perform the following steps: input the face image into a pre-trained map generation model to obtain a map corresponding to the face image; For the points in the figure, based on the coordinates of the point in the map and the pixel value of the point, determine the three-dimensional coordinates of the key point of the face corresponding to the point; based on the determined key point of the face in the face image Three-dimensional coordinates to generate a three-dimensional grid corresponding to the face image.

In this embodiment, for each of the left face image and the right face image obtained in step 201, the above-mentioned execution subject may perform the following steps:

Step 2021: Input the face image into a pre-trained map generation model to obtain a map corresponding to the face image.

Wherein, the map is an image used to determine the three-dimensional coordinates of the key points of the face in the face image. The three-dimensional coordinates of the face key points are composed of the position coordinates of the face key points in the face image and the depth value of the face key points. The depth value of the face key point can be the distance from the face key point to the imaging plane when the face image is collected. The points in the map correspond to the key points of the face in the face image. In practice, the key points of the human face may be key points in the human face, specifically, the points that affect the contour of the face or the shape of the facial features.

In this embodiment, the map generation model can be used to characterize the correspondence between the face image and the map corresponding to the face image. Specifically, as an example, the map generation model may be pre-made by technicians based on statistics of a large number of face images and maps corresponding to face images, and stores multiple face images and corresponding maps. Correspondence table; it can also be a model obtained after training an initial model (such as a neural network) using a machine learning method based on preset training samples.

It should be noted that the map generation model corresponds to a predetermined mapping relationship or mapping principle, which is used to determine the key points of the face in the face image input to the map generation model and output the map generation model The location of the map in the map.

In some optional implementation manners of this embodiment, the map generation model may be obtained by the above-mentioned execution subject or other electronic equipment through training in the following steps:

Step 20211: Obtain a training sample set.

Among them, the training sample includes the sample face image, the coordinates and depth values of the key points of the face in the sample face image. Here, the sample face image is a two-dimensional face image. In practice, various methods can be used to obtain the training sample set.

In some optional implementations of this embodiment, the training samples in the training sample set can be generated through the following steps: First, the depth map collection device can be used to collect the face depth map of the sample face, and obtain the face depth map location. The corresponding face image. Then, the face key point detection is performed on the face image corresponding to the face depth map to determine the coordinates of the face key points in the face image corresponding to the face depth map. Finally, the face image corresponding to the face depth map, the coordinates of the determined face key points, and the depth values of the face key points determined based on the face depth map are summarized as training samples.

Here, the depth map acquisition device may be various image acquisition devices that can acquire a depth map. For example, binocular cameras, depth cameras, etc. The face depth map is an image containing depth information (that is, the distance information between the viewpoint and the surface of the scene object). The face image corresponding to the face depth map is an RGB (Red Green Blue) three-channel color image without depth information corresponding to the face depth map. Furthermore, by removing the depth information of the face depth map, a face image corresponding to the face depth map (ie, a sample face image) can be obtained.

Here, various face key point detection methods can be used to perform face key point detection on the face image corresponding to the face depth map. For example, the face image can be input to a pre-trained face key point detection model to obtain the detection result. Among them, the face key point detection model can be used to detect the position of the face key point in the face image. Here, the face key point detection model can be obtained by supervised training of the existing convolutional neural network based on the sample set (including the face image and the label used to indicate the position of the face key point) using machine learning methods of. Among them, the convolutional neural network can use various existing structures, such as DenseBox, VGGNet, ResNet, SegNet, etc.

In addition, it should be noted that the above-mentioned method for determining the depth value of the key points of the face based on the depth map of the face is a well-known technology widely researched and applied at present, and will not be repeated here.

Step 20112: For the training samples in the training sample set, based on the coordinates of the face key points in the training sample, determine the mapping position of the face key points in the training sample in the map to be constructed, and based on the training sample Determine the depth value of the face key point in the map to determine the pixel value of the corresponding mapping position in the map to be constructed, and use the determined mapping position and the pixel value of the mapping position to construct the sample face image corresponding to the training sample Map.

Here, for the training samples in the training sample set, the above-mentioned execution subject or other electronic device may first determine that the face key points in the training sample are in the map to be constructed based on the coordinates of the face key points in the training sample The mapped location. Here, for a certain face key point, the coordinates of the mapping position of the face key point in the map to be constructed can be determined based on a pre-established mapping relationship or based on an existing mapping principle. As an example, the principle of UV (U-VEEZ) mapping can be used to determine the coordinates of the mapping position of the key point of the face in the map to be constructed. In practice, UV is a two-dimensional texture coordinate. UV is used to define a two-dimensional texture coordinate system, called "UV texture space". The UV texture space uses the letters U and V to indicate axes in two-dimensional space. In 3D modeling, UV mapping can convert texture information into plane information. At this time, the mapped UV coordinates can be used to indicate the mapping position in the map to be constructed. The mapped UV coordinates can be used as the coordinates of the mapping position in the map to be constructed.

In this implementation, after determining the mapping position of a certain face key point in the map to be constructed, the pixel value of the corresponding mapping position in the map to be constructed can be determined based on the depth value of the face key point . Specifically, the corresponding relationship between the pixel value and the depth value may be determined in advance, and further, the pixel value of the corresponding mapping position in the map to be constructed is determined based on the foregoing corresponding relationship and the depth value of the key point of the face. As an example, the predetermined correspondence between the pixel value and the depth value is "the pixel value is equal to the depth value". Furthermore, for a certain face key point, the coordinates of the face key point in the sample face image are (100, 50), and the mapping position corresponding to the face key point is the coordinates (50, 25). The face key point The depth value is 30. Then the pixel value at coordinates (50, 25) in the map is 30.

It should be noted that the corresponding relationship between the size of the map and the size of the sample face image can be established in advance. As an example, the size of the map may be preset to be the same as the size of the sample face image; or, the size of the map may be preset to be one half of the size of the sample face image.

Step 20113: Using a machine learning method, take the sample face images of the training samples in the training sample set as input, and use the input map corresponding to the sample face image as the desired output, and train a map generation model.

Here, the above-mentioned executive body or other electronic equipment can use machine learning methods to use the sample face images included in the training samples in the above-mentioned training sample set as the input of the initial model, and use the input map corresponding to the sample face image as The expected output of the initial model is trained on the initial model, and finally the map generation model is obtained through training.

Here, various existing convolutional neural network structures (such as DenseBox, VGGNet, ResNet, SegNet, etc.) can be used as the initial model for training. In practice, convolutional neural network (Convolutional Neural Network, CNN) is a feed-forward neural network. Its artificial neurons can respond to a part of the surrounding units in the coverage area and have excellent performance in image processing. Therefore, convolution can be used The neural network processes the sample face images in the training samples.

It should be noted that the above-mentioned executive body or other electronic devices can also use other models with image processing functions as the initial model, and it is not limited to CNN. The specific model structure can be set according to actual needs, and is not limited here. It needs to be pointed out that the machine learning method is a well-known technology that is currently widely researched and applied, and will not be repeated here.

Step 2022: For a point in the map, based on the coordinates of the point in the map and the pixel value of the point, determine the three-dimensional coordinates of the key point of the face corresponding to the point.

Here, for a point in the map, the above-mentioned execution subject can use various methods to determine the three-dimensional coordinates of the key point of the face corresponding to the point in the map based on the coordinates of the point in the map and the pixel value of the point.

In some optional implementations of this embodiment, for a point in the map, the above-mentioned executive body can determine the three-dimensional coordinates of the key point of the face corresponding to the point by the following steps: First, the above-mentioned executive body can be based on the point The pixel value determines the depth value of the key point of the face corresponding to the point. Then, the above-mentioned execution subject may determine the coordinates of the face key point corresponding to the point in the face image based on the coordinates of the point in the map. Finally, the execution subject may determine the three-dimensional coordinates of the face key point corresponding to the point based on the depth value of the face key point corresponding to the point and the coordinates of the face key point corresponding to the point in the face image.

Specifically, the above-mentioned execution subject may determine the coordinates of the face key point corresponding to the point in the face image based on the mapping relationship or the mapping principle corresponding to the map generation model, and the coordinates of the point in the map. It is understandable that when training the map generation model, due to the use of a predetermined mapping relationship or mapping principle, the mapping position of the face key point in the map to be constructed can be determined based on the coordinates of the face key point (refer to Step 20212). Therefore, here, for a certain point in the map, a reverse process can be used to determine the coordinates of the key point of the face corresponding to the point in the face image.

In addition, in this implementation manner, the above-mentioned execution subject may use various methods to determine the depth value of the face key point corresponding to the point based on the pixel value of the point. For example, the pixel value of the point can be directly determined as the depth value of the key point of the face corresponding to the point.

In some optional implementations of this embodiment, the above-mentioned execution subject may, in response to determining that the pixel value of the point is greater than or equal to a preset threshold, determine the pixel value of the point as the depth value of the face key point corresponding to the point . The preset threshold may be a predetermined value, such as "1". In practice, the points with very low pixel values are usually the points where the prediction is incorrect. Therefore, in this implementation, the preset threshold can be set to remove the points with the prediction error, which helps to determine more accurate three-dimensional face key points. coordinate.

In some optional implementations of this embodiment, the above-mentioned execution subject may further determine that the preset threshold is the depth value of the key point of the face corresponding to the point in response to determining that the pixel value of the point is less than the preset threshold.

It should be noted that after determining the coordinates of the face key points in the face image and the depth value of the face key points, the above-mentioned execution subject can directly use the coordinates of the face key points (which can be expressed as (x, y)) The depth value of the face key point (which can be expressed as z) constitutes the three-dimensional coordinates of the face key point (which can be expressed as (x, y, z)).

Step 2023: Generate a three-dimensional grid corresponding to the face image based on the determined three-dimensional coordinates of the key points of the face in the face image.

In practice, the three-dimensional grid corresponding to the face image is a three-dimensional grid with face key points as vertices. Therefore, here, based on the determined three-dimensional coordinates of each face key point in the face image, The above-mentioned execution subject can generate a three-dimensional grid corresponding to the face image.

It should be noted that the method of generating a three-dimensional grid based on the three-dimensional coordinates of the vertices of the three-dimensional grid is a well-known technology widely researched and applied at present, and will not be repeated here.

Step 203: Based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, generate a resulting three-dimensional grid corresponding to the target face.

In this embodiment, the above-mentioned execution subject can generate a three-dimensional grid corresponding to the left face image and a three-dimensional grid corresponding to the right face image by performing step 202. Furthermore, based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, the execution subject can generate the resultant three-dimensional grid corresponding to the target face. Among them, the resultant three-dimensional mesh is a three-dimensional mesh that is to be subjected to operations such as rendering to realize the three-dimensional face reconstruction of the target face.

Specifically, the execution subject may use various methods to generate the resulting three-dimensional grid corresponding to the target face based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image. For example, the head posture of the three-dimensional grid corresponding to the left face image and the head posture of the three-dimensional grid corresponding to the right face image can be detected, and then the head rotation angle indicated by the corresponding head posture can be compared to The three-dimensional grid corresponding to the small face image is determined as the resulting three-dimensional grid corresponding to the target face. It can be understood that, in practice, the smaller the head rotation angle, the fewer the occluded facial features, and the more accurate the generated three-dimensional mesh. Thus, from the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, a three-dimensional grid with a smaller head rotation angle is selected as the resultant three-dimensional grid corresponding to the target face. , Which helps to improve the accuracy of the resulting 3D grid.

It should be noted that the method for detecting the head posture is a well-known technology that is currently widely studied and applied, and will not be repeated here.

Continue to refer to FIG. 3, which is a schematic diagram of an application scenario of the method for generating information according to this embodiment. In the application scenario of FIG. 3, the server 301 can first obtain the left face image 303 and the right face image 304 obtained by shooting the target face sent by the terminal device 302, where the left face image 303 and the right face image The face image 304 is a binocular vision image.

Then, for the left face image 303, the server 301 can input the left face image 303 into the pre-trained map generation model 305 to obtain the map 306 corresponding to the left face image 303, where the points in the map 306 are The face key points in the left face image 303 correspond; for a point in the map 306, based on the coordinates of the point in the map 306 and the pixel value of the point, determine the three-dimensional face key point corresponding to the point Coordinates; Based on the determined three-dimensional coordinates of the key points of the face in the left face image 303, a three-dimensional grid 307 corresponding to the left face image 303 is generated; for the right face image 304, the server 301 can convert the right face The image 304 is input to the map generation model 305 to obtain the map 308 corresponding to the right face image 304, wherein the points in the map 308 correspond to the face key points in the right face image 304; for the map 308 Based on the coordinates of the point in the map 308 and the pixel value of the point, determine the three-dimensional coordinates of the key point of the face corresponding to the point; based on the determined key point of the face in the right face image 304 The three-dimensional coordinates are used to generate a three-dimensional grid 309 corresponding to the right face image 304.

Finally, the server 301 may generate a resultant three-dimensional mesh 310 corresponding to the target face based on the three-dimensional mesh 307 corresponding to the left face image 303 and the three-dimensional mesh 309 corresponding to the right face image 304.

The method provided by the foregoing embodiment of the present disclosure obtains the left face image and the right face image obtained by shooting the target face, and then performs the following steps for the face image in the left face image and the right face image : Input the face image into the pre-trained map generation model to obtain the map corresponding to the face image; for a point in the map, based on the coordinates of the point in the map and the pixel value of the point, determine The three-dimensional coordinates of the face key points corresponding to this point; based on the determined three-dimensional coordinates of the face key points in the face image, the three-dimensional grid corresponding to the face image is generated, and finally based on the left face image The corresponding three-dimensional grid and the three-dimensional grid corresponding to the right face image are generated to generate the resulting three-dimensional grid corresponding to the target face. It can be understood that due to occlusion and shooting angles, the left face image and the right face image can record facial features at different angles, so here, the three-dimensional grid corresponding to the left face image and the right face image corresponding to the The 3D mesh can generate a more accurate 3D mesh corresponding to the target face, which helps to improve the accuracy of 3D face reconstruction.

With further reference to FIG. 4, it shows a flow 400 of another embodiment of a method for generating information. The process 400 of the method for generating information includes the following steps:

Step 401: Obtain a left face image and a right face image obtained by shooting a target face.

In this embodiment, the execution subject of the method for generating information (for example, the server shown in FIG. 1) may obtain the left face image and the right person obtained by shooting the target face through a wired connection or a wireless connection. Face image. Among them, the target face is the face whose corresponding three-dimensional mesh is to be generated. In practice, after the three-dimensional mesh of the face is generated, operations such as rendering the three-dimensional mesh can be performed to realize the reconstruction of the three-dimensional face. The left face image and the right face image are binocular vision images.

Step 402, for the face images in the left face image and the right face image, perform the following steps: input the face image into a pre-trained map generation model to obtain the map corresponding to the face image; For the points in the figure, based on the coordinates of the point in the map and the pixel value of the point, determine the three-dimensional coordinates of the key point of the face corresponding to the point; based on the determined key point of the face in the face image Three-dimensional coordinates to generate a three-dimensional grid corresponding to the face image.

In this embodiment, for each of the left face image and the right face image obtained in step 401, the above-mentioned execution subject may perform the following steps:

Step 4021: Input the face image into a pre-trained map generation model to obtain a map corresponding to the face image.

Among them, the map is an image used to determine the three-dimensional coordinates of the key points of the face in the face image. The three-dimensional coordinates of the face key points are composed of the position coordinates of the face key points in the face image and the depth value of the face key points. The depth value of the face key point may be the distance from the face key point to the imaging plane when the face image is collected. The points in the map correspond to the key points of the face in the face image. In practice, the key points of the human face may be key points in the human face, specifically, the points that affect the contour of the face or the shape of the facial features. The map generation model can be used to characterize the correspondence between the face image and the map corresponding to the face image.

Step 4022: For a point in the map, based on the coordinates of the point in the map and the pixel value of the point, determine the three-dimensional coordinates of the key point of the face corresponding to the point.

Step 4023: Generate a three-dimensional grid corresponding to the face image based on the determined three-dimensional coordinates of the key points of the face in the face image.

The above step 401 and step 402 are respectively consistent with step 201 and step 202 in the foregoing embodiment. The above description of step 201 and step 202 also applies to step 401 and step 402, and will not be repeated here.

In step 403, the center line of the three-dimensional grid corresponding to the left face image and the center line of the three-dimensional grid corresponding to the right face image are respectively passed to establish a reference plane.

In this embodiment, the above-mentioned execution subject can generate the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image by executing step 402. Furthermore, the above-mentioned execution subject may respectively pass the center line of the three-dimensional grid corresponding to the left face image and the center line of the three-dimensional grid corresponding to the right face image to establish a reference plane. Among them, the reference plane penetrates the three-dimensional grid along the center line, and divides the three-dimensional grid into two parts. Specifically, the two parts divided by the reference plane may be two symmetrical parts (in this case, the reference plane crosses the symmetry axis of the human face), or two asymmetrical parts. But it needs to be clear that the reference plane can pass through a point on the face indicated by the three-dimensional grid. Furthermore, the reference plane can divide the face indicated by the three-dimensional grid into a face located on the left side of the reference plane and a face located on the right side of the reference plane. In addition, the points on the face image passed by the reference plane established for the three-dimensional grid corresponding to the left face image and the point on the face image passed by the reference plane established for the three-dimensional grid corresponding to the right face image The points indicate the same point on the face (for example, both indicate the points corresponding to the tip of the nose).

Step 404: Extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid located on the left side of the reference plane as the left three-dimensional grid, and extract the three-dimensional grid corresponding to the right face image, which is located on the right side of the reference plane The three-dimensional grid as the right three-dimensional grid.

In this embodiment, based on the two reference planes established in step 403, the execution subject can extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid on the left side of the reference plane as the left three-dimensional grid, and Among the three-dimensional grids corresponding to the right face image, the three-dimensional grid located on the right side of the reference plane is used as the right three-dimensional grid.

It should be noted that when the three-dimensional grid on the left side of the base plane is facing the face contour in the three-dimensional grid on which the base plane is established, the three-dimensional grid on the left side of the base plane; the three-dimensional grid on the right side of the base plane When facing the face contour in a three-dimensional grid with a base plane, the three-dimensional grid on the right side of the base plane.

Step 405, splicing the extracted left and right three-dimensional grids to generate a resulting three-dimensional grid corresponding to the target face.

In this embodiment, based on the left 3D mesh and the right 3D mesh obtained in step 404, the above-mentioned execution subject may splice the left 3D mesh and the right 3D mesh to generate a resultant 3D mesh corresponding to the target face .

It should be noted that since the reference plane passes through the center line of the three-dimensional grid, and the points on the face image passed by the reference plane established for the three-dimensional grid corresponding to the left face image are the same as those on the right face image. The points on the face image passed by the reference plane established by the three-dimensional grid indicate the same point on the face, so in the three-dimensional grid corresponding to the left face image, the three-dimensional grid on the left side of the reference plane can be In the three-dimensional grid corresponding to the face image, the three-dimensional grid located on the right side of the reference plane is spliced into a complete three-dimensional grid corresponding to the face; the three-dimensional grid corresponding to the left face image is located on the right side of the reference plane. The three-dimensional grid on the side can be spliced with the three-dimensional grid corresponding to the right face image, and the three-dimensional grid on the left side of the reference plane can be spliced into a complete three-dimensional grid corresponding to the face.

In practice, the left face image can record more face features corresponding to the three-dimensional grid on the left side of the reference plane, and the right face image can record more faces corresponding to the three-dimensional grid on the right side of the reference plane. Therefore, in this embodiment, the left three-dimensional grid in the three-dimensional grid corresponding to the left face image and the right three-dimensional grid in the three-dimensional grid corresponding to the right face image are extracted, and the extracted The left 3D mesh and the right 3D mesh are spliced to generate the result 3D mesh corresponding to the target face, which can make the generated result 3D mesh more accurately characterize the facial features of the target face and improve the generated The accuracy of the result of the three-dimensional grid.

It can be seen from FIG. 4 that, compared with the embodiment corresponding to FIG. 2, the flow 400 of the method for generating information in this embodiment highlights the extraction of the left three-dimensional mesh from the three-dimensional mesh corresponding to the left face image. Grid, extracting the right three-dimensional grid from the three-dimensional grid corresponding to the right face image, and then splicing the left three-dimensional grid and the right three-dimensional grid to generate the resulting three-dimensional grid corresponding to the target face. It is understandable that due to occlusion and other reasons, the left face image can record more facial features corresponding to the three-dimensional grid on the left side of the reference plane, while the right face image can record more three-dimensional grids on the right side of the reference plane. Therefore, the solution described in this embodiment uses the left three-dimensional grid corresponding to the left face image and the right three-dimensional grid corresponding to the right face image to generate a more accurate, target face The corresponding resultant 3D mesh helps to further improve the accuracy of 3D face reconstruction.

With further reference to FIG. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a device for generating information. The device embodiment corresponds to the method embodiment shown in FIG. The device can be applied to various electronic devices.

As shown in FIG. 5, the apparatus 500 for generating information in this embodiment includes: an image acquiring unit 501, a first generating unit 502, and a second generating unit 503. Wherein, the image acquisition unit 501 is configured to acquire a left face image and a right face image obtained by photographing a target face, where the left face image and the right face image are binocular vision images; the first generating unit 502 is configured to perform the following steps for the face images in the left face image and the right face image: input the face image into a pre-trained map generation model to obtain a map corresponding to the face image, where , The point in the map corresponds to the key point of the face in the face image; for a point in the map, based on the coordinates of the point in the map and the pixel value of the point, determine the face corresponding to the point The three-dimensional coordinates of the key points; based on the determined three-dimensional coordinates of the key points of the face in the face image, generate a three-dimensional grid corresponding to the face image; the second generation unit 503 is configured to be based on the left face image The corresponding three-dimensional grid and the three-dimensional grid corresponding to the right face image are generated to generate the resulting three-dimensional grid corresponding to the target face.

In this embodiment, the image acquisition unit 501 of the apparatus 500 for generating information may acquire the left face image and the right face image obtained by photographing the target face through a wired connection or a wireless connection. Among them, the target face is the face whose corresponding three-dimensional mesh is to be generated. The left face image and the right face image are binocular vision images.

In this embodiment, for each of the left face image and the right face image obtained by the image acquisition unit 501, the first generation unit 502 may perform the following steps: input the face image into a pre-trained map Generate a model to obtain the map corresponding to the face image; for a point in the map, determine the three-dimensional coordinates of the face key point corresponding to the point based on the coordinates of the point in the map and the pixel value of the point; Based on the determined three-dimensional coordinates of the key points of the face in the face image, a three-dimensional grid corresponding to the face image is generated.

Wherein, the map is an image used to determine the three-dimensional coordinates of the key points of the face in the face image. The three-dimensional coordinates of the face key points are composed of the position coordinates of the face key points in the face image and the depth value of the face key points. The depth value of the face key point may be the distance from the face key point to the imaging plane when the face image is collected. The points in the map correspond to the key points of the face in the face image. In practice, the key points of the human face may be key points in the human face, specifically, the points that affect the contour of the face or the shape of the facial features.

In this embodiment, the map generation model can be used to characterize the correspondence between the face image and the map corresponding to the face image. It should be noted that the map generation model corresponds to a predetermined mapping relationship or mapping principle, which is used to determine the key points of the face in the face image input to the map generation model and output the map generation model The location of the map in the map.

In this embodiment, based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image obtained by the first generating unit 502, the second generating unit 503 can generate the result corresponding to the target face Three-dimensional grid. Among them, the resultant three-dimensional mesh is a three-dimensional mesh that is to be subjected to operations such as rendering to realize the three-dimensional face reconstruction of the target face.

In some optional implementations of this embodiment, the first generating unit 502 may be further configured to: determine the depth value of the key point of the face corresponding to the point based on the pixel value of the point; Determine the coordinates of the face key point corresponding to the point in the face image; based on the depth value of the face key point corresponding to the point and the coordinate of the face key point corresponding to the point in the face image To determine the three-dimensional coordinates of the key points on the face corresponding to this point.

In some optional implementation manners of this embodiment, the first generating unit 502 may be further configured to: in response to determining that the pixel value of the point is greater than or equal to a preset threshold, determine the pixel value of the point as the corresponding The depth value of the key points of the face.

In some optional implementations of this embodiment, the first generating unit 502 may be further configured to: in response to determining that the pixel value of the point is less than a preset threshold, determine the preset threshold as the face key corresponding to the point. The depth value of the point.

In some optional implementations of this embodiment, the second generating unit 503 may include: a building module (not shown in the figure), configured to respectively pass the center line and the center line of the three-dimensional grid corresponding to the left face image The centerline of the three-dimensional grid corresponding to the right face image is used to establish a reference plane, where the reference plane penetrates the three-dimensional grid along the centerline, and divides the three-dimensional grid into two parts; the extraction module (not shown in the figure) is It is configured to extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid located on the left side of the reference plane as the left three-dimensional grid, and extract the three-dimensional grid corresponding to the right face image, which is located on the right side of the reference plane The three-dimensional grid is used as the right three-dimensional grid; the splicing module (not shown in the figure) is configured to splice the extracted left and right three-dimensional grids to generate the resulting three-dimensional grid corresponding to the target face.

In some optional implementations of this embodiment, the map generation model can be obtained by training in the following steps: Obtain a training sample set, where the training sample includes the sample face image, the key points of the face in the sample face image Coordinates and depth values; for the training samples in the training sample set, based on the coordinates of the face key points in the training sample, determine the mapping position of the face key points in the training sample in the map to be constructed, and based on the The depth value of the key points of the face in the training sample is determined, the pixel value of the corresponding mapping position in the map to be constructed is determined, and the determined mapping position and the pixel value of the mapping position are used to construct the sample face in the training sample The map corresponding to the image; using the machine learning method, the sample face image of the training sample in the training sample set is used as input, and the map corresponding to the input sample face image is used as the desired output, and the map generation model is trained.

In some optional implementations of this embodiment, the training samples in the training sample set can be generated by the following steps: using a depth map acquisition device to collect the face depth map of the sample face, and obtain the person corresponding to the face depth map Face image; perform face key point detection on the face image corresponding to the face depth map to determine the coordinates of the face key points in the face image corresponding to the face depth map; change the face depth map to The face image, the determined coordinates of the key points of the face, and the depth values of the key points of the face determined based on the face depth map are summarized as training samples.

It can be understood that the units recorded in the device 500 correspond to the steps in the method described with reference to FIG. 2. Therefore, the operations, features, and beneficial effects produced by the method described above are also applicable to the device 500 and the units contained therein, and will not be repeated here.

The apparatus 500 provided by the above-mentioned embodiment of the present disclosure obtains the left face image and the right face image obtained by photographing the target face, and then executes the following for the face images in the left face image and the right face image Steps: input the face image into the pre-trained map generation model to obtain the map corresponding to the face image; for a point in the map, based on the coordinates of the point in the map and the pixel value of the point, Determine the three-dimensional coordinates of the key points of the face corresponding to the point; based on the determined three-dimensional coordinates of the key points of the face in the face image, generate the three-dimensional grid corresponding to the face image, and finally based on the left face image The corresponding three-dimensional grid and the three-dimensional grid corresponding to the right face image are generated to generate the resulting three-dimensional grid corresponding to the target face. It can be understood that due to occlusion and shooting angles, the left face image and the right face image can record facial features at different angles, so here, the three-dimensional grid corresponding to the left face image and the right face image corresponding to the The 3D mesh can generate a more accurate 3D mesh corresponding to the target face, which helps to improve the accuracy of 3D face reconstruction.

Reference is now made to FIG. 6, which shows a schematic structural diagram of an electronic device (such as the server or terminal device in FIG. 1) 600 suitable for implementing embodiments of the present disclosure. Terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals ( For example, mobile terminals such as car navigation terminals and fixed terminals such as digital TVs and desktop computers. The terminal device or server shown in FIG. 6 is only an example, and should not bring any limitation to the function and use range of the embodiments of the present disclosure.

As shown in FIG. 6, the electronic device 600 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which can be loaded into a random access device according to a program stored in a read-only memory (ROM) 602 or from a storage device 608. The program in the memory (RAM) 603 executes various appropriate actions and processing. The RAM 603 also stores various programs and data required for the operation of the electronic device 600. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to the bus 604.

Generally, the following devices can be connected to the I/O interface 605: including input devices 606 such as touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, liquid crystal display (LCD), speakers, vibration An output device 607 such as a device; a storage device 608 such as a magnetic tape and a hard disk; and a communication device 609. The communication device 609 may allow the electronic device 600 to perform wireless or wired communication with other devices to exchange data. Although FIG. 6 shows an electronic device 600 having various devices, it should be understood that it is not required to implement or have all the illustrated devices. It may alternatively be implemented or provided with more or fewer devices. Each block shown in FIG. 6 may represent one device, or may represent multiple devices as needed.

In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart can be implemented as a computer software program. For example, the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602. When the computer program is executed by the processing device 601, the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed. It should be noted that the computer-readable medium described in the embodiment of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the embodiments of the present disclosure, the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device. In the embodiments of the present disclosure, the computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium. The computer-readable signal medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device . The program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is caused to: acquire the left face image and the right face obtained by shooting the target face Images, where the left face image and the right face image are binocular vision images; for the face images in the left face image and the right face image, perform the following steps: input the face image into a pre-trained map Generate a model to obtain the map corresponding to the face image, where the points in the map correspond to the key points of the face in the face image; for the points in the map, based on the points in the map The coordinates and the pixel value of the point are used to determine the three-dimensional coordinates of the key points of the face corresponding to the point; based on the determined three-dimensional coordinates of the key points of the face in the face image, the three-dimensional network corresponding to the face image is generated Grid; Based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, generate the resulting three-dimensional grid corresponding to the target face.

The computer program code used to perform the operations of the embodiments of the present disclosure can be written in one or more programming languages or a combination thereof, the programming languages including object-oriented programming languages such as Java, Smalltalk, C++, It also includes conventional procedural programming languages-such as "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).

The flowcharts and block diagrams in the accompanying drawings illustrate the possible implementation architecture, functions, and operations of the system, method, and computer program product according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram can represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logic function Executable instructions. It should also be noted that, in some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. The described unit may also be provided in the processor, for example, it may be described as: a processor includes an image acquisition unit, a first generation unit, and a second generation unit. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances. For example, the image acquisition unit can also be described as "a unit for acquiring a face image."

The above description is only a preferred embodiment of the present disclosure and an explanation of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in the embodiments of the present disclosure is not limited to the technical solution formed by the specific combination of the above technical features, and should also cover the above-mentioned inventive concept without departing from the above-mentioned inventive concept. Other technical solutions formed by any combination of technical features or equivalent features. For example, the above-mentioned features and the technical features disclosed in the embodiments of the present disclosure (but not limited to) having similar functions are replaced with each other to form a technical solution.

Claims

A method for generating information including:

Acquire a left face image and a right face image obtained by shooting a target face, where the left face image and the right face image are binocular vision images;

For the face images in the left face image and the right face image, perform the following steps: input the face image into the pre-trained map generation model to obtain the map corresponding to the face image, where the map The point corresponding to the face key point in the face image; for a point in the map, based on the point's coordinates in the map and the pixel value of the point, determine the three-dimensional face key point corresponding to the point Coordinates; based on the determined three-dimensional coordinates of the key points of the face in the face image, generate the three-dimensional grid corresponding to the face image;

Based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, the resulting three-dimensional grid corresponding to the target face is generated.
The method according to claim 1, wherein the determining the three-dimensional coordinates of the key points of the face corresponding to the point based on the coordinates of the point in the map and the pixel value of the point comprises:

Based on the pixel value of the point, determine the depth value of the key point of the face corresponding to the point;

Determine the coordinates of the key point of the face corresponding to the point in the face image based on the coordinates of the point in the map;

Based on the depth value of the face key point corresponding to the point and the coordinate of the face key point corresponding to the point in the face image, the three-dimensional coordinates of the face key point corresponding to the point are determined.
The method according to claim 2, wherein the determining the depth value of the key point of the face corresponding to the point based on the pixel value of the point comprises:

In response to determining that the pixel value of the point is greater than or equal to the preset threshold, the pixel value of the point is determined as the depth value of the key point of the face corresponding to the point.
The method according to claim 3, wherein the determining the depth value of the key point of the face corresponding to the point based on the pixel value of the point further comprises:

In response to determining that the pixel value of the point is less than the preset threshold, the preset threshold is determined as the depth value of the key point of the face corresponding to the point.
The method according to claim 1, wherein the generating a resultant three-dimensional grid corresponding to the target face based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, include:

The center line of the three-dimensional grid corresponding to the left face image and the center line of the three-dimensional grid corresponding to the right face image are respectively passed to establish a reference plane, wherein the reference plane penetrates the three-dimensional grid along the center line, and the three-dimensional grid Divided into two parts;

Extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid located on the left side of the reference plane as the left three-dimensional grid, and extract the three-dimensional grid corresponding to the right face image, the three-dimensional grid located on the right side of the reference plane Grid as the right three-dimensional grid;

The extracted left three-dimensional grid and the right three-dimensional grid are spliced to generate a resulting three-dimensional grid corresponding to the target face.
The method according to any one of claims 1 to 5, wherein the map generation model is obtained by training in the following steps:

Obtain a training sample set, where the training sample includes the sample face image, the coordinates and depth values of key points of the face in the sample face image;

For the training samples in the training sample set, based on the coordinates of the face key points in the training sample, determine the mapping position of the face key points in the training sample in the map to be constructed, and based on the people in the training sample The depth value of the key point of the face, determine the pixel value of the corresponding mapping position in the mapping image to be constructed, and use the determined mapping position and the pixel value of the mapping position to construct a mapping image corresponding to the sample face image in the training sample ；

Using the machine learning method, the sample face image of the training sample in the training sample set is taken as input, and the map corresponding to the input sample face image is taken as the desired output, and the map generation model is trained.
The method according to claim 6, wherein the training samples in the training sample set are generated by the following steps:

Using a depth map acquisition device to collect a face depth map of the sample face, and obtain a face image corresponding to the face depth map;

Performing face key point detection on the face image corresponding to the face depth map to determine the coordinates of the face key points in the face image corresponding to the face depth map;

The face image corresponding to the face depth map, the determined coordinates of the face key points, and the depth values of the face key points determined based on the face depth map are summarized as training samples.
A device for generating information, including:

The image acquisition unit is configured to acquire the left face image and the right face image obtained by photographing the target face, wherein the left face image and the right face image are binocular vision images;

The first generating unit is configured to perform the following steps for the face images in the left face image and the right face image: input the face image into a pre-trained map generation model to obtain the corresponding face image A map, where the points in the map correspond to the key points of the face in the face image; for the points in the map, the point is determined based on the coordinates of the point in the map and the pixel value of the point The corresponding three-dimensional coordinates of the key points of the face; based on the determined three-dimensional coordinates of the key points of the face in the face image, a three-dimensional grid corresponding to the face image is generated;

The second generating unit is configured to generate the resulting three-dimensional grid corresponding to the target face based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image.
The apparatus according to claim 8, wherein the first generating unit is further configured to:

Based on the pixel value of the point, determine the depth value of the key point of the face corresponding to the point;

Determine the coordinates of the key point of the face corresponding to the point in the face image based on the coordinates of the point in the map;

Based on the depth value of the face key point corresponding to the point and the coordinate of the face key point corresponding to the point in the face image, the three-dimensional coordinates of the face key point corresponding to the point are determined.
The device according to claim 9, wherein the first generating unit is further configured to:

In response to determining that the pixel value of the point is greater than or equal to the preset threshold, the pixel value of the point is determined as the depth value of the key point of the face corresponding to the point.
The device according to claim 10, wherein the first generating unit is further configured to:

In response to determining that the pixel value of the point is less than the preset threshold, the preset threshold is determined as the depth value of the key point of the face corresponding to the point.
The device according to claim 8, wherein the second generating unit comprises:

The establishment module is configured to respectively pass the center line of the three-dimensional grid corresponding to the left face image and the center line of the three-dimensional grid corresponding to the right face image to establish a reference plane, wherein the reference plane penetrates the three-dimensional network along the center line Grid, divide the three-dimensional grid into two parts;

The extraction module is configured to extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid located on the left side of the reference plane as the left three-dimensional grid, and extract the three-dimensional grid corresponding to the right face image, which is located in the reference The three-dimensional grid on the right side of the surface is used as the right three-dimensional grid;

The splicing module is configured to splice the extracted left and right three-dimensional grids to generate a resulting three-dimensional grid corresponding to the target face.
The apparatus according to any one of claims 8-12, wherein the map generation model is obtained by training in the following steps:

Obtain a training sample set, where the training sample includes the sample face image, the coordinates and depth values of key points of the face in the sample face image;

For the training samples in the training sample set, based on the coordinates of the face key points in the training sample, determine the mapping position of the face key points in the training sample in the map to be constructed, and based on the people in the training sample The depth value of the key point of the face, determine the pixel value of the corresponding mapping position in the mapping image to be constructed, and use the determined mapping position and the pixel value of the mapping position to construct a mapping image corresponding to the sample face image in the training sample ；

Using the machine learning method, the sample face image of the training sample in the training sample set is used as input, and the map corresponding to the input sample face image is used as the desired output, and the map generation model is trained.
The device according to claim 13, wherein the training samples in the training sample set are generated by the following steps:

Using a depth map acquisition device to collect a face depth map of the sample face, and obtain a face image corresponding to the face depth map;

Performing face key point detection on the face image corresponding to the face depth map to determine the coordinates of the face key points in the face image corresponding to the face depth map;

The face image corresponding to the face depth map, the determined coordinates of the face key points, and the depth values of the face key points determined based on the face depth map are summarized as training samples.
An electronic device including:

One or more processors;

A storage device on which one or more programs are stored,

When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1-7.
A computer-readable medium having a computer program stored thereon, wherein the program is executed by a processor to implement the method according to any one of claims 1-7.