WO2020155908A1 - Method and apparatus for generating information - Google Patents

Method and apparatus for generating information Download PDF

Info

Publication number
WO2020155908A1
WO2020155908A1 PCT/CN2019/126382 CN2019126382W WO2020155908A1 WO 2020155908 A1 WO2020155908 A1 WO 2020155908A1 CN 2019126382 W CN2019126382 W CN 2019126382W WO 2020155908 A1 WO2020155908 A1 WO 2020155908A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
face image
point
map
dimensional grid
Prior art date
Application number
PCT/CN2019/126382
Other languages
French (fr)
Chinese (zh)
Inventor
郭冠军
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020155908A1 publication Critical patent/WO2020155908A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery

Definitions

  • the embodiments of the present disclosure relate to the field of computer technology, and in particular to methods and devices for generating information.
  • Three-dimensional face reconstruction is a process of directly returning the three-dimensional mesh information (3D mesh) of the face through the pixel information of a given two-dimensional face image.
  • a two-dimensional face image used for three-dimensional face reconstruction uses an electronic device (such as a mobile phone, a camera, etc.) including a camera to obtain an image by taking a face from a certain angle.
  • the embodiments of the present disclosure propose methods and apparatuses for generating information.
  • an embodiment of the present disclosure provides a method for generating information, the method including: acquiring a left face image and a right face image obtained by photographing a target face, where the left face image
  • the and right face images are binocular vision images; for the face images in the left face image and the right face image, perform the following steps: input the face image into the pre-trained mapping generation model to obtain the face image
  • the corresponding map where the points in the map correspond to the key points of the face in the face image; for the points in the map, based on the coordinates of the point in the map and the pixel value of the point, Determine the three-dimensional coordinates of the face key points corresponding to this point; generate the three-dimensional grid corresponding to the face image based on the determined three-dimensional coordinates of the face key points in the face image; based on the left face image
  • the corresponding three-dimensional grid and the three-dimensional grid corresponding to the right face image are generated to generate the resulting three-dimensional grid corresponding to the target face.
  • determining the three-dimensional coordinates of the key point of the face corresponding to the point includes: determining the person corresponding to the point based on the pixel value of the point The depth value of the face key point; the coordinates of the face key point corresponding to the point in the face image are determined based on the coordinates of the point in the map; the depth value of the face key point corresponding to the point corresponds to the point The coordinates of the key points of the face in the face image are determined to determine the three-dimensional coordinates of the key points of the face corresponding to the point.
  • determining the depth value of the face key point corresponding to the point includes: in response to determining that the pixel value of the point is greater than or equal to a preset threshold, determining the pixel value of the point as The depth value of the key point of the face corresponding to this point.
  • determining the depth value of the key point of the face corresponding to the point based on the pixel value of the point further includes: in response to determining that the pixel value of the point is less than a preset threshold, determining the preset threshold as the point The depth value of the corresponding key point of the face.
  • generating the resulting three-dimensional grid corresponding to the target face includes: The center line of the corresponding three-dimensional grid and the center line of the three-dimensional grid corresponding to the right face image are established to establish a reference plane, where the reference plane penetrates the three-dimensional grid along the center line and divides the three-dimensional grid into two parts; extracting the left Among the three-dimensional grids corresponding to the face image, the three-dimensional grid on the left side of the reference plane is used as the left three-dimensional grid, and the three-dimensional grid corresponding to the right face image is extracted, and the three-dimensional grid on the right side of the reference plane is used as Right three-dimensional grid; splicing the extracted left and right three-dimensional grids to generate the resulting three-dimensional grid corresponding to the target face.
  • the map generation model is obtained by training in the following steps: Obtain a training sample set, where the training sample includes a sample face image, the coordinates and depth values of key points of the face in the sample face image; for training samples Concentrated training samples, based on the coordinates of the face key points in the training sample, determine the mapping position of the face key points in the training sample in the map to be constructed, and based on the face key points in the training sample Determine the pixel value of the corresponding mapping position in the mapping image to be constructed, and use the determined mapping position and the pixel value of the mapping position to construct a mapping image corresponding to the sample face image in the training sample; use a machine
  • the sample face image of the training sample in the training sample set is taken as the input
  • the map corresponding to the input sample face image is taken as the desired output
  • the map generation model is obtained by training.
  • the training samples in the training sample set are generated by the following steps: using the depth map acquisition device to collect the face depth map of the sample face, and obtain the face image corresponding to the face depth map; Perform face key point detection on the corresponding face image to determine the coordinates of the face key points in the face image corresponding to the face depth map; compare the face image corresponding to the face depth map to the determined person
  • the coordinates of the face key points and the depth values of the face key points determined based on the face depth map are summarized as training samples.
  • an embodiment of the present disclosure provides an apparatus for generating information.
  • the apparatus includes: an image acquiring unit configured to acquire a left face image and a right face image obtained by photographing a target face , Where the left face image and the right face image are binocular vision images; the first generating unit is configured to perform the following steps for the face images in the left face image and the right face image:
  • the image is input into the pre-trained map generation model to obtain the map corresponding to the face image, where the points in the map correspond to the key points of the face in the face image; for the points in the map, based on The coordinates of the point in the map and the pixel value of the point determine the three-dimensional coordinates of the key points of the face corresponding to the point; based on the determined three-dimensional coordinates of the key points of the face in the face image, the person is generated The three-dimensional mesh corresponding to the face image; the second generating unit is configured to generate the resultant three-dimensional mesh corresponding to the target face based on the three-dimensional mesh
  • the first generating unit is further configured to: determine the depth value of the key point of the face corresponding to the point based on the pixel value of the point; determine the person corresponding to the point based on the coordinates of the point in the map.
  • the coordinates of the face key point in the face image; based on the depth value of the face key point corresponding to the point and the coordinates of the face key point corresponding to the point in the face image, determine the face key corresponding to the point The three-dimensional coordinates of the point.
  • the first generating unit is further configured to: in response to determining that the pixel value of the point is greater than or equal to a preset threshold, determine the pixel value of the point as the depth value of the key point of the face corresponding to the point.
  • the first generating unit is further configured to: in response to determining that the pixel value of the point is less than the preset threshold, determine the preset threshold as the depth value of the key point of the face corresponding to the point.
  • the second generating unit includes: a establishing module configured to respectively pass the center line of the three-dimensional grid corresponding to the left face image and the center line of the three-dimensional grid corresponding to the right face image to establish a reference Surface, where the reference surface runs through the three-dimensional grid along the centerline, dividing the three-dimensional grid into two parts; the extraction module is configured to extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid located on the left side of the reference plane The grid is used as the left three-dimensional grid, and the three-dimensional grid corresponding to the right face image is extracted, the three-dimensional grid on the right side of the reference plane is used as the right three-dimensional grid; the splicing module is configured to pair the extracted left three-dimensional grid Join with the right 3D grid to generate the result 3D grid corresponding to the target face.
  • a establishing module configured to respectively pass the center line of the three-dimensional grid corresponding to the left face image and the center line of the three-dimensional grid corresponding to the right face image to establish a
  • the map generation model is obtained by training in the following steps: Obtain a training sample set, where the training sample includes a sample face image, the coordinates and depth values of key points of the face in the sample face image; for training samples Concentrated training samples, based on the coordinates of the face key points in the training sample, determine the mapping position of the face key points in the training sample in the map to be constructed, and based on the face key points in the training sample Determine the pixel value of the corresponding mapping position in the mapping image to be constructed, and use the determined mapping position and the pixel value of the mapping position to construct a mapping image corresponding to the sample face image in the training sample; use a machine
  • the sample face image of the training sample in the training sample set is taken as the input
  • the map corresponding to the input sample face image is taken as the desired output
  • the map generation model is obtained by training.
  • the training samples in the training sample set are generated by the following steps: using the depth map acquisition device to collect the face depth map of the sample face, and obtain the face image corresponding to the face depth map; Perform face key point detection on the corresponding face image to determine the coordinates of the face key points in the face image corresponding to the face depth map; compare the face image corresponding to the face depth map to the determined person
  • the coordinates of the face key points and the depth values of the face key points determined based on the face depth map are summarized as training samples.
  • an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, when one or more programs are processed by one or more The processor executes, so that one or more processors implement the method of any one of the foregoing methods for generating information.
  • the embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the method of any one of the above methods for generating information is implemented.
  • the method and device for generating information provided by the embodiments of the present disclosure obtain the left face image and the right face image obtained by shooting the target face, wherein the left face image and the right face image are double Then, for the face images in the left face image and the right face image, perform the following steps: input the face image into the pre-trained map generation model to obtain the map corresponding to the face image, Among them, the points in the map correspond to the key points of the face in the face image; for the points in the map, based on the coordinates of the point in the map and the pixel value of the point, determine the person corresponding to the point The three-dimensional coordinates of the key points of the face; based on the determined three-dimensional coordinates of the key points of the face in the face image, a three-dimensional grid corresponding to the face image is generated, and finally based on the three-dimensional grid corresponding to the left face image The three-dimensional grid corresponding to the right face image is used to generate the resulting three-dimensional grid corresponding to the target face.
  • the left face image and the right face image can record facial features at different angles, so here, the three-dimensional grid corresponding to the left face image and the right face image corresponding to the The 3D mesh can generate a more accurate 3D mesh corresponding to the target face, which helps to improve the accuracy of 3D face reconstruction.
  • Fig. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure can be applied;
  • Fig. 2 is a flowchart of an embodiment of a method for generating information according to the present disclosure
  • Fig. 3 is a schematic diagram of an application scene of the method for generating information according to an embodiment of the present disclosure
  • FIG. 4 is a flowchart of another embodiment of a method for generating information according to the present disclosure.
  • Fig. 5 is a schematic structural diagram of an embodiment of an apparatus for generating information according to the present disclosure
  • Fig. 6 is a schematic structural diagram of a computer system suitable for implementing an electronic device of an embodiment of the present disclosure.
  • FIG. 1 shows an exemplary system architecture 100 to which an embodiment of the method for generating information or the apparatus for generating information of the present disclosure can be applied.
  • the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages, and so on.
  • Various communication client applications such as image processing software, video playback software, web browser applications, search applications, instant messaging tools, and social platform software, may be installed on the terminal devices 101, 102, and 103.
  • the terminal devices 101, 102, and 103 may be hardware or software.
  • the terminal devices 101, 102, 103 can be various electronic devices, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, moving picture expert compression Standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compression standard audio layer 4) player, laptop portable computer and desktop computer, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, moving picture expert compression Standard audio layer 3
  • MP4 Motion Picture Experts Group Audio Layer IV, moving picture expert compression standard audio layer 4
  • laptop portable computer and desktop computer etc.
  • the terminal devices 101, 102, 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, multiple software or software modules used to provide distributed services), or as a single software or software module. No specific restrictions are made here.
  • the server 105 may be a server that provides various services, for example, an image processing server that processes the left face image and the right face image obtained by shooting the target face of the terminal device 101, 102, 103.
  • the image processing server can analyze and process the received data such as the left face image and the right face image, and obtain the processing result (for example, the resultant three-dimensional grid corresponding to the target face).
  • the method for generating information provided by the embodiments of the present disclosure can be executed by the server 105 or by the terminal devices 101, 102, 103; accordingly, the device for generating information can be set on the server 105 can also be set in the terminal devices 101, 102, 103.
  • the server can be hardware or software.
  • the server can be implemented as a distributed server cluster composed of multiple servers, or as a single server.
  • the server is software, it can be implemented as multiple software or software modules (for example, multiple software or software modules for providing distributed services), or as a single software or software module. No specific restrictions are made here.
  • terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks and servers.
  • the above system architecture may not include a network, but only include terminal devices or servers.
  • the method for generating information includes the following steps:
  • Step 201 Obtain a left face image and a right face image obtained by shooting a target face.
  • the execution subject of the method for generating information may obtain the left face image and the right person obtained by shooting the target face through a wired connection or a wireless connection. Face image.
  • the target face is the face whose corresponding three-dimensional mesh is to be generated.
  • operations such as rendering the three-dimensional mesh can be performed to realize the reconstruction of the three-dimensional face.
  • the left face image and the right face image are binocular vision images.
  • the above-mentioned execution subject may obtain the left face image and the right face image pre-stored locally, or may obtain the left face image and the right face image sent by a communication connected electronic device (such as the terminal device shown in FIG. 1). Face image. It should be noted that both the left face image and the right face image are two-dimensional face images.
  • various devices including binocular cameras (such as binocular cameras) can be used to photograph the target face to obtain the left and right face images corresponding to the target face.
  • the binocular camera is usually two cameras arranged in a horizontal direction.
  • the camera arranged on the left can be determined as the left camera, and the captured image is the left image (corresponding to the left face image); correspondingly, the camera arranged on the right is determined as The right camera, the image taken by it is the right image (corresponding to the right face image).
  • Step 202 For the face images in the left face image and the right face image, perform the following steps: input the face image into a pre-trained map generation model to obtain a map corresponding to the face image; For the points in the figure, based on the coordinates of the point in the map and the pixel value of the point, determine the three-dimensional coordinates of the key point of the face corresponding to the point; based on the determined key point of the face in the face image Three-dimensional coordinates to generate a three-dimensional grid corresponding to the face image.
  • the above-mentioned execution subject may perform the following steps:
  • Step 2021 Input the face image into a pre-trained map generation model to obtain a map corresponding to the face image.
  • the map is an image used to determine the three-dimensional coordinates of the key points of the face in the face image.
  • the three-dimensional coordinates of the face key points are composed of the position coordinates of the face key points in the face image and the depth value of the face key points.
  • the depth value of the face key point can be the distance from the face key point to the imaging plane when the face image is collected.
  • the points in the map correspond to the key points of the face in the face image.
  • the key points of the human face may be key points in the human face, specifically, the points that affect the contour of the face or the shape of the facial features.
  • the map generation model can be used to characterize the correspondence between the face image and the map corresponding to the face image.
  • the map generation model may be pre-made by technicians based on statistics of a large number of face images and maps corresponding to face images, and stores multiple face images and corresponding maps.
  • Correspondence table it can also be a model obtained after training an initial model (such as a neural network) using a machine learning method based on preset training samples.
  • the map generation model corresponds to a predetermined mapping relationship or mapping principle, which is used to determine the key points of the face in the face image input to the map generation model and output the map generation model The location of the map in the map.
  • the map generation model may be obtained by the above-mentioned execution subject or other electronic equipment through training in the following steps:
  • Step 20211 Obtain a training sample set.
  • the training sample includes the sample face image, the coordinates and depth values of the key points of the face in the sample face image.
  • the sample face image is a two-dimensional face image.
  • various methods can be used to obtain the training sample set.
  • the training samples in the training sample set can be generated through the following steps: First, the depth map collection device can be used to collect the face depth map of the sample face, and obtain the face depth map location. The corresponding face image. Then, the face key point detection is performed on the face image corresponding to the face depth map to determine the coordinates of the face key points in the face image corresponding to the face depth map. Finally, the face image corresponding to the face depth map, the coordinates of the determined face key points, and the depth values of the face key points determined based on the face depth map are summarized as training samples.
  • the depth map acquisition device may be various image acquisition devices that can acquire a depth map.
  • the face depth map is an image containing depth information (that is, the distance information between the viewpoint and the surface of the scene object).
  • the face image corresponding to the face depth map is an RGB (Red Green Blue) three-channel color image without depth information corresponding to the face depth map. Furthermore, by removing the depth information of the face depth map, a face image corresponding to the face depth map (ie, a sample face image) can be obtained.
  • various face key point detection methods can be used to perform face key point detection on the face image corresponding to the face depth map.
  • the face image can be input to a pre-trained face key point detection model to obtain the detection result.
  • the face key point detection model can be used to detect the position of the face key point in the face image.
  • the face key point detection model can be obtained by supervised training of the existing convolutional neural network based on the sample set (including the face image and the label used to indicate the position of the face key point) using machine learning methods of.
  • the convolutional neural network can use various existing structures, such as DenseBox, VGGNet, ResNet, SegNet, etc.
  • Step 20112 For the training samples in the training sample set, based on the coordinates of the face key points in the training sample, determine the mapping position of the face key points in the training sample in the map to be constructed, and based on the training sample Determine the depth value of the face key point in the map to determine the pixel value of the corresponding mapping position in the map to be constructed, and use the determined mapping position and the pixel value of the mapping position to construct the sample face image corresponding to the training sample Map.
  • the above-mentioned execution subject or other electronic device may first determine that the face key points in the training sample are in the map to be constructed based on the coordinates of the face key points in the training sample The mapped location.
  • the coordinates of the mapping position of the face key point in the map to be constructed can be determined based on a pre-established mapping relationship or based on an existing mapping principle.
  • the principle of UV (U-VEEZ) mapping can be used to determine the coordinates of the mapping position of the key point of the face in the map to be constructed.
  • UV is a two-dimensional texture coordinate. UV is used to define a two-dimensional texture coordinate system, called "UV texture space".
  • the UV texture space uses the letters U and V to indicate axes in two-dimensional space.
  • UV mapping can convert texture information into plane information.
  • the mapped UV coordinates can be used to indicate the mapping position in the map to be constructed.
  • the mapped UV coordinates can be used as the coordinates of the mapping position in the map to be constructed.
  • the pixel value of the corresponding mapping position in the map to be constructed can be determined based on the depth value of the face key point .
  • the corresponding relationship between the pixel value and the depth value may be determined in advance, and further, the pixel value of the corresponding mapping position in the map to be constructed is determined based on the foregoing corresponding relationship and the depth value of the key point of the face.
  • the predetermined correspondence between the pixel value and the depth value is "the pixel value is equal to the depth value".
  • the coordinates of the face key point in the sample face image are (100, 50), and the mapping position corresponding to the face key point is the coordinates (50, 25).
  • the face key point The depth value is 30.
  • the pixel value at coordinates (50, 25) in the map is 30.
  • the size of the map may be preset to be the same as the size of the sample face image; or, the size of the map may be preset to be one half of the size of the sample face image.
  • Step 20113 Using a machine learning method, take the sample face images of the training samples in the training sample set as input, and use the input map corresponding to the sample face image as the desired output, and train a map generation model.
  • the above-mentioned executive body or other electronic equipment can use machine learning methods to use the sample face images included in the training samples in the above-mentioned training sample set as the input of the initial model, and use the input map corresponding to the sample face image as The expected output of the initial model is trained on the initial model, and finally the map generation model is obtained through training.
  • convolutional neural network (Convolutional Neural Network, CNN) is a feed-forward neural network. Its artificial neurons can respond to a part of the surrounding units in the coverage area and have excellent performance in image processing. Therefore, convolution can be used
  • the neural network processes the sample face images in the training samples.
  • the above-mentioned executive body or other electronic devices can also use other models with image processing functions as the initial model, and it is not limited to CNN.
  • the specific model structure can be set according to actual needs, and is not limited here. It needs to be pointed out that the machine learning method is a well-known technology that is currently widely researched and applied, and will not be repeated here.
  • Step 2022 For a point in the map, based on the coordinates of the point in the map and the pixel value of the point, determine the three-dimensional coordinates of the key point of the face corresponding to the point.
  • the above-mentioned execution subject can use various methods to determine the three-dimensional coordinates of the key point of the face corresponding to the point in the map based on the coordinates of the point in the map and the pixel value of the point.
  • the above-mentioned executive body can determine the three-dimensional coordinates of the key point of the face corresponding to the point by the following steps: First, the above-mentioned executive body can be based on the point The pixel value determines the depth value of the key point of the face corresponding to the point. Then, the above-mentioned execution subject may determine the coordinates of the face key point corresponding to the point in the face image based on the coordinates of the point in the map. Finally, the execution subject may determine the three-dimensional coordinates of the face key point corresponding to the point based on the depth value of the face key point corresponding to the point and the coordinates of the face key point corresponding to the point in the face image.
  • the above-mentioned execution subject may determine the coordinates of the face key point corresponding to the point in the face image based on the mapping relationship or the mapping principle corresponding to the map generation model, and the coordinates of the point in the map. It is understandable that when training the map generation model, due to the use of a predetermined mapping relationship or mapping principle, the mapping position of the face key point in the map to be constructed can be determined based on the coordinates of the face key point (refer to Step 20212). Therefore, here, for a certain point in the map, a reverse process can be used to determine the coordinates of the key point of the face corresponding to the point in the face image.
  • the above-mentioned execution subject may use various methods to determine the depth value of the face key point corresponding to the point based on the pixel value of the point.
  • the pixel value of the point can be directly determined as the depth value of the key point of the face corresponding to the point.
  • the above-mentioned execution subject may, in response to determining that the pixel value of the point is greater than or equal to a preset threshold, determine the pixel value of the point as the depth value of the face key point corresponding to the point .
  • the preset threshold may be a predetermined value, such as "1".
  • the points with very low pixel values are usually the points where the prediction is incorrect. Therefore, in this implementation, the preset threshold can be set to remove the points with the prediction error, which helps to determine more accurate three-dimensional face key points. coordinate.
  • the above-mentioned execution subject may further determine that the preset threshold is the depth value of the key point of the face corresponding to the point in response to determining that the pixel value of the point is less than the preset threshold.
  • the above-mentioned execution subject can directly use the coordinates of the face key points (which can be expressed as (x, y))
  • the depth value of the face key point (which can be expressed as z) constitutes the three-dimensional coordinates of the face key point (which can be expressed as (x, y, z)).
  • Step 2023 Generate a three-dimensional grid corresponding to the face image based on the determined three-dimensional coordinates of the key points of the face in the face image.
  • the three-dimensional grid corresponding to the face image is a three-dimensional grid with face key points as vertices. Therefore, here, based on the determined three-dimensional coordinates of each face key point in the face image, The above-mentioned execution subject can generate a three-dimensional grid corresponding to the face image.
  • Step 203 Based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, generate a resulting three-dimensional grid corresponding to the target face.
  • the above-mentioned execution subject can generate a three-dimensional grid corresponding to the left face image and a three-dimensional grid corresponding to the right face image by performing step 202. Furthermore, based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, the execution subject can generate the resultant three-dimensional grid corresponding to the target face.
  • the resultant three-dimensional mesh is a three-dimensional mesh that is to be subjected to operations such as rendering to realize the three-dimensional face reconstruction of the target face.
  • the execution subject may use various methods to generate the resulting three-dimensional grid corresponding to the target face based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image. For example, the head posture of the three-dimensional grid corresponding to the left face image and the head posture of the three-dimensional grid corresponding to the right face image can be detected, and then the head rotation angle indicated by the corresponding head posture can be compared to The three-dimensional grid corresponding to the small face image is determined as the resulting three-dimensional grid corresponding to the target face. It can be understood that, in practice, the smaller the head rotation angle, the fewer the occluded facial features, and the more accurate the generated three-dimensional mesh.
  • a three-dimensional grid with a smaller head rotation angle is selected as the resultant three-dimensional grid corresponding to the target face. , Which helps to improve the accuracy of the resulting 3D grid.
  • the method for detecting the head posture is a well-known technology that is currently widely studied and applied, and will not be repeated here.
  • FIG. 3 is a schematic diagram of an application scenario of the method for generating information according to this embodiment.
  • the server 301 can first obtain the left face image 303 and the right face image 304 obtained by shooting the target face sent by the terminal device 302, where the left face image 303 and the right face image
  • the face image 304 is a binocular vision image.
  • the server 301 can input the left face image 303 into the pre-trained map generation model 305 to obtain the map 306 corresponding to the left face image 303, where the points in the map 306 are The face key points in the left face image 303 correspond; for a point in the map 306, based on the coordinates of the point in the map 306 and the pixel value of the point, determine the three-dimensional face key point corresponding to the point Coordinates; Based on the determined three-dimensional coordinates of the key points of the face in the left face image 303, a three-dimensional grid 307 corresponding to the left face image 303 is generated; for the right face image 304, the server 301 can convert the right face The image 304 is input to the map generation model 305 to obtain the map 308 corresponding to the right face image 304, wherein the points in the map 308 correspond to the face key points in the right face image 304; for the map 308 Based on the coordinates of the point in the map 308 and the
  • the server 301 may generate a resultant three-dimensional mesh 310 corresponding to the target face based on the three-dimensional mesh 307 corresponding to the left face image 303 and the three-dimensional mesh 309 corresponding to the right face image 304.
  • the method provided by the foregoing embodiment of the present disclosure obtains the left face image and the right face image obtained by shooting the target face, and then performs the following steps for the face image in the left face image and the right face image : Input the face image into the pre-trained map generation model to obtain the map corresponding to the face image; for a point in the map, based on the coordinates of the point in the map and the pixel value of the point, determine The three-dimensional coordinates of the face key points corresponding to this point; based on the determined three-dimensional coordinates of the face key points in the face image, the three-dimensional grid corresponding to the face image is generated, and finally based on the left face image The corresponding three-dimensional grid and the three-dimensional grid corresponding to the right face image are generated to generate the resulting three-dimensional grid corresponding to the target face.
  • the left face image and the right face image can record facial features at different angles, so here, the three-dimensional grid corresponding to the left face image and the right face image corresponding to the The 3D mesh can generate a more accurate 3D mesh corresponding to the target face, which helps to improve the accuracy of 3D face reconstruction.
  • FIG. 4 shows a flow 400 of another embodiment of a method for generating information.
  • the process 400 of the method for generating information includes the following steps:
  • Step 401 Obtain a left face image and a right face image obtained by shooting a target face.
  • the execution subject of the method for generating information may obtain the left face image and the right person obtained by shooting the target face through a wired connection or a wireless connection. Face image.
  • the target face is the face whose corresponding three-dimensional mesh is to be generated.
  • operations such as rendering the three-dimensional mesh can be performed to realize the reconstruction of the three-dimensional face.
  • the left face image and the right face image are binocular vision images.
  • Step 402 for the face images in the left face image and the right face image, perform the following steps: input the face image into a pre-trained map generation model to obtain the map corresponding to the face image; For the points in the figure, based on the coordinates of the point in the map and the pixel value of the point, determine the three-dimensional coordinates of the key point of the face corresponding to the point; based on the determined key point of the face in the face image Three-dimensional coordinates to generate a three-dimensional grid corresponding to the face image.
  • the above-mentioned execution subject may perform the following steps:
  • Step 4021 Input the face image into a pre-trained map generation model to obtain a map corresponding to the face image.
  • the map is an image used to determine the three-dimensional coordinates of the key points of the face in the face image.
  • the three-dimensional coordinates of the face key points are composed of the position coordinates of the face key points in the face image and the depth value of the face key points.
  • the depth value of the face key point may be the distance from the face key point to the imaging plane when the face image is collected.
  • the points in the map correspond to the key points of the face in the face image.
  • the key points of the human face may be key points in the human face, specifically, the points that affect the contour of the face or the shape of the facial features.
  • the map generation model can be used to characterize the correspondence between the face image and the map corresponding to the face image.
  • Step 4022 For a point in the map, based on the coordinates of the point in the map and the pixel value of the point, determine the three-dimensional coordinates of the key point of the face corresponding to the point.
  • the above-mentioned execution subject can use various methods to determine the three-dimensional coordinates of the key point of the face corresponding to the point in the map based on the coordinates of the point in the map and the pixel value of the point.
  • Step 4023 Generate a three-dimensional grid corresponding to the face image based on the determined three-dimensional coordinates of the key points of the face in the face image.
  • the three-dimensional grid corresponding to the face image is a three-dimensional grid with face key points as vertices. Therefore, here, based on the determined three-dimensional coordinates of each face key point in the face image, The above-mentioned execution subject can generate a three-dimensional grid corresponding to the face image.
  • step 401 and step 402 are respectively consistent with step 201 and step 202 in the foregoing embodiment.
  • step 201 and step 202 also applies to step 401 and step 402, and will not be repeated here.
  • step 403 the center line of the three-dimensional grid corresponding to the left face image and the center line of the three-dimensional grid corresponding to the right face image are respectively passed to establish a reference plane.
  • the above-mentioned execution subject can generate the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image by executing step 402. Furthermore, the above-mentioned execution subject may respectively pass the center line of the three-dimensional grid corresponding to the left face image and the center line of the three-dimensional grid corresponding to the right face image to establish a reference plane.
  • the reference plane penetrates the three-dimensional grid along the center line, and divides the three-dimensional grid into two parts.
  • the two parts divided by the reference plane may be two symmetrical parts (in this case, the reference plane crosses the symmetry axis of the human face), or two asymmetrical parts.
  • the reference plane can pass through a point on the face indicated by the three-dimensional grid. Furthermore, the reference plane can divide the face indicated by the three-dimensional grid into a face located on the left side of the reference plane and a face located on the right side of the reference plane.
  • the points on the face image passed by the reference plane established for the three-dimensional grid corresponding to the left face image and the point on the face image passed by the reference plane established for the three-dimensional grid corresponding to the right face image The points indicate the same point on the face (for example, both indicate the points corresponding to the tip of the nose).
  • Step 404 Extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid located on the left side of the reference plane as the left three-dimensional grid, and extract the three-dimensional grid corresponding to the right face image, which is located on the right side of the reference plane The three-dimensional grid as the right three-dimensional grid.
  • the execution subject can extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid on the left side of the reference plane as the left three-dimensional grid, and Among the three-dimensional grids corresponding to the right face image, the three-dimensional grid located on the right side of the reference plane is used as the right three-dimensional grid.
  • the three-dimensional grid on the left side of the base plane is facing the face contour in the three-dimensional grid on which the base plane is established, the three-dimensional grid on the left side of the base plane; the three-dimensional grid on the right side of the base plane When facing the face contour in a three-dimensional grid with a base plane, the three-dimensional grid on the right side of the base plane.
  • Step 405 splicing the extracted left and right three-dimensional grids to generate a resulting three-dimensional grid corresponding to the target face.
  • the above-mentioned execution subject may splice the left 3D mesh and the right 3D mesh to generate a resultant 3D mesh corresponding to the target face .
  • the reference plane passes through the center line of the three-dimensional grid, and the points on the face image passed by the reference plane established for the three-dimensional grid corresponding to the left face image are the same as those on the right face image.
  • the points on the face image passed by the reference plane established by the three-dimensional grid indicate the same point on the face, so in the three-dimensional grid corresponding to the left face image, the three-dimensional grid on the left side of the reference plane can be In the three-dimensional grid corresponding to the face image, the three-dimensional grid located on the right side of the reference plane is spliced into a complete three-dimensional grid corresponding to the face; the three-dimensional grid corresponding to the left face image is located on the right side of the reference plane.
  • the three-dimensional grid on the side can be spliced with the three-dimensional grid corresponding to the right face image, and the three-dimensional grid on the left side of the reference plane can be spliced into a complete three-dimensional grid corresponding to the face.
  • the left face image can record more face features corresponding to the three-dimensional grid on the left side of the reference plane
  • the right face image can record more faces corresponding to the three-dimensional grid on the right side of the reference plane. Therefore, in this embodiment, the left three-dimensional grid in the three-dimensional grid corresponding to the left face image and the right three-dimensional grid in the three-dimensional grid corresponding to the right face image are extracted, and the extracted The left 3D mesh and the right 3D mesh are spliced to generate the result 3D mesh corresponding to the target face, which can make the generated result 3D mesh more accurately characterize the facial features of the target face and improve the generated The accuracy of the result of the three-dimensional grid.
  • the flow 400 of the method for generating information in this embodiment highlights the extraction of the left three-dimensional mesh from the three-dimensional mesh corresponding to the left face image.
  • Grid extracting the right three-dimensional grid from the three-dimensional grid corresponding to the right face image, and then splicing the left three-dimensional grid and the right three-dimensional grid to generate the resulting three-dimensional grid corresponding to the target face.
  • the left face image can record more facial features corresponding to the three-dimensional grid on the left side of the reference plane, while the right face image can record more three-dimensional grids on the right side of the reference plane.
  • the solution described in this embodiment uses the left three-dimensional grid corresponding to the left face image and the right three-dimensional grid corresponding to the right face image to generate a more accurate, target face
  • the corresponding resultant 3D mesh helps to further improve the accuracy of 3D face reconstruction.
  • the present disclosure provides an embodiment of a device for generating information.
  • the device embodiment corresponds to the method embodiment shown in FIG.
  • the device can be applied to various electronic devices.
  • the apparatus 500 for generating information in this embodiment includes: an image acquiring unit 501, a first generating unit 502, and a second generating unit 503.
  • the image acquisition unit 501 is configured to acquire a left face image and a right face image obtained by photographing a target face, where the left face image and the right face image are binocular vision images
  • the first generating unit 502 is configured to perform the following steps for the face images in the left face image and the right face image: input the face image into a pre-trained map generation model to obtain a map corresponding to the face image, where , The point in the map corresponds to the key point of the face in the face image; for a point in the map, based on the coordinates of the point in the map and the pixel value of the point, determine the face corresponding to the point The three-dimensional coordinates of the key points; based on the determined three-dimensional coordinates of the key points of the face in the face image, generate a three-dimensional grid corresponding to the face image; the
  • the image acquisition unit 501 of the apparatus 500 for generating information may acquire the left face image and the right face image obtained by photographing the target face through a wired connection or a wireless connection.
  • the target face is the face whose corresponding three-dimensional mesh is to be generated.
  • the left face image and the right face image are binocular vision images.
  • the first generation unit 502 may perform the following steps: input the face image into a pre-trained map Generate a model to obtain the map corresponding to the face image; for a point in the map, determine the three-dimensional coordinates of the face key point corresponding to the point based on the coordinates of the point in the map and the pixel value of the point; Based on the determined three-dimensional coordinates of the key points of the face in the face image, a three-dimensional grid corresponding to the face image is generated.
  • the map is an image used to determine the three-dimensional coordinates of the key points of the face in the face image.
  • the three-dimensional coordinates of the face key points are composed of the position coordinates of the face key points in the face image and the depth value of the face key points.
  • the depth value of the face key point may be the distance from the face key point to the imaging plane when the face image is collected.
  • the points in the map correspond to the key points of the face in the face image.
  • the key points of the human face may be key points in the human face, specifically, the points that affect the contour of the face or the shape of the facial features.
  • the map generation model can be used to characterize the correspondence between the face image and the map corresponding to the face image. It should be noted that the map generation model corresponds to a predetermined mapping relationship or mapping principle, which is used to determine the key points of the face in the face image input to the map generation model and output the map generation model The location of the map in the map.
  • the second generating unit 503 can generate the result corresponding to the target face Three-dimensional grid.
  • the resultant three-dimensional mesh is a three-dimensional mesh that is to be subjected to operations such as rendering to realize the three-dimensional face reconstruction of the target face.
  • the first generating unit 502 may be further configured to: determine the depth value of the key point of the face corresponding to the point based on the pixel value of the point; Determine the coordinates of the face key point corresponding to the point in the face image; based on the depth value of the face key point corresponding to the point and the coordinate of the face key point corresponding to the point in the face image To determine the three-dimensional coordinates of the key points on the face corresponding to this point.
  • the first generating unit 502 may be further configured to: in response to determining that the pixel value of the point is greater than or equal to a preset threshold, determine the pixel value of the point as the corresponding The depth value of the key points of the face.
  • the first generating unit 502 may be further configured to: in response to determining that the pixel value of the point is less than a preset threshold, determine the preset threshold as the face key corresponding to the point. The depth value of the point.
  • the second generating unit 503 may include: a building module (not shown in the figure), configured to respectively pass the center line and the center line of the three-dimensional grid corresponding to the left face image
  • the centerline of the three-dimensional grid corresponding to the right face image is used to establish a reference plane, where the reference plane penetrates the three-dimensional grid along the centerline, and divides the three-dimensional grid into two parts
  • the extraction module (not shown in the figure) is It is configured to extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid located on the left side of the reference plane as the left three-dimensional grid, and extract the three-dimensional grid corresponding to the right face image, which is located on the right side of the reference plane
  • the three-dimensional grid is used as the right three-dimensional grid
  • the splicing module (not shown in the figure) is configured to splice the extracted left and right three-dimensional grids to generate the resulting three-dimensional grid corresponding to the target face.
  • the map generation model can be obtained by training in the following steps: Obtain a training sample set, where the training sample includes the sample face image, the key points of the face in the sample face image Coordinates and depth values; for the training samples in the training sample set, based on the coordinates of the face key points in the training sample, determine the mapping position of the face key points in the training sample in the map to be constructed, and based on the The depth value of the key points of the face in the training sample is determined, the pixel value of the corresponding mapping position in the map to be constructed is determined, and the determined mapping position and the pixel value of the mapping position are used to construct the sample face in the training sample The map corresponding to the image; using the machine learning method, the sample face image of the training sample in the training sample set is used as input, and the map corresponding to the input sample face image is used as the desired output, and the map generation model is trained.
  • the training samples in the training sample set can be generated by the following steps: using a depth map acquisition device to collect the face depth map of the sample face, and obtain the person corresponding to the face depth map Face image; perform face key point detection on the face image corresponding to the face depth map to determine the coordinates of the face key points in the face image corresponding to the face depth map; change the face depth map to The face image, the determined coordinates of the key points of the face, and the depth values of the key points of the face determined based on the face depth map are summarized as training samples.
  • the apparatus 500 obtained by the above-mentioned embodiment of the present disclosure obtains the left face image and the right face image obtained by photographing the target face, and then executes the following for the face images in the left face image and the right face image Steps: input the face image into the pre-trained map generation model to obtain the map corresponding to the face image; for a point in the map, based on the coordinates of the point in the map and the pixel value of the point, Determine the three-dimensional coordinates of the key points of the face corresponding to the point; based on the determined three-dimensional coordinates of the key points of the face in the face image, generate the three-dimensional grid corresponding to the face image, and finally based on the left face image The corresponding three-dimensional grid and the three-dimensional grid corresponding to the right face image are generated to generate the resulting three-dimensional grid corresponding to the target face.
  • the left face image and the right face image can record facial features at different angles, so here, the three-dimensional grid corresponding to the left face image and the right face image corresponding to the The 3D mesh can generate a more accurate 3D mesh corresponding to the target face, which helps to improve the accuracy of 3D face reconstruction.
  • FIG. 6 shows a schematic structural diagram of an electronic device (such as the server or terminal device in FIG. 1) 600 suitable for implementing embodiments of the present disclosure.
  • Terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals ( For example, mobile terminals such as car navigation terminals and fixed terminals such as digital TVs and desktop computers.
  • the terminal device or server shown in FIG. 6 is only an example, and should not bring any limitation to the function and use range of the embodiments of the present disclosure.
  • the electronic device 600 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which can be loaded into a random access device according to a program stored in a read-only memory (ROM) 602 or from a storage device 608.
  • the program in the memory (RAM) 603 executes various appropriate actions and processing.
  • the RAM 603 also stores various programs and data required for the operation of the electronic device 600.
  • the processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604.
  • the following devices can be connected to the I/O interface 605: including input devices 606 such as touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, liquid crystal display (LCD), speakers, vibration An output device 607 such as a device; a storage device 608 such as a magnetic tape and a hard disk; and a communication device 609.
  • the communication device 609 may allow the electronic device 600 to perform wireless or wired communication with other devices to exchange data.
  • FIG. 6 shows an electronic device 600 having various devices, it should be understood that it is not required to implement or have all the illustrated devices. It may alternatively be implemented or provided with more or fewer devices. Each block shown in FIG. 6 may represent one device, or may represent multiple devices as needed.
  • the process described above with reference to the flowchart can be implemented as a computer software program.
  • the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602.
  • the processing device 601 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the computer-readable medium described in the embodiment of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable signal medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is caused to: acquire the left face image and the right face obtained by shooting the target face Images, where the left face image and the right face image are binocular vision images; for the face images in the left face image and the right face image, perform the following steps: input the face image into a pre-trained map Generate a model to obtain the map corresponding to the face image, where the points in the map correspond to the key points of the face in the face image; for the points in the map, based on the points in the map The coordinates and the pixel value of the point are used to determine the three-dimensional coordinates of the key points of the face corresponding to the point; based on the determined three-dimensional coordinates of the key points of the face in the
  • the computer program code used to perform the operations of the embodiments of the present disclosure can be written in one or more programming languages or a combination thereof, the programming languages including object-oriented programming languages such as Java, Smalltalk, C++, It also includes conventional procedural programming languages-such as "C" language or similar programming languages.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagram can represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logic function Executable instructions.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner.
  • the described unit may also be provided in the processor, for example, it may be described as: a processor includes an image acquisition unit, a first generation unit, and a second generation unit. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • the image acquisition unit can also be described as "a unit for acquiring a face image.”

Abstract

Disclosed are a method and apparatus for generating information. The method comprises: acquiring a left face image and a right face image obtained by photographing a target face (201); performing the following steps on face images in the left face image and the right face image: inputting the face images to a pre-trained map generation model to obtain maps corresponding to the face images; for points in the maps, determining three-dimensional coordinates of face key points corresponding to the points on the basis of the coordinates of the points in the maps and the pixel values of the points; and generating three-dimensional meshes corresponding to the face images on the basis of the determined three-dimensional coordinates of the face key points in the face images (202); and generating a result three-dimensional mesh corresponding to the target face on the basis of the three-dimensional mesh corresponding to the left face image and the three-dimensional mesh corresponding to the right face image (203). The present invention facilitates improving the accuracy of three-dimensional face reconstruction.

Description

用于生成信息的方法和装置Method and device for generating information
相关申请的交叉引用Cross references to related applications
本申请基于申请号为201910100632.8、申请日为2019年01月31日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is filed based on a Chinese patent application with an application number of 201910100632.8 and an application date of January 31, 2019, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference into this application.
技术领域Technical field
本公开的实施例涉及计算机技术领域,尤其涉及用于生成信息的方法和装置。The embodiments of the present disclosure relate to the field of computer technology, and in particular to methods and devices for generating information.
背景技术Background technique
随着手机视频应用的普及,各种人脸特效功能也得到了广泛的应用。三维人脸重建作为一种有效的人脸表述的技术,有广泛的应用前景。With the popularity of mobile phone video applications, various face effects have also been widely used. As an effective face representation technology, three-dimensional face reconstruction has broad application prospects.
三维人脸重建,是通过给定的二维人脸图像的像素信息来直接回归人脸的三维网格信息(3D mesh)的过程。通常,用于三维人脸重建的二维人脸图像为利用包括摄像头的电子设备(例如手机、照相机等),从某个角度拍摄人脸获得图像。Three-dimensional face reconstruction is a process of directly returning the three-dimensional mesh information (3D mesh) of the face through the pixel information of a given two-dimensional face image. Generally, a two-dimensional face image used for three-dimensional face reconstruction uses an electronic device (such as a mobile phone, a camera, etc.) including a camera to obtain an image by taking a face from a certain angle.
发明内容Summary of the invention
本公开的实施例提出了用于生成信息的方法和装置。The embodiments of the present disclosure propose methods and apparatuses for generating information.
第一方面,本公开的实施例提供了一种用于生成信息的方法,该方法包括:获取对目标人脸进行拍摄所获得的左人脸图像和右人脸图像,其中,左人脸图像和右人脸图像为双目视觉图像;对于左人脸图像和右人脸图像中的人脸图像,执行以下步骤:将该人脸图像输入预先训练的映射图生成模型,获得该人脸图像所对应的映射图,其中,映射图中的点与该人脸图像中的人脸关键点相对应;对于映射图中的 点,基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标;基于所确定的、该人脸图像中的人脸关键点的三维坐标,生成该人脸图像所对应的三维网格;基于左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,生成目标人脸所对应的结果三维网格。In a first aspect, an embodiment of the present disclosure provides a method for generating information, the method including: acquiring a left face image and a right face image obtained by photographing a target face, where the left face image The and right face images are binocular vision images; for the face images in the left face image and the right face image, perform the following steps: input the face image into the pre-trained mapping generation model to obtain the face image The corresponding map, where the points in the map correspond to the key points of the face in the face image; for the points in the map, based on the coordinates of the point in the map and the pixel value of the point, Determine the three-dimensional coordinates of the face key points corresponding to this point; generate the three-dimensional grid corresponding to the face image based on the determined three-dimensional coordinates of the face key points in the face image; based on the left face image The corresponding three-dimensional grid and the three-dimensional grid corresponding to the right face image are generated to generate the resulting three-dimensional grid corresponding to the target face.
在一些实施例中,基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标,包括:基于该点的像素值,确定该点对应的人脸关键点的深度值;基于该点在映射图中的坐标确定该点对应的人脸关键点在该人脸图像中的坐标;基于该点对应的人脸关键点的深度值和该点对应的人脸关键点在该人脸图像中的坐标,确定该点对应的人脸关键点的三维坐标。In some embodiments, based on the coordinates of the point in the map and the pixel value of the point, determining the three-dimensional coordinates of the key point of the face corresponding to the point includes: determining the person corresponding to the point based on the pixel value of the point The depth value of the face key point; the coordinates of the face key point corresponding to the point in the face image are determined based on the coordinates of the point in the map; the depth value of the face key point corresponding to the point corresponds to the point The coordinates of the key points of the face in the face image are determined to determine the three-dimensional coordinates of the key points of the face corresponding to the point.
在一些实施例中,基于该点的像素值,确定该点对应的人脸关键点的深度值,包括:响应于确定该点的像素值大于等于预设阈值,将该点的像素值确定为该点对应的人脸关键点的深度值。In some embodiments, based on the pixel value of the point, determining the depth value of the face key point corresponding to the point includes: in response to determining that the pixel value of the point is greater than or equal to a preset threshold, determining the pixel value of the point as The depth value of the key point of the face corresponding to this point.
在一些实施例中,基于该点的像素值,确定该点对应的人脸关键点的深度值,还包括:响应于确定该点的像素值小于预设阈值,将预设阈值确定为该点对应的人脸关键点的深度值。In some embodiments, determining the depth value of the key point of the face corresponding to the point based on the pixel value of the point further includes: in response to determining that the pixel value of the point is less than a preset threshold, determining the preset threshold as the point The depth value of the corresponding key point of the face.
在一些实施例中,基于左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,生成目标人脸所对应的结果三维网格,包括:分别过左人脸图像所对应的三维网格的中心线和右人脸图像所对应的三维网格的中心线,建立基准面,其中,基准面沿中心线贯穿三维网格,将三维网格划分成两部分;提取左人脸图像所对应的三维网格中,位于基准面左侧的三维网格作为左三维网格,以及提取右人脸图像所对应的三维网格中,位于基准面右侧的三维网格作为右三维网格;对所提取的左三维网格和右三维网格进行拼接,生成目标人脸所对应的结果三维网格。In some embodiments, based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, generating the resulting three-dimensional grid corresponding to the target face includes: The center line of the corresponding three-dimensional grid and the center line of the three-dimensional grid corresponding to the right face image are established to establish a reference plane, where the reference plane penetrates the three-dimensional grid along the center line and divides the three-dimensional grid into two parts; extracting the left Among the three-dimensional grids corresponding to the face image, the three-dimensional grid on the left side of the reference plane is used as the left three-dimensional grid, and the three-dimensional grid corresponding to the right face image is extracted, and the three-dimensional grid on the right side of the reference plane is used as Right three-dimensional grid; splicing the extracted left and right three-dimensional grids to generate the resulting three-dimensional grid corresponding to the target face.
在一些实施例中,映射图生成模型通过以下步骤训练获得:获取训练样本集,其中,训练样本包括样本人脸图像、样本人脸图像中的人脸关键点的坐标和深度值;对于训练样本集中的训练样本,基于该训练样本中的人脸关键点的坐标,确定该训练样本中的人脸关键点在 待构建的映射图中的映射位置,以及基于该训练样本中的人脸关键点的深度值,确定待构建的映射图中相应的映射位置的像素值,利用所确定的映射位置和映射位置的像素值,构建与该训练样本中的样本人脸图像对应的映射图;利用机器学习方法,将训练样本集中的训练样本的样本人脸图像作为输入,将所输入的样本人脸图像所对应的映射图作为期望输出,训练得到映射图生成模型。In some embodiments, the map generation model is obtained by training in the following steps: Obtain a training sample set, where the training sample includes a sample face image, the coordinates and depth values of key points of the face in the sample face image; for training samples Concentrated training samples, based on the coordinates of the face key points in the training sample, determine the mapping position of the face key points in the training sample in the map to be constructed, and based on the face key points in the training sample Determine the pixel value of the corresponding mapping position in the mapping image to be constructed, and use the determined mapping position and the pixel value of the mapping position to construct a mapping image corresponding to the sample face image in the training sample; use a machine In the learning method, the sample face image of the training sample in the training sample set is taken as the input, and the map corresponding to the input sample face image is taken as the desired output, and the map generation model is obtained by training.
在一些实施例中,训练样本集中的训练样本通过以下步骤生成:利用深度图采集装置采集样本人脸的人脸深度图,以及获取人脸深度图所对应的人脸图像;对人脸深度图所对应的人脸图像进行人脸关键点检测,以确定人脸深度图所对应的人脸图像中的人脸关键点的坐标;将人脸深度图所对应的人脸图像、所确定的人脸关键点的坐标、基于人脸深度图确定的人脸关键点的深度值汇总为训练样本。In some embodiments, the training samples in the training sample set are generated by the following steps: using the depth map acquisition device to collect the face depth map of the sample face, and obtain the face image corresponding to the face depth map; Perform face key point detection on the corresponding face image to determine the coordinates of the face key points in the face image corresponding to the face depth map; compare the face image corresponding to the face depth map to the determined person The coordinates of the face key points and the depth values of the face key points determined based on the face depth map are summarized as training samples.
第二方面,本公开的实施例提供了一种用于生成信息的装置,该装置包括:图像获取单元,被配置成获取对目标人脸进行拍摄所获得的左人脸图像和右人脸图像,其中,左人脸图像和右人脸图像为双目视觉图像;第一生成单元,被配置成对于左人脸图像和右人脸图像中的人脸图像,执行以下步骤:将该人脸图像输入预先训练的映射图生成模型,获得该人脸图像所对应的映射图,其中,映射图中的点与该人脸图像中的人脸关键点相对应;对于映射图中的点,基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标;基于所确定的、该人脸图像中的人脸关键点的三维坐标,生成该人脸图像所对应的三维网格;第二生成单元,被配置成基于左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,生成目标人脸所对应的结果三维网格。In a second aspect, an embodiment of the present disclosure provides an apparatus for generating information. The apparatus includes: an image acquiring unit configured to acquire a left face image and a right face image obtained by photographing a target face , Where the left face image and the right face image are binocular vision images; the first generating unit is configured to perform the following steps for the face images in the left face image and the right face image: The image is input into the pre-trained map generation model to obtain the map corresponding to the face image, where the points in the map correspond to the key points of the face in the face image; for the points in the map, based on The coordinates of the point in the map and the pixel value of the point determine the three-dimensional coordinates of the key points of the face corresponding to the point; based on the determined three-dimensional coordinates of the key points of the face in the face image, the person is generated The three-dimensional mesh corresponding to the face image; the second generating unit is configured to generate the resultant three-dimensional mesh corresponding to the target face based on the three-dimensional mesh corresponding to the left face image and the three-dimensional mesh corresponding to the right face image grid.
在一些实施例中,第一生成单元进一步被配置成:基于该点的像素值,确定该点对应的人脸关键点的深度值;基于该点在映射图中的坐标确定该点对应的人脸关键点在该人脸图像中的坐标;基于该点对应的人脸关键点的深度值和该点对应的人脸关键点在该人脸图像中的坐标,确定该点对应的人脸关键点的三维坐标。In some embodiments, the first generating unit is further configured to: determine the depth value of the key point of the face corresponding to the point based on the pixel value of the point; determine the person corresponding to the point based on the coordinates of the point in the map. The coordinates of the face key point in the face image; based on the depth value of the face key point corresponding to the point and the coordinates of the face key point corresponding to the point in the face image, determine the face key corresponding to the point The three-dimensional coordinates of the point.
在一些实施例中,第一生成单元进一步被配置成:响应于确定该 点的像素值大于等于预设阈值,将该点的像素值确定为该点对应的人脸关键点的深度值。In some embodiments, the first generating unit is further configured to: in response to determining that the pixel value of the point is greater than or equal to a preset threshold, determine the pixel value of the point as the depth value of the key point of the face corresponding to the point.
在一些实施例中,第一生成单元进一步被配置成:响应于确定该点的像素值小于预设阈值,将预设阈值确定为该点对应的人脸关键点的深度值。In some embodiments, the first generating unit is further configured to: in response to determining that the pixel value of the point is less than the preset threshold, determine the preset threshold as the depth value of the key point of the face corresponding to the point.
在一些实施例中,第二生成单元包括:建立模块,被配置成分别过左人脸图像所对应的三维网格的中心线和右人脸图像所对应的三维网格的中心线,建立基准面,其中,基准面沿中心线贯穿三维网格,将三维网格划分成两部分;提取模块,被配置成提取左人脸图像所对应的三维网格中,位于基准面左侧的三维网格作为左三维网格,以及提取右人脸图像所对应的三维网格中,位于基准面右侧的三维网格作为右三维网格;拼接模块,被配置成对所提取的左三维网格和右三维网格进行拼接,生成目标人脸所对应的结果三维网格。In some embodiments, the second generating unit includes: a establishing module configured to respectively pass the center line of the three-dimensional grid corresponding to the left face image and the center line of the three-dimensional grid corresponding to the right face image to establish a reference Surface, where the reference surface runs through the three-dimensional grid along the centerline, dividing the three-dimensional grid into two parts; the extraction module is configured to extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid located on the left side of the reference plane The grid is used as the left three-dimensional grid, and the three-dimensional grid corresponding to the right face image is extracted, the three-dimensional grid on the right side of the reference plane is used as the right three-dimensional grid; the splicing module is configured to pair the extracted left three-dimensional grid Join with the right 3D grid to generate the result 3D grid corresponding to the target face.
在一些实施例中,映射图生成模型通过以下步骤训练获得:获取训练样本集,其中,训练样本包括样本人脸图像、样本人脸图像中的人脸关键点的坐标和深度值;对于训练样本集中的训练样本,基于该训练样本中的人脸关键点的坐标,确定该训练样本中的人脸关键点在待构建的映射图中的映射位置,以及基于该训练样本中的人脸关键点的深度值,确定待构建的映射图中相应的映射位置的像素值,利用所确定的映射位置和映射位置的像素值,构建与该训练样本中的样本人脸图像对应的映射图;利用机器学习方法,将训练样本集中的训练样本的样本人脸图像作为输入,将所输入的样本人脸图像所对应的映射图作为期望输出,训练得到映射图生成模型。In some embodiments, the map generation model is obtained by training in the following steps: Obtain a training sample set, where the training sample includes a sample face image, the coordinates and depth values of key points of the face in the sample face image; for training samples Concentrated training samples, based on the coordinates of the face key points in the training sample, determine the mapping position of the face key points in the training sample in the map to be constructed, and based on the face key points in the training sample Determine the pixel value of the corresponding mapping position in the mapping image to be constructed, and use the determined mapping position and the pixel value of the mapping position to construct a mapping image corresponding to the sample face image in the training sample; use a machine In the learning method, the sample face image of the training sample in the training sample set is taken as the input, and the map corresponding to the input sample face image is taken as the desired output, and the map generation model is obtained by training.
在一些实施例中,训练样本集中的训练样本通过以下步骤生成:利用深度图采集装置采集样本人脸的人脸深度图,以及获取人脸深度图所对应的人脸图像;对人脸深度图所对应的人脸图像进行人脸关键点检测,以确定人脸深度图所对应的人脸图像中的人脸关键点的坐标;将人脸深度图所对应的人脸图像、所确定的人脸关键点的坐标、基于人脸深度图确定的人脸关键点的深度值汇总为训练样本。In some embodiments, the training samples in the training sample set are generated by the following steps: using the depth map acquisition device to collect the face depth map of the sample face, and obtain the face image corresponding to the face depth map; Perform face key point detection on the corresponding face image to determine the coordinates of the face key points in the face image corresponding to the face depth map; compare the face image corresponding to the face depth map to the determined person The coordinates of the face key points and the depth values of the face key points determined based on the face depth map are summarized as training samples.
第三方面,本公开的实施例提供了一种电子设备,包括:一个或 多个处理器;存储装置,其上存储有一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现上述用于生成信息的方法中任一实施例的方法。In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, when one or more programs are processed by one or more The processor executes, so that one or more processors implement the method of any one of the foregoing methods for generating information.
第四方面,本公开的实施例提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现上述用于生成信息的方法中任一实施例的方法。In a fourth aspect, the embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the method of any one of the above methods for generating information is implemented.
本公开的实施例提供的用于生成信息的方法和装置,通过获取对目标人脸进行拍摄所获得的左人脸图像和右人脸图像,其中,左人脸图像和右人脸图像为双目视觉图像,而后对于左人脸图像和右人脸图像中的人脸图像,执行以下步骤:将该人脸图像输入预先训练的映射图生成模型,获得该人脸图像所对应的映射图,其中,映射图中的点与该人脸图像中的人脸关键点相对应;对于映射图中的点,基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标;基于所确定的、该人脸图像中的人脸关键点的三维坐标,生成该人脸图像所对应的三维网格,最后基于左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,生成目标人脸所对应的结果三维网格。可以理解,由于遮挡和拍摄角度等原因,左人脸图像和右人脸图像可以记录不同角度的人脸特征,所以这里,利用左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,可以生成更为准确的、目标人脸所对应的结果三维网格,有助于提高三维人脸重建的准确性。The method and device for generating information provided by the embodiments of the present disclosure obtain the left face image and the right face image obtained by shooting the target face, wherein the left face image and the right face image are double Then, for the face images in the left face image and the right face image, perform the following steps: input the face image into the pre-trained map generation model to obtain the map corresponding to the face image, Among them, the points in the map correspond to the key points of the face in the face image; for the points in the map, based on the coordinates of the point in the map and the pixel value of the point, determine the person corresponding to the point The three-dimensional coordinates of the key points of the face; based on the determined three-dimensional coordinates of the key points of the face in the face image, a three-dimensional grid corresponding to the face image is generated, and finally based on the three-dimensional grid corresponding to the left face image The three-dimensional grid corresponding to the right face image is used to generate the resulting three-dimensional grid corresponding to the target face. It can be understood that due to occlusion and shooting angles, the left face image and the right face image can record facial features at different angles, so here, the three-dimensional grid corresponding to the left face image and the right face image corresponding to the The 3D mesh can generate a more accurate 3D mesh corresponding to the target face, which helps to improve the accuracy of 3D face reconstruction.
附图说明Description of the drawings
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本公开的其它特征、目的和优点将会变得更明显:By reading the detailed description of the non-limiting embodiments with reference to the following drawings, other features, purposes and advantages of the present disclosure will become more apparent:
图1是本公开的一个实施例可以应用于其中的示例性系统架构图;Fig. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure can be applied;
图2是根据本公开的用于生成信息的方法的一个实施例的流程图;Fig. 2 is a flowchart of an embodiment of a method for generating information according to the present disclosure;
图3是根据本公开的实施例的用于生成信息的方法的一个应用场 景的示意图;Fig. 3 is a schematic diagram of an application scene of the method for generating information according to an embodiment of the present disclosure;
图4是根据本公开的用于生成信息的方法的又一个实施例的流程图;FIG. 4 is a flowchart of another embodiment of a method for generating information according to the present disclosure;
图5是根据本公开的用于生成信息的装置的一个实施例的结构示意图;Fig. 5 is a schematic structural diagram of an embodiment of an apparatus for generating information according to the present disclosure;
图6是适于用来实现本公开的实施例的电子设备的计算机系统的结构示意图。Fig. 6 is a schematic structural diagram of a computer system suitable for implementing an electronic device of an embodiment of the present disclosure.
具体实施方式detailed description
下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。The present disclosure will be further described in detail below in conjunction with the drawings and embodiments. It can be understood that the specific embodiments described here are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for ease of description, only the parts related to the relevant invention are shown in the drawings.
需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。It should be noted that the embodiments in the present disclosure and the features in the embodiments can be combined with each other if there is no conflict. Hereinafter, the present disclosure will be described in detail with reference to the drawings and in conjunction with embodiments.
图1示出了可以应用本公开的用于生成信息的方法或用于生成信息的装置的实施例的示例性系统架构100。FIG. 1 shows an exemplary system architecture 100 to which an embodiment of the method for generating information or the apparatus for generating information of the present disclosure can be applied.
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如图像处理类软件、视频播放类软件、网页浏览器应用、搜索类应用、即时通信工具、社交平台软件等。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages, and so on. Various communication client applications, such as image processing software, video playback software, web browser applications, search applications, instant messaging tools, and social platform software, may be installed on the terminal devices 101, 102, and 103.
终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4 (Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务的多个软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。The terminal devices 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they can be various electronic devices, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, moving picture expert compression Standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compression standard audio layer 4) player, laptop portable computer and desktop computer, etc. When the terminal devices 101, 102, 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, multiple software or software modules used to provide distributed services), or as a single software or software module. No specific restrictions are made here.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103对目标人脸进行拍摄获得的左人脸图像和右人脸图像进行处理的图像处理服务器。图像处理服务器可以对接收到的左人脸图像和右人脸图像等数据进行分析等处理,并获得处理结果(例如目标人脸所对应的结果三维网格)。The server 105 may be a server that provides various services, for example, an image processing server that processes the left face image and the right face image obtained by shooting the target face of the terminal device 101, 102, 103. The image processing server can analyze and process the received data such as the left face image and the right face image, and obtain the processing result (for example, the resultant three-dimensional grid corresponding to the target face).
需要说明的是,本公开的实施例所提供的用于生成信息的方法可以由服务器105执行,也可以由终端设备101、102、103执行;相应地,用于生成信息的装置可以设置于服务器105中,也可以设置于终端设备101、102、103中。It should be noted that the method for generating information provided by the embodiments of the present disclosure can be executed by the server 105 or by the terminal devices 101, 102, 103; accordingly, the device for generating information can be set on the server 105 can also be set in the terminal devices 101, 102, 103.
需要说明的是,服务器可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务的多个软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。It should be noted that the server can be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or as a single server. When the server is software, it can be implemented as multiple software or software modules (for example, multiple software or software modules for providing distributed services), or as a single software or software module. No specific restrictions are made here.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。在生成目标人脸所对应的结果三维网格的过程中所使用的数据不需要从远程获取的情况下,上述系统架构可以不包括网络,而只包括终端设备或服务器。It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks and servers. In the case that the data used in the process of generating the resultant three-dimensional grid corresponding to the target face does not need to be obtained remotely, the above system architecture may not include a network, but only include terminal devices or servers.
继续参考图2,示出了根据本公开的用于生成信息的方法的一个实施例的流程200。该用于生成信息的方法,包括以下步骤:With continued reference to FIG. 2, there is shown a process 200 of an embodiment of the method for generating information according to the present disclosure. The method for generating information includes the following steps:
步骤201,获取对目标人脸进行拍摄所获得的左人脸图像和右人脸图像。Step 201: Obtain a left face image and a right face image obtained by shooting a target face.
在本实施例中,用于生成信息的方法的执行主体(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式获取对目标人脸进行拍摄所获得的左人脸图像和右人脸图像。其中,目标人脸为待生成其所对应的三维网格的人脸。实践中,生成人脸的三维网格后,可以对三维网格进行渲染等操作,进而实现三维人脸重建。In this embodiment, the execution subject of the method for generating information (for example, the server shown in FIG. 1) may obtain the left face image and the right person obtained by shooting the target face through a wired connection or a wireless connection. Face image. Among them, the target face is the face whose corresponding three-dimensional mesh is to be generated. In practice, after the three-dimensional mesh of the face is generated, operations such as rendering the three-dimensional mesh can be performed to realize the reconstruction of the three-dimensional face.
在本实施例中,左人脸图像和右人脸图像为双目视觉图像。具体的,上述执行主体可以获取预先存储于本地的左人脸图像和右人脸图像,也可以获取通信连接的电子设备(例如图1所示的终端设备)发送的左人脸图像和右人脸图像。需要说明的是,左人脸图像和右人脸图像均为二维的人脸图像。In this embodiment, the left face image and the right face image are binocular vision images. Specifically, the above-mentioned execution subject may obtain the left face image and the right face image pre-stored locally, or may obtain the left face image and the right face image sent by a communication connected electronic device (such as the terminal device shown in FIG. 1). Face image. It should be noted that both the left face image and the right face image are two-dimensional face images.
实践中,可以利用各种包括双目摄像头的设备(例如双目相机)对目标人脸进行拍摄,获得目标人脸所对应的左人脸图像和右人脸图像。需要说明的是,双目摄像头通常为沿水平方向排列的两个摄像头。在利用双目摄像头进行拍摄时,可以将排列在左侧的摄像头确定为左摄像头,其拍摄的图像为左图像(对应左人脸图像);相对应的,将排列在右侧的摄像头确定为右摄像头,其拍摄的图像为右图像(对应右人脸图像)。In practice, various devices including binocular cameras (such as binocular cameras) can be used to photograph the target face to obtain the left and right face images corresponding to the target face. It should be noted that the binocular camera is usually two cameras arranged in a horizontal direction. When shooting with binocular cameras, the camera arranged on the left can be determined as the left camera, and the captured image is the left image (corresponding to the left face image); correspondingly, the camera arranged on the right is determined as The right camera, the image taken by it is the right image (corresponding to the right face image).
步骤202,对于左人脸图像和右人脸图像中的人脸图像,执行以下步骤:将该人脸图像输入预先训练的映射图生成模型,获得该人脸图像所对应的映射图;对于映射图中的点,基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标;基于所确定的、该人脸图像中的人脸关键点的三维坐标,生成该人脸图像所对应的三维网格。Step 202: For the face images in the left face image and the right face image, perform the following steps: input the face image into a pre-trained map generation model to obtain a map corresponding to the face image; For the points in the figure, based on the coordinates of the point in the map and the pixel value of the point, determine the three-dimensional coordinates of the key point of the face corresponding to the point; based on the determined key point of the face in the face image Three-dimensional coordinates to generate a three-dimensional grid corresponding to the face image.
在本实施例中,对于步骤201中得到的左人脸图像和右人脸图像中的每个人脸图像,上述执行主体可以执行以下步骤:In this embodiment, for each of the left face image and the right face image obtained in step 201, the above-mentioned execution subject may perform the following steps:
步骤2021,将该人脸图像输入预先训练的映射图生成模型,获得该人脸图像所对应的映射图。Step 2021: Input the face image into a pre-trained map generation model to obtain a map corresponding to the face image.
其中,映射图为用于确定人脸图像中的人脸关键点的三维坐标的图像。人脸关键点的三维坐标由人脸关键点在人脸图像中的位置坐标以及人脸关键点的深度值组成。人脸关键点的深度值可以是采集人脸 图像时的人脸关键点到成像平面的距离。映射图中的点与该人脸图像中的人脸关键点相对应。实践中,人脸关键点可以是人脸中关键的点,具体的,可以为影响脸部轮廓或者五官形状的点。Wherein, the map is an image used to determine the three-dimensional coordinates of the key points of the face in the face image. The three-dimensional coordinates of the face key points are composed of the position coordinates of the face key points in the face image and the depth value of the face key points. The depth value of the face key point can be the distance from the face key point to the imaging plane when the face image is collected. The points in the map correspond to the key points of the face in the face image. In practice, the key points of the human face may be key points in the human face, specifically, the points that affect the contour of the face or the shape of the facial features.
在本实施例中,映射图生成模型可以用于表征人脸图像和人脸图像所对应的映射图的对应关系。具体的,作为示例,映射图生成模型可以是技术人员预先基于对大量的人脸图像和人脸图像所对应的映射图的统计而预先制定的、存储有多个人脸图像与对应的映射图的对应关系表;也可以为基于预设的训练样本,利用机器学习方法对初始模型(例如神经网络)进行训练后得到的模型。In this embodiment, the map generation model can be used to characterize the correspondence between the face image and the map corresponding to the face image. Specifically, as an example, the map generation model may be pre-made by technicians based on statistics of a large number of face images and maps corresponding to face images, and stores multiple face images and corresponding maps. Correspondence table; it can also be a model obtained after training an initial model (such as a neural network) using a machine learning method based on preset training samples.
需要说明的是,映射图生成模型对应一个预先确定的映射关系或映射原理,该映射关系或映射原理用于确定输入映射图生成模型的人脸图像中的人脸关键点在映射图生成模型输出的映射图中的映射位置。It should be noted that the map generation model corresponds to a predetermined mapping relationship or mapping principle, which is used to determine the key points of the face in the face image input to the map generation model and output the map generation model The location of the map in the map.
在本实施例的一些可选的实现方式中,映射图生成模型可以由上述执行主体或者其他电子设备通过以下步骤训练获得:In some optional implementation manners of this embodiment, the map generation model may be obtained by the above-mentioned execution subject or other electronic equipment through training in the following steps:
步骤20211,获取训练样本集。Step 20211: Obtain a training sample set.
其中,训练样本包括样本人脸图像、样本人脸图像中的人脸关键点的坐标和深度值。这里,样本人脸图像为二维的人脸图像。实践中,可以采用各种方法获取训练样本集。Among them, the training sample includes the sample face image, the coordinates and depth values of the key points of the face in the sample face image. Here, the sample face image is a two-dimensional face image. In practice, various methods can be used to obtain the training sample set.
在本实施例的一些可选的实现方式中,训练样本集中的训练样本可以通过以下步骤生成:首先,可以利用深度图采集装置采集样本人脸的人脸深度图,以及获取人脸深度图所对应的人脸图像。然后,对人脸深度图所对应的人脸图像进行人脸关键点检测,以确定人脸深度图所对应的人脸图像中的人脸关键点的坐标。最后,将人脸深度图所对应的人脸图像、所确定的人脸关键点的坐标、基于人脸深度图确定的人脸关键点的深度值汇总为训练样本。In some optional implementations of this embodiment, the training samples in the training sample set can be generated through the following steps: First, the depth map collection device can be used to collect the face depth map of the sample face, and obtain the face depth map location. The corresponding face image. Then, the face key point detection is performed on the face image corresponding to the face depth map to determine the coordinates of the face key points in the face image corresponding to the face depth map. Finally, the face image corresponding to the face depth map, the coordinates of the determined face key points, and the depth values of the face key points determined based on the face depth map are summarized as training samples.
在这里,深度图采集装置可以是各种可以采集深度图的图像采集装置。例如双目相机、深度摄像头等。人脸深度图为包含深度信息(即视点与场景对象的表面的距离信息)的图像。人脸深度图所对应的人脸图像为人脸深度图对应的不含深度信息的RGB(Red Green Blue) 三通道彩色图像。进而,通过去除人脸深度图的深度信息,可以获得人脸深度图所对应的人脸图像(即样本人脸图像)。Here, the depth map acquisition device may be various image acquisition devices that can acquire a depth map. For example, binocular cameras, depth cameras, etc. The face depth map is an image containing depth information (that is, the distance information between the viewpoint and the surface of the scene object). The face image corresponding to the face depth map is an RGB (Red Green Blue) three-channel color image without depth information corresponding to the face depth map. Furthermore, by removing the depth information of the face depth map, a face image corresponding to the face depth map (ie, a sample face image) can be obtained.
此处,可以利用各种人脸关键点检测方式对人脸深度图所对应的人脸图像进行人脸关键点检测。例如,可以将人脸图像输入至预先训练的人脸关键点检测模型,得到检测结果。其中,人脸关键点检测模型可以用于检测人脸图像中的人脸关键点的位置。这里,人脸关键点检测模型可以是利用机器学习方法,基于样本集(包含人脸图像和用于指示人脸关键点的位置的标注),对现有的卷积神经网络进行有监督训练得到的。其中,卷积神经网络可以使用各种现有的结构,例如DenseBox、VGGNet、ResNet、SegNet等。Here, various face key point detection methods can be used to perform face key point detection on the face image corresponding to the face depth map. For example, the face image can be input to a pre-trained face key point detection model to obtain the detection result. Among them, the face key point detection model can be used to detect the position of the face key point in the face image. Here, the face key point detection model can be obtained by supervised training of the existing convolutional neural network based on the sample set (including the face image and the label used to indicate the position of the face key point) using machine learning methods of. Among them, the convolutional neural network can use various existing structures, such as DenseBox, VGGNet, ResNet, SegNet, etc.
另外,需要说明的是,上述基于人脸深度图确定的人脸关键点的深度值的方法是目前广泛研究和应用的公知技术,此处不再赘述。In addition, it should be noted that the above-mentioned method for determining the depth value of the key points of the face based on the depth map of the face is a well-known technology widely researched and applied at present, and will not be repeated here.
步骤20212,对于训练样本集中的训练样本,基于该训练样本中的人脸关键点的坐标,确定该训练样本中的人脸关键点在待构建的映射图中的映射位置,以及基于该训练样本中的人脸关键点的深度值,确定待构建的映射图中相应的映射位置的像素值,利用所确定的映射位置和映射位置的像素值,构建与该训练样本中的样本人脸图像对应的映射图。Step 20112: For the training samples in the training sample set, based on the coordinates of the face key points in the training sample, determine the mapping position of the face key points in the training sample in the map to be constructed, and based on the training sample Determine the depth value of the face key point in the map to determine the pixel value of the corresponding mapping position in the map to be constructed, and use the determined mapping position and the pixel value of the mapping position to construct the sample face image corresponding to the training sample Map.
在这里,对于训练样本集中的训练样本,上述执行主体或其他电子设备可以首先基于该训练样本中的人脸关键点的坐标,确定该训练样本中的人脸关键点在待构建的映射图中的映射位置。此处,对于某个人脸关键点,可以基于预先建立的映射关系或基于现有的映射原理,确定出该人脸关键点在待构建的映射图中的映射位置的坐标。作为示例,可以利用UV(U-VEEZ)映射的原理确定出该人脸关键点在待构建的映射图中的映射位置的坐标。实践中,UV是二维纹理坐标。UV用于定义二维纹理坐标系,称为“UV纹理空间”。UV纹理空间使用字母U和V来指示二维空间中的轴。在三维建模中,UV映射可以将纹理信息转换为平面信息。此时,所映射出的UV坐标可以用于指示待构建的映射图中的映射位置。所映射出的UV坐标可以作为待构建的映射图中的映射位置的坐标。Here, for the training samples in the training sample set, the above-mentioned execution subject or other electronic device may first determine that the face key points in the training sample are in the map to be constructed based on the coordinates of the face key points in the training sample The mapped location. Here, for a certain face key point, the coordinates of the mapping position of the face key point in the map to be constructed can be determined based on a pre-established mapping relationship or based on an existing mapping principle. As an example, the principle of UV (U-VEEZ) mapping can be used to determine the coordinates of the mapping position of the key point of the face in the map to be constructed. In practice, UV is a two-dimensional texture coordinate. UV is used to define a two-dimensional texture coordinate system, called "UV texture space". The UV texture space uses the letters U and V to indicate axes in two-dimensional space. In 3D modeling, UV mapping can convert texture information into plane information. At this time, the mapped UV coordinates can be used to indicate the mapping position in the map to be constructed. The mapped UV coordinates can be used as the coordinates of the mapping position in the map to be constructed.
在本实现方式中,在确定了某个人脸关键点在待构建的映射图中的映射位置之后,可以基于该人脸关键点的深度值确定待构建的映射图中相应的映射位置的像素值。具体的,可以预先确定像素值和深度值的对应关系,进而,基于上述对应关系和该人脸关键点的深度值,确定待构建的映射图中相应的映射位置的像素值。作为示例,预先确定的像素值和深度值的对应关系为“像素值与深度值相等”。进而对于某个人脸关键点,该人脸关键点在样本人脸图像中的坐标为(100,50),该人脸关键点对应的映射位置为坐标(50,25),该人脸关键点的深度值为30。则映射图中坐标(50,25)处的像素值为30。In this implementation, after determining the mapping position of a certain face key point in the map to be constructed, the pixel value of the corresponding mapping position in the map to be constructed can be determined based on the depth value of the face key point . Specifically, the corresponding relationship between the pixel value and the depth value may be determined in advance, and further, the pixel value of the corresponding mapping position in the map to be constructed is determined based on the foregoing corresponding relationship and the depth value of the key point of the face. As an example, the predetermined correspondence between the pixel value and the depth value is "the pixel value is equal to the depth value". Furthermore, for a certain face key point, the coordinates of the face key point in the sample face image are (100, 50), and the mapping position corresponding to the face key point is the coordinates (50, 25). The face key point The depth value is 30. Then the pixel value at coordinates (50, 25) in the map is 30.
需要说明的是,可以预先建立映射图的尺寸与样本人脸图像的尺寸的对应关系。作为示例,可以预先设置映射图的尺寸与样本人脸图像的尺寸相同;或者,可以预先设置映射图的尺寸为样本人脸图像的尺寸的二分之一。It should be noted that the corresponding relationship between the size of the map and the size of the sample face image can be established in advance. As an example, the size of the map may be preset to be the same as the size of the sample face image; or, the size of the map may be preset to be one half of the size of the sample face image.
步骤20213,利用机器学习方法,将训练样本集中的训练样本的样本人脸图像作为输入,将所输入的样本人脸图像所对应的映射图作为期望输出,训练得到映射图生成模型。Step 20113: Using a machine learning method, take the sample face images of the training samples in the training sample set as input, and use the input map corresponding to the sample face image as the desired output, and train a map generation model.
在这里,上述执行主体或其他电子设备可以利用机器学习方法,将上述训练样本集中的训练样本包括的样本人脸图像作为初始模型的输入,将所输入的样本人脸图像所对应的映射图作为初始模型的期望输出,对初始模型进行训练,最终训练得到映射图生成模型。Here, the above-mentioned executive body or other electronic equipment can use machine learning methods to use the sample face images included in the training samples in the above-mentioned training sample set as the input of the initial model, and use the input map corresponding to the sample face image as The expected output of the initial model is trained on the initial model, and finally the map generation model is obtained through training.
此处,可以使用各种现有的卷积神经网络结构(例如DenseBox、VGGNet、ResNet、SegNet等)作为初始模型进行训练。实践中,卷积神经网络(Convolutional Neural Network,CNN)是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,对于图像处理有出色表现,因而,可以利用卷积神经网络对训练样本中的样本人脸图像进行处理。Here, various existing convolutional neural network structures (such as DenseBox, VGGNet, ResNet, SegNet, etc.) can be used as the initial model for training. In practice, convolutional neural network (Convolutional Neural Network, CNN) is a feed-forward neural network. Its artificial neurons can respond to a part of the surrounding units in the coverage area and have excellent performance in image processing. Therefore, convolution can be used The neural network processes the sample face images in the training samples.
需要说明的是,上述执行主体或其他电子设备也可以使用其他具有图像处理功能的模型作为初始模型,并不限于CNN,具体的模型结构可以根据实际需求设定,此处不作限定。需要指出的是,机器学习方法是目前广泛研究和应用的公知技术,在此不再赘述。It should be noted that the above-mentioned executive body or other electronic devices can also use other models with image processing functions as the initial model, and it is not limited to CNN. The specific model structure can be set according to actual needs, and is not limited here. It needs to be pointed out that the machine learning method is a well-known technology that is currently widely researched and applied, and will not be repeated here.
步骤2022,对于映射图中的点,基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标。Step 2022: For a point in the map, based on the coordinates of the point in the map and the pixel value of the point, determine the three-dimensional coordinates of the key point of the face corresponding to the point.
在这里,对于映射图中的点,上述执行主体可以采用各种方法基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标。Here, for a point in the map, the above-mentioned execution subject can use various methods to determine the three-dimensional coordinates of the key point of the face corresponding to the point in the map based on the coordinates of the point in the map and the pixel value of the point.
在本实施例的一些可选的实现方式中,对于映射图中的点,上述执行主体可以通过以下步骤确定该点对应的人脸关键点的三维坐标:首先,上述执行主体可以基于该点的像素值,确定该点对应的人脸关键点的深度值。然后,上述执行主体可以基于该点在映射图中的坐标确定该点对应的人脸关键点在该人脸图像中的坐标。最后,上述执行主体可以基于该点对应的人脸关键点的深度值和该点对应的人脸关键点在该人脸图像中的坐标,确定该点对应的人脸关键点的三维坐标。In some optional implementations of this embodiment, for a point in the map, the above-mentioned executive body can determine the three-dimensional coordinates of the key point of the face corresponding to the point by the following steps: First, the above-mentioned executive body can be based on the point The pixel value determines the depth value of the key point of the face corresponding to the point. Then, the above-mentioned execution subject may determine the coordinates of the face key point corresponding to the point in the face image based on the coordinates of the point in the map. Finally, the execution subject may determine the three-dimensional coordinates of the face key point corresponding to the point based on the depth value of the face key point corresponding to the point and the coordinates of the face key point corresponding to the point in the face image.
具体的,上述执行主体可以基于映射图生成模型所对应的映射关系或映射原理,基于该点在映射图中的坐标确定该点对应的人脸关键点在该人脸图像中的坐标。可以理解的是,在训练映射图生成模型时,由于利用预先确定的映射关系或映射原理,可以基于人脸关键点的坐标,确定人脸关键点在待构建的映射图中的映射位置(参考步骤20212)。因而,此处,对于映射图中的某个点,可以采用逆向过程,确定该点对应的人脸关键点在人脸图像中的坐标。Specifically, the above-mentioned execution subject may determine the coordinates of the face key point corresponding to the point in the face image based on the mapping relationship or the mapping principle corresponding to the map generation model, and the coordinates of the point in the map. It is understandable that when training the map generation model, due to the use of a predetermined mapping relationship or mapping principle, the mapping position of the face key point in the map to be constructed can be determined based on the coordinates of the face key point (refer to Step 20212). Therefore, here, for a certain point in the map, a reverse process can be used to determine the coordinates of the key point of the face corresponding to the point in the face image.
另外,在本实现方式中,上述执行主体可以基于该点的像素值,采用各种方法确定该点对应的人脸关键点的深度值。例如,可以直接将该点的像素值确定为该点对应的人脸关键点的深度值。In addition, in this implementation manner, the above-mentioned execution subject may use various methods to determine the depth value of the face key point corresponding to the point based on the pixel value of the point. For example, the pixel value of the point can be directly determined as the depth value of the key point of the face corresponding to the point.
在本实施例的一些可选的实现方式中,上述执行主体可以响应于确定该点的像素值大于等于预设阈值,将该点的像素值确定为该点对应的人脸关键点的深度值。预设阈值可以为预先确定的值,例如“1”。实践中,像素值很低的点通常为预测失误的点,因此在本实现方式中,通过设置预设阈值可以去除预测失误的点,有助于确定出更为准确的人脸关键点的三维坐标。In some optional implementations of this embodiment, the above-mentioned execution subject may, in response to determining that the pixel value of the point is greater than or equal to a preset threshold, determine the pixel value of the point as the depth value of the face key point corresponding to the point . The preset threshold may be a predetermined value, such as "1". In practice, the points with very low pixel values are usually the points where the prediction is incorrect. Therefore, in this implementation, the preset threshold can be set to remove the points with the prediction error, which helps to determine more accurate three-dimensional face key points. coordinate.
在本实施例的一些可选的实现方式中,上述执行主体还可以响应于确定该点的像素值小于预设阈值,将预设阈值确定为该点对应的人 脸关键点的深度值。In some optional implementations of this embodiment, the above-mentioned execution subject may further determine that the preset threshold is the depth value of the key point of the face corresponding to the point in response to determining that the pixel value of the point is less than the preset threshold.
需要说明的是,确定了人脸关键点在人脸图像中的坐标以及人脸关键点的深度值后,上述执行主体可以直接利用人脸关键点的坐标(可以表示为(x,y))与人脸关键点的深度值(可以表示为z)构成人脸关键点的三维坐标(可以表示为(x,y,z))。It should be noted that after determining the coordinates of the face key points in the face image and the depth value of the face key points, the above-mentioned execution subject can directly use the coordinates of the face key points (which can be expressed as (x, y)) The depth value of the face key point (which can be expressed as z) constitutes the three-dimensional coordinates of the face key point (which can be expressed as (x, y, z)).
步骤2023,基于所确定的、该人脸图像中的人脸关键点的三维坐标,生成该人脸图像所对应的三维网格。Step 2023: Generate a three-dimensional grid corresponding to the face image based on the determined three-dimensional coordinates of the key points of the face in the face image.
实践中,人脸图像所对应的三维网格为以人脸关键点为顶点的三维网格,因此,在这里,基于确定出的、该人脸图像中的各个人脸关键点的三维坐标,上述执行主体可以生成该人脸图像所对应的三维网格。In practice, the three-dimensional grid corresponding to the face image is a three-dimensional grid with face key points as vertices. Therefore, here, based on the determined three-dimensional coordinates of each face key point in the face image, The above-mentioned execution subject can generate a three-dimensional grid corresponding to the face image.
需要说明的是,基于三维网格的顶点的三维坐标,生成三维网格的方法是目前广泛研究和应用的公知技术,此处不再赘述。It should be noted that the method of generating a three-dimensional grid based on the three-dimensional coordinates of the vertices of the three-dimensional grid is a well-known technology widely researched and applied at present, and will not be repeated here.
步骤203,基于左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,生成目标人脸所对应的结果三维网格。Step 203: Based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, generate a resulting three-dimensional grid corresponding to the target face.
在本实施例中,上述执行主体通过执行步骤202,可以生成左人脸图像所对应的三维网格和右人脸图像所对应的三维网格。进而,基于左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,上述执行主体可以生成目标人脸所对应的结果三维网格。其中,结果三维网格为待对其进行渲染等操作,以实现针对目标人脸的三维人脸重建的三维网格。In this embodiment, the above-mentioned execution subject can generate a three-dimensional grid corresponding to the left face image and a three-dimensional grid corresponding to the right face image by performing step 202. Furthermore, based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, the execution subject can generate the resultant three-dimensional grid corresponding to the target face. Among them, the resultant three-dimensional mesh is a three-dimensional mesh that is to be subjected to operations such as rendering to realize the three-dimensional face reconstruction of the target face.
具体的,上述执行主体可以基于左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,采用各种方法生成目标人脸所对应的结果三维网格。例如,可以检测左人脸图像所对应三维网格的头部姿态和右人脸图像所对应的三维网格的头部姿态,进而,将所对应的头部姿态所指示的头部旋转角度较小的人脸图像所对应的三维网格确定为目标人脸所对应的结果三维网格。可以理解,实践中,头部旋转角度越小,遮挡的人脸特征越少,生成的三维网格则越准确。从而,从左人脸图像所对应的三维网格和右人脸图像所对应的三维网格中选取所对应的头部旋转角度较小的三维网格作为目标人脸所对应的结果三 维网格,有助于提高结果三维网格的准确性。Specifically, the execution subject may use various methods to generate the resulting three-dimensional grid corresponding to the target face based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image. For example, the head posture of the three-dimensional grid corresponding to the left face image and the head posture of the three-dimensional grid corresponding to the right face image can be detected, and then the head rotation angle indicated by the corresponding head posture can be compared to The three-dimensional grid corresponding to the small face image is determined as the resulting three-dimensional grid corresponding to the target face. It can be understood that, in practice, the smaller the head rotation angle, the fewer the occluded facial features, and the more accurate the generated three-dimensional mesh. Thus, from the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, a three-dimensional grid with a smaller head rotation angle is selected as the resultant three-dimensional grid corresponding to the target face. , Which helps to improve the accuracy of the resulting 3D grid.
需要说明的是,检测头部姿态的方法是目前广泛研究和应用的公知技术,此处不再赘述。It should be noted that the method for detecting the head posture is a well-known technology that is currently widely studied and applied, and will not be repeated here.
继续参见图3,图3是根据本实施例的用于生成信息的方法的应用场景的一个示意图。在图3的应用场景中,服务器301首先可以获取终端设备302发送的、对目标人脸进行拍摄所获得的左人脸图像303和右人脸图像304,其中,左人脸图像303和右人脸图像304为双目视觉图像。Continue to refer to FIG. 3, which is a schematic diagram of an application scenario of the method for generating information according to this embodiment. In the application scenario of FIG. 3, the server 301 can first obtain the left face image 303 and the right face image 304 obtained by shooting the target face sent by the terminal device 302, where the left face image 303 and the right face image The face image 304 is a binocular vision image.
然后,对于左人脸图像303,服务器301可以将左人脸图像303输入预先训练的映射图生成模型305,获得左人脸图像303所对应的映射图306,其中,映射图306中的点与左人脸图像303中的人脸关键点相对应;对于映射图306中的点,基于该点在映射图306中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标;基于所确定的、左人脸图像303中的人脸关键点的三维坐标,生成左人脸图像303所对应的三维网格307;对于右人脸图像304,服务器301可以将右人脸图像304输入映射图生成模型305,获得右人脸图像304所对应的映射图308,其中,映射图308中的点与右人脸图像304中的人脸关键点相对应;对于映射图308中的点,基于该点在映射图308中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标;基于所确定的、右人脸图像304中的人脸关键点的三维坐标,生成右人脸图像304所对应的三维网格309。Then, for the left face image 303, the server 301 can input the left face image 303 into the pre-trained map generation model 305 to obtain the map 306 corresponding to the left face image 303, where the points in the map 306 are The face key points in the left face image 303 correspond; for a point in the map 306, based on the coordinates of the point in the map 306 and the pixel value of the point, determine the three-dimensional face key point corresponding to the point Coordinates; Based on the determined three-dimensional coordinates of the key points of the face in the left face image 303, a three-dimensional grid 307 corresponding to the left face image 303 is generated; for the right face image 304, the server 301 can convert the right face The image 304 is input to the map generation model 305 to obtain the map 308 corresponding to the right face image 304, wherein the points in the map 308 correspond to the face key points in the right face image 304; for the map 308 Based on the coordinates of the point in the map 308 and the pixel value of the point, determine the three-dimensional coordinates of the key point of the face corresponding to the point; based on the determined key point of the face in the right face image 304 The three-dimensional coordinates are used to generate a three-dimensional grid 309 corresponding to the right face image 304.
最后,服务器301可以基于左人脸图像303所对应的三维网格307和右人脸图像304所对应的三维网格309,生成目标人脸所对应的结果三维网格310。Finally, the server 301 may generate a resultant three-dimensional mesh 310 corresponding to the target face based on the three-dimensional mesh 307 corresponding to the left face image 303 and the three-dimensional mesh 309 corresponding to the right face image 304.
本公开的上述实施例提供的方法通过获取对目标人脸进行拍摄所获得的左人脸图像和右人脸图像,而后对于左人脸图像和右人脸图像中的人脸图像,执行以下步骤:将该人脸图像输入预先训练的映射图生成模型,获得该人脸图像所对应的映射图;对于映射图中的点,基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键 点的三维坐标;基于所确定的、该人脸图像中的人脸关键点的三维坐标,生成该人脸图像所对应的三维网格,最后基于左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,生成目标人脸所对应的结果三维网格。可以理解,由于遮挡和拍摄角度等原因,左人脸图像和右人脸图像可以记录不同角度的人脸特征,所以这里,利用左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,可以生成更为准确的、目标人脸所对应的结果三维网格,有助于提高三维人脸重建的准确性。The method provided by the foregoing embodiment of the present disclosure obtains the left face image and the right face image obtained by shooting the target face, and then performs the following steps for the face image in the left face image and the right face image : Input the face image into the pre-trained map generation model to obtain the map corresponding to the face image; for a point in the map, based on the coordinates of the point in the map and the pixel value of the point, determine The three-dimensional coordinates of the face key points corresponding to this point; based on the determined three-dimensional coordinates of the face key points in the face image, the three-dimensional grid corresponding to the face image is generated, and finally based on the left face image The corresponding three-dimensional grid and the three-dimensional grid corresponding to the right face image are generated to generate the resulting three-dimensional grid corresponding to the target face. It can be understood that due to occlusion and shooting angles, the left face image and the right face image can record facial features at different angles, so here, the three-dimensional grid corresponding to the left face image and the right face image corresponding to the The 3D mesh can generate a more accurate 3D mesh corresponding to the target face, which helps to improve the accuracy of 3D face reconstruction.
进一步参考图4,其示出了用于生成信息的方法的又一个实施例的流程400。该用于生成信息的方法的流程400,包括以下步骤:With further reference to FIG. 4, it shows a flow 400 of another embodiment of a method for generating information. The process 400 of the method for generating information includes the following steps:
步骤401,获取对目标人脸进行拍摄所获得的左人脸图像和右人脸图像。Step 401: Obtain a left face image and a right face image obtained by shooting a target face.
在本实施例中,用于生成信息的方法的执行主体(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式获取对目标人脸进行拍摄所获得的左人脸图像和右人脸图像。其中,目标人脸为待生成其所对应的三维网格的人脸。实践中,生成人脸的三维网格后,可以对三维网格进行渲染等操作,进而实现三维人脸重建。左人脸图像和右人脸图像为双目视觉图像。In this embodiment, the execution subject of the method for generating information (for example, the server shown in FIG. 1) may obtain the left face image and the right person obtained by shooting the target face through a wired connection or a wireless connection. Face image. Among them, the target face is the face whose corresponding three-dimensional mesh is to be generated. In practice, after the three-dimensional mesh of the face is generated, operations such as rendering the three-dimensional mesh can be performed to realize the reconstruction of the three-dimensional face. The left face image and the right face image are binocular vision images.
步骤402,对于左人脸图像和右人脸图像中的人脸图像,执行以下步骤:将该人脸图像输入预先训练的映射图生成模型,获得该人脸图像所对应的映射图;对于映射图中的点,基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标;基于所确定的、该人脸图像中的人脸关键点的三维坐标,生成该人脸图像所对应的三维网格。 Step 402, for the face images in the left face image and the right face image, perform the following steps: input the face image into a pre-trained map generation model to obtain the map corresponding to the face image; For the points in the figure, based on the coordinates of the point in the map and the pixel value of the point, determine the three-dimensional coordinates of the key point of the face corresponding to the point; based on the determined key point of the face in the face image Three-dimensional coordinates to generate a three-dimensional grid corresponding to the face image.
在本实施例中,对于步骤401中得到的左人脸图像和右人脸图像中的每个人脸图像,上述执行主体可以执行以下步骤:In this embodiment, for each of the left face image and the right face image obtained in step 401, the above-mentioned execution subject may perform the following steps:
步骤4021,将该人脸图像输入预先训练的映射图生成模型,获得该人脸图像所对应的映射图。Step 4021: Input the face image into a pre-trained map generation model to obtain a map corresponding to the face image.
其中,映射图为用于确定人脸图像中的人脸关键点的三维坐标的 图像。人脸关键点的三维坐标由人脸关键点在人脸图像中的位置坐标以及人脸关键点的深度值组成。人脸关键点的深度值可以是采集人脸图像时的人脸关键点到成像平面的距离。映射图中的点与该人脸图像中的人脸关键点相对应。实践中,人脸关键点可以是人脸中关键的点,具体的,可以为影响脸部轮廓或者五官形状的点。映射图生成模型可以用于表征人脸图像和人脸图像所对应的映射图的对应关系。Among them, the map is an image used to determine the three-dimensional coordinates of the key points of the face in the face image. The three-dimensional coordinates of the face key points are composed of the position coordinates of the face key points in the face image and the depth value of the face key points. The depth value of the face key point may be the distance from the face key point to the imaging plane when the face image is collected. The points in the map correspond to the key points of the face in the face image. In practice, the key points of the human face may be key points in the human face, specifically, the points that affect the contour of the face or the shape of the facial features. The map generation model can be used to characterize the correspondence between the face image and the map corresponding to the face image.
步骤4022,对于映射图中的点,基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标。Step 4022: For a point in the map, based on the coordinates of the point in the map and the pixel value of the point, determine the three-dimensional coordinates of the key point of the face corresponding to the point.
在这里,对于映射图中的点,上述执行主体可以采用各种方法基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标。Here, for a point in the map, the above-mentioned execution subject can use various methods to determine the three-dimensional coordinates of the key point of the face corresponding to the point in the map based on the coordinates of the point in the map and the pixel value of the point.
步骤4023,基于所确定的、该人脸图像中的人脸关键点的三维坐标,生成该人脸图像所对应的三维网格。Step 4023: Generate a three-dimensional grid corresponding to the face image based on the determined three-dimensional coordinates of the key points of the face in the face image.
实践中,人脸图像所对应的三维网格为以人脸关键点为顶点的三维网格,因此,在这里,基于确定出的、该人脸图像中的各个人脸关键点的三维坐标,上述执行主体可以生成该人脸图像所对应的三维网格。In practice, the three-dimensional grid corresponding to the face image is a three-dimensional grid with face key points as vertices. Therefore, here, based on the determined three-dimensional coordinates of each face key point in the face image, The above-mentioned execution subject can generate a three-dimensional grid corresponding to the face image.
上述步骤401、步骤402分别与前述实施例中的步骤201、步骤202一致,上文针对步骤201、步骤202的描述也适用于步骤401、步骤402,此处不再赘述。The above step 401 and step 402 are respectively consistent with step 201 and step 202 in the foregoing embodiment. The above description of step 201 and step 202 also applies to step 401 and step 402, and will not be repeated here.
步骤403,分别过左人脸图像所对应的三维网格的中心线和右人脸图像所对应的三维网格的中心线,建立基准面。In step 403, the center line of the three-dimensional grid corresponding to the left face image and the center line of the three-dimensional grid corresponding to the right face image are respectively passed to establish a reference plane.
在本实施例中,上述执行主体通过执行步骤402,可以生成左人脸图像所对应的三维网格和右人脸图像所对应的三维网格。进而,上述执行主体可以分别过左人脸图像所对应的三维网格的中心线和右人脸图像所对应的三维网格的中心线,建立基准面。其中,基准面沿中心线贯穿三维网格,将三维网格划分成两部分。具体的,基准面所划分的两部分可以为对称的两部分(此时基准面过人脸的对称轴线),也可以为不对称的两部分。但是需要明确的是,基准面可以经过三维网格所指示的人脸上的一个点。进而,基准面可以将三维网格所指示的 人脸划分为位于基准面左侧的人脸和位于基准面右侧的人脸。另外,针对左人脸图像所对应的三维网格建立的基准面所经过的人脸图像上的点与针对右人脸图像所对应的三维网格建立的基准面所经过的人脸图像上的点指示人脸上的同一个点(例如均指示鼻尖所对应的点)。In this embodiment, the above-mentioned execution subject can generate the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image by executing step 402. Furthermore, the above-mentioned execution subject may respectively pass the center line of the three-dimensional grid corresponding to the left face image and the center line of the three-dimensional grid corresponding to the right face image to establish a reference plane. Among them, the reference plane penetrates the three-dimensional grid along the center line, and divides the three-dimensional grid into two parts. Specifically, the two parts divided by the reference plane may be two symmetrical parts (in this case, the reference plane crosses the symmetry axis of the human face), or two asymmetrical parts. But it needs to be clear that the reference plane can pass through a point on the face indicated by the three-dimensional grid. Furthermore, the reference plane can divide the face indicated by the three-dimensional grid into a face located on the left side of the reference plane and a face located on the right side of the reference plane. In addition, the points on the face image passed by the reference plane established for the three-dimensional grid corresponding to the left face image and the point on the face image passed by the reference plane established for the three-dimensional grid corresponding to the right face image The points indicate the same point on the face (for example, both indicate the points corresponding to the tip of the nose).
步骤404,提取左人脸图像所对应的三维网格中,位于基准面左侧的三维网格作为左三维网格,以及提取右人脸图像所对应的三维网格中,位于基准面右侧的三维网格作为右三维网格。Step 404: Extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid located on the left side of the reference plane as the left three-dimensional grid, and extract the three-dimensional grid corresponding to the right face image, which is located on the right side of the reference plane The three-dimensional grid as the right three-dimensional grid.
在本实施例中,基于步骤403中建立的两个基准面,上述执行主体可以提取左人脸图像所对应的三维网格中,位于基准面左侧的三维网格作为左三维网格,以及提取右人脸图像所对应的三维网格中,位于基准面右侧的三维网格作为右三维网格。In this embodiment, based on the two reference planes established in step 403, the execution subject can extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid on the left side of the reference plane as the left three-dimensional grid, and Among the three-dimensional grids corresponding to the right face image, the three-dimensional grid located on the right side of the reference plane is used as the right three-dimensional grid.
需要说明的是,位于基准面左侧的三维网格为面向建立了基准面的三维网格中的人脸轮廓时,基准面的左侧的三维网格;位于基准面右侧的三维网格为面向建立了基准面的三维网格中的人脸轮廓时,基准面的右侧的三维网格。It should be noted that when the three-dimensional grid on the left side of the base plane is facing the face contour in the three-dimensional grid on which the base plane is established, the three-dimensional grid on the left side of the base plane; the three-dimensional grid on the right side of the base plane When facing the face contour in a three-dimensional grid with a base plane, the three-dimensional grid on the right side of the base plane.
步骤405,对所提取的左三维网格和右三维网格进行拼接,生成目标人脸所对应的结果三维网格。 Step 405, splicing the extracted left and right three-dimensional grids to generate a resulting three-dimensional grid corresponding to the target face.
在本实施例中,基于步骤404中得到的左三维网格和右三维网格,上述执行主体可以对左三维网格和右三维网格进行拼接,生成目标人脸所对应的结果三维网格。In this embodiment, based on the left 3D mesh and the right 3D mesh obtained in step 404, the above-mentioned execution subject may splice the left 3D mesh and the right 3D mesh to generate a resultant 3D mesh corresponding to the target face .
需要说明的是,由于基准面过三维网格的中心线,且针对左人脸图像所对应的三维网格建立的基准面所经过的人脸图像上的点与针对右人脸图像所对应的三维网格建立的基准面所经过的人脸图像上的点指示人脸上的同一个点,所以左人脸图像所对应的三维网格中,位于基准面左侧的三维网格可以与右人脸图像所对应的三维网格中,位于基准面右侧的三维网格拼接成一个完整的人脸所对应的三维网格;左人脸图像所对应的三维网格中,位于基准面右侧的三维网格可以与右人脸图像所对应的三维网格中,位于基准面左侧的三维网格拼接成一个完整的人脸所对应的三维网格。It should be noted that since the reference plane passes through the center line of the three-dimensional grid, and the points on the face image passed by the reference plane established for the three-dimensional grid corresponding to the left face image are the same as those on the right face image. The points on the face image passed by the reference plane established by the three-dimensional grid indicate the same point on the face, so in the three-dimensional grid corresponding to the left face image, the three-dimensional grid on the left side of the reference plane can be In the three-dimensional grid corresponding to the face image, the three-dimensional grid located on the right side of the reference plane is spliced into a complete three-dimensional grid corresponding to the face; the three-dimensional grid corresponding to the left face image is located on the right side of the reference plane. The three-dimensional grid on the side can be spliced with the three-dimensional grid corresponding to the right face image, and the three-dimensional grid on the left side of the reference plane can be spliced into a complete three-dimensional grid corresponding to the face.
而实践中,左人脸图像可以记录更多位于基准面左侧的三维网格 所对应的人脸特征,右人脸图像可以记录更多位于基准面右侧的三维网格所对应的人脸特征,所以,在本实施例中,提取左人脸图像所对应的三维网格中的左三维网格和右人脸图像所对应的三维网格中的右三维网格,并对所提取的左三维网格和右三维网格进行拼接,生成目标人脸所对应的结果三维网格,可以使所生成的结果三维网格更为准确地表征目标人脸的人脸特征,提高所生成的结果三维网格的准确性。In practice, the left face image can record more face features corresponding to the three-dimensional grid on the left side of the reference plane, and the right face image can record more faces corresponding to the three-dimensional grid on the right side of the reference plane. Therefore, in this embodiment, the left three-dimensional grid in the three-dimensional grid corresponding to the left face image and the right three-dimensional grid in the three-dimensional grid corresponding to the right face image are extracted, and the extracted The left 3D mesh and the right 3D mesh are spliced to generate the result 3D mesh corresponding to the target face, which can make the generated result 3D mesh more accurately characterize the facial features of the target face and improve the generated The accuracy of the result of the three-dimensional grid.
从图4中可以看出,与图2对应的实施例相比,本实施例中的用于生成信息的方法的流程400突出了从左人脸图像所对应的三维网格中提取左三维网格,从右人脸图像所对应的三维网格中提取右三维网格,进而对左三维网格和右三维网格进行拼接,生成目标人脸所对应的结果三维网格的步骤。可以理解,由于遮挡等原因,左人脸图像可以记录更多位于基准面左侧的三维网格所对应的人脸特征,而右人脸图像可以记录更多位于基准面右侧的三维网格所对应的人脸特征,所以本实施例描述的方案,利用左人脸图像所对应的左三维网格和右人脸图像所对应的右三维网格,可以生成更为准确的、目标人脸所对应的结果三维网格,有助于进一步提高三维人脸重建的准确性。It can be seen from FIG. 4 that, compared with the embodiment corresponding to FIG. 2, the flow 400 of the method for generating information in this embodiment highlights the extraction of the left three-dimensional mesh from the three-dimensional mesh corresponding to the left face image. Grid, extracting the right three-dimensional grid from the three-dimensional grid corresponding to the right face image, and then splicing the left three-dimensional grid and the right three-dimensional grid to generate the resulting three-dimensional grid corresponding to the target face. It is understandable that due to occlusion and other reasons, the left face image can record more facial features corresponding to the three-dimensional grid on the left side of the reference plane, while the right face image can record more three-dimensional grids on the right side of the reference plane. Therefore, the solution described in this embodiment uses the left three-dimensional grid corresponding to the left face image and the right three-dimensional grid corresponding to the right face image to generate a more accurate, target face The corresponding resultant 3D mesh helps to further improve the accuracy of 3D face reconstruction.
进一步参考图5,作为对上述各图所示方法的实现,本公开提供了一种用于生成信息的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。With further reference to FIG. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a device for generating information. The device embodiment corresponds to the method embodiment shown in FIG. The device can be applied to various electronic devices.
如图5所示,本实施例的用于生成信息的装置500包括:图像获取单元501、第一生成单元502和第二生成单元503。其中,图像获取单元501被配置成获取对目标人脸进行拍摄所获得的左人脸图像和右人脸图像,其中,左人脸图像和右人脸图像为双目视觉图像;第一生成单元502被配置成对于左人脸图像和右人脸图像中的人脸图像,执行以下步骤:将该人脸图像输入预先训练的映射图生成模型,获得该人脸图像所对应的映射图,其中,映射图中的点与该人脸图像中的人脸关键点相对应;对于映射图中的点,基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标;基于所确定的、该人脸图像中的人脸关键点的三维坐标,生成该人脸图像所对应 的三维网格;第二生成单元503被配置成基于左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,生成目标人脸所对应的结果三维网格。As shown in FIG. 5, the apparatus 500 for generating information in this embodiment includes: an image acquiring unit 501, a first generating unit 502, and a second generating unit 503. Wherein, the image acquisition unit 501 is configured to acquire a left face image and a right face image obtained by photographing a target face, where the left face image and the right face image are binocular vision images; the first generating unit 502 is configured to perform the following steps for the face images in the left face image and the right face image: input the face image into a pre-trained map generation model to obtain a map corresponding to the face image, where , The point in the map corresponds to the key point of the face in the face image; for a point in the map, based on the coordinates of the point in the map and the pixel value of the point, determine the face corresponding to the point The three-dimensional coordinates of the key points; based on the determined three-dimensional coordinates of the key points of the face in the face image, generate a three-dimensional grid corresponding to the face image; the second generation unit 503 is configured to be based on the left face image The corresponding three-dimensional grid and the three-dimensional grid corresponding to the right face image are generated to generate the resulting three-dimensional grid corresponding to the target face.
在本实施例中,用于生成信息的装置500的图像获取单元501可以通过有线连接方式或者无线连接方式获取对目标人脸进行拍摄所获得的左人脸图像和右人脸图像。其中,目标人脸为待生成其所对应的三维网格的人脸。左人脸图像和右人脸图像为双目视觉图像。In this embodiment, the image acquisition unit 501 of the apparatus 500 for generating information may acquire the left face image and the right face image obtained by photographing the target face through a wired connection or a wireless connection. Among them, the target face is the face whose corresponding three-dimensional mesh is to be generated. The left face image and the right face image are binocular vision images.
在本实施例中,对于图像获取单元501得到的左人脸图像和右人脸图像中的每个人脸图像,第一生成单元502可以执行以下步骤:将该人脸图像输入预先训练的映射图生成模型,获得该人脸图像所对应的映射图;对于映射图中的点,基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标;基于所确定的、该人脸图像中的人脸关键点的三维坐标,生成该人脸图像所对应的三维网格。In this embodiment, for each of the left face image and the right face image obtained by the image acquisition unit 501, the first generation unit 502 may perform the following steps: input the face image into a pre-trained map Generate a model to obtain the map corresponding to the face image; for a point in the map, determine the three-dimensional coordinates of the face key point corresponding to the point based on the coordinates of the point in the map and the pixel value of the point; Based on the determined three-dimensional coordinates of the key points of the face in the face image, a three-dimensional grid corresponding to the face image is generated.
其中,映射图为用于确定人脸图像中的人脸关键点的三维坐标的图像。人脸关键点的三维坐标由人脸关键点在人脸图像中的位置坐标以及人脸关键点的深度值组成。人脸关键点的深度值可以是采集人脸图像时的人脸关键点到成像平面的距离。映射图中的点与该人脸图像中的人脸关键点相对应。实践中,人脸关键点可以是人脸中关键的点,具体的,可以为影响脸部轮廓或者五官形状的点。Wherein, the map is an image used to determine the three-dimensional coordinates of the key points of the face in the face image. The three-dimensional coordinates of the face key points are composed of the position coordinates of the face key points in the face image and the depth value of the face key points. The depth value of the face key point may be the distance from the face key point to the imaging plane when the face image is collected. The points in the map correspond to the key points of the face in the face image. In practice, the key points of the human face may be key points in the human face, specifically, the points that affect the contour of the face or the shape of the facial features.
在本实施例中,映射图生成模型可以用于表征人脸图像和人脸图像所对应的映射图的对应关系。需要说明的是,映射图生成模型对应一个预先确定的映射关系或映射原理,该映射关系或映射原理用于确定输入映射图生成模型的人脸图像中的人脸关键点在映射图生成模型输出的映射图中的映射位置。In this embodiment, the map generation model can be used to characterize the correspondence between the face image and the map corresponding to the face image. It should be noted that the map generation model corresponds to a predetermined mapping relationship or mapping principle, which is used to determine the key points of the face in the face image input to the map generation model and output the map generation model The location of the map in the map.
在本实施例中,基于第一生成单元502得到的左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,第二生成单元503可以生成目标人脸所对应的结果三维网格。其中,结果三维网格为待对其进行渲染等操作,以实现针对目标人脸的三维人脸重建的三维网格。In this embodiment, based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image obtained by the first generating unit 502, the second generating unit 503 can generate the result corresponding to the target face Three-dimensional grid. Among them, the resultant three-dimensional mesh is a three-dimensional mesh that is to be subjected to operations such as rendering to realize the three-dimensional face reconstruction of the target face.
在本实施例的一些可选的实现方式中,第一生成单元502可以进 一步被配置成:基于该点的像素值,确定该点对应的人脸关键点的深度值;基于该点在映射图中的坐标确定该点对应的人脸关键点在该人脸图像中的坐标;基于该点对应的人脸关键点的深度值和该点对应的人脸关键点在该人脸图像中的坐标,确定该点对应的人脸关键点的三维坐标。In some optional implementations of this embodiment, the first generating unit 502 may be further configured to: determine the depth value of the key point of the face corresponding to the point based on the pixel value of the point; Determine the coordinates of the face key point corresponding to the point in the face image; based on the depth value of the face key point corresponding to the point and the coordinate of the face key point corresponding to the point in the face image To determine the three-dimensional coordinates of the key points on the face corresponding to this point.
在本实施例的一些可选的实现方式中,第一生成单元502可以进一步被配置成:响应于确定该点的像素值大于等于预设阈值,将该点的像素值确定为该点对应的人脸关键点的深度值。In some optional implementation manners of this embodiment, the first generating unit 502 may be further configured to: in response to determining that the pixel value of the point is greater than or equal to a preset threshold, determine the pixel value of the point as the corresponding The depth value of the key points of the face.
在本实施例的一些可选的实现方式中,第一生成单元502可以进一步被配置成:响应于确定该点的像素值小于预设阈值,将预设阈值确定为该点对应的人脸关键点的深度值。In some optional implementations of this embodiment, the first generating unit 502 may be further configured to: in response to determining that the pixel value of the point is less than a preset threshold, determine the preset threshold as the face key corresponding to the point. The depth value of the point.
在本实施例的一些可选的实现方式中,第二生成单元503可以包括:建立模块(图中未示出),被配置成分别过左人脸图像所对应的三维网格的中心线和右人脸图像所对应的三维网格的中心线,建立基准面,其中,基准面沿中心线贯穿三维网格,将三维网格划分成两部分;提取模块(图中未示出),被配置成提取左人脸图像所对应的三维网格中,位于基准面左侧的三维网格作为左三维网格,以及提取右人脸图像所对应的三维网格中,位于基准面右侧的三维网格作为右三维网格;拼接模块(图中未示出),被配置成对所提取的左三维网格和右三维网格进行拼接,生成目标人脸所对应的结果三维网格。In some optional implementations of this embodiment, the second generating unit 503 may include: a building module (not shown in the figure), configured to respectively pass the center line and the center line of the three-dimensional grid corresponding to the left face image The centerline of the three-dimensional grid corresponding to the right face image is used to establish a reference plane, where the reference plane penetrates the three-dimensional grid along the centerline, and divides the three-dimensional grid into two parts; the extraction module (not shown in the figure) is It is configured to extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid located on the left side of the reference plane as the left three-dimensional grid, and extract the three-dimensional grid corresponding to the right face image, which is located on the right side of the reference plane The three-dimensional grid is used as the right three-dimensional grid; the splicing module (not shown in the figure) is configured to splice the extracted left and right three-dimensional grids to generate the resulting three-dimensional grid corresponding to the target face.
在本实施例的一些可选的实现方式中,映射图生成模型可以通过以下步骤训练获得:获取训练样本集,其中,训练样本包括样本人脸图像、样本人脸图像中的人脸关键点的坐标和深度值;对于训练样本集中的训练样本,基于该训练样本中的人脸关键点的坐标,确定该训练样本中的人脸关键点在待构建的映射图中的映射位置,以及基于该训练样本中的人脸关键点的深度值,确定待构建的映射图中相应的映射位置的像素值,利用所确定的映射位置和映射位置的像素值,构建与该训练样本中的样本人脸图像对应的映射图;利用机器学习方法,将训练样本集中的训练样本的样本人脸图像作为输入,将所输入的样本人脸图像所对应的映射图作为期望输出,训练得到映射图生成模型。In some optional implementations of this embodiment, the map generation model can be obtained by training in the following steps: Obtain a training sample set, where the training sample includes the sample face image, the key points of the face in the sample face image Coordinates and depth values; for the training samples in the training sample set, based on the coordinates of the face key points in the training sample, determine the mapping position of the face key points in the training sample in the map to be constructed, and based on the The depth value of the key points of the face in the training sample is determined, the pixel value of the corresponding mapping position in the map to be constructed is determined, and the determined mapping position and the pixel value of the mapping position are used to construct the sample face in the training sample The map corresponding to the image; using the machine learning method, the sample face image of the training sample in the training sample set is used as input, and the map corresponding to the input sample face image is used as the desired output, and the map generation model is trained.
在本实施例的一些可选的实现方式中,训练样本集中的训练样本可以通过以下步骤生成:利用深度图采集装置采集样本人脸的人脸深度图,以及获取人脸深度图所对应的人脸图像;对人脸深度图所对应的人脸图像进行人脸关键点检测,以确定人脸深度图所对应的人脸图像中的人脸关键点的坐标;将人脸深度图所对应的人脸图像、所确定的人脸关键点的坐标、基于人脸深度图确定的人脸关键点的深度值汇总为训练样本。In some optional implementations of this embodiment, the training samples in the training sample set can be generated by the following steps: using a depth map acquisition device to collect the face depth map of the sample face, and obtain the person corresponding to the face depth map Face image; perform face key point detection on the face image corresponding to the face depth map to determine the coordinates of the face key points in the face image corresponding to the face depth map; change the face depth map to The face image, the determined coordinates of the key points of the face, and the depth values of the key points of the face determined based on the face depth map are summarized as training samples.
可以理解的是,该装置500中记载的诸单元与参考图2描述的方法中的各个步骤相对应。由此,上文针对方法描述的操作、特征以及产生的有益效果同样适用于装置500及其中包含的单元,在此不再赘述。It can be understood that the units recorded in the device 500 correspond to the steps in the method described with reference to FIG. 2. Therefore, the operations, features, and beneficial effects produced by the method described above are also applicable to the device 500 and the units contained therein, and will not be repeated here.
本公开的上述实施例提供的装置500通过获取对目标人脸进行拍摄所获得的左人脸图像和右人脸图像,而后对于左人脸图像和右人脸图像中的人脸图像,执行以下步骤:将该人脸图像输入预先训练的映射图生成模型,获得该人脸图像所对应的映射图;对于映射图中的点,基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标;基于所确定的、该人脸图像中的人脸关键点的三维坐标,生成该人脸图像所对应的三维网格,最后基于左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,生成目标人脸所对应的结果三维网格。可以理解,由于遮挡和拍摄角度等原因,左人脸图像和右人脸图像可以记录不同角度的人脸特征,所以这里,利用左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,可以生成更为准确的、目标人脸所对应的结果三维网格,有助于提高三维人脸重建的准确性。The apparatus 500 provided by the above-mentioned embodiment of the present disclosure obtains the left face image and the right face image obtained by photographing the target face, and then executes the following for the face images in the left face image and the right face image Steps: input the face image into the pre-trained map generation model to obtain the map corresponding to the face image; for a point in the map, based on the coordinates of the point in the map and the pixel value of the point, Determine the three-dimensional coordinates of the key points of the face corresponding to the point; based on the determined three-dimensional coordinates of the key points of the face in the face image, generate the three-dimensional grid corresponding to the face image, and finally based on the left face image The corresponding three-dimensional grid and the three-dimensional grid corresponding to the right face image are generated to generate the resulting three-dimensional grid corresponding to the target face. It can be understood that due to occlusion and shooting angles, the left face image and the right face image can record facial features at different angles, so here, the three-dimensional grid corresponding to the left face image and the right face image corresponding to the The 3D mesh can generate a more accurate 3D mesh corresponding to the target face, which helps to improve the accuracy of 3D face reconstruction.
下面参考图6,其示出了适于用来实现本公开的实施例的电子设备(例如图1中的服务器或终端设备)600的结构示意图。本公开的实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端 以及诸如数字TV、台式计算机等等的固定终端。图6示出的终端设备或服务器仅仅是一个示例,不应对本公开的实施例的功能和使用范围带来任何限制。Reference is now made to FIG. 6, which shows a schematic structural diagram of an electronic device (such as the server or terminal device in FIG. 1) 600 suitable for implementing embodiments of the present disclosure. Terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals ( For example, mobile terminals such as car navigation terminals and fixed terminals such as digital TVs and desktop computers. The terminal device or server shown in FIG. 6 is only an example, and should not bring any limitation to the function and use range of the embodiments of the present disclosure.
如图6所示,电子设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有电子设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。As shown in FIG. 6, the electronic device 600 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which can be loaded into a random access device according to a program stored in a read-only memory (ROM) 602 or from a storage device 608. The program in the memory (RAM) 603 executes various appropriate actions and processing. The RAM 603 also stores various programs and data required for the operation of the electronic device 600. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to the bus 604.
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许电子设备600与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各种装置的电子设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。图6中示出的每个方框可以代表一个装置,也可以根据需要代表多个装置。Generally, the following devices can be connected to the I/O interface 605: including input devices 606 such as touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, liquid crystal display (LCD), speakers, vibration An output device 607 such as a device; a storage device 608 such as a magnetic tape and a hard disk; and a communication device 609. The communication device 609 may allow the electronic device 600 to perform wireless or wired communication with other devices to exchange data. Although FIG. 6 shows an electronic device 600 having various devices, it should be understood that it is not required to implement or have all the illustrated devices. It may alternatively be implemented or provided with more or fewer devices. Each block shown in FIG. 6 may represent one device, or may represent multiple devices as needed.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开的实施例的方法中限定的上述功能。需要说明的是,本公开的实施例所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算 机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开的实施例中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开的实施例中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart can be implemented as a computer software program. For example, the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602. When the computer program is executed by the processing device 601, the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed. It should be noted that the computer-readable medium described in the embodiment of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the embodiments of the present disclosure, the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device. In the embodiments of the present disclosure, the computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium. The computer-readable signal medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device . The program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取对目标人脸进行拍摄所获得的左人脸图像和右人脸图像,其中,左人脸图像和右人脸图像为双目视觉图像;对于左人脸图像和右人脸图像中的人脸图像,执行以下步骤:将该人脸图像输入预先训练的映射图生成模型,获得该人脸图像所对应的映射图,其中,映射图中的点与该人脸图像中的人脸关键点相对应;对于映射图中的点,基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标;基于所确定的、该人脸图像中的人脸关键点的三维坐标,生成该人脸图像所对应的三维网格;基于左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,生成目标人脸所对应的结果三维网格。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is caused to: acquire the left face image and the right face obtained by shooting the target face Images, where the left face image and the right face image are binocular vision images; for the face images in the left face image and the right face image, perform the following steps: input the face image into a pre-trained map Generate a model to obtain the map corresponding to the face image, where the points in the map correspond to the key points of the face in the face image; for the points in the map, based on the points in the map The coordinates and the pixel value of the point are used to determine the three-dimensional coordinates of the key points of the face corresponding to the point; based on the determined three-dimensional coordinates of the key points of the face in the face image, the three-dimensional network corresponding to the face image is generated Grid; Based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, generate the resulting three-dimensional grid corresponding to the target face.
可以以一种或多种程序设计语言或其组合来编写用于执行本公开 的实施例的操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。The computer program code used to perform the operations of the embodiments of the present disclosure can be written in one or more programming languages or a combination thereof, the programming languages including object-oriented programming languages such as Java, Smalltalk, C++, It also includes conventional procedural programming languages-such as "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the accompanying drawings illustrate the possible implementation architecture, functions, and operations of the system, method, and computer program product according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram can represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logic function Executable instructions. It should also be noted that, in some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
描述于本公开的实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括图像获取单元、第一生成单元和第二生成单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,图像获取单元还可以被描述为“获取人脸图像的单元”。The units involved in the embodiments described in the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. The described unit may also be provided in the processor, for example, it may be described as: a processor includes an image acquisition unit, a first generation unit, and a second generation unit. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances. For example, the image acquisition unit can also be described as "a unit for acquiring a face image."
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开的实施例中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在 不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开的实施例中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present disclosure and an explanation of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in the embodiments of the present disclosure is not limited to the technical solution formed by the specific combination of the above technical features, and should also cover the above-mentioned inventive concept without departing from the above-mentioned inventive concept. Other technical solutions formed by any combination of technical features or equivalent features. For example, the above-mentioned features and the technical features disclosed in the embodiments of the present disclosure (but not limited to) having similar functions are replaced with each other to form a technical solution.

Claims (16)

  1. 一种用于生成信息的方法,包括:A method for generating information including:
    获取对目标人脸进行拍摄所获得的左人脸图像和右人脸图像,其中,左人脸图像和右人脸图像为双目视觉图像;Acquire a left face image and a right face image obtained by shooting a target face, where the left face image and the right face image are binocular vision images;
    对于左人脸图像和右人脸图像中的人脸图像,执行以下步骤:将该人脸图像输入预先训练的映射图生成模型,获得该人脸图像所对应的映射图,其中,映射图中的点与该人脸图像中的人脸关键点相对应;对于映射图中的点,基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标;基于所确定的、该人脸图像中的人脸关键点的三维坐标,生成该人脸图像所对应的三维网格;For the face images in the left face image and the right face image, perform the following steps: input the face image into the pre-trained map generation model to obtain the map corresponding to the face image, where the map The point corresponding to the face key point in the face image; for a point in the map, based on the point's coordinates in the map and the pixel value of the point, determine the three-dimensional face key point corresponding to the point Coordinates; based on the determined three-dimensional coordinates of the key points of the face in the face image, generate the three-dimensional grid corresponding to the face image;
    基于左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,生成所述目标人脸所对应的结果三维网格。Based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, the resulting three-dimensional grid corresponding to the target face is generated.
  2. 根据权利要求1所述的方法,其中,所述基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标,包括:The method according to claim 1, wherein the determining the three-dimensional coordinates of the key points of the face corresponding to the point based on the coordinates of the point in the map and the pixel value of the point comprises:
    基于该点的像素值,确定该点对应的人脸关键点的深度值;Based on the pixel value of the point, determine the depth value of the key point of the face corresponding to the point;
    基于该点在映射图中的坐标确定该点对应的人脸关键点在该人脸图像中的坐标;Determine the coordinates of the key point of the face corresponding to the point in the face image based on the coordinates of the point in the map;
    基于该点对应的人脸关键点的深度值和该点对应的人脸关键点在该人脸图像中的坐标,确定该点对应的人脸关键点的三维坐标。Based on the depth value of the face key point corresponding to the point and the coordinate of the face key point corresponding to the point in the face image, the three-dimensional coordinates of the face key point corresponding to the point are determined.
  3. 根据权利要求2所述的方法,其中,所述基于该点的像素值,确定该点对应的人脸关键点的深度值,包括:The method according to claim 2, wherein the determining the depth value of the key point of the face corresponding to the point based on the pixel value of the point comprises:
    响应于确定该点的像素值大于等于预设阈值,将该点的像素值确定为该点对应的人脸关键点的深度值。In response to determining that the pixel value of the point is greater than or equal to the preset threshold, the pixel value of the point is determined as the depth value of the key point of the face corresponding to the point.
  4. 根据权利要求3所述的方法,其中,所述基于该点的像素值,确定该点对应的人脸关键点的深度值,还包括:The method according to claim 3, wherein the determining the depth value of the key point of the face corresponding to the point based on the pixel value of the point further comprises:
    响应于确定该点的像素值小于所述预设阈值,将所述预设阈值确定为该点对应的人脸关键点的深度值。In response to determining that the pixel value of the point is less than the preset threshold, the preset threshold is determined as the depth value of the key point of the face corresponding to the point.
  5. 根据权利要求1所述的方法,其中,所述基于左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,生成所述目标人脸所对应的结果三维网格,包括:The method according to claim 1, wherein the generating a resultant three-dimensional grid corresponding to the target face based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image, include:
    分别过左人脸图像所对应的三维网格的中心线和右人脸图像所对应的三维网格的中心线,建立基准面,其中,基准面沿中心线贯穿三维网格,将三维网格划分成两部分;The center line of the three-dimensional grid corresponding to the left face image and the center line of the three-dimensional grid corresponding to the right face image are respectively passed to establish a reference plane, wherein the reference plane penetrates the three-dimensional grid along the center line, and the three-dimensional grid Divided into two parts;
    提取左人脸图像所对应的三维网格中,位于基准面左侧的三维网格作为左三维网格,以及提取右人脸图像所对应的三维网格中,位于基准面右侧的三维网格作为右三维网格;Extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid located on the left side of the reference plane as the left three-dimensional grid, and extract the three-dimensional grid corresponding to the right face image, the three-dimensional grid located on the right side of the reference plane Grid as the right three-dimensional grid;
    对所提取的左三维网格和右三维网格进行拼接,生成所述目标人脸所对应的结果三维网格。The extracted left three-dimensional grid and the right three-dimensional grid are spliced to generate a resulting three-dimensional grid corresponding to the target face.
  6. 根据权利要求1-5之一所述的方法,其中,所述映射图生成模型通过以下步骤训练获得:The method according to any one of claims 1 to 5, wherein the map generation model is obtained by training in the following steps:
    获取训练样本集,其中,训练样本包括样本人脸图像、样本人脸图像中的人脸关键点的坐标和深度值;Obtain a training sample set, where the training sample includes the sample face image, the coordinates and depth values of key points of the face in the sample face image;
    对于训练样本集中的训练样本,基于该训练样本中的人脸关键点的坐标,确定该训练样本中的人脸关键点在待构建的映射图中的映射位置,以及基于该训练样本中的人脸关键点的深度值,确定待构建的映射图中相应的映射位置的像素值,利用所确定的映射位置和映射位置的像素值,构建与该训练样本中的样本人脸图像对应的映射图;For the training samples in the training sample set, based on the coordinates of the face key points in the training sample, determine the mapping position of the face key points in the training sample in the map to be constructed, and based on the people in the training sample The depth value of the key point of the face, determine the pixel value of the corresponding mapping position in the mapping image to be constructed, and use the determined mapping position and the pixel value of the mapping position to construct a mapping image corresponding to the sample face image in the training sample ;
    利用机器学习方法,将训练样本集中的训练样本的样本人脸图像作为输入,将所输入的样本人脸图像所对应的映射图作为期望输出,训练得到映射图生成模型。Using the machine learning method, the sample face image of the training sample in the training sample set is taken as input, and the map corresponding to the input sample face image is taken as the desired output, and the map generation model is trained.
  7. 根据权利要求6所述的方法,其中,所述训练样本集中的训练样本通过以下步骤生成:The method according to claim 6, wherein the training samples in the training sample set are generated by the following steps:
    利用深度图采集装置采集样本人脸的人脸深度图,以及获取所述人脸深度图所对应的人脸图像;Using a depth map acquisition device to collect a face depth map of the sample face, and obtain a face image corresponding to the face depth map;
    对所述人脸深度图所对应的人脸图像进行人脸关键点检测,以确定所述人脸深度图所对应的人脸图像中的人脸关键点的坐标;Performing face key point detection on the face image corresponding to the face depth map to determine the coordinates of the face key points in the face image corresponding to the face depth map;
    将所述人脸深度图所对应的人脸图像、所确定的人脸关键点的坐标、基于人脸深度图确定的人脸关键点的深度值汇总为训练样本。The face image corresponding to the face depth map, the determined coordinates of the face key points, and the depth values of the face key points determined based on the face depth map are summarized as training samples.
  8. 一种用于生成信息的装置,包括:A device for generating information, including:
    图像获取单元,被配置成获取对目标人脸进行拍摄所获得的左人脸图像和右人脸图像,其中,左人脸图像和右人脸图像为双目视觉图像;The image acquisition unit is configured to acquire the left face image and the right face image obtained by photographing the target face, wherein the left face image and the right face image are binocular vision images;
    第一生成单元,被配置成对于左人脸图像和右人脸图像中的人脸图像,执行以下步骤:将该人脸图像输入预先训练的映射图生成模型,获得该人脸图像所对应的映射图,其中,映射图中的点与该人脸图像中的人脸关键点相对应;对于映射图中的点,基于该点在映射图中的坐标和该点的像素值,确定该点对应的人脸关键点的三维坐标;基于所确定的、该人脸图像中的人脸关键点的三维坐标,生成该人脸图像所对应的三维网格;The first generating unit is configured to perform the following steps for the face images in the left face image and the right face image: input the face image into a pre-trained map generation model to obtain the corresponding face image A map, where the points in the map correspond to the key points of the face in the face image; for the points in the map, the point is determined based on the coordinates of the point in the map and the pixel value of the point The corresponding three-dimensional coordinates of the key points of the face; based on the determined three-dimensional coordinates of the key points of the face in the face image, a three-dimensional grid corresponding to the face image is generated;
    第二生成单元,被配置成基于左人脸图像所对应的三维网格和右人脸图像所对应的三维网格,生成所述目标人脸所对应的结果三维网格。The second generating unit is configured to generate the resulting three-dimensional grid corresponding to the target face based on the three-dimensional grid corresponding to the left face image and the three-dimensional grid corresponding to the right face image.
  9. 根据权利要求8所述的装置,其中,所述第一生成单元进一步被配置成:The apparatus according to claim 8, wherein the first generating unit is further configured to:
    基于该点的像素值,确定该点对应的人脸关键点的深度值;Based on the pixel value of the point, determine the depth value of the key point of the face corresponding to the point;
    基于该点在映射图中的坐标确定该点对应的人脸关键点在该人脸图像中的坐标;Determine the coordinates of the key point of the face corresponding to the point in the face image based on the coordinates of the point in the map;
    基于该点对应的人脸关键点的深度值和该点对应的人脸关键点在该人脸图像中的坐标,确定该点对应的人脸关键点的三维坐标。Based on the depth value of the face key point corresponding to the point and the coordinate of the face key point corresponding to the point in the face image, the three-dimensional coordinates of the face key point corresponding to the point are determined.
  10. 根据权利要求9所述的装置,其中,所述第一生成单元进一步被配置成:The device according to claim 9, wherein the first generating unit is further configured to:
    响应于确定该点的像素值大于等于预设阈值,将该点的像素值确定为该点对应的人脸关键点的深度值。In response to determining that the pixel value of the point is greater than or equal to the preset threshold, the pixel value of the point is determined as the depth value of the key point of the face corresponding to the point.
  11. 根据权利要求10所述的装置,其中,所述第一生成单元进一步被配置成:The device according to claim 10, wherein the first generating unit is further configured to:
    响应于确定该点的像素值小于所述预设阈值,将所述预设阈值确定为该点对应的人脸关键点的深度值。In response to determining that the pixel value of the point is less than the preset threshold, the preset threshold is determined as the depth value of the key point of the face corresponding to the point.
  12. 根据权利要求8所述的装置,其中,所述第二生成单元包括:The device according to claim 8, wherein the second generating unit comprises:
    建立模块,被配置成分别过左人脸图像所对应的三维网格的中心线和右人脸图像所对应的三维网格的中心线,建立基准面,其中,基准面沿中心线贯穿三维网格,将三维网格划分成两部分;The establishment module is configured to respectively pass the center line of the three-dimensional grid corresponding to the left face image and the center line of the three-dimensional grid corresponding to the right face image to establish a reference plane, wherein the reference plane penetrates the three-dimensional network along the center line Grid, divide the three-dimensional grid into two parts;
    提取模块,被配置成提取左人脸图像所对应的三维网格中,位于基准面左侧的三维网格作为左三维网格,以及提取右人脸图像所对应的三维网格中,位于基准面右侧的三维网格作为右三维网格;The extraction module is configured to extract the three-dimensional grid corresponding to the left face image, the three-dimensional grid located on the left side of the reference plane as the left three-dimensional grid, and extract the three-dimensional grid corresponding to the right face image, which is located in the reference The three-dimensional grid on the right side of the surface is used as the right three-dimensional grid;
    拼接模块,被配置成对所提取的左三维网格和右三维网格进行拼接,生成所述目标人脸所对应的结果三维网格。The splicing module is configured to splice the extracted left and right three-dimensional grids to generate a resulting three-dimensional grid corresponding to the target face.
  13. 根据权利要求8-12之一所述的装置,其中,所述映射图生成模型通过以下步骤训练获得:The apparatus according to any one of claims 8-12, wherein the map generation model is obtained by training in the following steps:
    获取训练样本集,其中,训练样本包括样本人脸图像、样本人脸图像中的人脸关键点的坐标和深度值;Obtain a training sample set, where the training sample includes the sample face image, the coordinates and depth values of key points of the face in the sample face image;
    对于训练样本集中的训练样本,基于该训练样本中的人脸关键点的坐标,确定该训练样本中的人脸关键点在待构建的映射图中的映射位置,以及基于该训练样本中的人脸关键点的深度值,确定待构建的映射图中相应的映射位置的像素值,利用所确定的映射位置和映射位置的像素值,构建与该训练样本中的样本人脸图像对应的映射图;For the training samples in the training sample set, based on the coordinates of the face key points in the training sample, determine the mapping position of the face key points in the training sample in the map to be constructed, and based on the people in the training sample The depth value of the key point of the face, determine the pixel value of the corresponding mapping position in the mapping image to be constructed, and use the determined mapping position and the pixel value of the mapping position to construct a mapping image corresponding to the sample face image in the training sample ;
    利用机器学习方法,将训练样本集中的训练样本的样本人脸图像 作为输入,将所输入的样本人脸图像所对应的映射图作为期望输出,训练得到映射图生成模型。Using the machine learning method, the sample face image of the training sample in the training sample set is used as input, and the map corresponding to the input sample face image is used as the desired output, and the map generation model is trained.
  14. 根据权利要求13所述的装置,其中,所述训练样本集中的训练样本通过以下步骤生成:The device according to claim 13, wherein the training samples in the training sample set are generated by the following steps:
    利用深度图采集装置采集样本人脸的人脸深度图,以及获取所述人脸深度图所对应的人脸图像;Using a depth map acquisition device to collect a face depth map of the sample face, and obtain a face image corresponding to the face depth map;
    对所述人脸深度图所对应的人脸图像进行人脸关键点检测,以确定所述人脸深度图所对应的人脸图像中的人脸关键点的坐标;Performing face key point detection on the face image corresponding to the face depth map to determine the coordinates of the face key points in the face image corresponding to the face depth map;
    将所述人脸深度图所对应的人脸图像、所确定的人脸关键点的坐标、基于人脸深度图确定的人脸关键点的深度值汇总为训练样本。The face image corresponding to the face depth map, the determined coordinates of the face key points, and the depth values of the face key points determined based on the face depth map are summarized as training samples.
  15. 一种电子设备,包括:An electronic device including:
    一个或多个处理器;One or more processors;
    存储装置,其上存储有一个或多个程序,A storage device on which one or more programs are stored,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7中任一所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1-7.
  16. 一种计算机可读介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1-7中任一所述的方法。A computer-readable medium having a computer program stored thereon, wherein the program is executed by a processor to implement the method according to any one of claims 1-7.
PCT/CN2019/126382 2019-01-31 2019-12-18 Method and apparatus for generating information WO2020155908A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910100632.8 2019-01-31
CN201910100632.8A CN109816791B (en) 2019-01-31 2019-01-31 Method and apparatus for generating information

Publications (1)

Publication Number Publication Date
WO2020155908A1 true WO2020155908A1 (en) 2020-08-06

Family

ID=66606329

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/126382 WO2020155908A1 (en) 2019-01-31 2019-12-18 Method and apparatus for generating information

Country Status (2)

Country Link
CN (1) CN109816791B (en)
WO (1) WO2020155908A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109816791B (en) * 2019-01-31 2020-04-28 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN111652022B (en) * 2019-06-26 2023-09-05 广州虎牙科技有限公司 Image data display method, image data live broadcast device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914806A (en) * 2013-01-09 2014-07-09 三星电子株式会社 Display apparatus and control method for adjusting the eyes of a photographed user
CN106910222A (en) * 2017-02-15 2017-06-30 中国科学院半导体研究所 Face three-dimensional rebuilding method based on binocular stereo vision
CN109118579A (en) * 2018-08-03 2019-01-01 北京微播视界科技有限公司 The method, apparatus of dynamic generation human face three-dimensional model, electronic equipment
CN109272543A (en) * 2018-09-21 2019-01-25 北京字节跳动网络技术有限公司 Method and apparatus for generating model
CN109816791A (en) * 2019-01-31 2019-05-28 北京字节跳动网络技术有限公司 Method and apparatus for generating information

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866497A (en) * 2010-06-18 2010-10-20 北京交通大学 Binocular stereo vision based intelligent three-dimensional human face rebuilding method and system
US8861800B2 (en) * 2010-07-19 2014-10-14 Carnegie Mellon University Rapid 3D face reconstruction from a 2D image and methods using such rapid 3D face reconstruction
US20170032565A1 (en) * 2015-07-13 2017-02-02 Shenzhen University Three-dimensional facial reconstruction method and system
US10489973B2 (en) * 2015-08-17 2019-11-26 Cubic Corporation 3D face reconstruction from gate camera
CN108921926B (en) * 2018-07-02 2020-10-09 云从科技集团股份有限公司 End-to-end three-dimensional face reconstruction method based on single image

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914806A (en) * 2013-01-09 2014-07-09 三星电子株式会社 Display apparatus and control method for adjusting the eyes of a photographed user
CN106910222A (en) * 2017-02-15 2017-06-30 中国科学院半导体研究所 Face three-dimensional rebuilding method based on binocular stereo vision
CN109118579A (en) * 2018-08-03 2019-01-01 北京微播视界科技有限公司 The method, apparatus of dynamic generation human face three-dimensional model, electronic equipment
CN109272543A (en) * 2018-09-21 2019-01-25 北京字节跳动网络技术有限公司 Method and apparatus for generating model
CN109816791A (en) * 2019-01-31 2019-05-28 北京字节跳动网络技术有限公司 Method and apparatus for generating information

Also Published As

Publication number Publication date
CN109816791B (en) 2020-04-28
CN109816791A (en) 2019-05-28

Similar Documents

Publication Publication Date Title
CN111368685B (en) Method and device for identifying key points, readable medium and electronic equipment
CN109887003B (en) Method and equipment for carrying out three-dimensional tracking initialization
CN106846497B (en) Method and device for presenting three-dimensional map applied to terminal
CN109754464B (en) Method and apparatus for generating information
WO2020211573A1 (en) Method and device for processing image
JP7273129B2 (en) Lane detection method, device, electronic device, storage medium and vehicle
CN112733820B (en) Obstacle information generation method and device, electronic equipment and computer readable medium
CN111414879B (en) Face shielding degree identification method and device, electronic equipment and readable storage medium
CN110059623B (en) Method and apparatus for generating information
CN110059624B (en) Method and apparatus for detecting living body
US20210029486A1 (en) Method and Device for Processing Audio Signal
US20180270603A1 (en) Systems and methods for non-parametric processing of head geometry for hrtf personalization
CN115690382B (en) Training method of deep learning model, and method and device for generating panorama
US11561651B2 (en) Virtual paintbrush implementing method and apparatus, and computer readable storage medium
WO2020253716A1 (en) Image generation method and device
WO2023207379A1 (en) Image processing method and apparatus, device and storage medium
WO2020155908A1 (en) Method and apparatus for generating information
CN115439543A (en) Method for determining hole position and method for generating three-dimensional model in metauniverse
JP2023526899A (en) Methods, devices, media and program products for generating image inpainting models
CN112037305B (en) Method, device and storage medium for reconstructing tree-like organization in image
CN111310595B (en) Method and device for generating information
CN111368668B (en) Three-dimensional hand recognition method and device, electronic equipment and storage medium
WO2024041235A1 (en) Image processing method and apparatus, device, storage medium and program product
WO2023098649A1 (en) Video generation method and apparatus, and device and storage medium
WO2023030381A1 (en) Three-dimensional human head reconstruction method and apparatus, and device and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19913091

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 06/12/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19913091

Country of ref document: EP

Kind code of ref document: A1