WO2020019591A1

WO2020019591A1 - Method and device used for generating information

Info

Publication number: WO2020019591A1
Application number: PCT/CN2018/116182
Authority: WO
Inventors: 陈日伟
Original assignee: 北京字节跳动网络技术有限公司
Priority date: 2018-07-27
Filing date: 2018-11-19
Publication date: 2020-01-30
Also published as: CN109034069A; CN109034069B

Abstract

Disclosed by embodiments of the present application are a method and device used for generating information. One specific manner of embodying the method comprises: acquiring an image to be identified; extracting a facial image from the image to be identified, and inputting the extracted facial image to a pre-trained first identification model to obtain an identification result corresponding to the extracted facial image, wherein the identification result is used for characterizing the face type corresponding to the facial image; choosing a candidate identification model that matches the obtained identification result from a candidate identification model set to serve as a second identification model, wherein the candidate identification model in the candidate identification model set is a pre-trained model used for identifying different types of faces so as to generate key point information; inputting the extracted facial image to the second identification model to obtain the key point information corresponding to the extracted facial image, wherein the key point information is used for characterizing the position in the facial image of a key point in the facial image. The manner of embodiment increases the comprehensiveness and accuracy of information generation.

Description

Method and device for generating information

This patent application claims the priority of a Chinese patent application filed on July 27, 2018, with application number 201810846313.7, the applicant being Beijing BYTE Network Technology Co., Ltd., and the invention name "Method and Device for Generating Information" The entire application is incorporated herein by reference.

Technical field

Embodiments of the present application relate to the field of computer technology, and in particular, to a method and an apparatus for generating information.

Background technique

The key points of the face refer to the points in the face that have obvious semantic discrimination, such as the points corresponding to the nose and the points corresponding to the eyes.

At present, the detection of key points on a face usually refers to the detection of key points on a face. By detecting the key points of the face, functions such as adding special effects, 3D modeling of the face, and taking pictures of the beauty can be realized.

Summary of the Invention

The embodiments of the present application provide a method and device for generating information.

In a first aspect, an embodiment of the present application provides a method for generating information. The method includes: acquiring an image to be identified, where the image to be identified includes a face image; extracting a face image from the image to be identified, and extracting the extracted image. Input the pre-trained first recognition model of the face image to obtain the recognition result corresponding to the extracted face image, wherein the recognition result is used to characterize the category of the face corresponding to the face image; select and obtain from a candidate recognition model set The candidate recognition model that matches the recognition result of the second recognition model is used as the second recognition model. The candidate recognition model in the candidate recognition model set is a pre-trained model for identifying different classes of faces to generate key point information. The face image is input to the selected second recognition model to obtain key point information corresponding to the extracted face image, wherein the key point information is used to characterize the position of the key point in the face image in the face image.

In some embodiments, inputting the extracted face image into a pre-trained first recognition model to obtain a recognition result corresponding to the extracted face image includes: inputting the extracted face image into a pre-trained first recognition model, Obtain the recognition result and reference keypoint information corresponding to the extracted face image, wherein the reference keypoint information is used to characterize the position of the reference keypoint in the face image in the face image; and input the extracted face image into the selected Obtaining the keypoint information corresponding to the extracted face image includes: inputting the extracted face image and the obtained reference keypoint information into the selected second recognition model to obtain the extracted face image Corresponding key point information.

In some embodiments, extracting a face image from an image to be identified includes: inputting the image to be identified into a pre-trained third recognition model, and obtaining a position for characterizing a position of a face image in the image to be identified in the image to be identified Information; based on the obtained position information, a face image is extracted from the image to be identified.

In some embodiments, inputting the extracted face image into the selected second recognition model, and obtaining key point information corresponding to the extracted face image, includes: inputting the extracted face image into the selected second recognition model. To obtain key point information and matching information corresponding to the extracted face image, where the matching information includes a degree of matching used to characterize the category of the face corresponding to the input face image and the category of the face corresponding to the second recognition model Match index.

In some embodiments, obtaining the image to be identified includes: selecting an image to be identified from an image sequence corresponding to the target video, where the target video is a video obtained by photographing a face.

In some embodiments, after inputting the extracted face image into the selected second recognition model, and obtaining keypoint information and matching information corresponding to the extracted face image, the method further includes: selecting an image sequence located in the image sequence. The image after the image to be recognized and adjacent to the image to be recognized is a candidate to be recognized image; a face image is extracted from the candidate to be recognized image as a candidate face image, and the extracted face image in the image to be recognized is used as a reference face Image; determining whether the matching index in the matching information corresponding to the determined reference face image meets a preset condition; and in response to determining yes, inputting the extracted candidate face image into the second reference model entered in the determined reference face image To obtain keypoint information and matching information corresponding to the extracted candidate face image.

In a second aspect, an embodiment of the present application provides an apparatus for generating information. The apparatus includes: an image acquiring unit configured to acquire an image to be identified, where the image to be identified includes a face image; and a first input unit, which is Configured to extract a face image from an image to be recognized, and input the extracted face image into a pre-trained first recognition model to obtain a recognition result corresponding to the extracted face image, wherein the recognition result is used to characterize the corresponding face image The type of face; the model selection unit is configured to select a candidate recognition model matching the obtained recognition result from the candidate recognition model set as the second recognition model, wherein the candidate recognition model in the candidate recognition model set is a A trained model for identifying different types of faces to generate key point information; a second input unit configured to input the extracted face image into the selected second recognition model to obtain a corresponding face image of the extracted face Keypoint information, where the keypoint information is used to characterize the position of a keypoint in the face image in the face image.

In some embodiments, the first input unit is further configured to: input the extracted face image into a pre-trained first recognition model to obtain a recognition result and reference keypoint information corresponding to the extracted face image, wherein the reference The key point information is used to characterize the position of the reference key point in the face image in the face image; and the second input unit is further configured to: input the extracted face image and the obtained reference key point information into the selected second Recognize the model and obtain key point information corresponding to the extracted face image.

In some embodiments, the first input unit includes: a first input module configured to input an image to be identified into a pre-trained third recognition model, to obtain a characterization of a face image in the image to be identified in the image to be identified; The location information of the location; the image extraction module is configured to extract a face image from the image to be identified based on the obtained location information.

In some embodiments, the second input unit is further configured to input the extracted face image into the selected second recognition model to obtain keypoint information and matching information corresponding to the extracted face image, wherein the matching information A matching index is used to characterize the degree of matching between the category of the face corresponding to the input face image and the category of the face corresponding to the second recognition model.

In some embodiments, the image acquisition unit is further configured to select an image to be identified from an image sequence corresponding to the target video, where the target video is a video obtained by photographing a face.

In some embodiments, the apparatus further includes: an image selection unit configured to select an image located behind the image to be identified and adjacent to the image to be identified from the image sequence as a candidate image to be identified; and the image determination unit is configured to Extracting a face image from a candidate to-be-recognized image as a candidate face image, and determining the extracted face image in the to-be-recognized image as a reference face image; a condition determining unit configured to determine a corresponding one of the determined reference face image Whether the matching index in the matching information meets a preset condition; the third input unit is configured to, in response to determining yes, input the extracted candidate face image into the second reference model input to the determined reference face image to obtain the extracted Key point information and matching information corresponding to the candidate face image of.

According to a third aspect, an embodiment of the present application provides an electronic device including: one or more processors; a storage device that stores one or more programs thereon; when one or more programs are processed by one or more processors Execution causes one or more processors to implement the method of any one of the foregoing methods for generating information.

In a fourth aspect, an embodiment of the present application provides a computer-readable medium having stored thereon a computer program that, when executed by a processor, implements the method of any one of the foregoing methods for generating information.

The method and device for generating information provided in the embodiments of the present application obtain an extracted image by acquiring an image to be identified, and then extracting a face image from the image to be identified, and inputting the extracted face image into a pre-trained first recognition model. The recognition result corresponding to the face image, wherein the recognition result is used to characterize the category of the face corresponding to the face image, and then a candidate recognition model matching the obtained recognition result is selected from the candidate recognition model set as the second recognition model , And finally input the extracted face image into the selected second recognition model to obtain keypoint information corresponding to the extracted face image, wherein the keypoint information is used to characterize the position of the keypoint in the face image in the face image , So that pre-trained candidate recognition models for identifying different types of faces can be used to identify face images to generate key point information, and then different types of face images can be identified, improving the comprehensiveness of information generation, and, Recognition using a candidate recognition model that matches the category corresponding to the face image can provide The accuracy of the information generated.

BRIEF DESCRIPTION OF THE DRAWINGS

Other features, objects, and advantages of the present application will become more apparent by reading the detailed description of the non-limiting embodiments with reference to the following drawings:

FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied; FIG.

2 is a flowchart of an embodiment of a method for generating information according to the present application;

3 is a schematic diagram of an application scenario of a method for generating information according to an embodiment of the present application;

4 is a flowchart of still another embodiment of a method for generating information according to the present application;

5 is a schematic structural diagram of an embodiment of an apparatus for generating information according to the present application;

FIG. 6 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.

detailed description

The following describes the present application in detail with reference to the accompanying drawings and embodiments. It can be understood that the specific embodiments described herein are only used to explain the related invention, rather than limiting the invention. It should also be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The application will be described in detail below with reference to the drawings and embodiments.

FIG. 1 illustrates an exemplary system architecture 100 of an embodiment of a method for generating information or an apparatus for generating information to which the present application can be applied.

As shown in FIG. 1, the system architecture 100 may include

terminal devices

101, 102, and 103, a network 104, and a server 105. The network 104 is a medium for providing a communication link between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various types of connections, such as wired, wireless communication links, or fiber optic cables, and so on.

The user can use the

terminal devices

101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications can be installed on the

terminal devices

101, 102, and 103, such as image processing applications, Meitu software, web browser applications, search applications, and social platform software.

The

terminal devices

101, 102, and 103 may be hardware or software. When the

terminal device

101, 102, 103 is hardware, it can be various electronic devices with a display screen, including but not limited to smartphones, tablets, e-book readers, MP3 players (Moving Pictures Experts Group Audio Layer III, Motion picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer 4), player, laptop portable computer and desktop computer, etc. When the

terminal devices

101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as multiple software or software modules used to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.

The server 105 may be a server that provides various services, for example, an image processing server that processes an image to be identified sent by the

terminal devices

101, 102, and 103. The image processing server may perform analysis and processing on the received data such as the image to be identified, and obtain a processing result (for example, key point information).

It should be noted that the server may be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster consisting of multiple servers or as a single server. When the server is software, it can be implemented as multiple software or software modules (for example, multiple software or software modules used to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.

It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely exemplary. According to implementation needs, there can be any number of terminal devices, networks, and servers. In the case that the data used in the process of identifying an image or generating keypoint information does not need to be obtained from a remote place, the above-mentioned system architecture may not include a network, but only a terminal device or a server.

With continued reference to FIG. 2, a flowchart 200 of one embodiment of a method for generating information according to the present application is shown. The method for generating information includes the following steps:

Step 201: Obtain an image to be identified.

In this embodiment, an execution subject (for example, a server shown in FIG. 1) of the method for generating information may obtain an image to be identified through a wired connection method or a wireless connection method. The image to be identified may include a face image. Specifically, the face image included in the image to be identified may include an animal face image, and may also include a human face image. The animal face corresponding to the animal face image may be various types of animal faces, such as a dog face, a cat face, and the like.

It should be noted that the above-mentioned execution subject may obtain the image to be identified stored in advance locally, or may obtain the image to be identified sent by an electronic device (such as the terminal device shown in FIG. 1) communicatively connected thereto.

In some optional implementation manners of this embodiment, the execution subject may select an image to be identified from an image sequence corresponding to a target video, where the target video is a video that can be obtained by shooting a face. Specifically, the execution subject may first obtain an image sequence corresponding to the target video from a local or electronic device connected thereto, and then select an image from the image sequence as an image to be identified. Here, it should be noted that the above-mentioned executing subject may select the image to be identified from the above-mentioned image sequence in various ways, for example, a random selection manner may be adopted; or, the first-ranked image may be selected from the image sequence. As the image to be identified. In addition, it can be understood that a video is essentially an image sequence arranged in chronological order, so any video can correspond to an image sequence.

In step 202, a face image is extracted from the image to be recognized, and the extracted face image is input to a first trained first recognition model to obtain a recognition result corresponding to the extracted face image.

In this embodiment, based on the to-be-recognized image obtained in step 201, the execution subject may first extract a face image from the to-be-recognized image, and then input the extracted face image into a pre-trained first recognition model to obtain the extracted The recognition result corresponding to the face image. The recognition result may include, but is not limited to, at least one of the following: text, numbers, symbols, images, and audio. The recognition result may be used to characterize the category of the face corresponding to the face image.

Specifically, the execution subject may extract a face image from the image to be identified in various ways. For example, a threshold segmentation method in an image segmentation technique may be used to segment a face image in an image to be identified from other image regions, and then extract a face image from the image to be identified. It should be noted that the image segmentation technology is a well-known technology that is widely studied and applied at present, and will not be repeated here.

In some optional implementations of this embodiment, the above-mentioned execution subject may also extract a face image from the image to be identified by the following steps:

Step 2021: Input the to-be-recognized image into a pre-trained third recognition model, and obtain position information for characterizing a position of a face image in the to-be-recognized image in the to-be-recognized image.

The location information may include, but is not limited to, at least one of the following: text, numbers, symbols, and images. As an example, the position information may be a quadrilateral image in which a face image is frame-selected in the image to be identified.

The third recognition model may be used to characterize the correspondence between the image to be identified including the face image and the position information used to characterize the position of the face image in the image to be identified. Specifically, the third recognition model may be a model obtained by training an initial model (such as a Convolutional Neural Network (CNN), a residual network (ResNet), etc.) based on training samples and using a machine learning method.

As an example, the third recognition model can be obtained by training as follows: first, a training sample set is obtained, where the training sample may include a sample to-be-recognized image including a sample face image, and a sample face image in the sample-to-be-recognized image is previously The labeled sample position information, where the sample position information can be used to characterize the position of the sample face image in the sample to-be-recognized image in the sample to-be-recognized image. Then, training samples can be selected from the training sample set and the following training steps are performed: inputting the sample to-be-recognized image of the selected training sample into the initial model, obtaining position information corresponding to the sample face image in the sample-to-be-recognized image; The sample position information corresponding to the input sample to-be-recognized image is used as the expected output of the initial model. Based on the obtained position information and sample position information, adjust the parameters of the initial model; determine whether there are unselected training samples in the training sample set; response In the absence of unselected training samples, the adjusted initial model is determined as the third recognition model.

In this example, the method may further include the following steps: in response to determining that there are unselected training samples, reselecting training samples from the unselected training samples, and using the most recently adjusted initial model as the new initial model, and continuing Perform the training steps described above.

It should be noted that, in practice, the execution subject of the steps used to generate the model may be the same as or different from the execution subject of the method used to generate the information. If they are the same, the execution subject of the step for generating the model can store the trained model locally after the model is trained. If they are different, the execution body of the step for generating the model may send the trained model to the execution body of the method for generating information after training to obtain the model.

In step 2022, a face image is extracted from the image to be identified based on the obtained position information.

Here, based on the obtained position information, the above-mentioned executing subject may extract a face image from the image to be identified in various ways. For example, a to-be-recognized image may be cropped based on the obtained position information to obtain a face image.

For the extracted face image, the execution subject may generate a recognition result corresponding to the face image. It should be noted that, in this embodiment, the image to be recognized may include at least one face image. For the face image in the at least one face image, the execution subject may input the face image into a pre-trained first recognition model to obtain the The recognition result corresponding to the face image. The first recognition model may be used to characterize a correspondence between a face image and a recognition result corresponding to the face image.

Specifically, the first recognition model may be a model obtained by training an initial model (such as a convolutional neural network, a residual network, and the like) based on training samples and using a machine learning method.

It can be understood that the execution subject used to obtain the first recognition model may be trained to obtain the first recognition model by using a training method similar to the training method of the third recognition model, and specific training steps are not described herein again. It should be particularly noted that, for the training of the first recognition model, the training samples in the corresponding training sample set may include sample face images and sample recognition results pre-labeled for the sample face images, where the sample recognition results may be used for characterization The category of the face corresponding to the sample face image.

Step 203: Select a candidate recognition model that matches the obtained recognition result from the candidate recognition model set as the second recognition model.

In this embodiment, based on the recognition result obtained in step 202, the execution entity may select a candidate recognition model that matches the obtained recognition result from the candidate recognition model set as the second recognition model. Among them, the candidate recognition models in the candidate recognition model set may be pre-trained models for identifying different classes of faces to generate key point information.

Specifically, the execution body may use various methods to select a candidate recognition model that matches the obtained recognition result from the candidate recognition model set as the second recognition model. For example, a technician may preset a correspondence relationship between the recognition result and the candidate recognition model in the candidate recognition model set (for example, a correspondence table) in the execution body, and further, the execution body may use the obtained recognition result to find the correspondence. Relationship to determine a candidate recognition model that matches the obtained recognition result as a second recognition model.

Alternatively, for a candidate recognition model in the candidate recognition model set, a technician can preset category information corresponding to the candidate recognition model, where the category information can be used to characterize the types of faces that the candidate recognition model can recognize. The category information may include but is not limited to at least one of the following: numbers, text, symbols, pictures. Furthermore, the above-mentioned execution body may match the obtained recognition result with the category information corresponding to the candidate recognition model in the candidate recognition model set (for example, perform similarity calculation) to determine a candidate that matches the obtained recognition result. A recognition model (a candidate recognition model whose calculation result obtained by performing similarity calculation is greater than or equal to a preset threshold) is used as the second recognition model.

In this embodiment, the candidate recognition model may be used to characterize a correspondence between a face image and keypoint information corresponding to the face image. The keypoint information can be used to characterize the position of keypoints in the face image in the face image. The key points of the face image may be points with obvious semantic discrimination, such as points used to characterize the nose, points used to characterize the eyes, and so on.

Specifically, the candidate recognition model may be a model obtained by training an initial model (such as a convolutional neural network, a residual network, etc.) based on training samples and using a machine learning method.

It can be understood that the execution subject used for training to obtain the candidate recognition model may be trained to obtain the candidate recognition model by using a training method similar to the training method of the third recognition model, and specific training steps are not described herein again. It should be particularly noted that, for the training of each candidate recognition model in the candidate recognition model set, the training samples in the corresponding training sample set may include sample face images and sample keypoint information pre-labeled for the sample face images, where The types of faces corresponding to the sample face images may be the same (for example, they are all cat faces or all dog faces). The sample keypoint information can be used to characterize the position of keypoints in the sample face image in the sample face image.

It should also be noted that based on step 202, the execution subject may extract at least one face image, and for each face image in the extracted face image, the execution subject may obtain a recognition result, and further, for the obtained For each recognition result in the recognition result, based on step 203, the execution subject may select a candidate recognition model as the second recognition model corresponding to the recognition result.

Step 204: Input the extracted face image into the selected second recognition model to obtain key point information corresponding to the extracted face image.

In this embodiment, based on the second recognition model selected in step 203, the execution subject may input the extracted face image into the selected second recognition model to obtain key point information corresponding to the extracted face image. It should be noted that, for the face image in the extracted face image, the execution subject may input the face image into a second recognition model corresponding to the face image to obtain key point information corresponding to the face image. It can be understood that the correspondence between the face image and the second recognition model can be determined by the correspondence between the recognition result corresponding to the face image and the second recognition model.

In practice, after generating the keypoint information corresponding to the face image, the above-mentioned executing subject may also determine the keypoint information corresponding to the image to be identified in a backward inference manner, where the keypoint information corresponding to the image to be identified may be used It is used to characterize the positions of key points in the image to be identified (that is, key points included in the face image in the image to be identified) in the image to be identified.

In some optional implementations of this embodiment, the candidate recognition model (second recognition model) may be used to characterize the correspondence between the face image and the keypoint information and the matching information corresponding to the face image, where the matching information may include But not limited to at least one of the following: numbers, text, symbols, images, audio. Specifically, the matching information may include a matching index used to characterize the degree of matching between the category of the face corresponding to the input face image and the category of the face corresponding to the second recognition model.

Here, the size of the matching index and the level of matching may have a corresponding relationship. Specifically, the corresponding relationship may be that the larger the matching index, the higher the matching degree; or the smaller the matching index, the higher the matching degree.

It should be noted that, for the candidate recognition model (second recognition model) in this implementation manner, the training samples in the corresponding training sample set may include a sample face image and sample keypoint information and samples pre-labeled for the sample face image. Matching information, where sample keypoint information can be used to characterize the position of keypoints in the sample face image in the sample face image. The sample matching information may include a sample matching index. The sample matching index can be used to characterize the degree of sample matching between the category of the face corresponding to the input sample face image and the category of the face predetermined for the candidate recognition model. Here, the correspondence between the sample matching index and the degree of sample matching can be set in advance by a technician. For example, it can be set that the larger the sample matching index, the higher the degree of sample matching.

In this implementation manner, based on the second recognition model, the execution subject may input the extracted face image into the second recognition model to obtain key point information and matching information corresponding to the extracted face image. Furthermore, through the above-mentioned second recognition model, the degree of matching between the input face image and the second recognition model can be determined, and matching information is generated, so that subsequent operations (such as re-selecting the second recognition model) based on the matching information can further improve Information processing accuracy.

In some optional implementations of this embodiment, when the image to be identified is an image selected from an image sequence corresponding to the target video, the extracted face image is input to the selected second recognition model to obtain the After extracting the key point information and the matching information corresponding to the face image, the above-mentioned execution subject may also perform the following information generation steps:

First, an image located behind the image to be identified and adjacent to the image to be identified is selected as a candidate image to be identified from the image sequence.

Then, a face image is extracted from the candidate to-be-recognized image as a candidate face image, and the extracted face image in the to-be-recognized image is determined as a reference face image.

Here, the execution subject may use the above-mentioned method for extracting a face image for an image to be identified to extract a face image from a candidate image to be identified as a candidate face image, and details are not described herein again.

Next, it is determined whether the matching index in the matching information corresponding to the determined reference face image meets a preset condition.

The preset condition may be used to limit the degree of matching between the category of the face corresponding to the reference face image and the category of the face corresponding to the second recognition model input to the reference face image. Specifically, a technician can set a matching threshold in advance. Furthermore, when the corresponding relationship between the matching index and the degree of matching is that the larger the matching index, the higher the degree of matching, the above preset condition may be that the matching index is greater than or equal to the matching threshold; when the corresponding relationship between the matching index and the degree of matching is the smaller the matching index When the degree of matching is higher, the above preset condition may be that the matching index is less than or equal to the matching threshold.

Finally, in response to determining that the matching index in the matching information corresponding to the determined reference face image meets a preset condition, the extracted candidate face image is input to the second recognition model input to the determined reference face image to obtain the extracted Key point information and matching information corresponding to the candidate face image of.

It can be understood that, for any image in the above image sequence that is located after the image to be identified, the matching information corresponding to the image located before the image and the second recognition model can be used to generate the image by using the method described in this implementation manner. For the key point information and matching information corresponding to the face image in the image, the specific steps can be referred to the above information generation steps, which will not be repeated here.

With continued reference to FIG. 3, FIG. 3 is a schematic diagram of an application scenario of the method for generating information according to this embodiment. In the application scenario of FIG. 3, the server 301 first obtains an image to be identified 303 sent by the terminal device 302, where the image to be identified 303 includes

face images

3031 and 3032. Then, the server 301 extracts the face image 3031 and the face image 3032 from the to-be-recognized image 303, and inputs the face image 3031 and the face image 3032 into the first trained first recognition model 304, respectively, and obtains the recognition result corresponding to the face image 3031 " The recognition result "dog" 3052 corresponding to the "cat" 3051 and the face image 3032. Next, the server 301 may obtain a candidate recognition model set 306, where the candidate recognition model set 306 includes candidate recognition models 3061, 3062, and 3063. The technician presets the correspondence between the candidate recognition model and the recognition result as follows: the candidate recognition model 3061 corresponds to the recognition result "cat"; the candidate recognition model 3062 corresponds to the recognition result "dog"; the candidate recognition model 3063 corresponds to the recognition result " Person ". Further, the server 301 may select a candidate recognition model 3061 that matches the obtained recognition result "cat" 3051 from the candidate recognition model set 306 as the second recognition model 3071 corresponding to the face image 3031; from the candidate recognition model set 306, A candidate recognition model 3062 that matches the obtained recognition result "dog" 3052 is selected as the second recognition model 3072 corresponding to the face image 3032. Finally, the server 301 may input the face image 3031 into the second recognition model 3071, and obtain the key point information 3081 corresponding to the face image 3031. Enter the face image 3032 into the second recognition model 3072, and obtain the key point information 3082 corresponding to the face image 3032. Among them, the key point information can be used to characterize the position of key points in the face image in the face image.

The method provided by the foregoing embodiments of the present application obtains a corresponding image of the extracted face image by acquiring an image to be identified, and then extracting a face image from the image to be identified, and inputting the extracted face image into a first trained first recognition model. Recognition result, wherein the recognition result is used to characterize the category of the face corresponding to the face image, and then a candidate recognition model matching the obtained recognition result is selected as a second recognition model from the candidate recognition model set, and finally the extracted The face image is input to the selected second recognition model to obtain the key point information corresponding to the extracted face image. The key point information is used to characterize the position of the key point in the face image in the face image, so that pre-training can be used. The candidate recognition model for identifying different types of faces recognizes face images to generate key point information, which can identify different types of face images, improves the comprehensiveness of information generation, and uses the correspondence with face images. Recognition of candidate recognition models that match the categories, can provide accuracy of information generation

With further reference to FIG. 4, a flowchart 400 of yet another embodiment of a method for generating information is shown. The process 400 of the method for generating information includes the following steps:

Step 401: Obtain an image to be identified.

It should be noted that step 401 may be implemented in a manner similar to step 201 in the foregoing embodiment. Correspondingly, the above description of step 201 is also applicable to step 401 of this embodiment, and details are not described herein again.

In step 402, a face image is extracted from the image to be recognized, and the extracted face image is input into a first trained first recognition model to obtain a recognition result and reference keypoint information corresponding to the extracted face image.

In this embodiment, based on the to-be-recognized image obtained in step 401, the execution subject may first extract a face image from the to-be-recognized image, and then input the extracted face image into a pre-trained first recognition model to obtain the extracted The recognition result and reference keypoint information corresponding to the face image. The recognition result may include, but is not limited to, at least one of the following: text, numbers, symbols, images, and audio. The recognition result can be used to characterize the category of the face corresponding to the face image. The benchmark keypoint information may include but is not limited to at least one of the following: text, numbers. The symbol, image, and reference keypoint information can be used to characterize the position of the reference keypoint in the face image in the face image. The reference key point may be a point used to determine a key point in the face image, for example, a point where the nose tip of the nose is located, a point where the corner of the mouth is located, and the like.

In this embodiment, the above-mentioned execution subject may use the face image extraction method in the embodiment corresponding to FIG. 2 to extract the face image from the image to be identified, and details are not described herein again.

It should be noted that the first recognition model in this embodiment may be used to characterize the correspondence between the recognition result corresponding to the face image and the face image and the reference keypoint information. Correspondingly, for the training of the first recognition model in this embodiment, the training samples in the corresponding training sample set may include sample face images, sample recognition results and sample reference keypoint information pre-labeled for the sample face images. The reference keypoint information of the sample can be used to characterize the position of the reference keypoint in the sample face image in the sample face image.

Step 403: Select a candidate recognition model that matches the obtained recognition result from the candidate recognition model set as the second recognition model.

In this embodiment, based on the recognition result obtained in step 402, the execution entity may select a candidate recognition model that matches the obtained recognition result from the candidate recognition model set as the second recognition model.

The candidate recognition model in the candidate recognition model set may be a pre-trained model for recognizing faces of different classes to generate key point information. The candidate recognition model can be used to characterize the correspondence between the face image and the reference keypoint information of the face image and the keypoint information of the face image. Specifically, the candidate recognition model may be a model obtained by training an initial model (such as a convolutional neural network, a residual network, etc.) based on training samples and using a machine learning method.

As an example, the candidate recognition model can be obtained by training as follows. First, a training sample set is obtained, where the training sample may include a sample face image and sample keypoint information pre-labeled for the sample face image. Then, training samples can be selected from the training sample set and the following model training steps are performed: extracting the sample reference keypoint information corresponding to the sample face image of the selected training sample, wherein the sample reference keypoint information can be used to characterize the reference key The position of the point in the sample face image; the sample face image of the selected training sample and the extracted reference keypoint information of the sample are input into the initial model to obtain the keypoint information; the sample keypoint corresponding to the input sample face image The information is used as the expected output of the initial model. Based on the obtained keypoint information and sample keypoint information, adjust the parameters of the initial model; determine whether there are unselected training samples in the training sample set; respond to the absence of unselected training samples , Determine the adjusted initial model as a candidate recognition model.

Here, it should be noted that the execution subject used to perform the above model training step can extract sample reference keypoint information corresponding to the sample face image in various ways. For example, the sample face image can be input to the first recognition model in step 402 of this embodiment to obtain the sample reference keypoint information corresponding to the sample face image; or the sample face image can be output and the user can be labeled for the sample face image The key information of the sample benchmark.

In this example, the method may further include the following steps: in response to determining that there are unselected training samples, reselecting training samples from the unselected training samples, and using the most recently adjusted initial model as the new initial model, and continuing Perform the model training steps described above.

In addition, it should be noted that the selection of the second recognition model in this embodiment may be implemented in a manner similar to the method for selecting the second recognition model in the embodiment corresponding to FIG. 2, and is not described herein again.

Step 404: Input the extracted face image and the obtained reference keypoint information into the selected second recognition model to obtain keypoint information corresponding to the extracted face image.

In this embodiment, based on the second recognition model selected in step 403, the execution subject may input the extracted face image and the obtained reference keypoint information into the selected second recognition model to obtain the extracted face image. Corresponding key point information. It should be noted that, for the face image in the extracted face image, the execution subject may input the face image and the reference keypoint information corresponding to the face image into the second recognition model corresponding to the face image. Obtain keypoint information corresponding to the face image. It can be understood that the correspondence between the face image and the second recognition model can be determined by the correspondence between the recognition result corresponding to the face image and the second recognition model.

As can be seen from FIG. 4, compared with the embodiment corresponding to FIG. 2, the process 400 of the method for generating information in this embodiment highlights the reference keypoint information corresponding to the generated face image, and based on the generated Steps of generating key point information corresponding to a face image based on the key point information. Therefore, the solution described in this embodiment can use the reference keypoint information as a reference to generate more accurate keypoint information, which further improves the accuracy of information generation.

With further reference to FIG. 5, as an implementation of the methods shown in the foregoing figures, this application provides an embodiment of an apparatus for generating information. The apparatus embodiment corresponds to the method embodiment shown in FIG. 2. The device can be specifically applied to various electronic devices.

As shown in FIG. 5, the apparatus 500 for generating information in this embodiment includes an image acquisition unit 501, a first input unit 502, a model selection unit 503, and a second input unit 504. The image acquisition unit 501 is configured to acquire an image to be identified, where the image to be identified includes a face image; the first input unit 502 is configured to extract a face image from the image to be identified, and input the extracted face image into a pre-training The first recognition model of the first recognition model obtains the recognition result corresponding to the extracted face image, wherein the recognition result is used to characterize the category of the face corresponding to the face image; the model selection unit 503 is configured to select a candidate recognition model set from the set of candidate recognition models. The candidate recognition model that matches the obtained recognition result is used as the second recognition model, wherein the candidate recognition model in the candidate recognition model set is a pre-trained model for identifying different classes of faces to generate key point information; the second input The unit 504 is configured to input the extracted face image into the selected second recognition model to obtain keypoint information corresponding to the extracted face image, wherein the keypoint information is used to characterize the keypoints in the face image in the face image. Location.

In this embodiment, the image acquisition unit 501 of the apparatus 500 for generating information may acquire an image to be identified in a wired connection manner or a wireless connection manner. The image to be identified may include a face image. Specifically, the face image included in the image to be identified may include an animal face image, and may also include a human face image. The animal face corresponding to the animal face image may be various types of animal faces, such as a dog face, a cat face, and the like.

In this embodiment, based on the to-be-recognized image obtained by the image acquisition unit 501, the first input unit 502 may first extract a face image from the to-be-recognized image, and then input the extracted face image into a pre-trained first recognition model to obtain Recognition result corresponding to the extracted face image. The recognition result may include, but is not limited to, at least one of the following: text, numbers, symbols, images, and audio. The recognition result can be used to characterize the category of the face corresponding to the face image.

It should be noted that, in this embodiment, the image to be recognized may include at least one face image. For the face image in the at least one face image, the first input unit 502 may input the face image into a first trained first recognition model. A recognition result corresponding to the face image is obtained. The first recognition model may be used to characterize a correspondence between a face image and a recognition result corresponding to the face image.

In this embodiment, based on the recognition result obtained by the first input unit 502, the model selection unit 503 may select a candidate recognition model that matches the obtained recognition result from the candidate recognition model set as the second recognition model. The candidate recognition model in the candidate recognition model set may be a pre-trained model for recognizing faces of different classes to generate key point information.

In this embodiment, based on the second recognition model selected by the model selection unit 503, the second input unit 504 may input the extracted face image into the selected second recognition model to obtain a key corresponding to the extracted face image. Point information. It should be noted that, for a face image in the extracted face image, the second input unit 504 may input the face image into a second recognition model corresponding to the face image, and obtain a key point corresponding to the face image. information. It can be understood that the correspondence between the face image and the second recognition model can be determined by the correspondence between the recognition result corresponding to the face image and the second recognition model.

In some optional implementations of this embodiment, the first input unit 502 may be further configured to: input the extracted face image into a pre-trained first recognition model to obtain a recognition result corresponding to the extracted face image And reference keypoint information, where the reference keypoint information may be used to characterize the position of the reference keypoint in the face image in the face image; and the second input unit 504 may be further configured to: compare the extracted face image with the The obtained reference keypoint information is input into the selected second recognition model, and keypoint information corresponding to the extracted face image is obtained.

In some optional implementations of this embodiment, the first input unit 502 may include a first input module (not shown in the figure) configured to input an image to be identified into a pre-trained third recognition model to obtain Position information for characterizing the position of a face image in the image to be identified in the image to be identified; an image extraction module (not shown in the figure) configured to extract a face image from the image to be identified based on the obtained position information .

In some optional implementations of this embodiment, the second input unit 504 may be further configured to: input the extracted face image into the selected second recognition model to obtain the key points corresponding to the extracted face image Information and matching information, where the matching information may include a matching index used to characterize the degree of matching between the category of the face corresponding to the input face image and the category of the face corresponding to the second recognition model.

In some optional implementations of this embodiment, the image acquisition unit 501 may be further configured to: select an image to be identified from an image sequence corresponding to a target video, where the target video may be a place where a face is photographed. Get the video.

In some optional implementations of this embodiment, the apparatus 500 may further include: an image selecting unit (not shown in the figure), configured to select from the image sequence to be located behind the image to be identified and adjacent to the image to be identified Image as a candidate to-be-recognized image; an image determination unit (not shown in the figure) is configured to extract a face image from the candidate to-be-recognized image as a candidate face image, and determine the face image in the extracted, to-be-recognized image Is a reference face image; a condition determination unit (not shown in the figure) is configured to determine whether a matching index in the matching information corresponding to the determined reference face image meets a preset condition; a third input unit (not shown in the figure) Out), configured to, in response to determining yes, input the extracted candidate face image into the second reference model input to the determined reference face image, to obtain keypoint information and matching information corresponding to the extracted candidate face image.

It can be understood that the units recorded in the apparatus 500 correspond to the steps in the method described with reference to FIG. 2. Therefore, the operations, features, and beneficial effects described above for the method are also applicable to the device 500 and the units included therein, and details are not described herein again.

The apparatus 500 provided by the foregoing embodiment of the present application obtains an image to be identified through the image acquisition unit 501, and then the first input unit 502 extracts a face image from the image to be identified, and inputs the extracted face image into a pre-trained first recognition model. To obtain the recognition result corresponding to the extracted face image, wherein the recognition result is used to characterize the category of the face corresponding to the face image, and then the model selection unit 503 selects a candidate recognition model set that matches the obtained recognition result. The candidate recognition model is used as the second recognition model. Finally, the second input unit 504 inputs the extracted face image into the selected second recognition model to obtain key point information corresponding to the extracted face image. The key point information is used for Characterize the position of key points in the face image in the face image, so that the face image can be identified using a candidate training model that is pre-trained to identify different types of faces to generate key point information, which can then identify different types of The face image improves the comprehensiveness of information generation, and uses the image corresponding to the face image. Other candidate recognition models to match the identified, the accuracy of the information generated may be provided.

Reference is now made to FIG. 6, which illustrates a schematic structural diagram of a computer system 600 suitable for implementing an electronic device (such as the terminal device / server shown in FIG. 1) in the embodiment of the present application. The electronic device shown in FIG. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.

As shown in FIG. 6, the computer system 600 includes a central processing unit (CPU) 601, which can be loaded into a random access memory (RAM) 603 according to a program stored in a read-only memory (ROM) 602 or from a storage portion 608. Instead, perform various appropriate actions and processes. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input / output (I / O) interface 605 is also connected to the bus 604.

The following components are connected to the I / O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the speaker; a storage portion 608 including a hard disk and the like And a communication section 609 including a network interface card such as a LAN card, a modem, and the like. The communication section 609 performs communication processing via a network such as the Internet. The driver 610 is also connected to the I / O interface 605 as necessary. A removable medium 611, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 610 as needed, so that a computer program read therefrom is installed into the storage section 608 as needed.

In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and / or installed from a removable medium 611. When the computer program is executed by a central processing unit (CPU) 601, the above-mentioned functions defined in the method of the present application are executed. It should be noted that the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing. The computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. In this application, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device. In this application, a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device . Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than those marked in the drawings. For example, two successively represented boxes may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described unit may also be provided in a processor, for example, it may be described as: a processor includes an image acquisition unit, a first input unit, a model selection unit, and a second input unit. Wherein, the names of these units do not constitute a limitation on the unit itself in some cases, for example, the image acquisition unit may also be described as a “unit for acquiring an image to be identified”.

As another aspect, the present application also provides a computer-readable medium, which may be included in the electronic device described in the foregoing embodiments; or may exist alone without being assembled into the electronic device in. The computer-readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device: obtains an image to be identified, wherein the image to be identified includes a face image; Extracting a face image from the face, and inputting the extracted face image into a pre-trained first recognition model to obtain a recognition result corresponding to the extracted face image, wherein the recognition result is used to characterize a category of a face corresponding to the face image; from From the candidate recognition model set, a candidate recognition model that matches the obtained recognition result is selected as the second recognition model. The candidate recognition model in the candidate recognition model set is a pre-trained, used to recognize faces of different categories to generate a key. Model of point information; inputting the extracted face image into the selected second recognition model to obtain key point information corresponding to the extracted face image, wherein the key point information is used to characterize the key points in the face image in the face image Location.

The above description is only a preferred embodiment of the present application and an explanation of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution of the specific combination of the above technical features, but it should also cover the above technical features or Other technical solutions formed by arbitrarily combining their equivalent features. For example, a technical solution formed by replacing the above features with technical features disclosed in the present application (but not limited to) with similar functions.

Claims

A method for generating information, including:

Acquiring an image to be identified, where the image to be identified includes a face image;

Extract a face image from the to-be-recognized image, and input the extracted face image into a pre-trained first recognition model to obtain a recognition result corresponding to the extracted face image, wherein the recognition result is used to characterize the corresponding face image Category of face

From the candidate recognition model set, a candidate recognition model that matches the obtained recognition result is selected as the second recognition model. The candidate recognition model in the candidate recognition model set is a pre-trained, used to recognize faces of different categories to generate Model of key point information;

The extracted face image is input into the selected second recognition model to obtain key point information corresponding to the extracted face image, wherein the key point information is used to characterize the position of the key point in the face image in the face image.
The method according to claim 1, wherein the inputting the extracted face image into a pre-trained first recognition model to obtain a recognition result corresponding to the extracted face image comprises:

The extracted face image is input into a pre-trained first recognition model to obtain the recognition result and reference keypoint information corresponding to the extracted face image, wherein the reference keypoint information is used to characterize the reference keypoint in the face image as a face Position in the image; and

The inputting the extracted face image into the selected second recognition model to obtain key point information corresponding to the extracted face image includes:

The extracted face image and the obtained reference keypoint information are input into the selected second recognition model to obtain keypoint information corresponding to the extracted face image.
The method according to claim 1 or 2, wherein the extracting a face image from the image to be identified comprises:

Inputting the to-be-recognized image into a pre-trained third recognition model to obtain position information for characterizing a position of a face image in the to-be-recognized image in the to-be-recognized image;

Based on the obtained position information, a face image is extracted from the image to be identified.
The method according to claim 3, wherein the inputting the extracted face image into the selected second recognition model to obtain key point information corresponding to the extracted face image comprises:

The extracted face image is input into the selected second recognition model to obtain keypoint information and matching information corresponding to the extracted face image, where the matching information includes a category for characterizing the face corresponding to the input face image The matching index of the degree of matching with the category of the face corresponding to the second recognition model.
The method according to claim 4, wherein the acquiring an image to be identified comprises:

An image is selected as an image to be identified from an image sequence corresponding to a target video, where the target video is a video obtained by shooting a face.
The method according to claim 5, wherein after the inputting the extracted face image into the selected second recognition model to obtain keypoint information and matching information corresponding to the extracted face image, the method further comprises: include:

Selecting an image located behind the image to be identified and adjacent to the image to be identified from the image sequence as a candidate image to be identified;

Extracting a face image from the candidate to-be-recognized image as a candidate face image, and determining the extracted face image in the to-be-recognized image as a reference face image;

Determining whether the matching index in the matching information corresponding to the determined reference face image meets a preset condition;

In response to the determination of yes, the extracted candidate face image is input into the second reference model entered in the determined reference face image, and keypoint information and matching information corresponding to the extracted candidate face image are obtained.
An apparatus for generating information includes:

An image acquisition unit configured to acquire an image to be identified, wherein the image to be identified includes a face image;

The first input unit is configured to extract a face image from the to-be-recognized image, and input the extracted face image into a pre-trained first recognition model to obtain a recognition result corresponding to the extracted face image, where the recognition The result is used to characterize the category of the face corresponding to the face image;

The model selection unit is configured to select a candidate recognition model matching the obtained recognition result from the candidate recognition model set as a second recognition model, wherein the candidate recognition model in the candidate recognition model set is pre-trained and used for A model that recognizes different classes of faces to generate keypoint information;

The second input unit is configured to input the extracted face image into the selected second recognition model to obtain key point information corresponding to the extracted face image, wherein the key point information is used to characterize key points in the face image Position in the face image.
The apparatus according to claim 7, wherein the first input unit is further configured to:

The extracted face image is input into a pre-trained first recognition model to obtain the recognition result and reference keypoint information corresponding to the extracted face image, wherein the reference keypoint information is used to characterize the reference keypoint in the face image as a face. Position in the image; and

The second input unit is further configured:

The extracted face image and the obtained reference keypoint information are input into the selected second recognition model to obtain keypoint information corresponding to the extracted face image.
The apparatus according to claim 7 or 8, wherein the first input unit comprises:

A first input module configured to input the image to be identified into a pre-trained third recognition model, and obtain position information for characterizing a position of a face image in the image to be identified in the image to be identified;

The image extraction module is configured to extract a face image from the to-be-recognized image based on the obtained position information.
The apparatus according to claim 9, wherein the second input unit is further configured to:

The extracted face image is input into the selected second recognition model to obtain keypoint information and matching information corresponding to the extracted face image, where the matching information includes a category for characterizing the face corresponding to the input face image The matching index of the degree of matching with the category of the face corresponding to the second recognition model.
The apparatus according to claim 10, wherein the image acquisition unit is further configured to:

An image is selected as an image to be identified from an image sequence corresponding to a target video, where the target video is a video obtained by shooting a face.
The apparatus according to claim 11, wherein the apparatus further comprises:

An image selecting unit configured to select an image located behind the image to be identified and adjacent to the image to be identified from the image sequence as a candidate image to be identified;

An image determining unit configured to extract a face image from the candidate to-be-recognized image as a candidate face image, and determine the extracted face image from the to-be-recognized image as a reference face image;

A condition determining unit configured to determine whether a matching index in the matching information corresponding to the determined reference face image meets a preset condition;

The third input unit is configured to, in response to determining yes, input the extracted candidate face image into the second reference model input to the determined reference face image, to obtain keypoint information and matching corresponding to the extracted candidate face image information.
An electronic device includes:

One or more processors;

A storage device on which one or more programs are stored,

When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1-6.
A computer-readable medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.