CN110956129A

CN110956129A - Method, apparatus, device and medium for generating face feature vector

Info

Publication number: CN110956129A
Application number: CN201911190173.3A
Authority: CN
Inventors: 周学武; 康珮珮; 张韵东
Original assignee: Chongqing Zhongxing Micro Artificial Intelligence Chip Technology Co Ltd
Current assignee: Chongqing Zhongxing Micro Artificial Intelligence Chip Technology Co Ltd
Priority date: 2019-11-28
Filing date: 2019-11-28
Publication date: 2020-04-03

Abstract

Embodiments of the present disclosure disclose methods, apparatuses, devices, and media for generating face feature vectors. One embodiment of the method comprises: extracting the features of the face image to obtain a face feature vector corresponding to the face image; and inputting the face feature vector into a pre-trained first face recognition network to obtain a first face feature vector corresponding to the face feature vector. The embodiment realizes the consistency of the output of the face image in the face recognition network. Meanwhile, the face feature vectors output from different face recognition networks can be compatible through the pre-trained first face recognition network. Thus, computational effort and time are saved. Furthermore, the user experience is improved, and convenience is provided for life of people.

Description

Method, apparatus, device and medium for generating face feature vector

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a medium for generating a face feature vector.

Background

Face recognition is a biometric technique for identifying an identity based on facial feature information of a person. In the specific application of face recognition, a large number of face libraries often exist, and face features need to be extracted for each face in the face library to construct a face feature library for subsequent face verification. Therefore, when a user wants to replace face recognition algorithms provided by different manufacturers or wants to update face recognition model versions provided by the same manufacturers, the problem that face feature vectors are incompatible is faced.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose methods, apparatuses, devices, and media for generating face feature vectors to solve the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a method for generating a face feature vector, the method including: extracting the features of the face image to obtain a face feature vector corresponding to the face image; inputting the face feature vector into a first face recognition network trained in advance to obtain a first face feature vector corresponding to the face feature vector, wherein the first face recognition network is obtained by training through the following steps: acquiring a training sample set, wherein the training sample comprises a sample face feature vector and a sample first face feature vector corresponding to the sample face feature vector; and taking the sample face feature vector of the training sample in the training sample set as input, taking the sample first face feature vector corresponding to the input sample face feature vector as expected output, and training to obtain the first face recognition network.

In some embodiments, the above method further comprises: extracting the features of the face image to obtain a second face feature vector corresponding to the face image; inputting the second face feature vector into a pre-trained second face recognition network to obtain a third face feature vector corresponding to the second face feature vector; comparing the similarity of the first face feature vector with the similarity of the third face feature vector to obtain a comparison numerical value; determining whether the value is greater than a preset threshold value; and in response to the determination that the face number is larger than the preset value, determining that the face of the face image corresponding to the first face feature vector is the same as the face of the face image corresponding to the third face feature vector.

In some embodiments, the training sample is obtained by: acquiring a first face image from the face data set, wherein the face data set is used for storing the face image and a corresponding face feature vector; inputting the first face image into a second face recognition network and a third face recognition network respectively to obtain a fourth face feature vector and a fifth face feature vector respectively; processing the fourth face feature vector and the fifth face feature vector to obtain a sixth face feature vector; and inputting the sixth face feature vector into a pre-trained fusion network to obtain a fusion face feature vector, taking the fourth face feature vector as the sample face feature vector, and taking the sixth face feature vector as a sample first face feature vector corresponding to the sample face feature vector.

In some embodiments, the architecture of the converged network described above is an encoder-decoder architecture.

In some embodiments, the converged network is obtained by: acquiring a training sample set, wherein the training sample comprises a sample sixth face feature vector and a sample face feature vector of a face image corresponding to the sample sixth face feature vector in the face data set; and taking a sample sixth face feature vector of the training sample in the training sample set as an input, taking a sample face feature vector of the face image corresponding to the input sample sixth face feature vector in the face data set as an expected output, and training to obtain the fusion network.

In some embodiments, the training with the sample sixth face feature vector of the training sample in the training sample set as an input and the sample face feature vector of the face image corresponding to the input sample sixth face feature vector in the face data set as an expected output to obtain the fusion network includes: selecting a training sample from the training sample set, and executing the following training steps: inputting a sixth human face feature vector of a selected sample of the training sample into the initial fusion network to obtain a human face feature vector corresponding to the sixth human face feature vector of the sample; analyzing the face feature vector and the corresponding sample face feature vector to determine a first loss value; comparing the first loss value with a preset threshold value; determining whether the training of the initial fusion network is finished according to the comparison result; in response to determining that the initial converged network training is complete, determining the initial converged network as a converged network.

In some embodiments, the above method further comprises: and in response to the fact that the initial fusion network is not trained, adjusting relevant parameters in the initial fusion network, reselecting a training sample from the training sample set, and continuing to execute the training step by using the adjusted initial fusion network as the initial fusion network.

In some embodiments, the above method further comprises: matching from a pre-stored face track set according to the first face feature vector to obtain a matching numerical value; determining whether the matching numerical value is larger than a preset value; in response to the fact that the first face feature vector is larger than a preset value, obtaining information that the face of the face image corresponding to the first face feature vector is consistent with the face corresponding to the matched face track; and outputting a control signal for controlling the target equipment to perform the target operation according to the information.

In a second aspect, some embodiments of the present disclosure provide an apparatus for generating a face feature vector, the apparatus comprising: the extraction unit is configured to perform feature extraction on the face image to obtain a face feature vector corresponding to the face image; a generating unit configured to input the face feature vector into a first face recognition network trained in advance to obtain a first face feature vector corresponding to the face feature vector, wherein the first face recognition network is trained by the following steps: acquiring a training sample set, wherein the training sample comprises a sample face feature vector and a sample first face feature vector corresponding to the sample face feature vector; and taking the sample face feature vector of the training sample in the training sample set as input, taking the sample first face feature vector corresponding to the input sample face feature vector as expected output, and training to obtain the first face recognition network.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method as in any one of the first aspects.

In a fourth aspect, some embodiments of the disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements the method as in any one of the first aspect.

One of the above-described various embodiments of the present disclosure has the following advantageous effects: by extracting the features of the face image, the face feature vector corresponding to the face image can be obtained. And then, inputting the face feature vector into a first face recognition network trained in advance to obtain a first face feature vector corresponding to the face feature vector. Therefore, the output of the face image in the face recognition network can be consistent through the first face recognition network trained in advance. Meanwhile, the face feature vectors output from different face recognition networks can be compatible through the pre-trained first face recognition network. And further, the face image can be rapidly identified in the face identification network. Thus, computational effort and time are saved. Furthermore, the user experience is improved, and convenience is provided for life of people.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

FIG. 1 is a schematic diagram of one application scenario of a method for generating face feature vectors according to some embodiments of the present disclosure;

FIG. 2 is a flow diagram of some embodiments of a method for generating face feature vectors according to the present disclosure;

FIG. 3 is a flow diagram of further embodiments of a method for generating face feature vectors according to the present disclosure;

FIG. 4 is a schematic block diagram of some embodiments of an apparatus for generating face feature vectors according to the present disclosure;

FIG. 5 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1A-1B are schematic diagrams of an application scenario of a method for generating face feature vectors according to some embodiments of the present disclosure.

As shown in fig. 1A, the execution principal may be a server (e.g., server 101 shown in fig. 1). When the execution main body 101 receives the face image 102, feature extraction is performed on the face image 102 to obtain a face feature vector 1021 corresponding to the face image 102.

As shown in fig. 1B, the face feature vector 1021 is input to a first pre-trained face recognition network, and a first face feature vector 1022 corresponding to the face feature vector 1021 is obtained. The first face feature vector 1022 may be obtained by processing the face feature vector 1021 through the first face recognition network. The first face recognition network may be obtained by training through a training sample set. The training samples in the training sample set may include sample face feature vectors and sample first face feature vectors corresponding to the sample face feature vectors. And taking the sample face feature vector of the training sample in the training sample set as input, taking the sample first face feature vector corresponding to the input sample face feature vector as expected output, and training to obtain the first face recognition network.

It is understood that the method for generating the face feature vector may be executed by the server 101 or may be executed by the terminal device, and the execution subject of the method may further include a device formed by integrating the server 101 and the terminal device through a network, or may also be executed by various software programs. The execution body may also be embodied as hardware, software, or the like. When the execution main body is hardware, the execution main body can be implemented as a distributed cluster consisting of a plurality of servers or terminal devices, and can also be implemented as a single server or a single terminal device. When the execution subject is software, the software can be installed in the electronic device listed above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

It should be understood that the number of servers in fig. 1 is merely illustrative. There may be any number of servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of some embodiments of a method for generating face feature vectors according to the present disclosure is shown. The method for generating the face feature vector comprises the following steps:

step 201, extracting the features of the face image to obtain a face feature vector corresponding to the face image.

In some embodiments, the execution subject of the method for generating face feature vectors (e.g., the server 101 shown in fig. 1) may receive the face image from the terminal of the user through a wired connection or a wireless connection. The face image may be selected from a locally stored set of face images, or may be downloaded from a network. The execution main body can input the face image into a second face recognition network to obtain a face feature vector corresponding to the face image. The face feature vector may be a vector for characterizing a face feature of the face image. The second face recognition network may be configured to represent a correspondence between the face image and a face feature vector corresponding to the face image. As an example, the electronic device may generate a correspondence table storing correspondence between a plurality of face feature vectors in which face images and face images correspond to each other are recorded, based on counting a large number of face feature vectors in which face images and face images correspond to each other, and use the correspondence table as the second face recognition network.

It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

Step 202, inputting the face feature vector into a pre-trained first face recognition network to obtain a first face feature vector corresponding to the face feature vector.

In some embodiments, based on the face feature vector obtained in step 201, the executing entity may input the face feature vector to a first pre-trained face recognition network, and the first face recognition network analyzes the face feature vector to obtain a first face feature vector corresponding to the face feature vector. The first face recognition network may be configured to represent a correspondence between the face feature vector and a first face feature vector corresponding to the face feature vector.

In some embodiments, the first face recognition network may be trained by:

step 1, a training sample set is obtained.

In some embodiments, the execution subject for training the first face recognition network may be the execution subject (e.g., the server 101 shown in fig. 1) or may be different from the execution subject. The execution subject who trains the first face recognition network may be the execution subject (e.g., the server 101 shown in fig. 1). The execution main body obtains the training sample from the user terminal in a wired connection mode or a wireless connection mode, and the execution main body can also obtain the existing training sample stored in the execution main body from the server so as to obtain a training sample set. The training sample comprises a sample face feature vector and a sample first face feature vector corresponding to the sample face feature vector.

In some optional implementations of some embodiments, the training samples may be obtained by: first, a first face image is obtained from a face data set, wherein the face data set is used for storing the face image and a corresponding face feature vector. The face data set may be created by a user or downloaded over a network. And then, inputting the first face image into a second face recognition network and a third face recognition network respectively to obtain a fourth face feature vector and a fifth face feature vector respectively. For example, the fourth face feature vector and the fifth face feature vector may be 512-dimensional vectors. And then, processing the fourth face feature vector and the fifth face feature vector to obtain a sixth face feature vector. For example, a 1024-dimensional vector is obtained. Wherein, the above-mentioned treatment can include combination treatment and splicing treatment. And finally, inputting the sixth face feature vector into a pre-trained fusion network to obtain a fusion face feature vector, taking the fourth face feature vector as the sample face feature vector, and taking the sixth face feature vector as a sample first face feature vector corresponding to the sample face feature vector. The fused face feature vector may be a 512-dimensional vector.

In some alternative implementations of some embodiments, the architecture of the converged network described above is an encoder-decoder architecture. As an example, a commonly used codec may be a Sparse auto-encoder (Sparse auto-encoder), which may automatically learn features from unlabeled data, which may give a better feature description than the original data. Optionally, the executing agent for training the converged network may be the server 101 shown in fig. 1. The converged network can be obtained by the following steps:

step 11, acquiring a training sample set, wherein the training sample comprises a sample sixth face feature vector and a sample face feature vector of a face image corresponding to the sample sixth face feature vector in the face data set;

in some embodiments, the specific implementation of step 11 may refer to training step 1 of the first face recognition network, which is not described herein again.

And step 12, taking a sample sixth face feature vector of the training sample in the training sample set as an input, taking a sample face feature vector of the face image corresponding to the input sample sixth face feature vector in the face data set as an expected output, and training to obtain the fusion network.

Optionally, the fusion network may be obtained by performing the following training steps based on the training sample set. Performing the following training steps based on the set of training samples: firstly, inputting a sixth human face feature vector of a selected sample of a training sample into an initial fusion network to obtain a human face feature vector corresponding to the sixth human face feature vector of the sample; then, analyzing the face feature vector and the corresponding sample face feature vector through a Loss Function (Loss Function) to determine a first Loss value, wherein the Loss Function can be a mean square error Loss Function; then, comparing the first loss value with a preset threshold value to obtain a comparison result, wherein the preset threshold value can be set manually; then, whether the training of the initial fusion network is finished or not can be determined according to the comparison result; then, in response to determining that the training of the initial fusion network is completed, determining the initial fusion network as a fusion network; and finally, in response to the fact that the initial fusion network is not trained, adjusting relevant parameters in the initial fusion network, reselecting a training sample from the training sample set, and continuing to execute the training step by using the adjusted initial fusion network as the initial fusion network.

And 2, taking the sample face feature vector of the training sample in the training sample set as input, taking a sample first face feature vector corresponding to the input sample face feature vector as expected output, and training to obtain the first face recognition network.

In some embodiments, the following training steps are performed based on the set of training samples: respectively inputting the sample face feature vectors of at least one training sample in the training sample set into an initial machine learning model to obtain a sample first face feature vector corresponding to each sample face feature vector in the at least one training sample; comparing the sample first face feature vector corresponding to each sample face feature vector in the at least one training sample with the corresponding sample first face feature vector; determining the prediction accuracy of the initial machine learning model according to the comparison result; determining whether the prediction accuracy is greater than a preset accuracy threshold; in response to determining that the accuracy is greater than the preset accuracy threshold, taking the initial machine learning model as a trained first face recognition network; and adjusting parameters of the initial machine learning model in response to the determination that the accuracy is not greater than the preset accuracy threshold, forming a training sample set by using unused training samples, using the adjusted initial machine learning model as the initial machine learning model, and executing the training step again.

As an example, the electronic device may generate a correspondence table storing correspondence between a plurality of recorded face feature vectors and first face feature vectors corresponding to face feature vectors based on counting a large number of recorded face feature vectors and first face feature vectors corresponding to face feature vectors, and use the correspondence table as the first face recognition network.

In some optional implementation manners of some embodiments, the executing entity may obtain a matching value by matching the first face feature vector with a pre-stored face trajectory set; and determining whether the numerical value is greater than a preset value, and generating information that the face of the face image corresponding to the first face feature vector is consistent with the face corresponding to the matched face track in response to the determination that the numerical value is greater than the preset value. According to the information, the execution body may output a control signal for controlling the target device to perform the target operation. The face track set may be created by a user. The preset value may be set manually. The target device may be an electronic device having a camera and a display screen. Such as a cell phone. The target operation may include a face recognition operation.

The preset value can also be set by a technician according to an actual application scenario. For example, when the device is used in the security field and the financial field, the required precision is high, and the preset value can be adjusted correspondingly. The control signal may be information for outputting a prompt to the user to inform the user to perform a corresponding action in time. As an example, when the user passes the security check, when the execution subject confirms that the matching value is greater than the preset value, the authenticated information may be output to the user to prompt the user that the user can pass and further prompt the user to pass in time. When the execution subject confirms that the matching value is smaller than the preset value, the information of authentication failure can be output to the user and an alarm prompt tone is given. Thereby prompting the user to disable the pass. And then the technical personnel and the related personnel can be reminded to take corresponding measures in time. Therefore, the verification result can be obtained by comparing the matching value with the preset value, and the calculation force and time are saved. According to the control signal, the user can be intuitively reminded, and then the user can quickly make an action, so that the passing efficiency is improved.

In the method for generating the face feature vector disclosed in some embodiments of the present disclosure, the face feature vector corresponding to the face image may be obtained by performing feature extraction on the face image. And then, inputting the face feature vector into a first face recognition network trained in advance to obtain a first face feature vector corresponding to the face feature vector. Therefore, the output of the face image in the face recognition network can be consistent through the first face recognition network trained in advance. Meanwhile, the face feature vectors output from different face recognition networks can be compatible through the pre-trained first face recognition network. And further, the face image can be rapidly identified in the face identification network. Thus, computational effort and time are saved. Furthermore, the user experience is improved, and convenience is provided for life of people.

With further reference to fig. 3, a flow 300 of further embodiments of a method for generating face feature vectors is shown. The process 300 of the method for generating a face feature vector includes the following steps:

step 301, extracting features of the face image to obtain a face feature vector corresponding to the face image.

Step 302, inputting the face feature vector into a pre-trained first face recognition network, and obtaining a first face feature vector corresponding to the face feature vector.

In some embodiments, the detailed implementation and technical effects of the

steps

301 and 302 can refer to the

steps

201 and 202 in the embodiments corresponding to fig. 2, which are not described herein again.

Step 303, performing feature extraction on the face image to obtain a second face feature vector corresponding to the face image.

In some embodiments, an executing entity (e.g., the server shown in fig. 1) of the method for generating the face feature vector may first input the face image into a third face recognition network, so as to obtain a second face feature vector corresponding to the face image. The third face recognition network may be configured to represent a correspondence between the face image and a face feature vector corresponding to the face image. As an example, the electronic device may generate a correspondence table storing correspondence between a plurality of second face feature vectors in which face images and face images are recorded, based on statistics of a large number of second face feature vectors in which face images and face images correspond to each other, and use the correspondence table as the third face recognition network.

And 304, inputting the second face feature vector into a pre-trained second face recognition network to obtain a third face feature vector corresponding to the second face feature vector.

In some embodiments, the executing entity may input the second face feature vector to a second face recognition network trained in advance, and process the second face feature vector through the second face recognition network trained in advance to obtain a third face feature vector. As an example, the 512-dimensional vector with the human face features as the "round face" may be obtained by inputting the 512-dimensional vector with the human face features as the "round face" into a second human face recognition network trained in advance.

Step 305, comparing the similarity of the first face feature vector and the third face feature vector to obtain a comparison value.

In some embodiments, the executing entity may compare the similarity of the first facial feature vector and the third facial feature vector to obtain a comparison value. For example, the similarity is 99%. By way of example, the two groups of 512-dimensional vectors are compared to obtain a percentage value of the comparison.

Step 306, determine whether the value is greater than a preset threshold.

In some embodiments, the execution body may determine whether the value is greater than a preset threshold. The preset threshold value can be set by itself or according to actual needs.

Step 307, in response to determining that the value is greater than the preset value, determining that the face of the face image corresponding to the first face feature vector is the same as the face of the face image corresponding to the third face feature vector.

In some embodiments, in response to determining that the comparison value is greater than the preset threshold, the executing entity may determine that the face of the face image corresponding to the first face feature vector is the same as the face of the face image corresponding to the third face feature vector.

In the method for generating the face feature vector disclosed in some embodiments of the present disclosure, the face feature vector corresponding to the face image may be obtained by performing feature extraction on the face image. And then, inputting the face feature vector into a first face recognition network trained in advance to obtain a first face feature vector corresponding to the face feature vector. Then, the face image is extracted again, and a second face feature vector corresponding to the face image can be obtained. And then, inputting the second face feature vector into a pre-trained second face recognition network to obtain a third face feature vector corresponding to the second face feature vector. By comparing the similarity of the first face feature vector and the third face feature vector, whether the face in the face image corresponding to the first face feature vector is consistent with the face in the face image corresponding to the third face feature vector can be determined. Meanwhile, the output face feature vectors are enabled to be consistent through the pre-trained first face recognition network and the pre-trained second face recognition network. Thus, the face image can be recognized quickly. Furthermore, computational effort and time are saved. Furthermore, the user experience is improved, and convenience is provided for life of people.

With further reference to fig. 4, as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of an apparatus for generating face feature vectors, which correspond to those of the method embodiments shown in fig. 2, and which may be applied in various electronic devices in particular.

As shown in fig. 4, the apparatus 400 for generating a face feature vector according to some embodiments includes: an extraction unit 401 and a generation unit 402. The extraction unit 401 is configured to perform feature extraction on a face image to obtain a face feature vector corresponding to the face image; the generating unit 402 is configured to input the facial feature vector into a first pre-trained facial recognition network, and obtain a first facial feature vector corresponding to the facial feature vector, where the first facial recognition network is trained by the following steps: acquiring a training sample set, wherein the training sample comprises a sample face feature vector and a sample first face feature vector corresponding to the sample face feature vector; and taking the sample face feature vector of the training sample in the training sample set as input, taking the sample first face feature vector corresponding to the input sample face feature vector as expected output, and training to obtain the first face recognition network.

In an optional implementation manner of some embodiments, the apparatus 400 for generating a face feature vector further includes: the determining unit is configured to determine a second face feature vector similar to the face feature vector in a face data set, wherein the face data set is used for storing a face image and a corresponding sample face feature vector; a first generating unit configured to input the second face feature vector into a pre-trained first face recognition network to obtain a third face feature vector; a comparison unit configured to compare similarity between the first face feature vector and the third face feature vector to obtain a comparison value; a first determination unit configured to determine whether the value is greater than a preset threshold value; and the second determining unit is configured to determine that the face of the face image corresponding to the first face feature vector is the same as the face of the second face image corresponding to the third face feature vector in response to the determination that the face is larger than the preset value.

In an alternative implementation manner of some embodiments, the generating unit 402 of the apparatus 400 for generating a face feature vector is further configured to acquire a first face image from the face data set, wherein the face data set is used for storing the face image and a corresponding sample face feature vector; inputting the first face image into a second face recognition network and a third face recognition network respectively to obtain a fourth face feature vector and a fifth face feature vector respectively; processing the fourth face feature vector and the fifth face feature vector to obtain a sixth face feature vector; and inputting the sixth face feature vector into a pre-trained fusion network to obtain a fusion face feature vector, taking the fourth face feature vector as the sample face feature vector, and taking the sixth face feature vector as a sample first face feature vector corresponding to the sample face feature vector.

In an alternative implementation of some embodiments, the architecture of the converged network is an encoder-decoder architecture.

In an optional implementation manner of some embodiments, the generating unit 402 of the apparatus 400 for generating a face feature vector is further configured to obtain a set of training samples, where a training sample includes a sample sixth face feature vector and a sample face feature vector of a face image corresponding to the sample sixth face feature vector in the face data set; and taking a sample sixth face feature vector of the training sample in the training sample set as an input, taking a sample face feature vector of the face image corresponding to the input sample sixth face feature vector in the face data set as an expected output, and training to obtain the fusion network.

In an optional implementation manner of some embodiments, the generating unit 402 of the apparatus 400 for generating a face feature vector is further configured to select a training sample from the training sample set, and perform the following training steps: inputting a sixth human face feature vector of a selected sample of the training sample into the initial fusion network to obtain a human face feature vector corresponding to the sixth human face feature vector of the sample; analyzing the face feature vector and the corresponding sample face feature vector to determine a first loss value; comparing the first loss value with a preset threshold value; determining whether the training of the initial fusion network is finished according to the comparison result; in response to determining that the initial converged network training is complete, determining the initial converged network as a converged network.

In an optional implementation manner of some embodiments, the apparatus 400 for generating a face feature vector further includes: and the adjusting unit is configured to respond to the condition that the training of the initial fusion network is not completed, adjust relevant parameters in the initial fusion network, reselect training samples from the training sample set, and continue to execute the training step by using the adjusted initial fusion network as the initial fusion network.

Some embodiments of the present disclosure disclose a device for generating a face feature vector, which may obtain a face feature vector corresponding to a face image by performing feature extraction on the face image. And then, inputting the face feature vector into a first face recognition network trained in advance to obtain a first face feature vector corresponding to the face feature vector. Therefore, the output of the face image in the face recognition network can be consistent through the first face recognition network trained in advance. Meanwhile, the face feature vectors output from different face recognition networks can be compatible through the pre-trained first face recognition network. And further, the face image can be rapidly identified in the face identification network. Thus, computational effort and time are saved. Furthermore, the user experience is improved, and convenience is provided for life of people.

Referring now to fig. 5, a schematic diagram of an electronic device (e.g., the server of fig. 1) 500 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 5 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described above in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: extracting the features of the face image to obtain a face feature vector corresponding to the face image; inputting the face feature vector into a first face recognition network trained in advance to obtain a first face feature vector corresponding to the face feature vector, wherein the first face recognition network is obtained by training through the following steps: acquiring a training sample set, wherein the training sample comprises a sample face feature vector and a sample first face feature vector corresponding to the sample face feature vector; and taking the sample face feature vector of the training sample in the training sample set as input, taking the sample first face feature vector corresponding to the input sample face feature vector as expected output, and training to obtain the first face recognition network.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes an extraction unit and a generation unit. The names of these units do not limit the unit itself in some cases, and for example, the extraction unit may also be described as a unit that performs feature extraction on a face image to obtain a face feature vector corresponding to the face image.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method for generating a face feature vector, comprising:

extracting the features of the face image to obtain a face feature vector corresponding to the face image;

inputting the face feature vector into a first pre-trained face recognition network to obtain a first face feature vector corresponding to the face feature vector, wherein the first face recognition network is obtained by training through the following steps:

acquiring a training sample set, wherein the training sample comprises a sample face feature vector and a sample first face feature vector corresponding to the sample face feature vector;

and taking the sample face feature vector of the training sample in the training sample set as input, taking the sample first face feature vector corresponding to the input sample face feature vector as expected output, and training to obtain the first face recognition network.

2. The method of claim 1, wherein the method further comprises:

extracting the features of the face image to obtain a second face feature vector corresponding to the face image;

inputting the second face feature vector into a pre-trained second face recognition network to obtain a third face feature vector corresponding to the second face feature vector;

carrying out similarity comparison on the first face feature vector and the third face feature vector to obtain a comparison numerical value;

determining whether the value is greater than a preset threshold value;

and in response to the fact that the determination value is larger than the preset value, determining that the face of the face image corresponding to the first face feature vector is the same as the face of the face image corresponding to the third face feature vector.

3. The method of claim 1, wherein the training samples are obtained by:

acquiring a first face image from a face data set, wherein the face data set is used for storing the face image and a corresponding face feature vector;

inputting the first face image into a second face recognition network and a third face recognition network respectively to obtain a fourth face feature vector and a fifth face feature vector respectively;

processing the fourth face feature vector and the fifth face feature vector to obtain a sixth face feature vector;

inputting the sixth face feature vector into a pre-trained fusion network to obtain a fusion face feature vector, taking a fourth face feature vector as the sample face feature vector, and taking the sixth face feature vector as a sample first face feature vector corresponding to the sample face feature vector.

4. The method of claim 3, wherein the structure of the converged network is an encoder-decoder structure.

5. The method of claim 3, wherein the converged network is obtained by:

acquiring a training sample set, wherein the training sample comprises a sample sixth face feature vector and a sample face feature vector of a face image corresponding to the sample sixth face feature vector in the face data set;

and taking a sample sixth face feature vector of the training sample in the training sample set as an input, taking a sample face feature vector of the face image corresponding to the input sample sixth face feature vector in the face data set as an expected output, and training to obtain the fusion network.

6. The method according to claim 5, wherein the training with the sample sixth face feature vector of the training samples in the training sample set as input and the sample face feature vector of the face image corresponding to the input sample sixth face feature vector in the face data set as desired output to obtain the fusion network comprises:

selecting training samples from the training sample set, and executing the following training steps: inputting a sixth human face feature vector of a selected sample of the training sample into an initial fusion network to obtain a human face feature vector corresponding to the sixth human face feature vector of the sample; analyzing the face feature vector and a corresponding sample face feature vector to determine a first loss value; comparing the first loss value with a preset threshold value; determining whether the training of the initial fusion network is finished according to the comparison result; in response to determining that the initial converged network training is complete, determining the initial converged network as a converged network.

7. The method according to one of claims 1-6, wherein the method further comprises:

matching from a pre-stored face track set according to the first face feature vector to obtain a matching numerical value;

determining whether the matching value is greater than a preset value;

in response to the fact that the first face feature vector is larger than a preset value, obtaining information that the face of the face image corresponding to the first face feature vector is consistent with the face corresponding to the matched face track;

and outputting a control signal for controlling the target equipment to perform target operation according to the information.

8. An apparatus for generating a face feature vector, comprising:

the extraction unit is configured to perform feature extraction on a face image to obtain a face feature vector corresponding to the face image;

a generating unit, configured to input the facial feature vector to a first pre-trained facial recognition network, and obtain a first facial feature vector corresponding to the facial feature vector, where the first facial recognition network is obtained by training through the following steps:

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-7.