WO2023016007A1 - 人脸识别模型的训练方法、装置及计算机程序产品 - Google Patents

人脸识别模型的训练方法、装置及计算机程序产品 Download PDF

Info

Publication number
WO2023016007A1
WO2023016007A1 PCT/CN2022/092647 CN2022092647W WO2023016007A1 WO 2023016007 A1 WO2023016007 A1 WO 2023016007A1 CN 2022092647 W CN2022092647 W CN 2022092647W WO 2023016007 A1 WO2023016007 A1 WO 2023016007A1
Authority
WO
WIPO (PCT)
Prior art keywords
fully connected
connected layer
face image
target fully
gradient
Prior art date
Application number
PCT/CN2022/092647
Other languages
English (en)
French (fr)
Inventor
李弼
彭楠
希滕
张刚
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2023016007A1 publication Critical patent/WO2023016007A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • the present disclosure relates to the field of artificial intelligence, in particular to computer vision and deep learning technology, and in particular to a face recognition model training method, device, electronic equipment, storage medium, and computer program product, which can be used in face recognition scenarios.
  • Face recognition has important applications in many scenarios, such as the comparison of people and IDs in transfer scenarios such as airports and train stations, facial recognition in access control scenarios in private domain management scenarios, and real-person verification in financial scenarios.
  • transfer scenarios such as airports and train stations
  • facial recognition in access control scenarios in private domain management scenarios
  • real-person verification in financial scenarios.
  • the mismatch between the training process and the application process damages the recognition accuracy of the face recognition model.
  • the present disclosure provides a face recognition model training method, device, electronic equipment, storage medium and computer program product.
  • a training method for a face recognition model including: obtaining a training sample set, wherein the training samples in the training sample set include sample face images and category labels; The image is input, and the category label corresponding to the input sample face image is used as the expected output of the two target fully connected layers in the initial face recognition model, and the face recognition model is obtained through training, wherein the two target fully connected layers are sequentially Modeling is performed on sample face images including occluders and sample face images not including occluders.
  • a method for face recognition including: obtaining an image to be recognized; identifying the image to be recognized by a pre-trained face recognition model, and obtaining a face recognition result, wherein the face recognition model passes the first aspect Any implementation is trained.
  • a training device for a face recognition model including: a first acquisition unit configured to acquire a training sample set, wherein the training samples in the training sample set include sample face images and category labels; The unit is configured to use a machine learning method to take a sample face image as input, and use the category label corresponding to the input sample face image as the expected output of the two target fully connected layers in the initial face recognition model, and train to obtain A face recognition model, in which two target fully connected layers sequentially model the sample face images including occluders and the sample face images not including occluders.
  • a face recognition device including: a second acquisition unit configured to acquire an image to be recognized; a recognition unit configured to recognize the image to be recognized through a pre-trained face recognition model to obtain a person The face recognition result, wherein the face recognition model is obtained through training in any of the implementation methods of the first aspect.
  • an electronic device comprising: at least one processor; and a memory communicatively connected to at least one processor; wherein, the memory stores instructions executable by at least one processor, and the instructions are processed by at least one processor implemented by a processor, so that at least one processor can execute the method described in any implementation manner of the first aspect or the second aspect.
  • a non-transitory computer-readable storage medium storing computer instructions, the computer instructions are used to make a computer execute the method described in any implementation manner of the first aspect or the second aspect.
  • a computer program product including: a computer program, and when the computer program is executed by a processor, the method described in any one of the implementation manners of the first aspect and the second aspect is implemented.
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present disclosure can be applied;
  • Fig. 2 is the flow chart of an embodiment of the training method of face recognition model according to the present disclosure
  • Fig. 3 is the schematic diagram of the application scenario of the training method of the face recognition model according to the present embodiment
  • FIG. 4 is a flow chart of another embodiment of the training method of the face recognition model according to the present disclosure.
  • FIG. 5 is a schematic structural diagram of an embodiment of a face recognition method according to the present disclosure.
  • FIG. 6 is a structural diagram of an embodiment of a training device for a face recognition model according to the present disclosure
  • FIG. 7 is a structural diagram of an embodiment of a face recognition device according to the present disclosure.
  • FIG. 8 is a schematic structural diagram of a computer system suitable for implementing an embodiment of the present disclosure.
  • FIG. 1 shows an exemplary architecture 100 of a training method and device for a face recognition model, and a face recognition method and device of the present disclosure.
  • a system architecture 100 may include terminal devices 101 , 102 , 103 , a network 104 and a server 105 .
  • the communication connections between the terminal devices 101 , 102 , and 103 constitute a topological network, and the network 104 is used to provide a communication link medium between the terminal devices 101 , 102 , 103 and the server 105 .
  • Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
  • the terminal devices 101, 102, and 103 may be hardware devices or software that support network connections for data interaction and data processing.
  • the terminal devices 101, 102, and 103 are hardware, they can be various electronic devices that support network connection, information acquisition, interaction, display, processing, etc., including but not limited to monitoring equipment, smart phones, tablet computers, e-books, etc. Readers, laptops and desktops, and more.
  • the terminal devices 101, 102, 103 are software, they can be installed in the electronic devices listed above. It can be implemented, for example, as a plurality of software or software modules for providing distributed services, or as a single software or software module. No specific limitation is made here.
  • the server 105 may be a server that provides various services, such as a background server that receives training requests sent by users through terminal devices 101, 102, and 103, and uses machine learning methods to train face recognition models. During the training process, face images including occluders and face images not including occluders are modeled separately by two target fully connected layers in the face recognition model. After obtaining the pre-trained face recognition model, the server can also receive the image to be recognized sent by the user through the terminal device, perform face recognition, and obtain a face recognition result. As an example, server 105 may be a cloud server.
  • the server may be hardware or software.
  • the server can be implemented as a distributed server cluster composed of multiple servers, or as a single server.
  • the server is software, it can be implemented as multiple software or software modules (such as software or software modules for providing distributed services), or as a single software or software module. No specific limitation is made here.
  • the face recognition model training method and the face recognition method provided by the embodiments of the present disclosure can be executed by a server, or by a terminal device, or can be executed by the server and the terminal device in cooperation with each other.
  • the training device of the face recognition model and each part (such as each unit) included in the face recognition device can be all set in the server, can also be all set in the terminal device, and can also be set in the server and the terminal device respectively .
  • the numbers of terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
  • the system architecture may only include the training method of the face recognition model and the operation of the face recognition method Electronic equipment (such as server or terminal equipment) on it.
  • FIG. 2 is a flow chart of a method for training a face recognition model provided by an embodiment of the present disclosure.
  • the process 200 includes the following steps:
  • Step 201 acquire a training sample set.
  • the execution body of the face recognition model training method (for example, the terminal device or server in FIG. 1 ) can obtain the training sample set remotely or locally through a wired network connection or a wireless network connection.
  • the training samples in the training sample set include sample face images and category labels.
  • the sample face image includes a face object, and the class label is used to represent identity information or classification information of the face object in the sample face image corresponding to the class label.
  • the face objects in the sample face images may or may not include occluders.
  • the occluder may be, for example, any object such as a mask, a hat, and glasses that blocks the face object in the face image.
  • the training sample set can be obtained based on data collection.
  • data collection In transfer scenarios such as airports and railway stations, it is generally necessary to compare the passenger image collected on-site with the passenger's ID image to verify whether the passenger and the person represented by the ID image are the same person.
  • the execution subject can use the passenger image collected on-site as a sample face image, and use the identity information represented by the ID image representing the same person as the passenger as a category label to obtain a training data set.
  • the initial face recognition model can be any deep learning model with face recognition function, including but not limited to network models such as recurrent neural network, convolutional neural network, and residual network.
  • Step 202 using the machine learning method, taking the sample face image as input, and using the category label corresponding to the input sample face image as the expected output of the two target fully connected layers in the initial face recognition model, training to obtain a face Identify the model.
  • the above-mentioned executive body can use the machine learning method, take the sample face image as input, and take the category label corresponding to the input sample face image as the expectation of the two target fully connected layers in the initial face recognition model Output, trained to get the face recognition model.
  • the two target fully connected layers sequentially model the sample face images including occluders and the sample face images not including occluders.
  • the two target fully connected layers in the initial face recognition model may be located at the end of the initial face recognition model and are used to output recognition results based on the feature information of the extracted sample face images.
  • Each row or column parameter in the parameter matrix of the target fully connected layer represents a vector representation corresponding to one of the learned categories.
  • the parameter matrix of the target fully connected layer is updated according to the training results. Specifically, firstly, feature extraction is performed on the input sample face image through the feature extraction network in the face recognition model to obtain the feature vector; then, the feature vector is represented by the vector corresponding to each category in the target fully connected layer Carry out vector multiplication to determine the probability that the face object in the input sample face image belongs to each category, and obtain the actual output of the initial face recognition model; then, calculate the actual output and the input sample face image According to the classification loss between the corresponding category labels, the gradient is calculated according to the classification loss, and the parameters of the initial face recognition model are updated based on the gradient descent method and the stochastic gradient descent method.
  • the preset end condition may be, for example, that the training time exceeds a preset time threshold, the number of training times exceeds a preset number threshold, and the classification loss tends to converge.
  • the two target fully connected layers separately model the face image including the occluder and the face image not including the occluder, which is used to represent the first target fully connected layer in the two target fully connected layers Modeling a sample face image including an occluder.
  • the input sample face image is a face image including an occluder
  • the first target fully connected layer is updated according to the obtained classification loss.
  • the second target fully-connected layer is not updated; the second target fully-connected layer of the two target fully-connected layers models a sample face image that does not include occluders.
  • the face image is a face image that does not include an occluder, and the second target fully connected layer is updated according to the obtained classification loss.
  • the first target fully connected layer is not updated.
  • the two target fully-connected layers separately model the face images that include occluders and the face images that do not include occluders, regardless of whether the input sample face images include occluders, the two target fully-connected layers Both will output recognition results for the input sample face images.
  • the above execution subject may perform the above step 202 in the following manner:
  • the feature information of the input sample face image is extracted through the feature extraction network in the initial face recognition model, and according to the extracted feature information, through the first target fully connected layer and the second target layer in the initial face recognition model
  • the two target fully connected layers get the actual output respectively.
  • the first target fully connected layer models the face image including the occluder
  • the second target fully connected layer models the face image not including the occluder
  • the process of updating the first target fully connected layer based on the first classification loss and updating the feature extraction network according to the second classification loss can calculate the gradient based on the classification loss, and update parameters based on the gradient descent method and the stochastic gradient descent method.
  • a method for updating the first target fully connected layer and feature extraction network is provided, which can further improve the trained face image.
  • the recognition accuracy of the face recognition model is provided.
  • the above execution subject may perform the above second step in the following manner:
  • the first gradient is obtained according to the first classification loss; then, the first gradient is back-propagated to the first target fully connected layer, so as to update the first target fully connected layer according to the first gradient.
  • the above-mentioned execution subject may perform the above-mentioned third step in the following manner:
  • the second gradient is obtained according to the second classification loss; then, the second gradient is back-transmitted to the feature extraction network to update the feature extraction network according to the second gradient.
  • the above-mentioned executive body may update the first target fully connected layer using methods such as gradient descent method and stochastic gradient descent method.
  • the above execution subject may also perform the above step 202 in the following manner:
  • the feature information of the input sample face image is extracted through the feature extraction network, and according to the extracted feature information, the actual output is obtained through the first target fully connected layer and the second target fully connected layer respectively.
  • a method for updating the second target fully connected layer and feature extraction network is provided, which can further improve the training performance.
  • the recognition accuracy of the face recognition model is provided.
  • the above execution subject may perform the above second step in the following manner:
  • the third gradient is obtained; then, the third gradient is back-propagated to the second target fully connected layer, so as to update the second target fully connected layer according to the third gradient.
  • the above-mentioned execution subject may perform the above-mentioned third step in the following manner:
  • the fourth gradient is obtained; then, the fourth gradient is back-transmitted to the feature extraction network to update the feature extraction network according to the fourth gradient.
  • the above-mentioned executive body may update the first target fully connected layer using methods such as gradient descent method and stochastic gradient descent method.
  • FIG. 3 is a schematic diagram 300 of an application scenario of the method for training a face recognition model according to this embodiment.
  • the server first acquires a training sample set 301 .
  • the training samples in the training sample set include sample face images and category labels.
  • the server uses a machine learning method to use the sample face image as input, and uses the category label corresponding to the input sample face image as two of the initial face recognition models 302.
  • the expected output of the target fully connected layer is trained to obtain a face recognition model.
  • the two target fully connected layers sequentially model the sample face images including occluders and the sample face images not including occluders.
  • the recognition results of the input sample face images are output through the two target fully connected layers. update the first target fully connected layer 3021 used to model the sample face image including the occluder; in response to determining that the input image is a sample face image 3012 that does not include the occluder, through the recognition result and The classification loss between class labels updates the second target fully connected layer 3022 used to model the sample face images that do not include occluders.
  • two target fully connected layers are used to separate and model the face image including the occluder and the face image not including the occluder.
  • the application scenario is closer, and the recognition accuracy of the face recognition model is improved.
  • FIG. 4 a schematic flow 400 of an embodiment of a method for training a face recognition model according to the method of the present disclosure is shown.
  • the process 400 includes the following steps:
  • Step 401 acquire a training sample set.
  • training samples in the training sample set include sample face images and category labels.
  • Step 402 using a machine learning method to perform the following training operations until a face recognition model is obtained:
  • Step 4021 in response to determining that the input sample human face image is a human face image including an occluder, perform the following operations:
  • Step 40211 extract the feature information of the input sample face image through the feature extraction network in the initial face recognition model, and according to the extracted feature information, pass the first target fully connected layer and the second target layer in the initial face recognition model The two target fully connected layers get the actual output respectively.
  • the first target fully connected layer models the face image including the occluder
  • the second target fully connected layer models the face image not including the occluder
  • Step 40212 based on the first classification loss between the actual output of the first target fully connected layer modeling the sample face image including occluders in the initial face recognition model and the category label corresponding to the input sample face image , to update the first target fully connected layer.
  • Step 40213 based on the second classification between the actual output of the second target fully connected layer that models the sample face image that does not include occluders in the initial face recognition model and the category label corresponding to the input sample face image loss, to update the feature extraction network.
  • Step 4022 in response to determining that the input sample human face image is a human face image that does not include an occluder, perform the following operations:
  • Step 40221 extract feature information of the input sample face image through the feature extraction network, and obtain actual output through the first target fully connected layer and the second target fully connected layer respectively according to the extracted feature information.
  • Step 40222 Update the second target fully connected layer based on the third classification loss between the actual output of the second target fully connected layer and the category label corresponding to the input sample face image.
  • Step 40223 based on the fourth classification loss between the actual output of the first target fully connected layer and the category label corresponding to the input sample face image, update the feature extraction network.
  • the process 400 of the training method of the face recognition model in this embodiment specifically illustrates that the input image is a sample face image including an occluder
  • the training process in the case of and the training process in the case of the input image being a sample face image not including an occluder improves the recognition accuracy of the face recognition model.
  • FIG. 5 is a flow chart of a face recognition method provided by an embodiment of the present disclosure, wherein the process 500 includes the following steps:
  • Step 501 acquire an image to be recognized.
  • the execution subject of the face recognition method can obtain the image to be recognized remotely or locally through a wired network connection or a wireless network connection.
  • the image to be recognized may be any image.
  • the image to be recognized is each frame of image in the video captured by the monitoring device.
  • Step 502 using the pre-trained face recognition model to recognize the image to be recognized to obtain a face recognition result.
  • the execution subject may identify the image to be recognized by using a pre-trained face recognition model to obtain a face recognition result.
  • the face recognition result user represents the identity information of the face object in the image to be recognized.
  • the face recognition model is obtained based on the training methods shown in the foregoing embodiments 200 and 400 .
  • the face recognition result of the image to be recognized is obtained through the face recognition model, which improves the recognition accuracy of the face recognition result.
  • the present disclosure provides an embodiment of a face recognition model training device, which corresponds to the method embodiment shown in FIG. 2 ,
  • the device can be specifically applied to various electronic devices.
  • the training device of face recognition model includes: a first acquisition unit 601 configured to acquire a training sample set, wherein the training samples in the training sample set include sample face images and category labels; training unit 602 , is configured to use the machine learning method, take the sample face image as input, and take the category label corresponding to the input sample face image as the expected output of the two target fully connected layers in the initial face recognition model, and train to obtain the human A face recognition model, in which two target fully connected layers sequentially model the sample face images including occluders and the sample face images not including occluders.
  • the training unit 602 is further configured to: in response to determining that the input sample face image is a face image including an occluder, perform the following operations:
  • the feature extraction network in the model extracts the feature information of the input sample face image, and according to the extracted feature information, the actual Output, wherein, the first target fully connected layer models the face image including occluders, and the second target fully connected layer models the face image not including occluders; based on the actual Output the first classification loss between the category labels corresponding to the input sample face image, update the first target fully connected layer; based on the actual output of the second target fully connected layer and the category corresponding to the input sample face image A second classification loss between labels, updating the feature extraction network.
  • the training unit 602 is further configured to: obtain the first gradient according to the first classification loss; The gradient updates the first target fully connected layer; and according to the second classification loss, the second gradient is obtained; the second gradient is back-transmitted to the feature extraction network, so as to update the feature extraction network according to the second gradient.
  • the training unit 602 is further configured to: in response to determining that the input sample face image is a face image that does not include an occluder, perform the following operations: Extract the feature information of the input sample face image, and according to the extracted feature information, obtain the actual output respectively through the first target fully connected layer and the second target fully connected layer; based on the actual output of the second target fully connected layer and The third classification loss between the category labels corresponding to the input sample face images, and update the second target fully connected layer; based on the difference between the actual output of the first target fully connected layer and the category labels corresponding to the input sample face images Between the fourth classification loss, update the feature extraction network.
  • the training unit 602 is further configured to: obtain the third gradient according to the third classification loss; The gradient updates the second target fully connected layer; and according to the fourth classification loss, a fourth gradient is obtained; the fourth gradient is back-transmitted to the feature extraction network, so as to update the feature extraction network according to the fourth gradient.
  • two target fully connected layers are used to separate and model the face image including the occluder and the face image not including the occluder.
  • the application scenario is closer, and the recognition accuracy of the face recognition model is improved.
  • the present disclosure provides an embodiment of a face recognition device, which corresponds to the method embodiment shown in FIG. 5 , and the device specifically It can be applied to various electronic devices.
  • the training device of the face recognition model includes: a second acquisition unit 701 configured to acquire an image to be recognized; a recognition unit 702 configured to recognize the image to be recognized through a pre-trained face recognition model, and obtain Face recognition results.
  • the face recognition model is obtained through training in the embodiments 200 and 400 .
  • the face recognition result of the image to be recognized is obtained through the face recognition model, which improves the recognition accuracy of the face recognition result.
  • the present disclosure also provides an electronic device, the electronic device includes: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores information executable by the at least one processor.
  • An instruction the instruction is executed by at least one processor, so that the at least one processor can implement the face recognition model training method and the face recognition method described in any of the above embodiments when executed.
  • the present disclosure also provides a readable storage medium, the readable storage medium stores computer instructions, and the computer instructions are used to enable the computer to implement the face recognition described in any of the above-mentioned embodiments. Model training method, face recognition method.
  • An embodiment of the present disclosure provides a computer program product.
  • the computer program is executed by a processor, the face recognition model training method and the face recognition method described in any of the above embodiments can be implemented.
  • FIG. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure.
  • Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 800 includes a computing unit 801 that can execute according to a computer program stored in a read-only memory (ROM) 802 or loaded from a storage unit 808 into a random access memory (RAM) 803. Various appropriate actions and treatments. In the RAM 803, various programs and data necessary for the operation of the device 800 can also be stored.
  • the computing unit 801, ROM 802, and RAM 803 are connected to each other through a bus 804.
  • An input/output (I/O) interface 805 is also connected to the bus 804 .
  • the I/O interface 805 includes: an input unit 806, such as a keyboard, a mouse, etc.; an output unit 807, such as various types of displays, speakers, etc.; a storage unit 808, such as a magnetic disk, an optical disk, etc. ; and a communication unit 809, such as a network card, a modem, a wireless communication transceiver, and the like.
  • the communication unit 809 allows the device 800 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunication networks.
  • the computing unit 801 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 801 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
  • the calculation unit 801 executes various methods and processes described above, such as a face recognition model training method and a face recognition method.
  • the face recognition model training method and the face recognition method can be implemented as computer software programs, which are tangibly contained in machine-readable media, such as the storage unit 808 .
  • part or all of the computer program may be loaded and/or installed on the device 800 via the ROM 802 and/or the communication unit 809.
  • the computer program When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the training method of the face recognition model and the face recognition method described above can be performed.
  • the calculation unit 801 may be configured in any other appropriate manner (for example, by means of firmware) to execute a face recognition model training method and a face recognition method.
  • Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system of systems
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • programmable processor can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
  • Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes can be provided to a processor or controller of a general-purpose computer, a special-purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
  • the systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
  • a computer system may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the server can be a cloud server, also known as cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the management difficulties in traditional physical host and virtual private server (VPS, Virtual Private Server) services large and weak business scalability; it can also be a server of a distributed system, or a server combined with a blockchain.
  • VPN Virtual Private Server
  • two target fully connected layers are used to separate and model the face image including the occluder and the face image not including the occluder.
  • the application scenarios of the face recognition model are closer, which improves the recognition accuracy of the face recognition model.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了人脸识别模型的训练方法、装置、电子设备、存储介质及计算机程序产品,涉及人工智能领域,具体涉及计算机视觉和深度学习技术,可用于人脸识别场景下。具体实现方案为:获取训练样本集,其中,训练样本集中的训练样本包括样本人脸图像和类别标签;利用机器学习方法,以样本人脸图像为输入,以所输入的样本人脸图像对应的类别标签为初始人脸识别模型中的两个目标全连接层的期望输出,训练得到人脸识别模型,其中,两个目标全连接层依次对包括遮挡物的样本人脸图像、不包括遮挡物的样本人脸图像建模。本公开提高了人脸识别模型的识别精度。

Description

人脸识别模型的训练方法、装置及计算机程序产品
相关申请的交叉引用
本专利申请要求于2021年8月13日提交的、申请号为202110940012.2、发明名称为“人脸识别模型的训练方法、装置及计算机程序产品”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本公开涉及人工智能领域,具体涉及计算机视觉和深度学习技术,尤其涉及人脸识别模型的训练方法、装置、电子设备、存储介质以及计算机程序产品,可用于人脸识别场景下。
背景技术
近年来,随着深度学习技术的发展,基于深度学习技术的人脸识别的精度大幅提升。人脸识别在很多场景下有重要应用,如机场、火车站等换乘场景下的人证比对,私域管理场景下的门禁刷脸,金融场景下的真人核验等。而在人脸识别模型的应用过程中,训练过程和应用过程的不匹配,损害了人脸识别模型的识别精度。
发明内容
本公开提供了一种人脸识别模型的训练方法、装置、电子设备、存储介质以及计算机程序产品。
根据第一方面,提供了一种人脸识别模型的训练方法,包括:获取训练样本集,其中,训练样本集中的训练样本包括样本人脸图像和类别标签;利用机器学习方法,以样本人脸图像为输入,以所输入的样本人脸图像对应的类别标签为初始人脸识别模型中的两个目标全连接层的期望输出,训练得到人脸识别模型,其中,两个目标全连接层依次对包括遮挡物的样本人脸图像、不包括遮挡物的样本人脸图像建模。
根据第二方面,提供了一种人脸识别方法,包括:获取待识别图像;通过预训练的人脸识别模型识别待识别图像,得到人脸识别结果,其中,人脸识别模型通过第一方面任一实现方式训练得到。
根据第三方面,提供了一种人脸识别模型的训练装置,包括:第一获取单元,被配置成获取训练样本集,其中,训练样本集中的训练样本包括样本人脸图像和类别标 签;训练单元,被配置成利用机器学习方法,以样本人脸图像为输入,以所输入的样本人脸图像对应的类别标签为初始人脸识别模型中的两个目标全连接层的期望输出,训练得到人脸识别模型,其中,两个目标全连接层依次对包括遮挡物的样本人脸图像、不包括遮挡物的样本人脸图像建模。
根据第四方面,提供了一种人脸识别装置,包括:第二获取单元,被配置成获取待识别图像;识别单元,被配置成通过预训练的人脸识别模型识别待识别图像,得到人脸识别结果,其中,人脸识别模型通过第一方面任一实现方式训练得到。
根据第五方面,提供了一种电子设备,包括:至少一个处理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行如第一方面、第二方面任一实现方式描述的方法。
根据第六方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,计算机指令用于使计算机执行如第一方面、第二方面任一实现方式描述的方法。
根据第七方面,提供了一种计算机程序产品,包括:计算机程序,计算机程序在被处理器执行时实现如第一方面、第二方面任一实现方式描述的方法。
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。
附图说明
附图用于更好地理解本方案,不构成对本公开的限定。其中:
图1是根据本公开的一个实施例可以应用于其中的示例性系统架构图;
图2是根据本公开的人脸识别模型的训练方法的一个实施例的流程图;
图3是根据本实施例的人脸识别模型的训练方法的应用场景的示意图;
图4是根据本公开的人脸识别模型的训练方法的又一个实施例的流程图;
图5是根据本公开的人脸识别方法的一个实施例的结构示意图;
图6是根据本公开的人脸识别模型的训练装置的一个实施例的结构图;
图7是根据本公开的人脸识别装置的一个实施例的结构图;
图8是适于用来实现本公开实施例的计算机系统的结构示意图。
具体实施方式
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
本公开的技术方案中,所涉及的用户个人信息的收集、存储、使用、加工、传输、提供和公开等处理,均符合相关法律法规的规定,且不违背公序良俗。
图1示出了可以应用本公开的人脸识别模型的训练方法及装置、人脸识别方法及装置的示例性架构100。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。终端设备101、102、103之间通信连接构成拓扑网络,网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
终端设备101、102、103可以是支持网络连接从而进行数据交互和数据处理的硬件设备或软件。当终端设备101、102、103为硬件时,其可以是支持网络连接,信息获取、交互、显示、处理等功能的各种电子设备,包括但不限于监控设备、智能手机、平板电脑、电子书阅读器、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成例如用来提供分布式服务的多个软件或软件模块,也可以实现成单个软件或软件模块。在此不做具体限定。
服务器105可以是提供各种服务的服务器,例如接收用户通过终端设备101、102、103发送的训练请求,利用机器学习方法训练人脸识别模型的后台服务器。在训练过程中,通过人脸识别模型中的两个目标全连接层对包括遮挡物的人脸图像和不包括遮挡物的人脸图像进行分离建模。在得到预训练的人脸识别模型后,服务器还可以接收用户通过终端设备发送的待识别图像,进行人脸识别,得到人脸识别结果。作为示例,服务器105可以是云端服务器。
需要说明的是,服务器可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务的软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。
还需要说明的是,本公开的实施例所提供的人脸识别模型的训练方法、人脸识别 方法可以由服务器执行,也可以由终端设备执行,还可以由服务器和终端设备彼此配合执行。相应地,人脸识别模型的训练装置、人脸识别装置包括的各个部分(例如各个单元)可以全部设置于服务器中,也可以全部设置于终端设备中,还可以分别设置于服务器和终端设备中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。当人脸识别模型的训练方法、人脸识别方法运行于其上的电子设备不需要与其他电子设备进行数据传输时,该系统架构可以仅包括人脸识别模型的训练方法、人脸识别方法运行于其上的电子设备(例如服务器或终端设备)。
请参考图2,图2为本公开实施例提供的一种人脸识别模型的训练方法的流程图,流程200包括以下步骤:
步骤201,获取训练样本集。
本实施例中,人脸识别模型的训练方法的执行主体(例如,图1中的终端设备或服务器)可以通过有线网络连接方式或无线网络连接方式从远程,或从本地获取训练样本集。
训练样本集中的训练样本包括样本人脸图像和类别标签。样本人脸图像中包括人脸对象,类别标签用于表征类别标签所对应的样本人脸图像中的人脸对象的身份信息或分类信息。
样本人脸图像中的人脸对象上可能包括遮挡物,也可能不包括遮挡物。遮挡物例如可以是口罩、帽子、眼镜等任意对人脸图像中的人脸对象起到遮挡作用的物体。
训练样本集可以基于数据采集得到。作为示例,在机场、火车站等换乘场景下,一般需要对现场采集的乘客图像和乘客的证件图像进行比对,以验证乘客与证件图像所表征的人员是否为同一人。在该场景下,上述执行主体可以将现场采集的乘客图像作为样本人脸图像,将与乘客表征同一人的证件图像所表征的身份信息作为类别标签,得到训练数据集。
初始人脸识别模型可以是具有人脸识别功能的任意深度学习模型,包括但不限于是循环神经网络、卷积神经网络、残差网络等网络模型。
步骤202,利用机器学习方法,以样本人脸图像为输入,以所输入的样本人脸图像对应的类别标签为初始人脸识别模型中的两个目标全连接层的期望输出,训练得到人脸识别模型。
本实施例中,上述执行主体可以利用机器学习方法,以样本人脸图像为输入,以所输入的样本人脸图像对应的类别标签为初始人脸识别模型中的两个目标全连接层的期望输出,训练得到人脸识别模型。其中,两个目标全连接层依次对包括遮挡物的样本人脸图像、不包括遮挡物的样本人脸图像建模。
初始人脸识别模型中的两个目标全连接层可以是位于初始人脸识别模型的最后,用于基于所提取的样本人脸图像的特征信息输出识别结果的全连接层。
目标全连接层的参数矩阵中的每一行或每一列参数表征所学习到的多个类别中的一个类别对应的向量表示。在模型训练过程中,目标全连接层的参数矩阵根据训练结果进行更新。具体的,首先,通过人脸识别模型中的特征提取网络对所输入的样本人脸图像进行特征提取,得到特征向量;然后,将特征向量与目标全连接层中的每个类别对应的向量表示进行向量乘法运算,以确定该所输入的样本人脸图像中的人脸对象属于每个类别的概率,得到初始人脸识别模型的实际输出;然后,计算实际输出与所输入的样本人脸图像对应的类别标签之间的分类损失,根据分类损失计算梯度,并基于梯度下降法、随机梯度下降法进行初始人脸识别模型的参数更新。响应于确定达到预设结束条件,得到人脸识别模型。其中,预设结束条件例如可以是训练时间超过预设时间阈值,训练次数超过预设次数阈值,分类损失趋于收敛。
需要说明的是,两个目标全连接层对包括遮挡物的人脸图像和不包括遮挡物的人脸图像进行分离建模,用于表征两个目标全连接层中的第一目标全连接层对包括遮挡物的样本人脸图像建模,在模型训练过程中,针对于输入的样本人脸图像为包括遮挡物的人脸图像,根据所得到的分类损失更新第一目标全连接层,此时,并不更新第二目标全连接层;两个目标全连接层中的第二目标全连接层对不包括遮挡物的样本人脸图像建模,在模型训练过程中,针对于输入的样本人脸图像为不包括遮挡物的人脸图像,根据所得到的分类损失更新第二目标全连接层,此时,并不更新第一目标全连接层。
虽然两个目标全连接层对包括遮挡物的人脸图像和不包括遮挡物的人脸图像进行分离建模,但是,无论所输入的样本人脸图像是否包括遮挡物,两个目标全连接层均会输出针对于所输入的样本人脸图像的识别结果。
在本实施例的一些可选的实现方式中,上述执行主体可以通过如下方式执行上述步骤202:
响应于确定所输入的样本人脸图像为包括遮挡物的人脸图像,执行如下操作:
第一,通过初始人脸识别模型中的特征提取网络提取所输入的样本人脸图像的特征信息,并依据提取到的特征信息,通过初始人脸识别模型中的第一目标全连接层、第二目标全连接层分别得到实际输出。
其中,第一目标全连接层对包括遮挡物的人脸图像进行建模,第二目标全连接层对不包括遮挡物的人脸图像进行建模。
第二,基于第一目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第一分类损失,更新第一目标全连接层。
第三,基于第二目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第二分类损失,更新特征提取网络。
其中,基于第一分类损失更新第一目标全连接层、根据第二分类损失更新特征提取网络的过程均可以基于分类损失计算梯度,基于梯度下降法、随机梯度下降法进行参数更新。
本实现方式中,针对于所输入的样本人脸图像为包括遮挡物的人脸图像的情形,提供了一种更新第一目标全连接层、特征提取网络的方法,可以进一步提高训练后的人脸识别模型的识别精度。
在本实施例的一些可选的实现方式中,上述执行主体可以通过如下方式执行上述第二步骤:
首先,根据第一分类损失,得到第一梯度;然后,向第一目标全连接层反传第一梯度,以根据第一梯度更新第一目标全连接层。
本实现方式中,上述执行主体可以通过如下方式执行上述第三步骤:
首先,根据第二分类损失,得到第二梯度;然后,向特征提取网络反传第二梯度,以根据第二梯度更新特征提取网络。
具体的,上述执行主体可以采用梯度下降法、随机梯度下降法等方法更新第一目标全连接层。
本实现方式中,提供了一种更新第一目标全连接层、特征提取网络的具体方法,提高了训练过程的灵活性和训练效率。
在本实施例的一些可选的实现方式中,上述执行主体可以还通过如下方式执行上述步骤202:
响应于确定所输入的样本人脸图像为不包括遮挡物的人脸图像,执行如下操作:
第一,通过特征提取网络提取所输入的样本人脸图像的特征信息,并依据提取到 的特征信息,通过第一目标全连接层、第二目标全连接层分别得到实际输出。
第二,基于第二目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第三分类损失,更新第二目标全连接层。
第三,基于第一目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第四分类损失,更新特征提取网络。
本实现方式中,针对于所输入的样本人脸图像为不包括遮挡物的人脸图像的情形,提供了一种更新第二目标全连接层、特征提取网络的方法,可以进一步提高训练后的人脸识别模型的识别精度。
在本实施例的一些可选的实现方式中,上述执行主体可以通过如下方式执行上述第二步骤:
首先,根据第三分类损失,得到第三梯度;然后,向第二目标全连接层反传第三梯度,以根据第三梯度更新第二目标全连接层。
本实现方式中,上述执行主体可以通过如下方式执行上述第三步骤:
首先,根据第四分类损失,得到第四梯度;然后,向特征提取网络反传第四梯度,以根据第四梯度更新特征提取网络。
具体的,上述执行主体可以采用梯度下降法、随机梯度下降法等方法更新第一目标全连接层。
本实现方式中,提供了一种更新第二目标全连接层、特征提取网络的具体方法,提高了训练过程的灵活性和训练效率。
继续参见图3,图3是根据本实施例的人脸识别模型的训练方法的应用场景的一个示意图300。在图3的应用场景中,服务器首先获取训练样本集301。其中,训练样本集中的训练样本包括样本人脸图像和类别标签。在获取训练样本集301之后,服务器利用机器学习方法,利用机器学习方法,以样本人脸图像为输入,以所输入的样本人脸图像对应的类别标签为初始人脸识别模型302中的两个目标全连接层的期望输出,训练得到人脸识别模型。其中,两个目标全连接层依次对包括遮挡物的样本人脸图像、不包括遮挡物的样本人脸图像建模。具体的,通过两个目标全连接层输出所输入的样本人脸图像的识别结果,响应于确定所输入的图像为包括遮挡物的样本人脸图像3011,通过实际输出的识别结果与类别标签之间的分类损失更新用于对包括遮挡物的样本人脸图像建模的第一目标全连接层3021;响应于确定所输入的图像为不包括遮挡物的样本人脸图像3012,通过识别结果与类别标签之间的分类损失更新用于对不包括遮挡 物的样本人脸图像建模的第二目标全连接层3022。
本实施例中,在人脸识别模型的训练过程中,通过两个目标全连接层对包括遮挡物的人脸图像和不包括遮挡物的人脸图像进行分离建模,与人脸识别模型的应用场景更加贴近,提高了人脸识别模型的识别精度。
继续参考图4,示出了根据本公开的方法的一个人脸识别模型的训练方法实施例的示意性流程400。其中,流程400包括以下步骤:
步骤401,获取训练样本集。
其中,训练样本集中的训练样本包括样本人脸图像和类别标签。
步骤402,利用机器学习方法,执行如下训练操作,直至得到人脸识别模型:
步骤4021,响应于确定所输入的样本人脸图像为包括遮挡物的人脸图像,执行如下操作:
步骤40211,通过初始人脸识别模型中的特征提取网络提取所输入的样本人脸图像的特征信息,并依据提取到的特征信息,通过初始人脸识别模型中的第一目标全连接层、第二目标全连接层分别得到实际输出。
其中,第一目标全连接层对包括遮挡物的人脸图像进行建模,第二目标全连接层对不包括遮挡物的人脸图像进行建模。
步骤40212,基于初始人脸识别模型中对包括遮挡物的样本人脸图像建模的第一目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第一分类损失,更新第一目标全连接层。
步骤40213,基于初始人脸识别模型中对不包括遮挡物的样本人脸图像建模的第二目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第二分类损失,更新特征提取网络。
步骤4022,响应于确定所输入的样本人脸图像为不包括遮挡物的人脸图像,执行如下操作:
步骤40221,通过特征提取网络提取所输入的样本人脸图像的特征信息,并依据提取到的特征信息,通过第一目标全连接层、第二目标全连接层分别得到实际输出。
步骤40222,基于第二目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第三分类损失,更新第二目标全连接层。
步骤40223,基于第一目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第四分类损失,更新特征提取网络。
从本实施例中可以看出,与图2对应的实施例相比,本实施例中的人脸识别模型的训练方法的流程400具体说明了所输入的图像为包括遮挡物的样本人脸图像的情况下的训练过程,以及所输入的图像为不包括遮挡物的样本人脸图像的情况下的训练过程,提高了人脸识别模型的识别精度。
请参考图5,图5为本公开实施例提供的一种人脸识别方法的流程图,其中,流程500包括以下步骤:
步骤501,获取待识别图像。
本实施例中,人脸识别方法的执行主体(例如图1中的终端设备或服务器)可以通过有线网络连接方式或无线网络连接方式从远程,或从本地获取待识别图像。
其中,待识别图像可以是任意图像。作为示例,待识别图像为监控设备所摄取的视频中的每帧图像。
步骤502,通过预训练的人脸识别模型识别待识别图像,得到人脸识别结果。
本实施例中,上述执行主体可以通过预训练的人脸识别模型识别待识别图像,得到人脸识别结果。人脸识别结果用户表征待识别图像中的人脸对象的身份信息。
其中,人脸识别模型基于上述实施例200、400所示的训练方法得到。
本实现例中,通过人脸识别模型得到待识别图像的人脸识别结果,提高了人脸识别结果的识别精度。
继续参考图6,作为对上述各图所示方法的实现,本公开提供了一种人脸识别模型的训练装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图6所示,人脸识别模型的训练装置,包括:第一获取单元601,被配置成获取训练样本集,其中,训练样本集中的训练样本包括样本人脸图像和类别标签;训练单元602,被配置成利用机器学习方法,以样本人脸图像为输入,以所输入的样本人脸图像对应的类别标签为初始人脸识别模型中的两个目标全连接层的期望输出,训练得到人脸识别模型,其中,两个目标全连接层依次对包括遮挡物的样本人脸图像、不包括遮挡物的样本人脸图像建模。
在本实施例的一些可选的实现方式中,训练单元602,进一步被配置成:响应于确定所输入的样本人脸图像为包括遮挡物的人脸图像,执行如下操作:通过初始人脸识别模型中的特征提取网络提取所输入的样本人脸图像的特征信息,并依据提取到的特征信息,通过初始人脸识别模型中的第一目标全连接层、第二目标全连接层分别得 到实际输出,其中,第一目标全连接层对包括遮挡物的人脸图像进行建模,第二目标全连接层对不包括遮挡物的人脸图像进行建模;基于第一目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第一分类损失,更新第一目标全连接层;基于第二目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第二分类损失,更新特征提取网络。
在本实施例的一些可选的实现方式中,训练单元602,进一步被配置成:根据第一分类损失,得到第一梯度;向第一目标全连接层反传第一梯度,以根据第一梯度更新第一目标全连接层;以及根据第二分类损失,得到第二梯度;向特征提取网络反传第二梯度,以根据第二梯度更新特征提取网络。
在本实施例的一些可选的实现方式中,训练单元602,进一步被配置成:响应于确定所输入的样本人脸图像为不包括遮挡物的人脸图像,执行如下操作:通过特征提取网络提取所输入的样本人脸图像的特征信息,并依据提取到的特征信息,通过第一目标全连接层、第二目标全连接层分别得到实际输出;基于第二目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第三分类损失,更新第二目标全连接层;基于第一目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第四分类损失,更新特征提取网络。
在本实施例的一些可选的实现方式中,训练单元602,进一步被配置成:根据第三分类损失,得到第三梯度;向第二目标全连接层反传第三梯度,以根据第三梯度更新第二目标全连接层;以及根据第四分类损失,得到第四梯度;向特征提取网络反传第四梯度,以根据第四梯度更新特征提取网络。
本实施例中,在人脸识别模型的训练过程中,通过两个目标全连接层对包括遮挡物的人脸图像和不包括遮挡物的人脸图像进行分离建模,与人脸识别模型的应用场景更加贴近,提高了人脸识别模型的识别精度。
继续参考图7,作为对上述各图所示方法的实现,本公开提供了一种人脸识别装置的一个实施例,该装置实施例与图5所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图7所示,人脸识别模型的训练装置包括:第二获取单元701,被配置成获取待识别图像;识别单元702,被配置成通过预训练的人脸识别模型识别待识别图像,得到人脸识别结果。其中,人脸识别模型通过实施例200、400训练得到。
本实现例中,通过人脸识别模型得到待识别图像的人脸识别结果,提高了人脸识 别结果的识别精度。
根据本公开的实施例,本公开还提供了一种电子设备,该电子设备包括:至少一个处理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,该指令被至少一个处理器执行,以使至少一个处理器执行时能够实现上述任意实施例所描述的人脸识别模型的训练方法、人脸识别方法。
根据本公开的实施例,本公开还提供了一种可读存储介质,该可读存储介质存储有计算机指令,该计算机指令用于使计算机执行时能够实现上述任意实施例所描述的人脸识别模型的训练方法、人脸识别方法。
本公开实施例提供了一种计算机程序产品,该计算机程序在被处理器执行时能够实现上述任意实施例所描述的人脸识别模型的训练方法、人脸识别方法。
图8示出了可以用来实施本公开的实施例的示例电子设备800的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。
如图8所示,设备800包括计算单元801,其可以根据存储在只读存储器(ROM)802中的计算机程序或者从存储单元808加载到随机访问存储器(RAM)803中的计算机程序,来执行各种适当的动作和处理。在RAM 803中,还可存储设备800操作所需的各种程序和数据。计算单元801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。
设备800中的多个部件连接至I/O接口805,包括:输入单元806,例如键盘、鼠标等;输出单元807,例如各种类型的显示器、扬声器等;存储单元808,例如磁盘、光盘等;以及通信单元809,例如网卡、调制解调器、无线通信收发机等。通信单元809允许设备800通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
计算单元801可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元801的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元801执行 上文所描述的各个方法和处理,例如人脸识别模型的训练方法、人脸识别方法。例如,在一些实施例中,人脸识别模型的训练方法、人脸识别方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元808。在一些实施例中,计算机程序的部分或者全部可以经由ROM 802和/或通信单元809而被载入和/或安装到设备800上。当计算机程序加载到RAM 803并由计算单元801执行时,可以执行上文描述的人脸识别模型的训练方法、人脸识别方法的一个或多个步骤。备选地,在其他实施例中,计算单元801可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行人脸识别模型的训练方法、人脸识别方法。
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决传统物理主机与虚拟专用服务器(VPS,Virtual Private Server)服务中存在的管理难度大,业务扩展性弱的缺陷;也可以为分布式系统的服务器,或者是结合了区块链的服务器。
根据本公开实施例的技术方案,在人脸识别模型的训练过程中,通过两个目标全连接层对包括遮挡物的人脸图像和不包括遮挡物的人脸图像进行分离建模,与人脸识别模型的应用场景更加贴近,提高了人脸识别模型的识别精度。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开提供的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。

Claims (15)

  1. 一种人脸识别模型的训练方法,包括:
    获取训练样本集,其中,所述训练样本集中的训练样本包括样本人脸图像和类别标签;
    利用机器学习方法,以样本人脸图像为输入,以所输入的样本人脸图像对应的类别标签为初始人脸识别模型中的两个目标全连接层的期望输出,训练得到所述人脸识别模型,其中,所述两个目标全连接层依次对包括遮挡物的样本人脸图像、不包括遮挡物的样本人脸图像建模。
  2. 根据权利要求1所述的方法,其中,所述利用机器学习方法,以样本人脸图像为输入,以所输入的样本人脸图像对应的类别标签为初始人脸识别模型中的两个目标全连接层的期望输出,训练得到所述人脸识别模型,包括:
    响应于确定所输入的样本人脸图像为包括遮挡物的人脸图像,执行如下操作:
    通过所述初始人脸识别模型中的特征提取网络提取所输入的样本人脸图像的特征信息,并依据提取到的特征信息,通过所述初始人脸识别模型中的第一目标全连接层、第二目标全连接层分别得到实际输出,其中,所述第一目标全连接层对包括遮挡物的人脸图像进行建模,所述第二目标全连接层对不包括遮挡物的人脸图像进行建模;
    基于所述第一目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第一分类损失,更新所述第一目标全连接层;
    基于所述第二目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第二分类损失,更新所述特征提取网络。
  3. 根据权利要求2所述的方法,其中,所述基于所述第一目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第一分类损失,更新所述第一目标全连接层,包括:
    根据所述第一分类损失,得到第一梯度;
    向所述第一目标全连接层反传所述第一梯度,以根据所述第一梯度更新所述第一目标全连接层;以及
    所述基于所述第二目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第二分类损失,更新所述特征提取网络,包括:
    根据所述第二分类损失,得到第二梯度;
    向所述特征提取网络反传所述第二梯度,以根据所述第二梯度更新所述特征提取网络。
  4. 根据权利要求2所述的方法,其中,所述利用机器学习方法,以样本人脸图像为输入,以所输入的样本人脸图像对应的类别标签为初始人脸识别模型中的两个目标全连接层的期望输出,训练得到所述人脸识别模型,还包括:
    响应于确定所输入的样本人脸图像为不包括遮挡物的人脸图像,执行如下操作:
    通过所述特征提取网络提取所输入的样本人脸图像的特征信息,并依据提取到的特征信息,通过所述第一目标全连接层、所述第二目标全连接层分别得到实际输出;
    基于所述第二目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第三分类损失,更新所述第二目标全连接层;
    基于所述第一目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第四分类损失,更新所述特征提取网络。
  5. 根据权利要求4所述的方法,其中,所述基于所述第二目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第三分类损失,更新所述第二目标全连接层,包括:
    根据所述第三分类损失,得到第三梯度;
    向所述第二目标全连接层反传所述第三梯度,以根据所述第三梯度更新所述第二目标全连接层;以及
    所述基于所述第一目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第四分类损失,更新所述特征提取网络,包括:
    根据所述第四分类损失,得到第四梯度;
    向所述特征提取网络反传所述第四梯度,以根据所述第四梯度更新所述特征提取网络。
  6. 一种人脸识别方法,包括:
    获取待识别图像;
    通过预训练的人脸识别模型识别所述待识别图像,得到人脸识别结果,其中,所述人脸识别模型通过权利要求1-5中任一项方法训练得到。
  7. 一种人脸识别模型的训练装置,包括:
    第一获取单元,被配置成获取训练样本集,其中,所述训练样本集中的训练样本包括样本人脸图像和类别标签;
    训练单元,被配置成利用机器学习方法,以样本人脸图像为输入,以所输入的样本人脸图像对应的类别标签为初始人脸识别模型中的两个目标全连接层的期望输出,训练得到所述人脸识别模型,其中,所述两个目标全连接层依次对包括遮挡物的样本人脸图像、不包括遮挡物的样本人脸图像建模。
  8. 根据权利要求7所述的装置,其中,所述训练单元,进一步被配置成:
    响应于确定所输入的样本人脸图像为包括遮挡物的人脸图像,执行如下操作:通过所述初始人脸识别模型中的特征提取网络提取所输入的样本人脸图像的特征信息,并依据提取到的特征信息,通过所述初始人脸识别模型中的第一目标全连接层、第二目标全连接层分别得到实际输出,其中,所述第一目标全连接层对包括遮挡物的人脸图像进行建模,所述第二目标全连接层对不包括遮挡物的人脸图像进行建模;基于所述第一目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第一分类损失,更新所述第一目标全连接层;基于所述第二目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第二分类损失,更新所述特征提取网络。
  9. 根据权利要求8所述的装置,其中,所述训练单元,进一步被配置成:
    根据所述第一分类损失,得到第一梯度;向所述第一目标全连接层反传所述第一梯度,以根据所述第一梯度更新所述第一目标全连接层;以及
    根据所述第二分类损失,得到第二梯度;向所述特征提取网络反传所述第二梯度,以根据所述第二梯度更新所述特征提取网络。
  10. 根据权利要求8所述的装置,其中,所述训练单元,进一步被配置成:
    响应于确定所输入的样本人脸图像为不包括遮挡物的人脸图像,执行如下操作:通过所述特征提取网络提取所输入的样本人脸图像的特征信息,并依据提取到的特征信息,通过所述第一目标全连接层、所述第二目标全连接层分别得到实际输出;基于所述第二目标全连接层的实际输出与所输入的样本人脸图像对应的类别标签之间的第三分类损失,更新所述第二目标全连接层;基于所述第一目标全连接层的实际输出与所输入的样本人脸图像 对应的类别标签之间的第四分类损失,更新所述特征提取网络。
  11. 根据权利要求10所述的装置,其中,所述训练单元,进一步被配置成:
    根据所述第三分类损失,得到第三梯度;向所述第二目标全连接层反传所述第三梯度,以根据所述第三梯度更新所述第二目标全连接层;以及
    根据所述第四分类损失,得到第四梯度;向所述特征提取网络反传所述第四梯度,以根据所述第四梯度更新所述特征提取网络。
  12. 一种人脸识别装置,包括:
    第二获取单元,被配置成获取待识别图像;
    识别单元,被配置成通过预训练的人脸识别模型识别所述待识别图像,得到人脸识别结果,其中,所述人脸识别模型通过权利要求7-11中任一项装置训练得到。
  13. 一种电子设备,其特征在于,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-6中任一项所述的方法。
  14. 一种非瞬时计算机可读存储介质,存储有计算机指令,其中,所述计算机指令用于使所述计算机执行权利要求1-6中任一项所述的方法。
  15. 一种计算机程序产品,包括:计算机程序,其中,所述计算机程序在被处理器执行时实现根据权利要求1-6中任一项所述的方法。
PCT/CN2022/092647 2021-08-13 2022-05-13 人脸识别模型的训练方法、装置及计算机程序产品 WO2023016007A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110940012.2 2021-08-13
CN202110940012.2A CN113657269A (zh) 2021-08-13 2021-08-13 人脸识别模型的训练方法、装置及计算机程序产品

Publications (1)

Publication Number Publication Date
WO2023016007A1 true WO2023016007A1 (zh) 2023-02-16

Family

ID=78479358

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/092647 WO2023016007A1 (zh) 2021-08-13 2022-05-13 人脸识别模型的训练方法、装置及计算机程序产品

Country Status (2)

Country Link
CN (1) CN113657269A (zh)
WO (1) WO2023016007A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657269A (zh) * 2021-08-13 2021-11-16 北京百度网讯科技有限公司 人脸识别模型的训练方法、装置及计算机程序产品
CN114519378B (zh) * 2021-12-24 2023-05-30 浙江大华技术股份有限公司 特征提取单元的训练方法、人脸识别方法及装置
CN114663980B (zh) * 2022-04-01 2023-04-18 北京百度网讯科技有限公司 行为识别方法、深度学习模型的训练方法及装置
CN115100717A (zh) * 2022-06-29 2022-09-23 腾讯科技(深圳)有限公司 特征提取模型的训练方法、卡通对象的识别方法及装置
CN115622730A (zh) * 2022-08-25 2023-01-17 支付宝(杭州)信息技术有限公司 人脸攻击检测模型的训练方法、人脸攻击检测方法和装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200042775A1 (en) * 2019-09-10 2020-02-06 Lg Electronics Inc. Artificial intelligence server and method for de-identifying face area of unspecific person from image file
CN112052789A (zh) * 2020-09-03 2020-12-08 腾讯科技(深圳)有限公司 人脸识别方法、装置、电子设备及存储介质
CN112115866A (zh) * 2020-09-18 2020-12-22 北京澎思科技有限公司 人脸识别方法、装置、电子设备及计算机可读存储介质
CN112734641A (zh) * 2020-12-31 2021-04-30 百果园技术(新加坡)有限公司 目标检测模型的训练方法、装置、计算机设备及介质
CN112767329A (zh) * 2021-01-08 2021-05-07 北京安德医智科技有限公司 图像处理方法及装置、电子设备
CN113221732A (zh) * 2021-05-10 2021-08-06 精点视界(深圳)科技有限公司 基于人脸识别的大数据精准制作智能证卡的实现方法
CN113657269A (zh) * 2021-08-13 2021-11-16 北京百度网讯科技有限公司 人脸识别模型的训练方法、装置及计算机程序产品

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200042775A1 (en) * 2019-09-10 2020-02-06 Lg Electronics Inc. Artificial intelligence server and method for de-identifying face area of unspecific person from image file
CN112052789A (zh) * 2020-09-03 2020-12-08 腾讯科技(深圳)有限公司 人脸识别方法、装置、电子设备及存储介质
CN112115866A (zh) * 2020-09-18 2020-12-22 北京澎思科技有限公司 人脸识别方法、装置、电子设备及计算机可读存储介质
CN112734641A (zh) * 2020-12-31 2021-04-30 百果园技术(新加坡)有限公司 目标检测模型的训练方法、装置、计算机设备及介质
CN112767329A (zh) * 2021-01-08 2021-05-07 北京安德医智科技有限公司 图像处理方法及装置、电子设备
CN113221732A (zh) * 2021-05-10 2021-08-06 精点视界(深圳)科技有限公司 基于人脸识别的大数据精准制作智能证卡的实现方法
CN113657269A (zh) * 2021-08-13 2021-11-16 北京百度网讯科技有限公司 人脸识别模型的训练方法、装置及计算机程序产品

Also Published As

Publication number Publication date
CN113657269A (zh) 2021-11-16

Similar Documents

Publication Publication Date Title
WO2023016007A1 (zh) 人脸识别模型的训练方法、装置及计算机程序产品
WO2022105118A1 (zh) 基于图像的健康状态识别方法、装置、设备及存储介质
WO2022105117A1 (zh) 一种图像质量评价的方法、装置、计算机设备及存储介质
US20230080230A1 (en) Method for generating federated learning model
US20220004928A1 (en) Method and apparatus for incrementally training model
US20220139096A1 (en) Character recognition method, model training method, related apparatus and electronic device
US20230036338A1 (en) Method and apparatus for generating image restoration model, medium and program product
WO2022247343A1 (zh) 识别模型训练方法、识别方法、装置、设备及存储介质
CN113450759A (zh) 语音生成方法、装置、电子设备以及存储介质
WO2022213717A1 (zh) 模型训练方法、行人再识别方法、装置和电子设备
CN113177449B (zh) 人脸识别的方法、装置、计算机设备及存储介质
WO2023005253A1 (zh) 文本识别模型框架的训练方法、装置及系统
CN114494784A (zh) 深度学习模型的训练方法、图像处理方法和对象识别方法
JP2023040100A (ja) マルチタスク識別方法及び装置、トレーニング方法及び装置、電子機器、記憶媒体、並びにコンピュータプログラム
CN114092759A (zh) 图像识别模型的训练方法、装置、电子设备及存储介质
CN113221771A (zh) 活体人脸识别方法、装置、设备、存储介质及程序产品
US20220308816A1 (en) Method and apparatus for augmenting reality, device and storage medium
CN113011309A (zh) 图像识别方法、装置、设备、介质及程序产品
CN113627361B (zh) 人脸识别模型的训练方法、装置及计算机程序产品
US11366984B1 (en) Verifying a target object based on confidence coefficients generated by trained models
CN114120413A (zh) 模型训练方法、图像合成方法、装置、设备及程序产品
KR20170057118A (ko) 오브젝트 인식 방법 및 장치, 인식 모델 학습 방법 및 장치
CN115393488B (zh) 虚拟人物表情的驱动方法、装置、电子设备和存储介质
US20230115765A1 (en) Method and apparatus of transferring image, and method and apparatus of training image transfer model
US20220327803A1 (en) Method of recognizing object, electronic device and storage medium

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE