CN116012918A

CN116012918A - Training method and device for face recognition model and storage medium

Info

Publication number: CN116012918A
Application number: CN202310025749.0A
Authority: CN
Inventors: 陈杞城; 张松水; 苏志坚
Original assignee: Xiamen Dnake Intelligent Technology Co ltd
Current assignee: Xiamen Dnake Intelligent Technology Co ltd
Priority date: 2023-01-09
Filing date: 2023-01-09
Publication date: 2023-04-25

Abstract

The invention provides a training method, a training device and a storage medium of a face recognition model, wherein the method comprises the following steps: s1, acquiring a training data set; s2, splitting the training data set into N data sets; s3, inputting each of the N data sets into a backbone network model respectively to obtain a face backbone model characteristic value of each of the N data sets; s4, respectively inputting the characteristic values of the human face trunk model of each data set into a preset classification model to obtain the classification characteristic value of each data set; s5, back propagation is carried out according to the loss function value of the classification model, and the classification characteristic value of each data set is updated; and S6, after the updating of the classification characteristic values is completed, combining the loss function values of the N data sets, and updating the characteristic value of the trunk model of each data set. By utilizing the technical scheme, the complexity of cleaning and ID merging of large-scale face data is reduced, the model training precision is effectively improved, and the training efficiency is improved.

Description

Training method and device for face recognition model and storage medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a training method and apparatus for a face recognition model, and a storage medium.

Background

In recent years, with the development of hardware resources, deep learning techniques have also been rapidly developed, which achieve good effects in many fields, particularly in the field of computer vision. Face recognition is an important research direction in the field of computer vision. Face recognition is a biological recognition technology for carrying out identity recognition based on facial feature information of people. A series of related technologies, commonly referred to as image recognition and face recognition, are used to capture images or video streams containing faces with a camera or cameras, and automatically detect and track the faces in the images, thereby performing face recognition on the detected faces.

The existing face recognition training method is mostly a single data set multi-GPU parallel training method, such as: an arcface training method in open source insightface. Fig. 1 is a schematic diagram of a face recognition model training method in the prior art. As shown in fig. 1, the face recognition model training method in the prior art includes the following steps: inputting a training data set, and extracting a characteristic value FC1 of the data set by a plurality of GPUs through a shared backbone network model; inputting the characteristic value FC1 extracted through the trunk model into a classification model containing linear+Softmax, and outputting a classification characteristic value FC2 to perform face ID conversion; and calculating a Loss function Loss, carrying out back propagation according to the calculated Loss function value, updating the trunk model characteristic value FC1 and the classification characteristic value FC2, and continuing training the face recognition model until the training stopping condition is reached.

When training is carried out on a single data set, the fact that face IDs in the data set are not repeated is required to be ensured, and the algorithm training precision is lower as the repetition rate is higher. Therefore, the data of a single data set above millions of face IDs are difficult to sort and clean, huge manpower and material resources are required to be consumed, and the training efficiency is low; if the algorithm is adopted for automatic cleaning, the accuracy requirement on the cleaning algorithm is very high, and the accuracy of the finally trained model cannot be higher than that of the cleaning algorithm.

Disclosure of Invention

The embodiment of the invention provides a training method, a training device and a storage medium of a face recognition model, which are used for improving the training efficiency of the face recognition model under a large-scale data set under the condition of ensuring certain model training precision.

In order to achieve the above object, in one aspect, a training method of a face recognition model is provided, including:

s1, acquiring a training data set for face recognition;

s2, splitting the acquired training data set into N data sets, wherein N is a natural number greater than 1;

s3, inputting each of the N data sets into a main network model in a preset face recognition model respectively to obtain a face main model characteristic value of each of the N data sets;

s4, respectively inputting the obtained characteristic values of the human face trunk model of each data set into a preset classification model to obtain the classification characteristic values of each data set;

s5, carrying out back propagation according to the loss function value of the classification model, and respectively updating the classification characteristic value of each data set;

and S6, after the classification characteristic value of each data set is updated, combining the loss function values of the N data sets, carrying out back propagation according to the combined loss function values, and updating the trunk model characteristic value of each data set.

Preferably, in the training method, in step S2, the training data set is divided into N data sets according to the data amount.

Preferably, the training method, wherein the classification model includes: linear layer and Softmax layer.

Preferably, the training method, wherein combining the loss function values of the N data sets in step S6 includes:

the loss function values for the N data sets are averaged to obtain a combined loss function value.

Preferably, the training method, wherein the learning rate of the trunk model is set to be greater than the learning rate of the classification model, and wherein a multiple value of the learning rate of the trunk model compared to the learning rate of the classification model is related to the N value.

Preferably, in the training method, in step S2, the training data set is split according to different tasks to be completed, and the different tasks use different data sets.

In another aspect, there is provided a face recognition device comprising a memory and a processor, the memory storing at least one program, the at least one program being executable by the processor to implement a method as described in any one of the above.

In yet another aspect, a face recognition device is provided that includes a memory storing at least one program that is executed by the processor to implement a face recognition model trained using a training method as described in any of the above.

In yet another aspect, a computer readable storage medium having at least one program stored therein is provided, the at least one program being executed by a processor to implement a training method as described in any of the above.

In yet another aspect, a computer readable storage medium having at least one program stored therein for execution by a processor to implement a face recognition model trained using a training method as described in any of the above.

The technical scheme has the following technical effects:

according to the technical scheme, a single face ID training data set is split into a plurality of mutually independent data sets, and Loss calculation and back propagation are decomposed according to the plurality of independent data sets, so that Loss back propagation of each data set is not interfered with each other, the purposes that FC2 and Loss calculation is limited in each data set are achieved, noise of the split and smaller independent data set is only required to be ensured in the training process, the problem of noise of massive global data is not required to be considered, the complexity of cleaning and ID merging of large-scale face data is reduced, and model training accuracy is effectively improved;

further, compared with the existing scheme, the technical scheme of the embodiment of the invention has the following advantages:

1. face data may be divided into small data sets, such as: the method has the advantages that 500 ten thousand unlabeled face IDs exist, the face IDs can be divided into 5 100 ten thousand face ID data sets, only 100 ten thousand face IDs in each data set are required to be labeled, and compared with the whole 500 ten thousand complete labeling work difficulty is obviously reduced;

2. because the separated data sets are not interfered with each other, the face IDs which are not known to be repeated can be repeatedly placed in each data set, the collision probability among the face IDs is increased, and therefore the model precision can be improved by utilizing GPU calculation under the condition of limited data;

3. in the training process, each sub-data set FC2 can be independently and rapidly trained, and then the Loss are combined for training, so that the training efficiency is improved.

Drawings

FIG. 1 is a schematic diagram of a face recognition model training method of the prior art;

fig. 2 is a schematic diagram of a training method of a face recognition model according to an embodiment of the present invention;

fig. 3 is a flowchart of a training method of a face recognition model according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a face recognition device according to an embodiment of the present invention.

Detailed Description

For further illustration of the various embodiments, the invention is provided with the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments and together with the description, serve to explain the principles of the embodiments. With reference to these matters, one of ordinary skill in the art will understand other possible embodiments and advantages of the present invention. The components in the figures are not drawn to scale and like reference numerals are generally used to designate like components.

The invention will now be further described with reference to the drawings and detailed description.

Embodiment one:

fig. 2 is a schematic diagram of a training method of a face recognition model according to an embodiment of the present invention. In this embodiment, as shown in fig. 2, the face ID of a single training dataset is split into multiple datasets that are independent of each other, and Loss function Loss calculation and back propagation are decomposed from the split datasets such that Loss back propagation for each dataset does not interfere with each other. After the face recognition model is trained, only the main model characteristic value of the main network model, namely the FC1 characteristic value, is used when the face recognition model is finally used, and the classification characteristic value, namely the FC2 value, is only valuable in the model training stage, so that the training method of the embodiment of the invention isolates the FC2 back propagation of each dataset, and the face IDs among the datasets are not interfered with each other.

Fig. 3 is a flowchart of a training method of a face recognition model according to an embodiment of the present invention. As shown in fig. 3, the method of this embodiment includes the steps of:

s1, acquiring a training data set for face recognition;

preferably, in this step, the training data set is divided into N data sets on average according to the data amount;

preferably, the training data set is split according to different tasks to be completed, the different tasks using different data sets;

wherein, each split data set is independent;

preferably, the repeated face photos do not appear in different data sets;

s3, inputting each of the N data sets into a main network model in a preset face recognition model respectively to obtain a face main model characteristic value, namely an FC1 characteristic value, of each of the N data sets;

s4, respectively inputting the obtained characteristic values of the human face trunk model of each data set into a preset classification model to obtain the classification characteristic value of each data set, namely, the FC2 characteristic value;

preferably, the classification model comprises: linear and Softmax layers;

s6, after the classification characteristic value of each data set is updated, combining the loss function values of the N data sets, carrying out back propagation according to the combined loss function values, and updating the trunk model characteristic value of each data set; preferably, combining the loss function values of the N data sets in step S6 includes: the loss function values for the N data sets are averaged to obtain loss function values for the N data sets.

The inventor of the application finds that when the invention is realized, the technical scheme of the embodiment of the invention only learns for the data set of the inventor due to the separation of the data set, and the backbone network model backbone is globally updated, so that the phenomenon that the backbone is inadequately learned due to the updating of the FC2 characteristic value can occur; therefore, preferably, in the method of an embodiment of the present invention, the learning rate of the trunk model is set to be larger than the learning rate of the classification model, wherein a multiple value of the learning rate of the trunk model compared to the learning rate of the classification model is correlated with the N value. That is, the learning rate of the proposed backup during training is larger than the learning rate of the FC2 feature value of the classification model, and the loss function loss value is reduced by updating the backup as much as possible. For example, for a training dataset of 300 ten thousand ID, after dividing it into three separate 100 ten thousand data sets, the learning rate of the backup may be set to 2-3 times the FC2 learning rate.

In summary, the training method of the face recognition model in the embodiment of the invention limits the calculation of the classification characteristic value FC2 and the Loss function Loss value output by the classification model to each group, so that the noise of the group task data set is only ensured, and the problem of the noise of global data is not considered; therefore, aiming at training of FACE training data sets with FACE IDs of more than millions, the technical scheme of the embodiment of the invention has obvious advantages in data requirements, the data sets of different channels can be divided into independent tasks in the training process, so that the occurrence of FACE ID global noise pollution is avoided, the complexity of cleaning and ID merging of large-scale FACE data is reduced, and the model training precision is effectively improved.

Embodiment two:

the present invention also provides a face recognition device, as shown in fig. 4, where the device includes a processor 401, a memory 402, a bus 403, and a computer program stored in the memory 402 and capable of running on the processor 401, where the processor 401 includes one or more processing cores, the memory 402 is connected to the processor 401 through the bus 403, and the memory 402 is used to store program instructions, where the processor implements steps in the training method embodiment of the first embodiment of the present invention when executing the computer program or implements a face recognition model trained by using the training method embodiment of the first embodiment of the present invention when executing the computer program.

Further, as an executable scheme, the face recognition device may be a computer unit, and the computer unit may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, and the like. The computer unit may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the constituent structures of the computer unit described above are merely examples of the computer unit and are not limiting, and may include more or fewer components than those described above, or may combine certain components, or different components. For example, the computer unit may further include an input/output device, a network access device, a bus, etc., which is not limited by the embodiment of the present invention.

Further, as an implementation, the processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is a control center of the computer unit, connecting various parts of the entire computer unit using various interfaces and lines.

The memory may be used to store the computer program and/or modules, and the processor may implement the various functions of the computer unit by running or executing the computer program and/or modules stored in the memory, and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

Embodiment III:

the present invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements the steps of the above-described method of an embodiment of the present invention.

The invention also provides a computer readable storage medium, wherein at least one section of program is stored in the storage medium, and the computer program realizes the face recognition model trained by the method steps of the embodiment of the invention when being executed by a processor.

The modules/units integrated with the computer unit may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the legislation and the patent practice in the jurisdiction.

While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for training a face recognition model, comprising:

s1, acquiring a training data set for face recognition;

s3, inputting each of the N data sets into a trunk network model in a preset face recognition model respectively to obtain a face trunk model characteristic value of each of the N data sets;

s5, back propagation is carried out according to the loss function value of the classification model, and classification characteristic values of each data set are updated respectively;

2. Training method according to claim 1, characterized in that in step S2 the training data set is split into the N data sets on average according to the data amount.

3. The training method of claim 1, wherein the classification model comprises: linear layer and Softmax layer.

4. Training method according to claim 1, characterized in that combining the loss function values of the N data sets in step S6 comprises:

the combined loss function value is obtained by averaging the loss function values of the N data sets.

5. Training method according to claim 1, characterized in that the learning rate of the trunk model is set to be larger than the learning rate of the classification model, wherein a multiple value of the learning rate of the trunk model compared to the learning rate of the classification model is related to the N value.

6. Training method according to claim 1, characterized in that in step S2 the training data set is split according to different tasks to be done, different tasks using different data sets.

7. A face recognition device comprising a memory and a processor, the memory storing at least one program, the at least one program being executable by the processor to implement the training method of any one of claims 1 to 6.

8. A face recognition device comprising a memory and a processor, the memory storing at least one program, the at least one program being executable by the processor to implement a face recognition model trained using the training method of any one of claims 1 to 6.

9. A computer readable storage medium, characterized in that at least one program is stored in the storage medium, which is executed by a processor to implement the training method according to any of claims 1 to 6.

10. A computer readable storage medium having stored therein at least one program that is executed by a processor to implement a face recognition model trained using the training method of any of claims 1 to 6.