CN114511897A

CN114511897A - Identity recognition method, system, storage medium and server

Info

Publication number: CN114511897A
Application number: CN202111576261.4A
Authority: CN
Inventors: 刘跃; 孙胜; 王孝明; 穆涛涛
Original assignee: Terminus Technology Group Co Ltd
Current assignee: Terminus Technology Group Co Ltd
Priority date: 2021-12-21
Filing date: 2021-12-21
Publication date: 2022-05-17

Abstract

The invention discloses an identity recognition method, an identity recognition system, a storage medium and a server, which are applied to an edge server, wherein the identity recognition method comprises the following steps: receiving a face image to be inquired sent by an intelligent camera aiming at an edge server; inputting the face image into a pre-trained identity recognition model, and outputting a target feature vector corresponding to the face image; the method comprises the following steps that model parameters of a pre-trained identity recognition model are adjusted based on a face image; and determining the identity of the face image based on the target feature vector. According to the method and the device, the face images sent by the intelligent camera are used for constructing the triples so as to adjust the parameters of the model on line, so that the model can complete self-adaptive learning without manually marking samples, the robustness of the model is improved, and the problem of privacy safety when data of public scenes are collected is solved.

Description

Identity recognition method, system, storage medium and server

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an identity recognition method, an identity recognition system, a storage medium and a server.

Background

In recent years, identity recognition algorithms based on deep learning technology are diversified, and are widely applied to the fields of face recognition, pedestrian re-recognition and the like. Taking face recognition as an example, from the previous FaceNet recognition algorithm to the currently used insight face recognition algorithm and the like, training and testing are performed based on public data sets, and with the continuous improvement of the performance of the face recognition algorithm, the face recognition models trained and tested based on the public data sets often have accuracy reduction of different degrees in the actual use process. This is usually caused by the fact that the lighting conditions, the blurring degree, the lens angle, and other conditions of the actual deployment scene are greatly different from the public training set, and the robustness of face recognition is insufficient, so that the model needs to be trained again.

In the prior art, when a model is trained again, the data of a target in a deployment scene is collected statically by a worker, then the model can be optimally trained after the data are labeled one by one, and the data collection and labeling are relatively difficult and need to be collected and labeled by professional people, so that the data collection and labeling can be completed by spending large manpower and material resources, meanwhile, the manual labeling is easy to make mistakes, and the problem of privacy safety can occur in the data of collecting public scenes is also involved, so that the robustness of the model is reduced.

Disclosure of Invention

The embodiment of the application provides an identity identification method, an identity identification system, a storage medium and a server. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.

In a first aspect, an embodiment of the present application provides an identity identification method, which is applied to an edge server, and the method includes:

receiving a face image to be inquired sent by an intelligent camera aiming at an edge server;

inputting the face image into a pre-trained identity recognition model, and outputting a target feature vector corresponding to the face image; the method comprises the steps that model parameters of a pre-trained identity recognition model are adjusted based on triples, and the triples are generated according to a face image;

and determining the identity of the face image based on the target feature vector.

Optionally, determining the identity of the face image based on the target feature vector includes:

calculating cosine values between the target characteristic vectors and each characteristic vector in a pre-generated image characteristic library to generate a cosine value set;

and when the cosine value smaller than the preset threshold value exists in the cosine value set, determining the identity of the face image corresponding to the cosine value smaller than the preset threshold value as the identity of the face image.

Optionally, the generating the image feature library according to the following steps includes:

creating a database;

acquiring a face image set to be compared, inputting the face image set into a pre-trained feature extraction network one by one, and outputting a feature vector of each face image;

and mapping and associating the feature vector of each face image with the face image corresponding to the feature vector, and storing the feature vector into a database to generate an image feature library.

Optionally, the method further comprises:

initializing a pre-trained identity recognition model;

constructing a plurality of triples based on the face image;

inputting a plurality of triples into a pre-trained identity recognition model, and outputting a loss value;

and determining whether to adjust model parameters of the pre-trained identity recognition model based on the loss value.

Optionally, constructing a plurality of triples based on the face image includes:

inputting the face images into a pre-trained identity recognition model one by one, and outputting a feature vector of each face image;

determining the identity ID of each face according to the feature vector of each face;

determining an anchor sample, a positive sample and a negative sample according to the identity ID of each face; the identity IDs of the anchor sample and the positive sample are the same, the identity IDs of the anchor sample and the positive sample are different, and the identity IDs of the positive sample and the negative sample are different;

and constructing a plurality of triples according to the anchor sample, the positive sample and the negative sample.

Optionally, determining whether to adjust a model parameter of the pre-trained identity recognition model based on the loss value includes:

and when the loss value does not reach the minimum value, adjusting the model parameters of the trained initial identity recognition model, and continuously executing the step of receiving the face image to be inquired sent by the intelligent camera aiming at the edge server until the loss value reaches the minimum value.

The method further comprises the following steps:

when the loss value reaches the minimum value, the model parameters of the identity recognition model trained in advance are not adjusted.

In a second aspect, an embodiment of the present application provides an identity identification system, which is applied to an edge server, and the system includes:

the face image receiving module is used for receiving a face image to be inquired sent by the intelligent camera aiming at the edge server;

the image input module is used for inputting the face image into a pre-trained identity recognition model and outputting a target feature vector corresponding to the face image; the method comprises the steps that model parameters of a pre-trained identity recognition model are adjusted based on triples, and the triples are generated according to a face image;

and the identity identification determining module is used for determining the identity identification of the face image based on the target feature vector.

In a third aspect, embodiments of the present application provide a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.

In a fourth aspect, an embodiment of the present application provides a server, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

in the embodiment of the application, the edge server firstly receives a face image to be inquired sent by the intelligent camera aiming at the edge server, then inputs the face image into a pre-trained identity recognition model, and outputs a target feature vector corresponding to the face image; the method comprises the following steps that model parameters of a pre-trained identity recognition model are adjusted based on a face image; and finally, determining the identity of the face image based on the target feature vector. According to the method and the device, the face images sent by the intelligent camera are used for constructing the triples so as to adjust the parameters of the model on line, so that the model can complete self-adaptive learning without manually marking samples, the robustness of the model is improved, and the problem of privacy safety when data of public scenes are collected is solved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a schematic flowchart of an identity recognition method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a triplet construction provided in an embodiment of the present application;

fig. 3 is a schematic diagram of an identity recognition process provided in an embodiment of the present application;

fig. 4 is a schematic flowchart of another identity recognition method provided in the embodiment of the present application;

fig. 5 is a schematic flowchart of another identification method provided in the embodiment of the present application;

fig. 6 is a system architecture design diagram of identity recognition provided in an embodiment of the present application;

fig. 7 is a schematic resource allocation diagram provided in an embodiment of the present application;

fig. 8 is a schematic structural diagram of an identification system according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a server provided in an embodiment of the present application;

fig. 10 is a schematic diagram of a storage medium according to an embodiment of the present application.

Detailed Description

The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.

It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of systems and methods consistent with certain aspects of the invention, as detailed in the appended claims.

In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The present application provides an identity recognition method, system, storage medium and server to solve the above-mentioned problems in the related art. In the technical scheme provided by the application, because a plurality of triples are constructed by the face image sent by the intelligent camera, the parameters of the model are adjusted online, the model can be self-adaptive learning without manually marking a sample, the robustness of the model is improved, the problem of privacy safety caused by collecting data of a public scene is avoided, and the following exemplary embodiment is adopted for detailed description.

The identity recognition method provided by the embodiment of the present application will be described in detail below with reference to fig. 1 to 7. The method may be implemented in dependence on a computer program operable on a von neumann based identity recognition system. The computer program may be integrated into the application or may run as a separate tool-like application.

Referring to fig. 1, a schematic flow chart of an identity recognition method applied to an edge server is provided in the embodiment of the present application. As shown in fig. 1, the method of the embodiment of the present application may include the following steps:

s101, receiving a face image to be inquired sent by an intelligent camera aiming at an edge server;

the intelligent camera is a monitoring device capable of performing target detection, target alignment and target tracking in real time.

Generally, an intelligent camera firstly acquires a video sequence of a current monitoring scene in real time, then performs face detection and face alignment processing on the video sequence of the current monitoring scene to obtain a processed target face image, then tracks the processed face image through a tracking algorithm to generate a face image set of the target face image, and finally sequentially sends the face images in the face image set to an edge server.

In a possible implementation manner, the edge server receives the face image sent by the intelligent camera to the uniform server in real time.

S102, inputting the face image into a pre-trained identity recognition model, and outputting a target feature vector corresponding to the face image;

the model parameters of the pre-trained identity recognition model are adjusted based on the triples, and the triples are generated according to the face images.

In the embodiment of the application, when the pre-trained identity recognition model is adjusted in parameters, the pre-trained identity recognition model is initialized, then a plurality of triples are constructed based on the face image, then the triples are input into the pre-trained identity recognition model, a loss value is output, and finally whether the model parameters of the pre-trained identity recognition model are adjusted is determined based on the loss value.

Specifically, the identity recognition model trained in advance is trained in the cloud server, the identity recognition model trained in advance is sent to the edge server after the cloud server is trained, and the edge server receives the identity recognition model trained in advance sent by the cloud server for deployment.

In the embodiment of the application, when the cloud server trains the model, the identity recognition model is firstly constructed, then the MegaFace face data set is obtained, the data set comprises one million pictures and represents 690000 unique people, finally the MegaFace face data set is input into the identity recognition model for training, the identity recognition model trained in advance is obtained after training is finished, and then the cloud server sends the identity recognition model trained in advance to the edge server.

Further, when a plurality of triples are constructed based on the face images, firstly, the face images are input into a pre-trained identity recognition model one by one, the feature vector of each face image is output, then the identity ID of each face is determined according to the feature vector of each face, and then an anchor point sample, a positive sample and a negative sample are determined according to the identity ID of each face; the identity IDs of the anchor sample and the positive sample are the same, the identity IDs of the anchor sample and the positive sample are different, the identity IDs of the positive sample and the negative sample are different, and finally a plurality of triples are constructed according to the anchor sample, the positive sample and the negative sample.

Specifically, the Triplet refers to a group of three pictures containing the target object, which are respectively an anchor point sample, a positive sample and a negative sample, the anchor point sample is any target object picture, the positive sample and the anchor point sample belong to the same ID, the negative sample and the anchor point sample belong to different IDs, and the Triplet-Loss (that is, the Loss function of the model is a Triplet-pair Loss function) is used for shortening the distance of the output feature vector between the anchor point sample and the positive sample during training, so that the feature vector distance between the anchor point sample and the negative sample is increased.

As shown in fig. 2, the present application uses temporal and spatial continuity of video, and considers that target sequences obtained by tracking between different frames belong to the same ID, and targets at different positions in the same frame belong to different IDs. From these two a priori experiences, a number of ternary pairs can be generated from a video sequence containing the target object. In order to improve efficiency, after selecting an anchor sample, we find the object with the feature vector farthest from it in the same sequence as a positive sample, and find the object with the feature vector closest to it in the same frame as a negative sample, and form a triplet through this way.

Further, when determining whether to adjust model parameters of the identity recognition model trained in advance based on the loss value, when the loss value does not reach the minimum value, adjusting the model parameters of the initial identity recognition model after training, and continuing to execute the step of receiving the face image to be queried sent by the intelligent camera aiming at the edge server until the loss value reaches the minimum value.

Or when the loss value reaches the minimum value, the model parameters of the pre-trained identity recognition model are not adjusted, and the default is that the precision of the pre-trained identity recognition model reaches the highest standard, so that the parameters of the model are not adjusted.

In a possible implementation manner, after a face image sent by an intelligent camera is received, the face image is input into a pre-trained identity recognition model, and a target feature vector corresponding to the face image is output.

For example, as shown in fig. 3, fig. 3 is a schematic diagram of an identity recognition process provided in the present application, first, a is that an existing MegaFace face data set is input into a recognition network at a cloud server for training to obtain a pre-trained identity recognition model, the cloud server sends the identity recognition model to an edge server for deployment, a second-party intelligent camera acquires a video sequence and processes the video sequence to obtain a face image sequence, the intelligent camera sends the face image sequence to the edge server, a third-party intelligent camera inputs the face image sequence into the deployed identity recognition model for recognition by the edge server and extracts a feature vector, a third-party intelligent camera generates a triplet for fine tuning training according to a recognition result and the feature vector, and a fourth-party intelligent camera performs model fine tuning training by using the triplet.

And S103, determining the identity of the face image based on the target feature vector.

In a possible implementation manner, when the identity of the face image is determined based on the target feature vector, a cosine value between the target feature vector and each feature vector in a pre-generated image feature library is first calculated, a cosine value set is generated, and when a cosine value smaller than a preset threshold exists in the cosine value set, the identity of the face image corresponding to the cosine value smaller than the preset threshold is determined as the identity of the face image.

Further, when an image feature library is generated, firstly, a database is created, then, a set of face images to be compared is obtained and input into a pre-trained feature extraction network one by one, a feature vector of each face image is output, and finally, the feature vector of each face image is mapped and associated with the face image corresponding to the feature vector and stored in the database to generate the image feature library.

Referring to fig. 4, a schematic flow chart of another identity recognition method applied to an intelligent camera is provided in the embodiment of the present application. As shown in fig. 4, the method of the embodiment of the present application may include the following steps:

s201, acquiring images in a current monitoring scene in real time to obtain video image frames;

s202, detecting a human face from a video image frame, aligning the human face and obtaining a processed human face image;

and S203, tracking the processed face image through a tracking algorithm to generate a face image sequence of the target face image.

Referring to fig. 5, a schematic flow chart of another identity recognition method applied to a cloud server is provided in the embodiment of the present application. As shown in fig. 5, the method of the embodiment of the present application may include the following steps:

s301, constructing an identity recognition model;

s302, acquiring a MegaFace face data set;

s303, inputting the MegaFace face data set into an identity recognition model for training, and obtaining a pre-trained identity recognition model after training is finished;

s304, sending the identity recognition model trained in advance to the edge server.

For example, as shown in fig. 6, fig. 6 is a design diagram of a system architecture for identity recognition provided in the present application, which designs a three-tier edge computing system. As shown in fig. 6, the system is divided into three parts, the first part is responsible for model pre-training using a large-scale public data set, and the pre-trained model is sent to each local edge server through a core network and deployed on a cloud server as an initialized parameter. The second part is responsible for identity recognition and online model fine tuning and is deployed on edge servers, the edge servers are distributed, and each edge server is responsible for identity recognition and online fine tuning of models of one or more scenes. The third part is positioned at the bottom layer of the schematic diagram, carries out real-time target detection and alignment and target tracking tasks based on the intelligent camera, and transmits the acquired data of the target to an edge server connected with the third part.

For example, as shown in fig. 7, in the edge server, the present application further implements a relevant scheduling algorithm to schedule resource occupation of online identification and online fine tuning of the model, so as to improve the effective round number of fine tuning training while ensuring lower forward delay. Given that the hardware resources within the GPU server in the edge server are limited, the fine-tuning of the model can consume a lot of computing resources. The application needs to firstly ensure lower identification delay and then seek higher fine tuning efficiency. In order to achieve the purpose, a context-aware scheduling strategy is designed for resource allocation, and as shown in fig. 7, the fine-tuning training and the identity recognition forward operation belong to two different processes and occupy the same GPU hardware resource. Because the number of targets in the monitoring scene generally changes along with the change of time, when the target sent to the edge server by the intelligent camera is larger than a preset threshold value, the edge server reduces the batch data of the fine tuning training, namely, the small batch data is adopted for processing, resources are allocated to the identity recognition calculation as much as possible, and when the target is smaller than the preset threshold value, the batch data of the fine tuning training can be increased, and the resources are allocated to the fine tuning training as much as possible. By dynamic scheduling, fine tuning training efficiency is increased while low recognition delay is guaranteed.

The following are embodiments of systems of the present invention that may be used to perform embodiments of methods of the present invention. For details which are not disclosed in the embodiments of the system of the present invention, reference is made to the embodiments of the method of the present invention.

Referring to fig. 8, a schematic structural diagram of an identification system according to an exemplary embodiment of the present invention is shown. The identification system may be implemented as all or part of the terminal in software, hardware or a combination of both. The system 1 comprises a face image receiving module 10, an image input module 20 and an identity identification determining module 30.

The face image receiving module 10 is used for receiving a face image to be inquired sent by the intelligent camera aiming at the edge server;

the image input module 20 is configured to input a face image into a pre-trained identity recognition model, and output a target feature vector corresponding to the face image; the method comprises the steps that model parameters of a pre-trained identity recognition model are adjusted based on triples, and the triples are generated according to a face image;

and the identity identification determining module 30 is used for determining the identity identification of the face image based on the target feature vector.

It should be noted that, when the identity recognition system provided in the foregoing embodiment executes the identity recognition method, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the identity recognition system and the identity recognition method provided by the above embodiments belong to the same concept, and the detailed implementation process thereof is referred to as the method embodiment, which is not described herein again.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The present invention also provides a computer readable medium, on which program instructions are stored, which program instructions, when executed by a processor, implement the identity recognition method provided by the above-mentioned method embodiments.

The present invention also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of identification of the various method embodiments described above.

Please refer to fig. 9, which provides a schematic structural diagram of a server according to an embodiment of the present application. As shown in fig. 9, the server includes a processor, a medium, a memory, and a network interface connected through a system bus. The server medium stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can be executed by a processor to enable the processor to realize an identity recognition method. The processor of the server is used to provide computing and control capabilities to support the operation of the entire device. The server may have stored in its memory computer readable instructions which, when executed by the processor, cause the processor to perform a method of identification. The network interface of the server is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the configuration shown in fig. 9 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation on the devices to which the present application applies, and that a particular device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components. Wherein the medium is a readable storage medium. The processor in the server, when executing the computer program, implements the steps of:

In one embodiment, when determining the identity of the face image based on the target feature vector, the processor specifically performs the following operations:

In one embodiment, the processor performs the following operations in generating the image feature library:

creating a database;

In one embodiment, the processor further performs the following:

initializing a pre-trained identity recognition model;

constructing a plurality of triples based on the face image;

In one embodiment, the processor, when constructing the plurality of triples based on the face image, specifically performs the following operations:

In one embodiment, the processor, when determining whether to adjust the model parameters of the pre-trained identity recognition model based on the loss value, specifically performs the following operations:

In one embodiment, the processor further performs the following:

Referring to fig. 10, the computer-readable storage medium is an optical disc 30, on which a computer program (i.e., a program product) is stored, and when the computer program is executed by a processor, the computer program executes the method for identifying an identity provided in any of the foregoing embodiments.

It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.

The computer-readable storage medium provided by the above-mentioned embodiment of the present application and the identity recognition method provided by the embodiment of the present application have the same beneficial effects as the method adopted, operated or implemented by the application program stored in the computer-readable storage medium.

The above description is only for the preferred embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program to instruct associated hardware, and the program for controlling access to services can be stored in a computer-readable storage medium, and when executed, the program can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory or a random access memory.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. An identity recognition method applied to an edge server, the method comprising:

receiving a face image to be inquired sent by an intelligent camera aiming at the edge server;

inputting the face image into a pre-trained identity recognition model, and outputting a target feature vector corresponding to the face image; the model parameters of the pre-trained identity recognition model are adjusted based on triples, and the triples are generated according to the face image;

2. The method of claim 1, wherein the determining the identity of the face image based on the target feature vector comprises:

3. The method of claim 2, wherein generating the image feature library comprises:

creating a database;

and mapping and associating the feature vector of each face image with the face image corresponding to the feature vector, and storing the feature vector into the database to generate an image feature library.

4. The method of claim 1, further comprising:

initializing a pre-trained identity recognition model;

constructing a plurality of triples based on the face image;

inputting the triples into a pre-trained identity recognition model, and outputting a loss value;

determining whether to adjust model parameters of the pre-trained identity recognition model based on the loss value.

5. The method of claim 4, wherein constructing the plurality of triples based on the face image comprises:

determining an anchor sample, a positive sample and a negative sample according to the identity ID of each face; wherein the anchor sample and the positive sample have the same identity ID, the anchor sample and the positive sample have different identity IDs, and the positive sample and the negative sample have different identity IDs;

and constructing a plurality of triples according to the anchor point sample, the positive sample and the negative sample.

6. The method of claim 4, wherein the determining whether to adjust model parameters of the pre-trained identity recognition model based on the loss value comprises:

7. The method of claim 6, further comprising:

and when the loss value reaches the minimum value, the model parameters of the pre-trained identity recognition model are not adjusted.

8. An identity recognition system, applied to an edge server, the system comprising:

the image input module is used for inputting the face image into a pre-trained identity recognition model and outputting a target feature vector corresponding to the face image; the model parameters of the pre-trained identity recognition model are adjusted based on triples, and the triples are generated according to the face image;

9. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method steps according to any of claims 1-7.

10. A server, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-7.