CN112307234A

CN112307234A - Face bottom library synthesis method, system, device and storage medium

Info

Publication number: CN112307234A
Application number: CN202011210340.9A
Authority: CN
Inventors: 杨培德; 谢伟; 黄保成; 文建国; 林开雄
Original assignee: Xiamen Zhaohui Network Technology Co ltd
Current assignee: Xiamen Zhaohui Network Technology Co ltd
Priority date: 2020-11-03
Filing date: 2020-11-03
Publication date: 2021-02-02

Abstract

The invention relates to a method, a system, equipment and a storage medium for synthesizing a human face base library, wherein the method comprises the following steps: step S1: collecting face data of community users; step S2: judging whether the face bottom library updating condition is met or not for each user; step S3: synthesizing the face data into face data through a convolutional neural network model; step S4: and updating the face bottom library. According to the method, the pictures of the same person in different equipment periods are collected, the face is synthesized through the deep learning convolutional neural network model to serve as the comparison base, the strategy of the face base can be optimized, updated and selected, and the accuracy of the face algorithm in different equipment in a complex scene is greatly improved under the condition that the original performance speed and complexity are not reduced.

Description

Face bottom library synthesis method, system, device and storage medium

[ technical field ] A method for producing a semiconductor device

The invention belongs to the technical field of computers, and particularly relates to a human face base synthesis method, a human face base synthesis system, human face base synthesis equipment and a storage medium.

[ background of the invention ]

The community is a micro society as a key and organic unit of social and economic development, and the intelligent security community combines the technologies of artificial intelligence, big data, Internet of things and the like to create a multi-dimensional perception and multi-dimensional linkage community intelligent security system, becomes a new mode for exploring social governance in various places, and is also in line with the development trend of modern science and technology. The face recognition technology is a well-established technology of artificial intelligence, and the face recognition technology occupies a crucial position in various intelligent community solutions. In people-oriented society, everywhere there is a person, the living aspects of people must be considered. The intelligent community is constructed aiming at providing a more safe, comfortable and convenient modern living environment for community residents. The process needs to utilize new generation information technology, and the face recognition technology is one of the new generation information technology. At present, the face recognition technology plays an important role in the video monitoring of community entrances and exits (including gates, unit doors, parking factory entrances and exits, elevators), community visitor systems, community perimeters, open passages, buildings and the like.

The human face recognition is an important task of artificial intelligence in the field of intelligent perception, and has great practical value. In recent years, with the rapid development of artificial intelligence and the gradual popularization of video monitoring systems, face recognition research has great significance in the field of security protection. As one of the important problems of computer vision, face recognition is a biological recognition technology for identity authentication based on facial features of a person. At present, face recognition under a constraint scene reaches a commercial degree, but in different natural scenes of a community, for example, in a monitoring scene, a face is affected by a plurality of factors such as expressions, angles, illumination, resolution, age change and the like, so that difficulty in recognition is caused.

For face images acquired under complex multi-scenes, the recognition effect is challenging. In order to promote high-quality identification and monitoring of the face identification camera in the processes of face snapshot, comparison and the like, many face identification manufacturers have made much effort on product software, hardware and algorithm strategies. Aiming at deducing video front-end series intelligent products such as self-adaptive dynamic exposure, starlight night vision, ultra-low illumination face snapshot, ultra-strong backlight processing, face automatic tracking and the like, equipment manufacturers; in the aspect of a face recognition algorithm, a large manufacturer trains multiple models which are adaptive to blurring, low illumination and complex angles or a general large model under all scenes aiming at different scenes in a transfer learning manner; scene application engineers can perform optimization or algorithm transformation on the faces collected at the front end according to actual scenes, and some software implementation engineers can input the same one-person multi-face base library according to actual different scenes. The intelligent community face recognition technology mainly compares and retrieves faces captured by different devices and a pre-stored base face library, so that quality of base pictures has great influence on accuracy rate of face recognition, false recognition and search comparison speed.

In the existing face recognition project of the intelligent community, hardware manufacturers push out corresponding products aiming at complex scenes to solve the problem of the quality of the video images of the complex scenes to a certain extent, but the method is not the optimal scheme for purchasing new equipment to update the old cell cases. The method for processing the collected human face by the human face model or the front-end algorithm aiming at the migration training of different scenes and inputting the human face base library of different scenes of the same person can make the project more complicated in the actual community, prolong the actual processing time and reduce the actual efficiency. The invention provides a method, a device and a storage medium for updating a face base, which can optimize, update and select a face base strategy by acquiring photos of the same person in different equipment and different periods and synthesizing a face as a comparison base through a deep learning convolutional neural network model, and greatly improve the accuracy of a face algorithm in different equipment in a complex scene without reducing the original performance speed and complexity. The method for storing the bottom library images based on the ranking values can reserve a storage position for the images which are really closest to the user in the comparison process, and does not need to update the synthesized images every time, so that the bottom library can store the face images which are most relevant to time, equipment, user change and environment change, and the comparison times are reduced.

[ summary of the invention ]

In order to solve the above problems in the prior art, the present invention provides a method and a system for synthesizing a human face base, wherein the method comprises:

step S1: collecting face data of community users;

step S2: judging whether the face bottom library updating condition is met or not for each user;

step S3: synthesizing the face data into face data through a convolutional neural network model;

step S4: and updating the face bottom library.

Further: and updating the face bottom library under the condition that the expectation of the similarity value is lower than a certain threshold value or the standard deviation of the similarity value is larger than a certain threshold value in the snapshot comparison of the user for n times.

Further: the step S4 specifically includes: and additionally storing the synthesized face data in the face library, and storing the synthesized face data in association with the user identification.

Further: the method further includes step S0: and constructing a deep learning convolutional neural network model.

Further: this step S0 is performed before the system is idle or acquisition.

A face base synthesis system is characterized in that a face base is stored in a server of a community, and the face base comprises face base pictures of all users in the community and face features corresponding to the face base pictures; a face synthesis module: merging and generating a human face bottom library picture according to the input human face image; a bottom library updating module: and the method is used for updating and extracting the characteristics of the face picture in the base library in real time.

Further: the system further comprises: a comparison record statistic module: the method is used for acquiring the snapshot comparison records of all users in the cell and generating comparison similarity values.

Further: the deep learning convolutional neural network model two-channel fusion type confrontation network deep learning model comprises a two-channel generating module, a human face comprehensive module and a discriminator module.

A face chassis compositing apparatus comprising: and the processor is used for executing the human face bottom library synthesis method.

A storage medium, comprising: the storage medium is used for storing a computer readable program which is used for a processor to execute so as to realize the human face bottom library synthesis method.

The beneficial effects of the invention include: by collecting photos of the same person in different periods of different equipment and synthesizing the face as a comparison base through a deep learning convolutional neural network model, the strategy of the face base can be optimized, updated and selected, and the accuracy of the face algorithm in different equipment in a complex scene is greatly improved without reducing the original performance speed and complexity. The method for storing the images of the bottom library based on the sorting values is provided, so that the storage position of the images which are really closest to the user can be reserved in the comparison process, the updating of the synthesized images is not required each time, the face images which are most relevant to time, equipment, user change and environment change can be stored in the bottom library, and the comparison times are reduced.

[ description of the drawings ]

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, and are not to be considered limiting of the invention, in which:

fig. 1 is a schematic diagram of a method for updating a human face base library according to the present invention.

FIG. 2 is a schematic diagram of a deep learning convolutional neural network model according to the present invention.

[ detailed description ] embodiments

The present invention will now be described in detail with reference to the drawings and specific embodiments, wherein the exemplary embodiments and descriptions are provided only for the purpose of illustrating the present invention and are not to be construed as limiting the present invention.

Example 1

As shown in fig. 1, the method for updating a face base according to the present invention comprises the following steps:

step S1: collecting face data of community users; specifically, the method comprises the following steps: acquiring face images of community users at different times through different video equipment;

the face recognition can be carried out by using the existing face recognition engine, the obtained multiple images are subjected to similarity distribution judgment, and the possible external environment, the user posture and the subsequent recognition capability reduction caused by the change of the user can be found through the continuous distribution judgment of the multiple images;

step S2: judging whether the face bottom library updating condition is met or not for each user; specifically, the method comprises the following steps: and updating the face bottom library under the condition that the expectation of the similarity value is lower than a certain threshold value or the standard deviation of the similarity value is larger than a certain threshold value in the snapshot comparison of the user n times.

step S4: updating a face bottom library; specifically, the method comprises the following steps: the synthesized face data is additionally stored in the face library and is stored in association with the user identification;

preferably: the method further includes step S0: constructing a deep learning convolutional neural network model; this step may be performed before the system is idle or acquisition;

as shown in fig. 2, the deep learning convolutional neural network model two-channel fusion confrontation network deep learning model comprises a two-channel generating module, a face synthesis module and a discriminator module. In the figure, Gm represents an mth channel network generation model, Dm represents an mth channel discriminator, and the invention adopts two channels, so that m is 1 or 2. Xi is the ith input image, Y is the final composite image, f (Xi) is the image feature for extracting Xi, and Wi is the trust value corresponding to the input image Xi.

Inputting each image into different channels, generating a model through independent networks, and synthesizing the output of the same channel into a channel characteristic after weighting;

the dual-channel generation module comprises two groups of independent network generators and discriminator models, one of the two channels is dedicated to reasoning the global topological structure, and the other channel is used for reasoning the local texture and respectively obtaining two groups of characteristic graphs. As shown in FIG. 2, the first channel shares G1 to extract global features of human face, the second channel shares G2 to extract local features, and the two channels output Y by weighting and combining.

Preferably: the dual-channel generation module comprises two channels, wherein each channel comprises one or more network generation models G and a discrimination model D;

preferably: the number of the generation models G of each channel network is equal to the number of images needing to be synthesized;

the face synthesis module is used for synthesizing the face generated by the network generation module; the discriminator module is used for judging the difference between the synthesized face and all real faces, and is also used for calculating a Loss function (Loss) and combining various Loss functions to reserve the prominent features of the face.

Example 2

The face capturing method comprises the steps that a first time period is taken as a time period, and N times of face capturing I of a plurality of devices in an intra-community area are summarized at intervals of the first time period;

preferably: the first time period is 7 days;

obtaining a comparison similarity value Bi (i is 1-N), counting N times of similarity values, and obtaining similarity value expectation shown in a formula (1) and a standard deviation formula (2); wherein:

for similarity value expectation, s is the standard deviation; n is the number of the snapshots;

preferably: expect similar values

And if the standard deviation s is less than 0.65 and/or greater than 0.65, the condition for updating the face bottom library is taken.

The dual-channel network structure comprises two channels, wherein each channel comprises a network generation model G and a discrimination model D, the network generation model G and the discrimination model D can perform maximum and minimum competition aiming at the same optimization problem, so that a real picture and a generated picture can be distinguished as much as possible, meanwhile, a picture which looks real is finally generated and is output as a deception image, and the relationship between the two channels is shown in a formula (3):

wherein: x denotes the input image vector, the noise distribution used for generation is pz (z), the true data distribution is pd (X), and d (X) denotes the distribution from which X originates from the true data. G (z) represents a sample generated after noise passes through the network generation model, and D (G (z)) is the probability that the generated sample is considered to belong to a real sample by the classifier. V (D, G) is an evaluation function. For the above relationship, the larger the sum of the two expectations, the better the recognition ability. Solving an evaluation function V (D, G)

The ultimate goal being to be outside

Thereby ultimately minimizing V (D, G); by internal nesting

So that D maximizes V (D, G) given G, i.e. given the network generative model, recognition capability is maximized. E is the sum of the values of E,

is shown in the distribution of real data X

The expectation of the situation.

That is, the network generation model G and the discriminator model D are set to be trained by the following maximum formula (4) of D and maximum formula (5) of G:

generating a model G for the fixed network through a formula (4), and searching an optimal solution of an observation valuation function; wherein:

is expressed in the distribution of noise data Z

The situation is expected.

For the discriminator model D, the optimal solution of the observation value function is sought by equation (5).

The network generation models G1 and G2 of the two channels are provided with a confidence value W to predict the quality of the learned characteristics besides the extracted characteristics F (X). When n pictures are entered, they attempt to jointly generate a channel feature using a weighted sum, as shown in equation (6): the two channels G1 and G2 share a set of parameters.

Preferably: the initial value of the trust value W is a random initial value parameter, and is updated to an optimal value through network iterative training.

The face synthesis module is used for synthesizing the image data of the two channels; specifically, calculating by adopting a formula (7);

f＝f₁+f₂formula (7);

the discriminator module calculates a Loss function (Loss), combining multiple Loss functions to preserve the salient features of the face. The loss functions to be synthesized include: pixel loss, symmetry loss, countermeasure loss, identity preservation loss;

wherein: the pixel loss is the fusion of the global output and the output of the calibration network. The multi-scale output can be increased for convenient deep supervision, and although the synthetic illumination is over-balanced, the multi-scale output is still an important part for precision optimization and excellent performance;

preferably, the pixel loss is calculated using equation (8).

Wherein: w, H, C are the width, height and image channel number of the input image, X is the face image, and (X, y) is the position of the pixel in the image;

to generate a value for the pixel at position (x, y). X_pred＝G(I)，

Is the pixel at position (x, y).

The symmetric loss is calculated by adopting a formula (9); the symmetry is the inherent characteristic of the human face, and by introducing symmetric constraint to the composite image, the shielding problem is effectively relieved, and the performance under the condition of large posture is improved;

the confrontation loss is calculated by adopting a formula (10);

wherein: n is the number of iterative computations,

is the non-linearly transformed image produced by the jth iteration,

is a generative model function, θ_GFor the parameters corresponding to the generative model,

is a decision function, θ_Dθ is a distribution parameter, which is a parameter corresponding to the discriminant function.

The identity preservation loss is calculated by adopting a formula (11) and is used for calculating the difference between the original face extraction characteristics and the synthetic image extraction characteristics.

Wherein: i is^PFor the original face image, I^predGenerating a face for prediction, and F () extracting features of the face; i is the neural network layer number, passing through G (I)^pred) Predicting image data generated by the generating function module;

preferably: extracting the loss of the last two layers of the neural network, i, i is 1-2;

the final total objective function is the weighted sum of the four loss functions and is calculated by adopting a formula (12);

L_syn＝L_pixel+λ₁L_sym+λ₂L_adv+λ₃L_ipformula (12);

a two-channel network training stage: collecting 50000 people times of the community owner, taking 100 pictures of the average person as a training sample, deducting a face image through an algorithm, compressing the face image into 224 × 244 pixels as network input, and weighing a total objective function: lambda 1 is 0.5, lambda 2 is 0.5, lambda 3 is 0.8, and the trained network weight is used as a face synthesis tool.

Example 3

In order to solve the technical problem, the invention adopts another technical scheme that: the provided face base updating device comprises a face base database, wherein the face base database is stored in a server of a community, the face base database comprises face base pictures and face characteristics of all users in the community, and the face base updating device comprises: a comparison record statistic module: the system comprises a data acquisition module, a comparison module and a comparison module, wherein the data acquisition module is used for acquiring snapshot comparison records of all users in a cell and generating comparison score values; a face synthesis module: merging the input human faces to generate a human face base picture; a bottom library updating module: the system is used for updating and extracting the characteristics of the face picture of the bottom library in real time;

preferably: the bottom library updating module stores the face image and the corresponding image characteristics thereof, and stores the face image of each person and the corresponding image characteristics thereof in association with the user identification;

preferably: storing a plurality of base images for the same user in a base updating module, wherein the plurality of base images comprise synthetic images, collected images, snapshot images and the like; each image and the corresponding image characteristic are stored in an associated mode; calculating the ranking value of each image in real time, deleting the image with the lowest ranking value and storing the acquired image or the synthesized image like a shuttle when a new acquired image or a synthesized image is recorded; for example: if the number of the users is 2, two users can be saved for the same user;

after a new snapshot image is obtained, calculating each phase comparison in the snapshot image and the plurality of base images to obtain a similarity value expectation, and updating a ranking value based on the similarity value expectation, so that the smaller the similarity value expectation is, the higher the ranking value is; for example, the ranking value of the bottom library image with the smallest expected similarity value +2, the ranking value of the bottom library image with the largest expected similarity value +0, etc.; therefore, the sequencing sequence of the plurality of bottom library images corresponding to each user can reflect the face change or equipment change condition of the user along with time, but the representation is a slow representation process, and when the final bottom library image is updated because the sequencing value is the minimum, the collected image or the composite image of a row can be supplemented;

here, updating the face base is to put the latest acquired image or composite image into a plurality of base images associated with the user and set the ranking value thereof as the highest value; the highest value is a default value or the highest ranking value of the ranking values in the current plurality of base images. Generally, the images of the bottom library are arranged according to the sorting value and are put into a sorting queue for storage;

by the method, the storage position of the image which is really closest to the user can be reserved in the comparison process, and the composite image is not updated every time. Meanwhile, a storage space is reserved for the collected images, and the collected images have the particularity and all the typical characteristics of human faces, so that the necessity is reserved.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowcharts and/or block diagrams, and combinations of flows and/or blocks in the flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Those skilled in the art will appreciate that all or part of the steps in the above method embodiments may be implemented by a program to instruct associated hardware to perform the steps, and the program may be stored in a computer readable storage medium, which is referred to herein as a storage medium, such as: ROM/RAM, magnetic disk, optical disk, etc.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for synthesizing a human face base library is characterized by comprising the following steps:

step S1: collecting face data of community users;

step S4: and updating the face bottom library.

2. The method according to claim 1, wherein the updating condition is that the face base is updated when the similarity value is expected to be lower than a certain threshold value or the standard deviation of the similarity value is greater than a certain threshold value in the n-time snapshot comparison of the user.

3. The method for synthesizing a human face base library according to claim 2, wherein the step S4 specifically comprises: and additionally storing the synthesized face data in the face library, and storing the synthesized face data in association with the user identification.

4. The method for synthesizing a human face base library according to claim 3, wherein the method further comprises step S0: and constructing a deep learning convolutional neural network model.

5. The method for synthesizing a human face base library according to claim 4, wherein the step S0 is performed before the system is idle or collected.

6. A face base synthesis system based on the face base synthesis method of any one of claims 1 to 5 is characterized in that the face base is stored in a server of a community, and the face base comprises face base pictures of all users in the community and face features corresponding to the face base pictures; a face synthesis module: merging and generating a human face base picture according to the input human face image; a bottom library updating module: and the method is used for updating and extracting the characteristics of the face picture in the base library in real time.

7. The method of synthesizing a face base library as claimed in claim 6, wherein the system further comprises: a comparison record statistic module: the method is used for acquiring the snapshot comparison records of all the users in the cell and generating comparison similarity values.

8. The method for synthesizing the human face base library according to claim 6, wherein the deep learning convolutional neural network model two-channel fusion confrontation network deep learning model comprises a two-channel generating module, a human face synthesis module and a discriminator module.

9. A face chassis compositing apparatus, comprising: a processor for performing the method of face chassis synthesis of any one of claims 1 to 5.

10. A storage medium, comprising: a storage medium for storing thereon a computer readable program for execution by a processor to implement the method of face chassis synthesis according to any one of claims 1 to 5.