CN113591681A

CN113591681A - Face detection and protection method and device, electronic equipment and storage medium

Info

Publication number: CN113591681A
Application number: CN202110860024.4A
Authority: CN
Inventors: 马兆远; 朱善玮; 殷小雷; 韩德伟; 李康; 董利健; 徐健; 毕东柱; 刘军河; 李锦文
Original assignee: Beijing Langdaheshun Technology Co ltd
Current assignee: Sima Motor Tuo (Shenzhen) Intelligent Systems Co.,Ltd.
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2021-11-02

Abstract

The present invention provides a face detection and protection method, device, electronic device and storage medium. The method includes: obtaining a decomposed picture image of a video to be detected; inputting the picture image into a pre-trained face detection model , obtain the position information of the face region on the screen image, and obtain the face image with the target face based on the position information of the face region on the screen image; input the face image into the pre-trained Face protection model to obtain occluded face occlusion images. This method can effectively realize the dual functions of detection and privacy protection, and can detect not only static objects in the face area, but also moving objects in video images, realizing the consideration of both detection objects. The face detection accuracy is high, and the speed of face detection and occlusion is greatly improved.

Description

Face detection and protection method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of video data processing, in particular to a face detection and protection method, a face detection and protection device, electronic equipment and a storage medium.

Background

In recent years, the application of video monitoring technology is becoming more popular and widespread, and many scenes relate to monitoring pictures of people and the like, such as monitoring inside enterprises, monitoring property of parks, monitoring public places of hospitals and markets and the like. However, in some scenarios, in consideration of protecting personal privacy, it is necessary to block or mask the facial features of the protected person in the monitoring picture, and this includes a process of detecting a face in the monitoring picture and a process of performing a blocking process on the detected face.

Conventionally, face detection methods such as retinaFace and DSFD are adopted, which detect complete RGB images in a monitored video, and although a moving target and a static target can be detected, the monitoring video needs to be completely decoded, thereby causing very large consumption of overall computing resources.

And a face target detection method based on a vector field and a face target detection method based on a DCT residual coefficient. Both of these detection methods determine a moving object based on the result of calculating the similarity or residual of a macro block, and both of them can detect only a moving object, rather than only a stationary object in a face target region. Therefore, the two methods also detect redundant areas, which also causes problems of large calculation amount and labor waste. In addition, although the detection speed of the human face target detection method based on the vector field is high, the human face target detection method is easily influenced by the performance of the MV classification modeling model, and if the model modeling of the split modeling is not good, the detection effect of the method is very poor. The face target detection method based on the DCT residual coefficient has higher calculation amount and is easily influenced by coding noise although the accuracy is improved.

Disclosure of Invention

The invention provides a face detection and protection method, a face detection and protection device, electronic equipment and a storage medium, which are used for overcoming the defects of large calculation amount, poor modeling precision and poor detection effect of a face detection method in the prior art and realizing efficient detection and privacy protection of a face in a video picture.

The invention provides a face detection and protection method, which comprises the following steps:

acquiring a picture image after the decomposition of a video to be detected;

inputting the picture image into a pre-trained face detection model to obtain the position information of a face area on the picture image, and obtaining a face image with a target face based on the position information of the face area on the picture image;

and inputting the face image into a face protection model trained in advance to obtain a face shielding image subjected to shielding processing.

The face detection and protection method provided by the invention further comprises the following steps:

and inputting the face shielding image to the face protection model again to obtain a face reduction image subjected to reduction processing.

According to the face detection and protection method provided by the invention, the pre-training process of the face detection model comprises the following steps:

carrying out face region labeling on each frame of picture image after the video to be detected is decomposed;

dividing each frame of image subjected to labeling into a plurality of image groups, wherein each group comprises a frame of full-amount image and a plurality of frame difference amount images;

for each picture image group, decoding the full-scale image to obtain an RGB image, taking the RGB image as a training data set, and training by combining a first neural network to obtain a first detection output image;

for each picture image group, fusing a current frame residual image and a current frame motion vector image of each differential image and a corresponding accumulated residual image and accumulated motion vector image, then superposing the fused images and the full-scale images in the same picture image group to obtain a synthetic image, taking the synthetic image as a training data set, and training by combining a second neural network to obtain a second detection output image;

and constructing the face detection model based on the first detection output image and the second detection output image which are obtained under each picture image group.

According to the face detection and protection method provided by the invention, the face region marking refers to marking the position of a face region on a picture image, and marking values at least comprise the abscissa and ordinate of a reference point of the face region and the width and height of the face region.

According to the face detection and protection method provided by the invention, the first neural network and the second neural network adopt any one of a convolutional neural network, a cyclic neural network and a combination thereof.

According to the face detection and protection method provided by the invention, the pre-training process of the face protection model comprises the following steps:

inputting the face image into a variational self-encoder for encoding processing to obtain a face shielding image;

the face occlusion image is returned to a variational self-encoder for decoding processing to obtain a face restoration image;

and constructing the face protection model based on the face shielding image and the face restoration image.

The invention provides a face detection and protection device, comprising:

the acquisition module is used for acquiring a picture image after the decomposition of the video to be detected;

the detection module is used for inputting the picture image into a pre-trained face detection model, acquiring the position information of a face area on the picture image, and acquiring a face image with a target face based on the position information of the face area on the picture image;

and the shielding module is used for inputting the face image into a face protection model trained in advance to obtain a face shielding image subjected to shielding processing.

The face detection and protection device provided by the invention further comprises:

and the restoring module is used for inputting the face shielding image to the face protection model again to obtain a face restoring image subjected to restoring processing.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein when the processor executes the computer program, all or part of the steps of the human face detection and protection method are realized.

A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, carries out all or part of the steps of a face detection and protection method according to any one of the above.

The invention provides a face detection and protection method, a device, electronic equipment and a storage medium, wherein the method utilizes a neural network face detection model and a face protection model to realize the detection of a target face in a video compression domain and the shielding treatment of a face area of the detected target face in an image, can effectively realize the double functions of detection and privacy protection, can detect a static target in the face area and a moving target in the video image, and realizes the consideration of two detection targets.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a face detection and protection method provided by the present invention;

FIG. 2 is a second schematic flow chart of the face detection and protection method provided by the present invention;

FIG. 3 is a schematic diagram of a pre-training process of a face detection model in the face detection and protection method provided by the present invention;

FIG. 4 is a schematic diagram of a pre-training process of a face protection model in the face detection and protection method provided by the present invention;

FIG. 5 is a schematic structural diagram of a face detection and protection apparatus provided in the present invention;

fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.

Reference numerals:

510: an acquisition module; 520: a detection module; 530: a shielding module; 610: a processor; 620: a communication interface; 630: a memory; 640: a communication bus.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be described in detail below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The function of protecting the privacy of the face in the video is mainly realized by two parts: face detection and face region occlusion in various embodiments of the present invention, a face detection part function and a face region occlusion function are implemented by a neural network model.

The following describes in detail a face detection and protection method, apparatus, electronic device and storage medium provided by the present invention with reference to fig. 1 to 6.

The invention provides a face detection and protection method, fig. 1 is one of the flow diagrams of the face detection and protection method provided by the invention, as shown in fig. 1, the method comprises the following steps:

300. acquiring a picture image after the decomposition of a video to be detected;

400. inputting the picture image into a pre-trained face detection model to obtain the position information of a face area on the picture image, and obtaining a face image with a target face based on the position information of the face area on the picture image;

500. and inputting the face image into a face protection model trained in advance to obtain a face shielding image subjected to shielding processing.

For various monitoring videos, a video file needs to be split to be decomposed into multiple frames of picture images, and then the frames of picture images are correspondingly processed.

And acquiring each frame of picture image after decoding and decomposing the current video to be detected, and performing detection and identification based on a face detection model on each frame of picture image so as to identify each face image with the target face from the plurality of frame of picture images, wherein at the moment, one or more face images are possible. When the image is input into the face detection model, the position information of the face area on the image is firstly obtained, and then the face image with the target face can be obtained based on the position information of the face area on the image. The position information of the face region on the picture image may be determined based on the relative coordinate position information of the face region on the picture image in a manner of manual labeling and the like, which may be operated according to actual conditions, and is not specifically limited herein.

And respectively and sequentially inputting one or more face images into a pre-trained face protection model to perform shielding processing on a face area in the image of the target face needing privacy protection, so as to obtain a corresponding face shielding image, for example, a face shielding image added with mosaic.

The human face detection model and the human face protection model adopt a neural network learning model.

The face detection and protection method provided by the invention utilizes the neural network face detection model and the face protection model to realize the detection of the target face in the video compression domain and the shielding processing of the face area of the detected target face in the image, can effectively realize the double functions of detection and privacy protection, can detect the static target of the face area and the moving target in the video image, and realizes the consideration of the two detection targets.

According to the face detection and protection method provided by the present invention, fig. 2 is a second schematic flow chart of the face detection and protection method provided by the present invention, as shown in fig. 2, the method further includes, on the basis of the embodiment shown in fig. 1:

600. and inputting the face shielding image to the face protection model again to obtain a face reduction image subjected to reduction processing.

In some application scenarios, it is also necessary to restore the occlusion image that has undergone the face region occlusion processing to restore the face image. At this time, the face occlusion image may be input into the face protection model again, and a restored face restoration image is obtained by using a restoration processing process in which the face protection model and the occlusion processing are reversed.

According to the face detection and protection method provided by the present invention, fig. 3 is a schematic diagram of a pre-training process of a face detection model in the face detection and protection method provided by the present invention, as shown in fig. 3, the method further includes a training step of the face detection model, and the pre-training process of the face detection model in step 100 includes:

110. carrying out face region labeling on each frame of picture image after the video to be detected is decomposed;

120. dividing each frame of image subjected to labeling into a plurality of image groups, wherein each group comprises a frame of full-amount image and a plurality of frame difference amount images;

130. for each picture image group, decoding the full-scale image to obtain an RGB image, taking the RGB image as a training data set, and training by combining a first neural network to obtain a first detection output image;

140. for each picture image group, fusing a current frame residual image and a current frame motion vector image of each differential image and a corresponding accumulated residual image and accumulated motion vector image, then superposing the fused images and the full-scale images in the same picture image group to obtain a synthetic image, taking the synthetic image as a training data set, and training by combining a second neural network to obtain a second detection output image;

150. and constructing the face detection model based on the first detection output image and the second detection output image which are obtained under each picture image group.

Specifically, after decoding a large number of collected monitored videos, each frame of image is obtained, and a processing process is described by taking a current video to be detected as an example. After the video to be detected is decomposed, a plurality of frame images, such as a plurality of I frame images and a plurality of P frame images, are obtained. And respectively labeling the face area of each frame of picture image. The labeling method is to use a rectangular target box (bbox for short) to identify the occupied position range of the face area in the picture image. The face region labeling is to label the position of the face region on the picture image, and the labeled values at least comprise the abscissa and ordinate of the reference point of the face region and the width and height of the face region. That is, the target frame of a face region is composed of four labeled values, which are respectively the abscissa and ordinate (x, y) of the position of the upper left corner point or the center coordinate point of the target in the image, and the abscissa and ordinate (cx, cy), as well as the width w of the face region and the height h of the face region.

Description of the drawings: and (4) performing labeling operation to form a data set, wherein only the face region in each frame of picture image is labeled, and any other region or other target in the image is not labeled, so that the face region labeling data set in the video compressed domain is obtained and is used as training data of the face detection model.

And dividing each frame of image subjected to labeling into a plurality of image groups, wherein each group comprises a frame of full-amount image and a plurality of frame difference amount images. Because each frame image after labeling still comprises a plurality of I frame image and a plurality of P frame image. Moreover, the I-frame picture image refers to a full-scale image that can be decoded, and generally requires a larger neural network model for training and reasoning; the P frame picture image is a motion vector and a prediction residual relative to the previous frame, or a differential image, which only contains less main information, so that a smaller neural network model can be selected for training and reasoning, and the number of frames of the P frame picture image in all the picture images after each video decoding is far greater than that of the I frame picture image. If a video is understood to be composed of several groups of pictures (groups), in other words, all the labeled frame pictures in the video are divided into several groups of pictures (groups), and each group is set to include a full frame picture and several delta frame pictures, each group of pictures includes a frame of I-frame picture and several frames of P-frame pictures, i.e., IPPP or IPPPP is formed.

And respectively taking the data of each group of picture images as a group of training data sets to carry out model training. And dividing the face detection model into two networks according to different picture images for respective training to obtain corresponding face detection networks respectively, and combining the two networks to form the face detection model.

Decoding a full image (I frame image) in the image group to obtain an RGB image, taking the RGB image as training data, training by combining a first neural network, obtaining a first detection output image output by the first neural network, then calculating a loss value together with corresponding label data, then carrying out back propagation according to the obtained loss value, updating parameters of the first neural network according to the loss value, and finishing the training.

For the delta image (each P frame image) in the same picture image group, decoding is not needed, the residual image of the current frame obtained from the video compression domain, the motion vector image of the current frame, the accumulated residual image corresponding to the motion vector image and the accumulated motion vector image are fused, then the fused delta image and the accumulated motion vector image are superposed with the full-scale image (the I frame image) in the same picture image group to obtain a synthetic image, the synthetic image is used as training data and is trained by combining with a second neural network to obtain a second detection output image output by the second neural network, then the second detection output image and corresponding label data are subjected to loss value calculation, and then back propagation is performed according to the obtained loss value, and the parameters of the second neural network are updated according to the loss value. And the above training process is performed for each P frame picture image.

Description of the drawings: when a P frame picture image is trained, if the P frame picture image is the first P frame picture image in the picture image group, the motion vector image, the residual image and the I frame picture image in the picture image group are directly superposed, and the superposed composite image is sent to a second neural network for training. If the P frame picture image is not the first P frame picture image of the picture image group, accumulating the motion vector image and the residual image obtained by the P frame picture image with the motion vector image and the residual image of each previous P frame picture image respectively, then superposing the accumulated motion vector image and residual image with the I frame picture image in the picture image group, sending the superposed synthetic image into a second neural network for training until all the P frame picture images of the picture image group are trained, and ending the training.

And then, the face detection model is comprehensively constructed according to each first detection output image and each second detection output image which are respectively obtained under each picture image group.

Therefore, the human face detection model constructed by the method has high detection precision and good detection effect. And the image detection can be respectively carried out on the I frame image and the P frame image of the current time frequency to be detected through the first neural network and the second neural network in the constructed human face detection model, so that the human face image with the target human face can be detected more quickly, and the detection efficiency is improved. The essence of the method is that the method directly detects the human face in the video compression domain, and the detection process of detecting both the static target and the moving target can be realized by only decoding the I frame picture image and not decoding the P frame picture image, in other words, the video image of the video to be detected does not need to be completely decoded, thereby greatly reducing the calculation amount of the whole detection, and aiming at the P frame picture image, the human face detection can be carried out by using a smaller second neural network, and the consumption of the calculation resources is also reduced. And because the deep neural network is used for automatically learning the characteristics, the distribution characteristics between the target and background macro block vector fields can be better learned, and a better model is established, so that the accuracy of target face detection can be improved.

According to the face detection and protection method provided by the invention, specifically, the first neural network and the second neural network can adopt any one of a convolutional neural network, a cyclic neural network and a combination thereof. Generally, the second neural network may be smaller than the first neural network. In addition, when any one of the convolutional neural network, the cyclic neural network, and a combination of the convolutional neural network and the cyclic neural network is adopted, a model training process may be performed by using a machine learning algorithm such as a Non-Maximum Suppression algorithm (NMS), and may be specifically set according to an actual training scenario.

According to the face detection and protection method provided by the present invention, fig. 4 is a schematic diagram of a pre-training process of a face protection model in the face detection and protection method provided by the present invention, as shown in fig. 4, the method further includes a training step of the face protection model, and the pre-training process of the face protection model in step 200 includes:

210. inputting the face image into a variational self-encoder for encoding processing to obtain a face shielding image;

220. the face occlusion image is returned to a variational self-encoder for decoding processing to obtain a face restoration image;

230. and constructing the face protection model based on the face shielding image and the face restoration image.

And acquiring a large number of face images with target faces detected based on the face detection model. For each original face image, carrying out occlusion preprocessing based on a mean value pixel substitution method in advance: dividing the face part into small cell areas, respectively calculating the average value of each small cell area, replacing the pixel value of the original image of the small cell area with the average value of each small cell area to obtain a pre-occlusion image y1 with mosaic, and using the pre-occlusion image y1 and the original face image y2 as a pair of reference data.

Further, for each face image, the face image is input to the encoding section of the variational self-encoder and subjected to encoding processing, and one output encoded image is obtained as the actual face mask image x 1. And the actual face occlusion image x1 is returned to the decoding part of the variational self-encoder for reverse decoding processing, so as to obtain an output decoded image, which is used as a face restoration image x2 of the actual face occlusion image x 1.

And calculating network loss according to the mosaic-contained pre-occlusion image y1 and the actual face occlusion image x1, the face restoration image x2 and the original face image y2, and reversely propagating the calculation result to update the network parameters of the calculation network of the variational self-encoder. And comprehensively constructing the face protection model according to the face occlusion image x1, the face restoration image x2, the pre-occlusion image y1 and the original face image y 2.

The face protection model constructed in the way can shield and protect the target face in a scene needing privacy protection, and can restore the face hidden in the face shielding image into original information in the scene needing checking of the specific face.

The human face detection and protection device provided by the present invention is described below, and the human face detection and protection device can be understood as a device for implementing the human face detection and protection method, and the principles of the two devices are consistent and can be referred to each other, which is not described herein again.

The invention provides a face detection and protection device, fig. 5 is a schematic structural diagram of the face detection and protection device provided by the invention, as shown in fig. 5, the device comprises: an acquisition module 510, a detection module 520, and an occlusion module 530, wherein,

the acquisition module 510 is configured to acquire a picture image after the video to be detected is decomposed;

the detection module 520 is configured to input the picture image into a pre-trained face detection model, obtain position information of a face region on the picture image, and obtain a face image with a target face based on the position information of the face region on the picture image;

the occlusion module 530 is configured to input the face image into a pre-trained face protection model, and obtain a face occlusion image subjected to occlusion processing.

The human face detection and protection device comprises an acquisition module 510, a detection module 520 and a shielding module 530, wherein the modules are matched with each other, so that the device is a hardware platform, a neural network human face detection model and a human face protection model are arranged on the platform, the target human face is detected in a video compression domain, the shielding treatment of a human face area of the detected target human face in an image is realized, the double functions of detection and privacy protection can be realized, the human face detection precision is high, and the human face detection and shielding speed is greatly improved.

According to the face detection and protection device provided by the invention, on the basis of the embodiment shown in fig. 5, the device further comprises a restoring module, wherein,

and the restoring module is used for inputting the face shielding image to the face protection model again to obtain a restored face image after restoring processing.

The restoring module is applied to a scene needing to check a specific face so as to restore the face hidden in the face shielding image into original information.

Fig. 6 is a schematic structural diagram of the electronic device provided in the present invention, and as shown in fig. 6, the electronic device may include: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform all or part of the steps of the face detection and protection method, which comprises:

acquiring a picture image after the decomposition of a video to be detected;

In another aspect, the present invention also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer being capable of executing the face detection and protection method provided by the above embodiments, the method including:

acquiring a picture image after the decomposition of a video to be detected;

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing all or part of the steps of the face detection and protection method according to the above embodiments, the method including:

acquiring a picture image after the decomposition of a video to be detected;

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be essentially or partially implemented in the form of software products, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the face detection and protection methods according to the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A face detection and protection method is characterized by comprising the following steps:

acquiring a picture image after the decomposition of a video to be detected;

2. The face detection and protection method of claim 1, further comprising:

3. The face detection and protection method according to claim 1 or 2, wherein the pre-training process of the face detection model comprises:

4. The face detection and protection method of claim 3, wherein the face region labeling is to label the position of the face region on the picture image, and the labeling value at least includes the abscissa and ordinate of the reference point of the face region and the width and height of the face region.

5. The face detection and protection method according to claim 3, wherein the first neural network and the second neural network adopt any one of a convolutional neural network, a cyclic neural network and a combination thereof.

6. The face detection and protection method according to any of claims 1-3, wherein the pre-training process of the face protection model comprises:

7. A face detection and protection apparatus, comprising:

8. The face detection and protection device of claim 7, further comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements all or part of the steps of the face detection and protection method according to any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out all or part of the steps of the face detection and protection method according to any one of claims 1 to 6.