CN108932465B

CN108932465B - Method and device for reducing false detection rate of face detection and electronic equipment

Info

Publication number: CN108932465B
Application number: CN201711462162.7A
Authority: CN
Inventors: 余永龙; 李聪廷; 陈航锋; 黄攀; 汪辉
Original assignee: Zhejiang Uniview Technologies Co Ltd
Current assignee: Zhejiang Uniview Technologies Co Ltd
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2021-02-02
Anticipated expiration: 2037-12-28
Also published as: CN108932465A

Abstract

The embodiment of the invention provides a method, a device and electronic equipment for reducing false detection rate of face detection, wherein the method comprises the following steps: acquiring a current frame image to be detected and a previous frame image of the current frame image; obtaining a plurality of first target blocks from a current frame image, and obtaining second target blocks of coordinates corresponding to the first target blocks in a previous frame image; judging whether each first target block is a static object or not according to the frame difference relationship between each first target block and a second target block corresponding to the first target block; and deleting the first target block which is judged as the static object from the current frame image. Through the steps, the moving object in the captured target can be accurately judged by utilizing the frame difference relation, the interference of the environment diversity on the detection is reduced, and the judgment accuracy is improved.

Description

Method and device for reducing false detection rate of face detection and electronic equipment

Technical Field

The invention relates to the technical field of computer vision, in particular to a method and a device for reducing false detection rate of face detection and electronic equipment.

Background

The face detection technology plays an indispensable role in the intelligent information age. First, face detection is a key link in an automatic face recognition system. Secondly, the face detection has important application value in the aspects of content retrieval, digital image processing, video detection, security monitoring and the like. With the development of intelligent technology, face detection methods are also continuously innovated, and from a traditional method based on key point detection and matching to a deep learning method based on CNN, the face detection accuracy and timeliness are also continuously improved. However, the problem of false detection of face detection always exists regardless of the traditional method or the method based on the deep network model. This problem is particularly bad for applications like face-capturing machines, because the capturing system reports the captured face and also reports the object being false-detected, especially a fixed false-detected object in the fixed background in front of the shot.

At present, the problem of face false detection is solved to become one of the most main problems in a series of methods based on a deep network model, and particularly, a face detection system is used under a complex environment background. Because of the diversity of environmental factors such as the use scene and the illumination of the face detection system, it is obviously not a perfect strategy to overcome false detection by adding negative samples, so how to realize a problem that the snap shooting accuracy and performance of the system cannot be influenced and the face detection accuracy can be reduced needs to be solved urgently.

Disclosure of Invention

In view of the above, the present invention provides a method, an apparatus and an electronic device for reducing a false detection rate of face detection to solve the above problem.

The preferred embodiment of the present invention provides a method for reducing false detection rate of face detection, the method comprising:

acquiring a current frame image to be detected and a previous frame image of the current frame image;

obtaining a plurality of first target blocks from the current frame image, and obtaining second target blocks of coordinates corresponding to the first target blocks in the previous frame image;

judging whether each first target block is a static object or not according to the frame difference relation between each first target block and a second target block corresponding to the first target block;

and if the current frame image is judged to be static and dynamic, deleting the first target block which is judged to be a static object from the current frame image.

Further, after the step of deleting the first target block determined as a static object from the current frame image, the method further includes:

and inputting the first target block left in the current frame image subjected to deletion processing into the established face classifier so as to screen out the face object from the first target block left.

Further, the step of determining whether each first target block is a static object according to a frame difference relationship between each first target block and a second target block corresponding to the first target block and the second target block includes:

carrying out difference operation on each first target block and a second target block corresponding to the first target block and the second target block to obtain a frame difference chart of each first target block and the second target block corresponding to the first target block;

aiming at each frame difference image, comparing the pixel value of each pixel point in the frame difference image with a first preset threshold value, and obtaining the frame difference equivalent value corresponding to each pixel point according to the comparison result;

obtaining a total frame difference value between a first target block and a second target block corresponding to the frame difference image according to the frame difference equivalent value of each pixel point;

and judging whether the first target block is a static object or not according to the frame difference total value.

Further, the step of comparing the pixel value of each pixel point in the frame difference map with a first preset threshold, obtaining a frame difference equivalent value corresponding to each pixel point according to the comparison result, and obtaining a frame difference total value between a first target block and a second target block corresponding to the frame difference map according to the frame difference equivalent value of each pixel point includes:

dividing the frame difference map into a plurality of sub-blocks;

aiming at each sub-block, comparing the pixel value of each pixel point in the sub-block with the first preset threshold value respectively;

setting the value of the frame difference of the pixel points with the pixel values larger than the first preset threshold value as 1, and setting the value of the frame difference of the pixel points with the pixel values smaller than or equal to the first preset threshold value as 0;

counting the sum of the frame difference values of the pixel points contained in the sub-blocks;

and accumulating the sum of the values of the frame difference corresponding to each sub-block to obtain the total value of the frame difference between the first target block and the second target block.

Further, the step of determining whether the first target block is a static object according to the total frame difference value includes:

detecting whether the frame difference total value is smaller than a second preset threshold value, and if so, judging that the first target block is a static object;

if the sum of the frame difference values is larger than or equal to the second preset threshold, selecting a sub-block with the maximum sum of the frame difference values of a preset number from the plurality of sub-blocks;

accumulating the sum of the frame difference values of the selected sub-blocks, and calculating the proportion between the accumulated value and the total frame difference value;

and detecting whether the ratio is greater than a third preset threshold, and if so, determining that the first target block is a static object.

Further, the face classifier is obtained by:

constructing a classifier network architecture based on a convolutional neural network;

and respectively inputting the obtained positive samples and negative samples of the face images into the classifier network architecture to train the positive samples and the negative samples of the face images so as to obtain the face classifier.

Further, the step of inputting the remaining first target block in the current frame image subjected to the deletion processing into the established face classifier to screen the face object from the remaining first target block includes:

inputting the first target block left after deletion into the established face classifier;

detecting a first fitting degree between the first target block and a positive sample of the face image trained in the face classifier, and a second fitting degree between the first target block and a negative sample trained in the face classifier;

and comparing the first fitting degree and the second fitting degree corresponding to the first target block, and if the first fitting degree is greater than the second fitting degree, judging that the first target block is the human face object.

Further, before the step of obtaining a plurality of first target blocks from the current frame image and obtaining a second target block in the previous frame image with coordinates corresponding to each of the first target blocks, the method further includes:

and scaling the current frame image and the previous frame image in the same proportion.

Another preferred embodiment of the present invention further provides an apparatus for reducing a false detection rate of face detection, the apparatus comprising:

the image acquisition module is used for acquiring a current frame image to be detected and a previous frame image of the current frame image;

a target block obtaining module, configured to obtain a plurality of first target blocks from the current frame image, and obtain a second target block of the previous frame image, where the second target block corresponds to a coordinate of each first target block;

the judging module is used for judging whether each first target block is a static object or not according to the frame difference relation between each first target block and a second target block corresponding to the first target block;

and the deleting module is used for deleting the first target block which is judged as the static object from the current frame image when the static object is judged.

Another preferred embodiment of the present invention further provides an electronic device, including:

a memory;

a processor; and

the device for reducing the false detection rate of the face detection comprises one or more software functional modules which are stored in the memory and executed by the processor.

The embodiment of the invention provides a method, a device and electronic equipment for reducing false detection rate of face detection, wherein a plurality of first target blocks are obtained from a current frame image, a plurality of second target blocks with corresponding coordinates to the first target blocks are obtained from a previous frame image of the current frame image, whether each first target block is a static object or not is judged according to the frame difference relation between each first target block and the corresponding second target block, and the corresponding first target block is deleted from the current frame image when the first target block is judged to be the static object. Through the steps, the moving object in the captured target can be accurately judged by utilizing the frame difference relation, the interference of the environment diversity on the detection is reduced, and the judgment accuracy is improved. In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a block diagram of an electronic device according to a preferred embodiment of the invention.

Fig. 2 is a flowchart of a method for reducing false detection rate of face detection according to a preferred embodiment of the present invention.

Fig. 3 is a flowchart of the substeps of step S105 in fig. 2.

Fig. 4 is a flowchart of sub-steps of step S1053 in fig. 3.

Fig. 5 is a flowchart of sub-steps of step S1055 in fig. 3.

Fig. 6 is another flowchart of a method for reducing false detection rate of face detection according to a preferred embodiment of the present invention.

Fig. 7 is a flowchart of a method for establishing a face classifier according to a preferred embodiment of the present invention.

FIG. 8 is a diagram of a classifier network architecture constructed in accordance with a preferred embodiment of the present invention.

Fig. 9 is a flowchart of the substeps of step S109 in fig. 6.

Fig. 10 is a functional block diagram of an apparatus for reducing a false detection rate of face detection according to a preferred embodiment of the present invention.

Icon: 100-an electronic device; 110-means to reduce false detection rate of face detection; 111-an image acquisition module; 112-target block acquisition module; 113-a judgment module; 114-a deletion module; 115-a screening module; 120-a processor; 130-memory.

Detailed Description

The inventor finds that the following method is often adopted in the prior art to realize target screening after face snapshot:

(1) one way is that after the face detection system is in the face capturing link, all captured targets are screened by adopting a skin color filtering method so as to filter out the targets which are considered as non-face by the system.

The method comprises the steps of firstly carrying out foreground judgment on all captured targets, then utilizing a trained skin color foreground classifier to score and screen the captured targets which are judged to be the foreground, judging the captured targets to be face targets and outputting the face targets when the captured targets meet set threshold conditions, applying the face targets to face recognition and the like in subsequent links, and deleting the targets when the captured targets do not meet the set threshold conditions.

(2) The other mode is that after the human face capturing link, all captured targets are matched and screened by a partial human face detection system by adopting a template matching and similar method so as to filter out the non-human face targets considered by the system.

The method comprises the steps of firstly carrying out face contour or partial key point positioning on all captured targets, then carrying out key point matching judgment on one or more types of face templates which are manufactured in advance and the targets, judging the targets to be faces and outputting the targets when the set threshold conditions are met, and using the targets for face recognition and other applications in subsequent links, and deleting the targets when the set threshold conditions are not met.

In the first method, in engineering practice, the scene where the face detection related application is located has characteristics of diversity, complexity, and the like, for example, outdoor night and other environments with disordered ambient light, which causes great restrictions on adopting a color space-based screening strategy such as skin color. Because the face to be captured is greatly affected by the ambient light in these special scenes, the skin color of the face cannot meet the preset screening condition based on the color space. In addition, the strategy based on skin color filtering cannot be applied to gray level images, and the application range is relatively narrow.

For the second mode, in the same way, in engineering practice, the characteristics of diversity, complexity and the like of the environment where the front-end device of the face detection system is located cause great difficulty in the process of extracting the contour or the key point of the captured target. In addition, in actual operation, the pedestrian to be captured moves at a great random, the phenomena of head turning, head lowering, shielding and the like are easy to occur, a great part of the captured face is a side face, and due to the influence of ambient light, even if the captured face is a face target, key points of the face target are difficult to accurately extract, and the factors can reduce the accuracy of template matching to a great extent. Meanwhile, the template matching method is time-consuming and is not suitable for real-time snapshot in engineering practice.

Based on the above research, the embodiment of the present invention provides a scheme for reducing a false detection rate of face detection, which can screen a moving object from a plurality of target blocks in a current frame image by using a frame difference relationship between the current frame in an image to be processed and a previous frame image thereof, thereby greatly reducing interference caused by environmental diversity on judgment and reducing the false detection rate.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "disposed," and "connected" are to be construed broadly, e.g., as being fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Referring to fig. 1, a schematic block diagram of an electronic device 100 according to an embodiment of the invention is shown. In this embodiment, the electronic device 100 may be an image capturing device, a photographing device, or the like, and the electronic device 100 has an image capturing function. As shown in fig. 1, the electronic device 100 may include a memory 130, a processor 120, and a computer program stored on the memory 130 and executable on the processor 120, and the processor 120 executes the computer program to enable the electronic device 100 to implement the method for reducing the false detection rate of face detection according to the present invention.

The memory 130 and the processor 120 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory 130 stores software functional modules stored in the memory 130 in the form of software or Firmware (Firmware), and the processor 120 executes various functional applications and data processing by running software programs and modules stored in the memory 130, such as the device 110 for reducing the false detection rate of face detection in the embodiment of the present invention, so as to implement the method for reducing the false detection rate of face detection in the embodiment of the present invention.

It is to be understood that the configuration shown in fig. 1 is merely exemplary, and that the electronic device 100 may include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for reducing a false detection rate of face detection applied to the electronic device 100 shown in fig. 1, and the steps included in the method will be described in detail below.

Step S101, obtaining a current frame image to be detected and a previous frame image of the current frame image.

Step S103, obtaining a plurality of first target blocks from the current frame image, and obtaining a second target block of the previous frame image corresponding to the coordinates of each of the first target blocks.

The method for reducing the false detection rate of the face detection provided by the embodiment is applied after the system snapshot link and before the application of face recognition and the like. Considering that in engineering practice, pedestrians to be captured by the system for capturing human faces have motion attributes, in the present embodiment, a screening mechanism of moving objects is first designed to obtain moving objects from captured images.

In the present embodiment, a current frame image and a previous frame image of the current frame image are acquired from a captured image. In this embodiment, the current frame image and the previous frame image may be preprocessed by using a neural network algorithm, so as to keep a plurality of regions in which face images may exist in the current frame image and the previous frame image. The preprocessing of the image by using the neural network algorithm can refer to the prior art to obtain more relevant knowledge, and the details are not repeated in this embodiment.

However, due to the influence of environmental factors, real-time factors, and the like on the preprocessing process, the area obtained after preprocessing generally has false detection, and the area after preprocessing may be an area including a human face or an area including other, for example, static objects. Therefore, the frame difference strategy provided in the present embodiment is adopted to detect the moving object characterized as the human face object, thereby reducing the false detection rate.

In this embodiment, a plurality of first target blocks are obtained from the preprocessed current frame image, where the first target blocks are the regions possibly including the face image. Each of the first target blocks has a respective coordinate value in the current frame image, the coordinate value may be a two-dimensional coordinate value, and a coordinate value of a center point of each of the first target blocks may be used as an overall coordinate value of each of the first target blocks. Optionally, a block having the same coordinate value as each of the first target blocks is obtained from a previous frame of image of the current frame of image, and is used as a second target block corresponding to each of the first target blocks. And then, judging whether each first target block is a moving object or a static object according to the frame difference relation between each first target block and the second target block corresponding to the coordinate of the first target block.

In this embodiment, in order to reduce the processing amount of the image, the current frame image and the previous frame image may be scaled in the same scale to reduce the current frame image and the previous frame image to appropriate sizes. And then the subsequent treatment is carried out, so that the treatment efficiency is improved and the treatment time is saved.

Step S105, determining whether each first target block is a static object according to a frame difference relationship between each first target block and a second target block corresponding to the first target block.

Referring to fig. 3, in the present embodiment, the step S105 may include three substeps, namely step S1051, step S1053 and step S1055.

In step S1051, a difference operation is performed on each first target block and a second target block corresponding to the first target block and the second target block to obtain a frame difference map of each first target block and the second target block corresponding to the first target block.

Step S1053, aiming at each frame difference image, comparing the pixel value of each pixel point in the frame difference image with a first preset threshold value, obtaining the frame difference equivalent value corresponding to each pixel point according to the comparison result, and obtaining the frame difference total value between a first target block and a second target block corresponding to the frame difference image according to the frame difference equivalent value of each pixel point.

In this embodiment, in order to avoid the time overhead caused by calculating the frame difference of the whole image, the frame difference calculation may be performed only on each first target block in the current frame image after the size reduction and each second target block in the previous frame image. Optionally, for each first target block, a difference operation may be performed on each first target block and a second target block corresponding to the first target block and the second target block corresponding to the first target block, so as to obtain a frame difference map of each first target and the second target block corresponding to the first target block.

And aiming at each frame difference image, the pixel value of each pixel point in the frame difference image can be obtained, the pixel value corresponding to each pixel point is compared with a first preset threshold value, and whether a first target block corresponding to the frame difference image is a static object or not is judged according to the comparison result of each pixel point in the frame difference image. The first preset threshold may be set to 10, or other suitable values.

Referring to fig. 4, in the present embodiment, the step S1053 may include five substeps, namely, step S10531, step S10533, step S10535, step S10537 and step S10539.

In step S10531, the frame difference map is divided into a plurality of sub-blocks.

Step S10533, comparing the pixel values of the pixels in the sub-blocks with the first preset threshold, respectively, for each of the sub-blocks.

Step S10535, setting the value of the frame difference of the pixel point whose pixel value is greater than the first preset threshold value to 1, and setting the value of the frame difference of the pixel point whose pixel value is less than or equal to the first preset threshold value to 0.

Step S10537, the sum of the frame difference values of the pixel points included in the sub-block is counted.

Step S10539, accumulating the sum of the values of the frame difference corresponding to each of the sub-blocks to obtain the total value of the frame difference between the first target block and the second target block.

In this embodiment, when determining whether each first target block is a static object, to improve the accuracy of the determination, the frame difference map corresponding to each first target block may be divided into blocks, and then the determination may be performed according to the pixel value of the pixel point in each block. Optionally, for each frame difference map, the frame difference map may be divided into a plurality of sub-blocks, for example, into 4 × 4 16 sub-blocks, or may be divided into another number of sub-blocks, which is not limited in this embodiment. Only when the sub-blocks are divided, it should be noted that each sub-block should at least be guaranteed to have a plurality of pixels, for example, 2 or 3, for the convenience of subsequent processing. Then, the detection difference value is calculated for each sub-block.

Optionally, in this embodiment, for each sub-block, a plurality of pixel points included in the sub-block are obtained, and the pixel values corresponding to the pixel points are respectively compared with the first preset threshold. In practical operation, for convenience of statistics, the value of the frame difference of the pixel point whose pixel value is greater than the first preset threshold may be set to 1, and the value of the frame difference of the pixel point whose pixel value is less than or equal to the first preset threshold may be set to 0. The frame difference value is equivalent to the frame difference value of the pixel point. Thus, the sum of the counted frame difference values of the pixel points in the sub-block is the number of the pixel points of which the pixel values are greater than the first preset threshold value in the sub-block.

After obtaining the sum of the values of the frame difference of each sub-block, the sum of the values of the frame difference of each sub-block can be written as W_n,1，W_n,2，…，W_n,16Wherein n represents the number of the frame difference map corresponding to each sub-block. The sum of the values of the frame difference of each sub-block can be accumulated to obtain the total value of the frame difference map corresponding to each sub-block, i.e. the total value of the frame difference between the first target block and the second target block corresponding to the frame difference map is marked as W_sum。

Step S1055, determining whether the first target block is a static object according to the frame difference total value.

Referring to fig. 5, in the present embodiment, step S1055 may include five substeps, namely step S10551, step S10552, step S10553, step S10554, and step S10555.

In this embodiment, when determining whether each first target block in the current frame image is a static object, the determination may be performed according to a condition of a frame difference between the first target block and a second target block corresponding to the first target block, or may be performed by combining a condition of a sum of the frame difference between the first target block and the second target block corresponding to the first target block and a frame difference between each sub-block in a frame difference map between the first target block and the second target block corresponding to the first target block.

Step S10551, detecting whether the total frame difference value is smaller than a second preset threshold, if so, executing the following step S10552, and if not, executing the following step S10553.

In step S10552, it is determined that the first target tile is a static object.

In step S10553, a sub-block with the largest total sum of the values of the frame differences of the preset number is selected from the plurality of sub-blocks.

Step S10554, accumulating the sum of the values of the frame difference of the selected sub-blocks, and calculating the ratio between the accumulated value and the total value of the frame difference.

Step S10555, detecting whether the ratio is greater than a third preset threshold, and if so, executing step S10552.

In this embodiment, for each first target block, it is detected whether the detection difference between the first target block and the corresponding second target block is smaller than a second preset threshold, where the second preset threshold may be set to 30 or other values, as follows:

W_sum<30

if the difference is smaller than the second preset threshold, it indicates that the difference between the first target block and the corresponding second target block is not large, and it is highly probable that the object corresponding to the first target block is a static object. On the contrary, if the total frame difference between the first target block and the corresponding second target block is greater than or equal to the second preset threshold, it indicates that the first target block may be a dynamic object, but in this case, some other objects may be mistakenly classified as dynamic objects, for example, for a block including a ground image, the total frame difference of the block of the ground image may be greater than the second preset threshold due to a dynamic image generated by a pedestrian passing through the ground, and the ground image may be mistakenly detected as a dynamic object. Therefore, when the total frame difference value is greater than or equal to the second preset threshold, the uniformity of the distribution of the dynamic sub-blocks in the first target block can be considered to further reduce the false detection rate.

Optionally, in this embodiment, when the total value of the detected differences between the first target block and the second target block corresponding to the first target block is greater than or equal to the second preset threshold, a sub-block with the largest total sum of the values of the frame differences is selected from the plurality of sub-blocks divided from the frame difference map corresponding to the first target block, for example, four sub-blocks or other number of sub-blocks may be selected, which is not limited in this embodiment.

And accumulating the sum of the values of the frame difference of the selected sub-blocks, and comparing the accumulated result with the total value of the detected difference corresponding to the first target block to obtain a ratio value between the two. And the obtained ratio is proportional to a third preset threshold, for example, the third preset threshold may be 0.75, or may be other values. If the obtained ratio value is greater than the third preset threshold, that is, if the following formula is satisfied, it may be determined that the object corresponding to the first target block is a static object.

Wherein, W_sumIs the total frame difference value, W, between the first target block and the corresponding second target block_max1、W_max2、W_max3、W_max4The first four blocks with the maximum sum of the detection difference values in the frame difference image corresponding to the first target blockThe sum of the detected values of the sub-blocks.

In step S107, the first target block determined as the static object is deleted from the current frame image.

It should be noted that, since there is almost no motion in the individual faces that are normally captured, the total frame difference value corresponding to the individual faces is deleted because the total frame difference value is smaller than the second preset threshold, but considering that the capturing system is continuously captured, it is found by practice that such a situation does not substantially affect the overall capturing rate.

In this embodiment, the problem of static false detection is solved to a great extent through the above error detection strategy, but the filtering effect on dynamic false detection is not ideal enough, so on the basis, a classifier trained based on a deep learning mode is designed in this embodiment to screen out a human face object from a dynamic target. Referring to fig. 6, the method for reducing the false detection rate of face detection according to the present embodiment further includes the following steps:

step S109, inputting the remaining first target block in the current frame image after the deletion process into the established face classifier, so as to screen out the face object from the remaining first target block.

Alternatively, after the above steps are performed, a static object may be detected from the captured image, and the first target block determined as the static object may be deleted, so as to avoid interference with subsequent processing. And inputting the first target block left after deletion into the established face classifier so as to screen out the face object from the remaining first target block. It should be noted that the first target block left after the deletion may be one first target block or a plurality of first target blocks, which is not limited in this embodiment.

Referring to fig. 7, in the present embodiment, the face classifier can be built by the following steps:

step S201, a classifier network architecture based on a convolutional neural network is constructed.

Step S203, respectively inputting the obtained positive samples and negative samples of the plurality of face images into the classifier network architecture to train the positive samples and the negative samples of the face images so as to obtain the face classifier.

In this embodiment, a classifier network architecture based on a convolutional neural network is constructed, and the classifier network architecture constructed in this embodiment includes three convolutional layers, four pooling layers, and two full-connection layers, and a specific structure thereof is shown in fig. 8. The number of convolution kernels of the convolution layer I, the convolution layer II, the convolution layer III and the full-connection layer I and the full-connection layer II is respectively 8, 16, 32 and 2 in sequence. The convolution kernel size is 3 × 3 except for the fully connected layer two, which is 1 × 1. The step size of each layer is 1. In the network architecture, the core sizes of the first pooling layer, the second pooling layer, the third pooling layer and the fourth pooling layer are all 2 x 2, and the step length is 2.

In this embodiment, in order to make convergence faster, the network architecture activation method all uses the ReLU function, and in addition, in order to prevent overfitting, a dropout mechanism is added in the last downsampling layer, i.e., the pooling layer four, so as to randomly make some hidden layer weights not work.

In this embodiment, after the classifier network architecture is constructed, a plurality of face image positive samples and a plurality of negative samples, that is, non-face image samples, which are used for training, are respectively input into the established classifier network architecture, and the plurality of face image positive samples and the plurality of negative samples are trained in the network architecture, so as to obtain the face classifier, which is used as a subsequent criterion for determining a face image.

Referring to fig. 9, in the present embodiment, the step S109 may include three substeps, i.e., a step S1091, a step S1093, and a step S1095.

Step S1091, inputting the first target block left after deletion into the established face classifier.

Step S1093, detecting a first fitting degree between the first target block and the positive sample of the face image trained in the face classifier, and a second fitting degree between the first target block and the negative sample trained in the face classifier.

Step S1095, comparing the first fitting degree and the second fitting degree corresponding to the first target block, and determining that the first target block is the human face object if the first fitting degree is greater than the second fitting degree.

In this embodiment, the plurality of first target blocks remaining after the static object is deleted are input into the established face classifier, so as to classify each first target block in the face classifier.

In this embodiment, a first degree of fitting between each first target block input into the face classifier and a positive sample of the face image after training in the face classifier and a second degree of fitting between each first target block and a negative sample after training are detected, where the first degree of fitting and the second degree of fitting are both decimals smaller than 1 and larger than 0, and a sum of the first degree of fitting and the second degree of fitting is 1. And detecting whether the first fitting degree corresponding to each first target block is greater than the second fitting degree, if so, indicating that the fitting degree of the first target block to the positive sample of the face image is higher, and judging that the object corresponding to the first target block is the face image. And the first target block confirmed as the face image is reported to a subsequent face application link, otherwise, the first target block can be confirmed as a false detection object to be deleted. And finishing the face screening link.

Referring to fig. 10, a functional block diagram of an apparatus 110 for reducing a false detection rate of face detection applied to the electronic device 100 according to an embodiment of the present invention is shown. The device comprises an image acquisition module 111, a target block acquisition module 112, a judgment module 113, a deletion module 114 and a screening module 115.

The image obtaining module 111 is configured to obtain a current frame image to be detected and a previous frame image of the current frame image. The image obtaining module 111 may be configured to perform step S101 shown in fig. 2, and the detailed description of step S101 may be referred to for a specific operation method.

The target block obtaining module 112 is configured to obtain a plurality of first target blocks from the current frame image, and obtain a second target block of the previous frame image with coordinates corresponding to each of the first target blocks. The target block obtaining module 112 may be configured to execute step S103 shown in fig. 2, and the detailed description of step S103 may be referred to for a specific operation method.

The determining module 113 is configured to determine whether each first target block is a static object according to a frame difference relationship between each first target block and a second target block corresponding to the first target block. The determining module 113 may be configured to execute step S105 shown in fig. 2, and the detailed description of step S105 may be referred to for a specific operation method.

The deleting module 114 is configured to delete the first target block determined as the static object from the current frame image when the first target block is determined as the static object. The deleting module 114 may be configured to execute step S107 shown in fig. 2, and the detailed description of step S107 may be referred to for a specific operation method.

The screening module 115 is configured to input the remaining first target block in the current frame image subjected to the deletion processing into the established face classifier, so as to screen a face object from the remaining first target block. The screening module 115 may be configured to perform step S109 shown in fig. 6, and a detailed description of the step S109 may be referred to for a specific operation method.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method, and will not be described in too much detail herein.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by hardware, or by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the method according to the implementation scenarios of the present invention.

In summary, the embodiments of the present invention provide a method, an apparatus, and an electronic device 100 for reducing a false detection rate of face detection, in which a plurality of first target blocks are obtained from a current frame image, a plurality of second target blocks having corresponding coordinates to the first target blocks are obtained from a previous frame image of the current frame image, whether each first target block is a static object is determined according to a frame difference relationship between each first target block and the corresponding second target block, and the corresponding first target block is deleted when the first target block is determined as the static object. On the basis, the rest first target blocks are input into the established face classifier so as to screen out the face image from the rest first target blocks. Through the steps, the moving object in the captured target can be accurately judged by utilizing the frame difference relation, the interference of the environment diversity on the detection is reduced, and the judgment accuracy is improved. And further, a face classifier combined with deep learning is used for screening out face images from the moving object, and further false detection is filtered.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for reducing false detection rate of face detection is characterized in that the method comprises the following steps:

if the current frame image is judged to be static and dynamic, deleting the first target block which is judged to be a static object from the current frame image;

the step of determining whether each first target block is a static object according to the frame difference relationship between each first target block and a second target block corresponding to the first target block and the second target block, includes:

comparing the pixel value of each pixel point in the frame difference image with a first preset threshold value aiming at each frame difference image, obtaining the frame difference equivalent value corresponding to each pixel point according to the comparison result, and obtaining the frame difference total value between a first target block and a second target block corresponding to the frame difference image according to the frame difference equivalent value of each pixel point;

when the total frame difference value is greater than or equal to a second preset threshold value, selecting a sub-block with the maximum sum of the values of the frame differences of a preset number from a plurality of sub-blocks contained in the first target block, accumulating the sum of the values of the frame differences of the selected sub-block, and calculating the proportion between the accumulated value and the total frame difference value;

2. The method according to claim 1, wherein after the step of deleting the first target block determined as a static object from the current frame image, the method further comprises:

3. The method according to claim 1, wherein the step of comparing the pixel value of each pixel point in the frame difference map with a first preset threshold, obtaining the frame difference equivalent value corresponding to each pixel point according to the comparison result, and obtaining the frame difference total value between the first target block and the second target block corresponding to the frame difference map according to the frame difference equivalent value of each pixel point comprises:

dividing the frame difference map into a plurality of sub-blocks;

4. The method according to claim 3, wherein the step of determining whether the first target block is a static object according to the total frame difference value comprises:

and detecting whether the frame difference total value is smaller than a second preset threshold value, and if so, judging that the first target block is a static object.

5. The method for reducing the false detection rate of human face detection according to claim 2, wherein the human face classifier is obtained by the following steps:

6. The method according to claim 5, wherein the step of inputting the remaining first target block in the current frame image after the deletion process into the established face classifier to screen the face object from the remaining first target block comprises:

7. The method according to claim 1, wherein before the step of obtaining a plurality of first target blocks from the current frame image and obtaining a second target block in the previous frame image with coordinates corresponding to each of the first target blocks, the method further comprises:

8. An apparatus for reducing false detection rate of face detection, the apparatus comprising:

a deleting module, configured to delete the first target block determined as the static object from the current frame image when the first target block is determined as the static object;

the judging module is configured to judge whether each first target block is a static object by:

9. An electronic device, comprising:

a memory;

a processor; and

the apparatus for reducing false positive rates of face detection of claim 8, comprising one or more software functional modules stored in the memory and executed by the processor.