CN112712004B

CN112712004B - Face detection system, face detection method and device and electronic equipment

Info

Publication number: CN112712004B
Application number: CN202011564556.5A
Authority: CN
Inventors: 朱才志; 向声宁; 周晓; 郑银强; 孙耀晖
Original assignee: Hefei Intelingda Information Technology Co ltd; Intelingda Information Technology Shenzhen Co ltd
Current assignee: Hefei Intelingda Information Technology Co ltd; Intelingda Information Technology Shenzhen Co ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2023-09-12
Anticipated expiration: 2040-12-25
Also published as: CN112712004A

Abstract

The embodiment of the disclosure provides a face detection system, a face detection method, a face detection device and electronic equipment, wherein the face detection system comprises: the image acquisition device is used for acquiring an original RAW image, wherein the RAW image comprises at least one face image; the image signal processing equipment is connected with the image collector and is used for acquiring a RAW picture and carrying out image processing on the RAW picture to obtain a standard red, green and blue sRGB picture; the first processor is connected with the image collector and used for acquiring a RAW picture, inputting the RAW picture into a preset face detection model in the first processor and obtaining first position information of at least one face image; the second processor is connected with the image signal processing equipment and the first processor and is used for acquiring the sRGB picture and the first position information and labeling the sRGB picture according to the first position information; and the display is connected with the second processor and used for acquiring the labeled sRGB picture and displaying the picture with the face picture frame according to the labeled sRGB picture.

Description

Face detection system, face detection method and device and electronic equipment

Technical Field

The disclosure relates to the technical field of computer vision, and in particular relates to a face detection system, a face detection method, a face detection device and electronic equipment.

Background

Face detection technology is an important field in computer vision today, and is applied to various aspects of social life, such as identity authentication, face-brushing payment, and the like. In the related art, the face detection system obtains the image in the sRGB (standard Red Green Blue, standard red, green and blue) format through the imaging module, and obtains the position information of the face in the image in the sRGB format through analysis and calculation of the image in the sRGB format, so as to achieve the face detection effect.

However, the existing face detection system is greatly affected by the environment, when the light of the environment where the face detection system is located is dim, the problems of blurring, dimming, artifact and the like exist in the picture in the sRGB format, so that the accuracy of the face detection system for acquiring the face is low.

Disclosure of Invention

An object of an embodiment of the present disclosure is to provide a face detection system and a face detection method, so as to reduce an influence of environmental factors on the face detection system and improve detection accuracy of the face detection system. The specific technical scheme is as follows:

to achieve the above object, an embodiment of the present disclosure provides a face detection system, including:

the image acquisition device is used for acquiring an original RAW image, wherein the RAW image comprises at least one face image;

The image signal processing equipment is connected with the image collector and is used for acquiring the RAW picture and carrying out image processing on the RAW picture to obtain a standard red, green and blue sRGB picture;

the first processor is connected with the image collector and used for acquiring the RAW picture, inputting the RAW picture into a preset face detection model in the first processor and obtaining first position information of the at least one face image; the preset face detection model is a model obtained by training a preset neural network according to a preset training set, wherein the preset training set comprises a plurality of sample RAW pictures and first position information of at least one face image in each sample RAW picture;

the second processor is connected with the image signal processing equipment and the first processor and is used for acquiring the sRGB picture and the first position information and labeling the sRGB picture according to the first position information;

and the display is connected with the second processor and used for acquiring the noted sRGB picture and displaying the picture with the face picture frame according to the noted sRGB picture.

In some embodiments, the second processor is specifically configured to determine a target area in the sRGB picture corresponding to the first location information, and label the target area in the sRGB picture, to obtain a labeled sRGB picture.

In some embodiments, the face detection system further comprises a denoising device;

the denoising device is connected with the image collector and the first processor and is used for acquiring the RAW picture, denoising the RAW picture, and sending the denoised RAW picture to the first processor.

In some embodiments, the denoising device is specifically configured to divide all pixels in the RAW image into a plurality of groups with the same number as that of the plurality of color classes according to a standard correspondence between the pixels and the plurality of color classes, and perform denoising processing on each pixel in each group, where the plurality of color classes include red, green, and blue, or the plurality of color classes include red, a first area green, a second area green, and blue, the pixels corresponding to the first area green are located in a first preset area, and the pixels corresponding to the second area green are located in a second preset area.

In some embodiments, the second processor is further configured to perform histogram equalization normalization preprocessing on the denoised RAW picture before inputting the RAW picture into a preset face detection model in the first processor.

In order to achieve the above object, an embodiment of the present disclosure further provides a face detection method, where the face detection method includes:

acquiring first position information sent by a first processor, wherein the first position information is obtained after the first processor inputs a RAW picture into a preset face detection model, and the RAW picture comprises position information of at least one face image;

acquiring an sRGB picture sent by an image signal processing device;

and marking the sRGB picture according to the first position information, and sending the marked sRGB picture to a display, so that the display displays the picture with the face picture frame according to the marked sRGB picture.

In some embodiments, the step of labeling the sRGB picture according to the first location information includes:

determining a target area corresponding to the first position information in the sRGB picture;

and marking the target area in the sRGB picture.

To achieve the above object, an embodiment of the present disclosure further provides a face detection apparatus, including:

the first acquisition module is used for acquiring first position information sent by the first processor, wherein the first position information is obtained after the first processor inputs a RAW picture into a preset face detection model, and the RAW picture comprises position information of at least one face image;

A second acquisition module for acquiring the sRGB picture transmitted by the image signal processing apparatus;

the labeling module is used for labeling the sRGB picture according to the first position information, and sending the labeled sRGB picture to a display, so that the display displays the picture with the face frame according to the labeled sRGB picture.

In some embodiments, the labeling module is specifically configured to:

and marking the target area in the sRGB picture.

To achieve the above object, an embodiment of the present disclosure further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the steps of the face detection method when executing the program stored in the memory.

To achieve the above object, the embodiments of the present disclosure further provide a computer-readable storage medium having a computer program stored therein, which when executed by a processor, implements the steps of the face detection method.

The beneficial effects of the embodiment of the disclosure are that:

the embodiment of the disclosure provides a face detection system, a face detection method, a face detection device and an electronic device, wherein after an image collector collects a RAW picture, the RAW picture is sent to a first processor and image signal processing equipment. And the image signal processing equipment receives the RAW picture, performs image processing on the RAW picture, obtains an sRGB picture, and sends the obtained sRGB picture to the second processor. After the first processor acquires the RAW picture, the RAW picture is input into a preset face detection model, first position information corresponding to a face image in the RAW picture is obtained, and the first position information is sent to the second processor. After receiving the sRGB picture and the first position information, the second processor marks the sRGB picture according to the first position information, and then sends the marked sRGB picture to the display, so that the display displays the marked sRGB picture. When the face detection system is adopted for face detection, the RAW picture can still provide richer details under the condition of low illumination because the color information contained in the pixel points in the RAW picture is higher. Therefore, the face position in the RAW picture is determined firstly, and then the face image in the sRGB picture is determined according to the face position in the RAW picture, so that the influence of environmental factors on the face detection system is reduced, and the detection accuracy of the face detection system is improved.

Of course, not all of the above-described advantages need be achieved simultaneously in practicing any one of the products or methods of the present disclosure.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings that are required for the embodiments or the description of the prior art will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present disclosure, and other embodiments may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.

Fig. 1 is a block diagram of a face detection system in an embodiment of the present disclosure;

fig. 2 is a flowchart of a training method of a preset face recognition model in an embodiment of the disclosure;

FIG. 3 is a schematic diagram of an embodiment of the disclosure with an sRGB picture labeled;

fig. 4 is an interaction diagram of each device in the face detection system according to an embodiment of the disclosure;

FIG. 5 is another block diagram of a face detection system in an embodiment of the present disclosure;

FIG. 6 is a flowchart of a face detection method according to an embodiment of the present disclosure;

fig. 7 is a block diagram of a face detection apparatus according to an embodiment of the present disclosure;

fig. 8 is a block diagram of an electronic device in an embodiment of the disclosure.

Detailed Description

The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.

In order to reduce the influence of environmental factors on a face detection system and improve the detection accuracy of the face detection system, the embodiment of the disclosure provides a face detection system, a face detection method, a face detection device and electronic equipment, and the detailed description will be given below with reference to the accompanying drawings.

As shown in fig. 1, a face detection system provided in an embodiment of the present disclosure includes:

an image collector 101, configured to collect an original RAW image, where the RAW image includes at least one face image;

an image signal processing device 102, connected to the image collector 101, for obtaining a RAW image and performing image processing on the RAW image to obtain a standard rgb image;

the first processor 103 is connected with the image collector 101 and is used for acquiring a RAW picture, inputting the RAW picture into a preset face detection model in the first processor 103 and obtaining first position information of at least one face image; the preset face detection model is a model obtained by training a preset neural network according to a preset training set, wherein the preset training set comprises a plurality of sample RAW pictures and first position information of at least one face image in each sample RAW picture;

The second processor 104 is connected with the image signal processing device 102 and the first processor 103, and is used for acquiring the sRGB picture and the first position information, and labeling the sRGB picture according to the first position information;

and the display 105 is connected with the second processor 104 and is used for acquiring the labeled sRGB picture and displaying the picture with the face frame according to the labeled sRGB picture.

In the embodiment of the present disclosure, the image collector 101 may be a device for collecting images, such as a camera, and a mobile phone. The image collector 101 is respectively connected to the image signal processing device 102 and the first processor 103, and is configured to collect a RAW image, and after the RAW image is collected, the collected RAW image may be sent to the image signal processing device 102 and the first processor 103. The RAW picture is an unprocessed original picture file.

In the embodiment of the present disclosure, the image signal processing apparatus 102 is capable of acquiring a RAW picture transmitted by the image acquirer 101 and performing image processing on the acquired RAW picture. Among them, there are various ways in which the image signal processing apparatus 102 acquires a RAW picture. In one example, the image signal processing apparatus 102 may transmit an acquisition request for a RAW picture to the image collector 101, so that the image collector 101 transmits the RAW picture to the image signal processing apparatus 102 according to the acquisition request after receiving the acquisition request, and the image signal processing apparatus 102 acquires the RAW picture. In another example, after the image acquirer 101 acquires the RAW picture, the acquired RAW picture is directly transmitted to the image signal processing apparatus 102, and the image signal processing apparatus 102 acquires the RAW picture. The image signal processing apparatus 102 may also acquire a RAW picture in other manners, which is not particularly limited in the embodiments of the present disclosure.

The image signal processing apparatus 102 performs image processing on a RAW picture after acquiring the RAW picture. Among them, the image processing performed by the image signal processing apparatus 102 on the RAW picture includes, but is not limited to, 3D denoising of the picture, nonlinear processing, linear correction, dead pixel removal, interpolation, white balance, automatic exposure control, and the like. The image signal processing apparatus 102 performs image processing on the RAW picture to obtain an sRGB picture, and outputs the sRGB picture to the second processor 104. The sRGB picture is obtained by processing the original picture file, and the color of the sRGB picture accords with the sRGB color language protocol.

In the embodiment of the present disclosure, the image signal processing apparatus 102 may be an ISP (Image Signal Processing, picture signal processing) processing apparatus, where the image signal processing apparatus 102 performs ISP processing on a RAW picture, where the ISP processing includes, but is not limited to, 3D denoising, nonlinear processing, linear correction, dead pixel removal, interpolation, white balance, automatic exposure control, and the like of the picture.

In the embodiment of the present disclosure, the first processor 103 is connected to the image collector 101, and is capable of acquiring a RAW image. There are various ways in which the first processor 103 obtains the RAW picture. In one example, the first processor 103 may send an acquisition request for a RAW picture to the image collector 101, so that the image collector 101 sends the RAW picture to the first processor 103 according to the acquisition request after receiving the acquisition request, and the first processor 103 acquires the RAW picture. In another example, after the image collector 101 collects the RAW image, the collected RAW image is directly sent to the first processor 103, and the first processor 103 obtains the RAW image. The first processor 103 may also obtain the RAW picture in other manners, which is not specifically limited in the embodiments of the present disclosure.

After the first processor 103 obtains the RAW picture, inputting the obtained RAW picture into a preset face detection model, so that the preset face detection model processes the RAW picture, and then outputting first position information of at least one face image in the RAW picture. That is, when the input RAW picture includes a plurality of face images, the preset face detection model may output a plurality of first position information of the plurality of face images. The preset face detection model may be a face detection model obtained by training a convolutional neural network, or may be another type of face detection model, which is not particularly limited in the embodiment of the present disclosure.

In an embodiment of the present disclosure, the preset face detection model may be a model obtained by training a preset neural network based on a preset training set, where the preset training set may include a plurality of sample RAW pictures and first location information of at least one face image in each sample RAW picture.

Based on this, as shown in fig. 2, the preset face detection model may be obtained by training the following steps:

step 201, acquiring a first preset training set, where the first preset training set includes a plurality of first sample RAW pictures and first position information of at least one face image included in each first sample RAW picture.

Step 202, inputting each first sample RAW image into a preset neural network respectively, so as to obtain first prediction position information of at least one face image in each first sample RAW image.

Step 203, determining a first loss value according to the first predicted position information and the labeling position information, determining whether the preset neural network converges according to the first loss value, and if yes, executing step 204; if not, step 205 is performed.

And 204, if yes, finishing training, and determining the current preset neural network as a preset face detection model.

Step 205, if not, adjusting parameters of the preset neural network, and returning to step 202 to start a new training round.

In step 201, the first preset training set may be obtained based on an existing sRGB training set, and specifically, an sRGB training set including a plurality of sRGB sample pictures is obtained first. And then inputting each sRGB sample picture into a preset picture processing model to obtain a first RAW sample picture corresponding to each sRGB sample picture. And acquiring the first position information of at least one face image in each first RAW sample picture. Thereby, a first preset training set is obtained according to the first RAW sample picture and the first position information corresponding to the first RAW sample picture. The preset picture processing model is used for restoring the sRGB picture into the RAW picture, and the preset picture processing model takes the sRGB picture as input data to process the sRGB picture and then outputs the corresponding RAW picture.

In one embodiment, the preset picture processing model may be a model developed based on a self-encoder deep learning model. The deep learning network may be trained based on a plurality of sRGB sample pictures and RAW pictures corresponding to each sRGB sample picture as a preset training set. Specifically, after the sRGB sample pictures are normalized, a preset deep learning network is input in a manner that the number of single samples is 1. And judging whether the pre-deep learning network is converged according to the mean square difference loss function, and taking the current deep learning network as a preset picture processing model when the deep learning network is converged. Whether the deep learning model converges, such as a root mean square error loss function, an average absolute value error loss function, etc., can also be determined according to other loss functions, which is not particularly limited in the embodiments of the disclosure.

In step 203, the first loss value may be an error existing between the first predicted position information and the labeling position information. For example, the accuracy and the error rate of face detection may be counted according to the first predicted position information and the labeling position information. The error rate or the accuracy rate of face detection may be regarded as the first loss value. Based on the above, the process of judging whether the preset neural network converges is as follows: when the first loss value is smaller than a preset error rate threshold value, determining convergence; and when the first loss value is greater than or equal to a preset correct rate threshold value, determining that the first loss value is not converged. Other ways of determining whether the preset neural network converges may also be used, which is not limited herein. For example, if the number of iterations reaches a preset number of iterations threshold, a preset neural network convergence is determined, and so on.

In an embodiment of the present disclosure, the preset training set may further include a plurality of sample RAW pictures, second location information in each sample RAW picture, and face confidence corresponding to each second location information. The face confidence corresponding to each piece of second position information may be understood as a probability that the second position information includes a face image in a region corresponding to the sample RAW image. The higher the face confidence value corresponding to the second position information is, the higher the probability that the occupied area of the second position information in the sample RAW picture comprises a face image is; the lower the face confidence value corresponding to the second position information is, the lower the probability that the occupied area of the first position information in the sample RAW picture comprises a face image is.

Based on the above, the preset face detection model can be obtained through training by the following steps:

step one, a second preset training set is obtained, wherein the second preset training set comprises a plurality of second sample RAW pictures, second position information in each second sample RAW picture and face confidence degrees corresponding to the second position information, and the face confidence degrees corresponding to each second position information are used for representing the probability that a corresponding region of the second position information in the second sample RAW picture comprises a face image.

And step two, respectively inputting each second sample RAW picture into a preset neural network to obtain second predicted position information of the face image in each second sample RAW picture and predicted face confidence corresponding to the second predicted position information.

Determining a second loss value according to the second predicted position information, the predicted face confidence, the second labeling position information and the labeling face confidence, judging whether the preset neural network converges or not according to the second loss value, and if yes, executing the fourth step; if not, executing the fifth step.

And step four, if yes, finishing training, and determining the current preset neural network as a preset face detection model.

And fifthly, if not, adjusting parameters of a preset neural network, re-executing the second step, and starting a new training round.

The descriptions of the first step to the fifth step refer specifically to the steps 201 to 205, and are not repeated here.

Based on the above, when the plurality of second position information and the face confidence coefficient corresponding to each second position information in the sample RAW picture are obtained, the first processor is further configured to select the plurality of second position information according to the face confidence coefficient, and send at least one second position information, in the plurality of second position information, with the corresponding face confidence coefficient being greater than or equal to a preset confidence coefficient threshold, to the second processor. The preset confidence threshold may be set according to practical situations, for example, the preset confidence threshold is 0.8, 0.85 or 0.9, which is not limited in particular in the embodiment of the disclosure.

For example, taking the preset confidence degree as 0.85 as an example, if the sample RAW picture is input into a preset face detection model, the preset face detection model outputs position information a, b and c respectively, and the face confidence degree corresponding to the position information a is 0.8, the face confidence degree corresponding to the position information b is 0.9 and the face confidence degree corresponding to the position information c is 0.95. Since 0.95 and 0.9 are greater than 0.85,0.8 and less than 0.85, the position information b and the position information c are the position information with the corresponding face confidence greater than the preset confidence threshold, and thus the first processor 103 sends the position information b and the position information c to the second processor 104.

In the embodiment of the disclosure, after acquiring the first location information sent by the first processor 103 and the sRGB picture sent by the image signal processing apparatus 102, the second processor 104 marks the sRGB picture according to the first location information, and obtains the marked sRGB picture.

In some embodiments, the second processor 104 is specifically configured to determine a target area in the sRGB picture corresponding to the first location information, and mark the target area in the sRGB picture.

In the embodiment of the disclosure, in the process of performing image processing on a RAW picture to obtain an sRGB picture corresponding to the RAW picture, position information of each object included in the RAW picture or the sRGB picture is not changed. Based on this, after the second processor 104 obtains the first position information in the RAW picture and the sRGB picture corresponding to the RAW picture, the second processor may determine the position information in the sRGB picture that is the same as the position coordinate of the first position information according to the first position information, then determine the target area corresponding to the position information in the sRGB picture, and mark the target area. Since the first position information is the position information of the face image in the RAW picture, the position information in the sRGB picture, which is the same as the position coordinate of the first position information, is the position information of the face image in the sRGB picture, that is, the target area in the sRGB picture, which corresponds to the position information in the same as the position coordinate of the first position information, contains the face image.

The target area in sRGB is marked, and it can be understood that the target area is selected by a rectangular frame or a circular frame, as shown in fig. 3. The target area may also be marked by other means, such as an arrow at the target area, etc., as embodiments of the present disclosure are not specifically limited thereto.

In an embodiment of the present disclosure, the display 105 may include a display screen, and is configured to display the labeled sRGB picture in the display screen after the labeled sRGB picture is acquired. And displaying a face picture frame in the sRGB picture displayed on the screen, wherein the face picture frame is a face image in the sRGB picture.

When face detection is performed by the face detection system described above, as shown in fig. 4, the image collector 101 sends a RAW picture to the first processor 103 and the image signal processing device 102 after collecting the RAW picture. The image signal processing apparatus 102 performs image processing on the RAW picture after receiving the RAW picture, obtains an sRGB picture, and sends the obtained sRGB picture to the second processor 104. After the first processor 103 obtains the RAW image, the RAW image is input into a preset face detection model, first position information corresponding to a face image in the RAW image is obtained, and the first position information is sent to the second processor 104. After receiving the sRGB picture and the first location information, the second processor 104 marks the sRGB picture according to the first location information, and then sends the marked sRGB picture to the display 105, so that the display 105 displays the marked sRGB picture.

The color information contained in the pixel points in the RAW picture is higher, so that the RAW picture can still provide richer details under the condition of low illumination. Therefore, the face position in the RAW picture is determined firstly, and then the face image in the sRGB picture is determined according to the face position in the RAW picture, so that the influence of environmental factors on the face detection system is reduced, and the detection accuracy of the face detection system is improved.

In some embodiments, the face detection system provided in the embodiments of the present disclosure further includes a denoising device 106, as shown in fig. 5. The denoising device 106 is connected to the image collector 101 and the first processor 103, and is configured to obtain a RAW image, denoise the RAW image, and send the denoised RAW image to the first processor 103.

In the embodiment of the disclosure, the denoising device 106 is connected with the image collector 101 and the first processor 103, and after the image collector 101 obtains the RAW image, the RAW image is not directly sent to the first processor 103, but the RAW image is sent to the denoising device 106 first, so that the denoising device 106 removes noise from the RAW image, and then sends the RAW image after removing noise to the first processor 103. Noise of the RAW picture includes, but is not limited to, pretzel noise, gaussian noise, and the like. Denoising the RAW picture reduces interference to the RAW picture, and enhances brightness of the RAW picture, so that the first processor 103 can more accurately detect the face image in the RAW picture, and detection accuracy of the face detection system is further improved.

In the embodiment of the present disclosure, the denoising device 106 may perform denoising processing on the RAW image by using a median filtering method. The median filtering method can be understood as dividing the RAW image into a plurality of small areas, arranging gray values corresponding to pixel points in each small area in a sequence from large to small to obtain an arranged gray value sequence, determining a median value of the gray value sequence, and replacing gray values of all the pixel points in the small area with the median value, thereby completing image denoising.

In the embodiment of the present disclosure, the denoising device 106 may also perform denoising processing on the RAW image by using other methods, such as an average filtering method, an adaptive wiener filtering method, and the like, which is not limited in particular in the embodiment of the present disclosure.

In some embodiments, the denoising device 106 is specifically configured to divide all pixels in the RAW picture into a plurality of groups having the same number as the plurality of color categories according to a standard correspondence between the pixels and the plurality of color categories, and perform denoising processing on each pixel in each group, where the plurality of color categories include red, green, and blue, or the plurality of color categories include red, first area green, second area green, and blue, the pixels corresponding to the first area green are located in a first preset area, and the pixels corresponding to the second area green are located in a second preset area.

In the embodiment of the disclosure, a photosensitive chip is disposed in the image collector 101, and an optical filter is disposed in front of each pixel of the photosensitive chip, and the color of the optical filter in front of the pixel is the color of the pixel. Based on this, the standard correspondence between the pixel points and the plurality of color categories in the RAW image acquired by the image acquirer 101 may be: each pixel point in the RAW picture has a color to which it belongs. The number of the plurality of color categories may be set according to actual situations, and in one example, the plurality of color categories may be three primary colors, that is, red, blue, and green. According to the colors of the pixels, the pixels in the RAW picture can be divided into a red group, a blue group and a green group, and then noise reduction processing is respectively carried out on the pixels in the red group, the blue group and the green group.

In another example, the green information in the RAW picture may be relatively large, for example, up to 50%. Therefore, the green pixel points with more numbers can be further divided, for example, the pixel points are divided according to the positions of the pixel points. Based on this, the plurality of color categories may be red, first area green, second area green, blue. The first region green may be a green pixel point located in a first preset region, and the second region green may be a green pixel point located in a second preset region. The dividing manner of the area of the RAW picture may be: the method includes dividing the RAW into a first preset area and a second preset area by using a center line of the RAW picture as a dividing line, and dividing the area of the RAW picture in other manners, for example, using a diagonal line of the RAW picture as a dividing line, which is not particularly limited in the embodiments of the present disclosure.

When the plurality of color categories are red, the first area is green, the second area is green and blue, and when the plurality of pixel points in the RAW picture are grouped according to the color categories, if the colors of the pixel points are blue or red, the pixel points are directly classified into a blue group or a red group; if the color of the pixel point is green, further judging the pixel point according to the position of the pixel point in the RAW picture, classifying the pixel point into a first region green group if the position of the pixel point is positioned in a first preset region, and classifying the pixel point into a second region green group if the position of the pixel point is positioned in a second preset region. The color categories may be set according to actual needs, for example, the color categories may be set to be a first red, a second red, a green, a blue, etc., which is not particularly limited in the embodiments of the disclosure.

In the embodiment of the disclosure, the pixel points are grouped according to the colors of the pixel points in the RAW picture, and then the pixel points contained in each group are denoised for each group, so that the situation that if the pixel point of one color (such as blue) is surrounded by the pixel point of another color (such as green) with larger difference in the pixel point of one color (such as blue) in the denoising process of the RAW picture, the pixel point of the color (such as blue) is treated as noise removal is avoided, useful color information is prevented from being removed in the denoising process, and the denoising effect of the RAW picture is improved.

In some embodiments, the first processor 103 is further configured to perform normalization preprocessing on the RAW picture after denoising before inputting the RAW picture into a preset face detection model in the first processor 103.

In the embodiment of the disclosure, when the first processor 103 receives the denoised RAW image, in order to enable the preset face detection model to extract the position information of the face image from the RAW image more accurately, the value range of the pixel value of the image needs to be reduced to a preset interval, such as 0-1, so that normalization preprocessing can be performed on the RAW image. Wherein the normalization preprocessing includes, but is not limited to, maximum-minimum normalization preprocessing, and the like.

In the embodiment of the disclosure, before the RAW image is input to the preset face detection model, histogram equalization preprocessing may be further performed on the RAW image. The histogram equalization preprocessing may be understood as performing nonlinear stretching on the RAW image, and reassigning pixel values of the RAW image, so that the number of pixel values in a certain gray scale range is approximately equal. The RAW pictures provided by the embodiments of the present disclosure may also be preprocessed in other manners, which are not specifically limited in the embodiments of the present disclosure.

It should be noted that, because the preset face detection model is a single-channel input model, when the pixels of the RAW picture are divided into a plurality of groups with the same number as that of the plurality of color categories, and each group is denoised, the first processor 103 needs to unify the plurality of groups of denoised pixels.

In order to reduce the influence of environmental factors on a face detection system and improve the detection accuracy of the face detection system, the embodiment of the disclosure also provides a face detection method, which is applied to the second processor in the face detection system, and the face detection method comprises the following steps, as shown in fig. 6:

step 601, obtaining first position information sent by a first processor, where the first position information is obtained after the first processor inputs a RAW picture into a preset face detection model, and the RAW picture includes position information of at least one face image.

Step 602, an sRGB picture transmitted by an image signal processing apparatus is acquired.

And 603, marking the sRGB picture according to the first position information, and sending the marked sRGB picture to a display so that the display displays the picture with the face frame according to the marked sRGB picture.

In some embodiments, step 603 may be refined as: and determining a target area corresponding to the first position information in the sRGB picture, and marking the target area in the sRGB picture.

The description of steps 601-603 refers to the description of the face detection system, and will not be repeated here.

When the face detection is performed by the face detection method, the image signal processing equipment performs image processing on the RAW picture to obtain an sRGB picture, and then sends the obtained sRGB picture to the second processor. The first processor inputs the RAW picture into a preset face detection model, and sends the first position information to the second processor after obtaining the first position information corresponding to the face image in the RAW picture. After receiving the sRGB picture and the first position information, the second processor marks the sRGB picture according to the first position information, and then sends the marked sRGB picture to the display, so that the display displays the marked sRGB picture.

In order to reduce the influence of environmental factors on a face detection system and improve the detection accuracy of the face detection system, an embodiment of the disclosure further provides a face detection device, as shown in fig. 7, where the face detection device includes:

the first obtaining module 701 is configured to obtain first location information sent by the first processor, where the first location information is location information of at least one face image included in a RAW picture obtained after the first processor inputs the RAW picture into a preset face detection model.

A second acquisition module 702, configured to acquire the sRGB picture sent by the image signal processing apparatus.

The labeling module 703 is configured to label the sRGB picture according to the first location information, and send the labeled sRGB picture to the display, so that the display displays the picture with the face frame according to the labeled sRGB picture.

In some embodiments, the labeling module 703 is specifically configured to:

and marking the target area in the sRGB picture.

The embodiment of the present invention further provides an electronic device, as shown in fig. 8, including a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete communication with each other through the communication bus 804,

A memory 803 for storing a computer program;

the processor 801 is configured to implement any one of the steps of the face detection method described above when executing the program stored in the memory 803.

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present invention, there is also provided a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of any of the face detection methods described above.

In yet another embodiment of the present invention, a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the face detection methods described in the above embodiments is also provided.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the method, apparatus, electronic device, and computer-readable storage medium, the description is relatively simple, as it is substantially similar to the system embodiments, with reference to the portions of the system embodiments that are relevant.

The foregoing description is only of the preferred embodiments of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present disclosure are included in the protection scope of the present disclosure.

Claims

1. A face detection system, comprising:

the first processor is connected with the image collector and used for acquiring the RAW picture, inputting the RAW picture into a preset face detection model in the first processor and obtaining first position information of the at least one face image; the preset face detection model is a model obtained by training a preset neural network according to a preset training set, wherein the preset training set comprises a plurality of sample RAW pictures and first position information of at least one face image in each sample RAW picture; the preset training set is obtained based on the existing sRGB training set in the following mode: inputting each sRGB sample picture into a preset picture processing model to obtain a first RAW sample picture corresponding to each sRGB sample picture; acquiring first position information of at least one face image in each first RAW sample picture; constructing a preset training set according to the first RAW sample picture and the first position information corresponding to the first RAW sample picture;

The second processor is connected with the image signal processing equipment and the first processor and is used for acquiring the sRGB picture and the first position information and labeling the sRGB picture according to the first position information; the second processor is specifically configured to determine a target area corresponding to the first position information in the sRGB picture, and label the target area in the sRGB picture, to obtain a labeled sRGB picture;

the display is connected with the second processor and used for acquiring the noted sRGB picture and displaying the picture with the face picture frame according to the noted sRGB picture;

the denoising device is connected with the image collector and the first processor and is used for acquiring the RAW picture, denoising the RAW picture and sending the denoised RAW picture to the first processor;

the denoising device is specifically configured to divide all pixels in the RAW picture into a plurality of groups with the same number as that of the plurality of color categories according to a standard correspondence between the pixels and the plurality of color categories, and perform denoising processing on each pixel in each group, where the plurality of color categories include red, green and blue, or the plurality of color categories include red, a first area green, a second area green and blue, the pixels corresponding to the first area green are located in a first preset area, and the pixels corresponding to the second area green are located in a second preset area.

2. The face detection system of claim 1, wherein,

the first processor is further configured to perform normalization preprocessing on the denoised RAW picture before inputting the RAW picture into a preset face detection model in the first processor.

3. A face detection method, comprising:

acquiring first position information sent by a first processor, wherein the first position information is obtained after the first processor inputs a RAW picture into a preset face detection model, and the RAW picture comprises position information of at least one face image; the preset face detection model is a model obtained by training a preset neural network according to a preset training set, wherein the preset training set comprises a plurality of sample RAW pictures and first position information of at least one face image in each sample RAW picture; the preset training set is obtained based on the existing sRGB training set in the following mode: inputting each sRGB sample picture into a preset picture processing model to obtain a first RAW sample picture corresponding to each sRGB sample picture; acquiring first position information of at least one face image in each first RAW sample picture; the RAW picture is obtained by: constructing a preset training set according to the first RAW sample picture and the first position information corresponding to the first RAW sample picture; dividing all pixel points in the RAW picture into a plurality of groups with the same number as that of the plurality of color categories according to the standard correspondence between the pixel points and the plurality of color categories, and denoising each pixel point in each group, wherein the plurality of color categories comprise red, green and blue, or the plurality of color categories comprise red, first region green, second region green and blue, the pixel point corresponding to the first region green is positioned in a first preset region, and the pixel point corresponding to the second region green is positioned in a second preset region;

Acquiring an sRGB picture sent by an image signal processing device;

marking the sRGB picture according to the first position information, determining a target area corresponding to the first position information in the sRGB picture, marking the target area in the sRGB picture, and sending the marked sRGB picture to a display, so that the display displays the picture with the face frame according to the marked sRGB picture.

4. A face detection apparatus, comprising:

the first acquisition module is used for acquiring first position information sent by the first processor, wherein the first position information is obtained after the first processor inputs a RAW picture into a preset face detection model, and the RAW picture comprises position information of at least one face image; the preset face detection model is a model obtained by training a preset neural network according to a preset training set, wherein the preset training set comprises a plurality of sample RAW pictures and first position information of at least one face image in each sample RAW picture; the preset training set is obtained based on the existing sRGB training set in the following mode: inputting each sRGB sample picture into a preset picture processing model to obtain a first RAW sample picture corresponding to each sRGB sample picture; acquiring first position information of at least one face image in each first RAW sample picture; constructing a preset training set according to the first RAW sample picture and the first position information corresponding to the first RAW sample picture; the RAW picture is obtained by: constructing a preset training set according to the first RAW sample picture and the first position information corresponding to the first RAW sample picture; dividing all pixel points in the RAW picture into a plurality of groups with the same number as that of the plurality of color categories according to the standard correspondence between the pixel points and the plurality of color categories, and denoising each pixel point in each group, wherein the plurality of color categories comprise red, green and blue, or the plurality of color categories comprise red, first region green, second region green and blue, the pixel point corresponding to the first region green is positioned in a first preset region, and the pixel point corresponding to the second region green is positioned in a second preset region;

5. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of claim 3 when executing a program stored on a memory.

6. A computer-readable storage medium, characterized in that it has stored therein a computer program which, when executed by a processor, implements the method steps of claim 3.