CN117716400A

CN117716400A - Face region detection and local reshaping enhancement

Info

Publication number: CN117716400A
Application number: CN202280052829.9A
Authority: CN
Inventors: 黃琮玮; 苏冠铭
Original assignee: Dolby Laboratories Licensing Corp
Current assignee: Dolby Laboratories Licensing Corp
Priority date: 2021-07-29
Filing date: 2022-07-25
Publication date: 2024-03-15

Abstract

A method of processing a face region and a corresponding system are disclosed. The described method includes providing a face bounding box and face confidence level, generating histograms of faces and pixels, generating a face probability, and generating a face probability map. Facial contrast adjustment and facial saturation adjustment may be applied to the facial probability map.

Description

Face region detection and local reshaping enhancement

Cross Reference to Related Applications

The present application claims priority from U.S. provisional patent application No. 63/226,938 filed on 7.29, 2021 and european application No. 21188517.3 filed on 29, 2021, 7, which are incorporated by reference.

Technical Field

The present disclosure relates generally to video image processing. In particular, the present disclosure relates to face region detection and local shaping enhancement.

Background

Face detection methods have been used in a variety of applications to identify faces in images and/or videos. In some existing face region detection methods, the face region may be detected by skin color. Some graphics cut or graphics model based methods may use a bounding box of the face to predict the segmentation of the face in the image. Based on recently developed techniques, deep convolutional neural networks for semantic and instance segmentation tasks can be used for face region detection.

Disclosure of Invention

The disclosed method and apparatus provide an efficient framework to detect a face region in an image given a facial bounding box and apply different adjustments to the face region in local shaping. The detection of the face region is based on a histogram analysis of the face and can be effectively extended to successive frames in the video segment. When applying the detected face region to local shaping, the contrast and saturation of the face may be adjusted separately from other image content to avoid excessive enhancement of face details (e.g., wrinkles or blobs).

One embodiment of the present invention is a method for face region detection in an input image including one or more faces, the method comprising: providing a face bounding box and a confidence level for each of the one or more faces; generating a histogram of all pixels based on the input image; generating a histogram of the one or more faces based on the input image and the face bounding box; a face probability is generated based on the histograms of all pixels and the histograms of the one or more faces, and a face probability map is generated based on the face probability. Another embodiment of the present invention uses the face region detection of the previous embodiment to apply local shaping by: applying a facial saturation adjustment and a facial contrast adjustment to the facial probability map to generate an adjusted facial probability map; and generating a shaped image based on the adjusted facial probability map and the one or more selected shaping functions.

In some embodiments, the method may be computer implemented. For example, the method may be implemented at least in part via a control system including one or more processors and one or more non-transitory storage media.

Some or all of the methods described herein may be performed by one or more devices according to instructions (e.g., software) stored on one or more non-transitory media. Such non-transitory media may include memory devices such as those described herein, including but not limited to Random Access Memory (RAM) devices, read Only Memory (ROM) devices, and the like. Accordingly, various innovative aspects of the subject matter described in this disclosure can be implemented in non-transitory media having software stored thereon. For example, the software may be executed by one or more components of a control system (e.g., those disclosed herein). For example, the software may include instructions for performing one or more of the methods disclosed herein.

At least some aspects of the disclosure may be implemented via one or more devices. For example, one or more devices may be configured to at least partially perform the methods disclosed herein. In some embodiments, an apparatus may include an interface system and a control system. The interface system may include one or more network interfaces, one or more interfaces between the control system and the storage system, one or more interfaces between the control system and another device, and/or one or more external device interfaces. The control system may include at least one of a general purpose single or multi-chip processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. Thus, in some implementations, a control system may include one or more processors and one or more non-transitory storage media operatively coupled to the one or more processors.

The details of one or more implementations of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages will become apparent from the description, the drawings, and the claims. Note that the relative dimensions of the lower drawing may not be drawn to scale. Like reference numbers and designations in the various drawings generally indicate like elements, but different reference numbers do not necessarily indicate different elements between different drawings.

Drawings

Fig. 1 illustrates an example diagram of face region detection and local shaping with face adjustment according to an embodiment of the present disclosure.

Fig. 2 shows an example diagram of a face region detection process according to an embodiment of the present disclosure.

FIG. 3 illustrates an example diagram of generating a global generic histogram according to an embodiment of the disclosure.

Fig. 4 illustrates an image with a detected face according to an embodiment of the present disclosure.

Fig. 5 illustrates an example diagram of generating an individual histogram of a face in an image according to an embodiment of the present disclosure.

Fig. 6 shows an example graph of calculating an initial probability of a face according to an embodiment of the present disclosure.

Fig. 7 illustrates an example diagram of an adaptive ordering and probability propagation process according to an embodiment of the disclosure.

Fig. 8A-8D illustrate example charts and histograms related to the present disclosure. Fig. 8A shows an example graph of facial probabilities, and fig. 8B shows an example non-facial histogram according to an embodiment of the present disclosure. Fig. 8C shows an exemplary histogram of a real non-face, and fig. 8D shows an exemplary graph of updated probabilities of non-faces according to an embodiment of the present disclosure.

Fig. 9 shows an example diagram illustrating details of a local post-processing step according to an embodiment of the present disclosure.

Fig. 10 illustrates an example diagram of partial shaping according to an embodiment of the present disclosure.

Detailed Description

Previous face recognition methods for image processing have drawbacks for video. For example, skin tone detection does not generalize well because skin tone can vary from person to person and under different lighting conditions. For video, the computational cost of predictive segmentation is high. Neural networks may in further operation create flickering artifacts due to detection deletions and temporal inconsistencies. The systems and methods provided herein avoid these drawbacks.

As used herein, a "face bounding box" refers to an imaginary (not drawn) rectangle that serves as a reference point for a face detected by a face detection algorithm.

As used herein, "face histogram" refers to grouping data for detected face images.

As used herein, a "face probability map" refers to a pixel map of the probability of an image to each pixel individually as part of a face.

As used herein, a "basic face shape" or "basic face shape model" refers to a shape (e.g., an ellipse) that generally represents the size and shape of a detected face, and a "basic face shape map" refers to a pixel map of basic face shapes in an image.

As used herein, "face probability" and "non-face probability" refer to the calculated probabilities of pixels being in the face or not being in the face, respectively.

As used herein, "soft morphology operation" refers to a nonlinear operation related to the shape or morphology of a feature in an image, where the maximum and minimum operations used in standard gray scale morphology are replaced by weighted order statistics.

As used herein, "facial adjustment" refers to applying a shaping operation to a detected face region of an image.

As shown in the exemplary embodiment of fig. 1, the disclosed method includes face region detection (100). Given an input image (11) and a pre-detected face bounding box (10), the histogram (12) attributes are analyzed and a face probability (13) is predicted for each bin (bin) in the histogram. Local post-processing (14) is then applied, as a result of which the smoothness is improved and small noise in the generated facial probability map (15) is removed.

Local shaping (100') processing may then be applied. Using the facial probability map (15), different local shaping (17) operations are applied to the facial region. The contrast and saturation of the facial region is adjusted (16) so that it appears natural and visually pleasing in the shaped image (18). In one embodiment, a local shaping method may be used, such as those disclosed in U.S. provisional application No. 63/086,699 entitled "Adaptive Local Reshaping For SDR-To-HDR Up-Conversion," filed by the applicant of the present disclosure on month 10, 2020, which is incorporated herein by reference in its entirety. In this method, the contrast and saturation of each pixel can be easily adjusted.

With continued reference to fig. 1, the disclosed method may also be integrated with existing linear encoding architecture for local shaping (see, e.g., U.S. provisional application No. 63/086,699 referenced above) to address real-world conversion scenarios. The proposed method exploits sliding windows in a linear coding architecture to enhance the temporal stability of the final video quality.

A. Face region detection

Fig. 2 shows an example diagram of a face region detection process according to an embodiment of the present disclosure. This process is based on analyzing the histogram of the face in YUV color space, considering that the color of the face is likely to be different from the color of other content in the same image. Given an input image (201) and a pre-detected face bounding box (200), as shown in a histogram analysis step (220), a generic histogram (202) of faces and all pixels in the YUV color space and an individual histogram (204) of each detected face are first calculated. The basic face shape model (203) is used to generate a basic shape map, which is an initial guess of the face region in the input image (201) used to calculate the histogram. As part of step (230), an initial probability (205) for the face in the YUV color space is calculated from the generic histogram (202). The adaptive ordering (206) may then be used to refine the initial probability of the (refine) face based on the generic histogram (202) and the individual histogram (204) for each face. The probability of the face is then iteratively updated and propagated in the YUV color space (207).

With further reference to fig. 2, given the refined probability of the face, as shown in the local post-processing step (240), local smoothing (208) is first performed to avoid artifacts due to abrupt probability changes in the image. Next is iteratively applying soft morphology operations (209) to remove small noise from the final facial probability map (215) of the input image (201). The pre-detected face bounding box (200) may be derived from any type of face detector that predicts a face bounding box and corresponding detection score. Details of the individual steps shown in the embodiment of fig. 2 will be described below.

A1 histogram analysis

In accordance with the teachings of the present disclosure, as part of the histogram analysis, a face shape model is used to generate an initial guess of the face region for use in computing a generic histogram of the face. To capture the color diversity of different faces in the same image, an individual histogram for each face is also calculated.

A1.1 Global Universal histogram

FIG. 3 illustrates an example diagram of generating a global generic histogram according to an embodiment of the disclosure. The generic histogram refers to the histogram of all faces or the histogram of all pixels. In order to calculate a generic histogram of faces in an input image (31), a face region is first defined. Based on the face bounding boxes (30) that have been detected, each bounding box may be populated with an average, i.e., basic, face shape (32) to obtain an initial guess of the face region. Given an input image (31) S of size W X H containing N _face A detected face with a bounding box (c _k ，x _k ，y _k ，w _k ，h _k )，k＝0,...,N _face -1, wherein c _k A detection score between 0 and 1, (x) _k ，y _k ) Is the coordinates (integer or floating point number) of the upper left corner of the bounding box, (w _k ，h _k ) Is the size (integer or floating point number) of the bounding box of the kth detected face, a basic shape map (33) M can be generated _Q . This basic shape map is an initial guess of the face region, denoted Q, using a predefined or pre-trained basic face shape (32) model. The basic face shape (32) model is a probability map of the face within the detected bounding box. It can also be considered as an average face shape. As an example, the basic face shape (32) model Q may be a solid inscribed ellipse of the bounding box, i.e. 1 inside the ellipse, 0 outside the ellipse. As another example, a basic faceThe shape (32) model may also be learned from training data for face segmentation. In general, the basic facial shape (32) model may be saved as a size W _Q ×H _Q And is resized for each detected face.

With further reference to FIG. 3, for the kth detected face, the face shape model is resized and shifted to fit into the bounding box (30) to obtain a probability map M of the face _Q,k . The probability map is then multiplied by the detection score c _k To reduce the impact of false positive detection, which typically has a lower detection score. Then, the probability maps of all the detected faces are added to the basic shape map (33) M _Q Is a kind of medium. If there is an overlapping bounding box, M _Q May be clipped (clip) to 1. If M _Q Any letter box (letter box) exists, such letter box is excluded. Probability map M given an inactive region (filled black region of arbitrary shape, e.g. letter box, mailbox, circle or any other shape) obtained from an inactive region detector _L The inactive region detector is such as that described in U.S. provisional application No. 63/209,602 entitled "ambient region detection and mixing for image filtration (Surround Area Detection And Blending For Image Filtering), filed by the applicant of the present disclosure at 2021, 6, 11, the entire contents of which are incorporated herein by reference, the probability map of a region of interest (ROI) may be defined as M _ROI ＝1-M _L Then multiply by M _Q . Thus, the final M _Q Can be expressed as:

M _Q ＝M _ROI ·*(min(∑ _k c _k M _Q，k ，1)) (1)

where operator is element-wise multiplication. To further clarify the teachings of the above disclosure, reference is made to fig. 4, which shows an image (400) in which four faces have been detected. The image (400) includes an image main area (401) and a letter box (402). Basic shape diagrams (403) and face bounding boxes (404) associated with the four faces are also shown.

Referring back to fig. 3, since it is derived from the face detectorThe bounding box (30) may not always be perfect, so the actual face area may be outside the bounding box (30). Thus, the probability that the face is outside the bounding box may not be 0. In this case, the scaling factors f in the x and y directions may be utilized before fitting the basic facial shape model _box,x And f _box,y To fix the center and scale of the bounding box. The following pseudo-code shows an example of how a basic shape map can be generated from a basic face shape model of an inscribed ellipse:

with continued reference to fig. 3, given the face region defined in the basic shape map (32), a generic histogram (35) of the face and a generic histogram (34) of all pixels may be calculated. According to an embodiment of the present disclosure, a generic histogram (35) of faces is calculated as a weighted count of pixels, where the weights are from a base shape graph (32). On the other hand, the common histogram (34) for all pixels is a histogram that counts all pixels in the ROI. For computational efficiency, a sub-sampling factor s may be used during counting _hist The pixels are sub-sampled. As an example, s _hist Can be set as s _hist =2. A histogram of an input image (31) of size h×w may be calculated in YUV color space. YUV channels of the input image are denoted S respectively ^Y 、S ^U And S is ^V And the number of intervals of each channel is expressed asAnd->For input bit depth B ^S The section width of each channel is calculated as +.>And +.>B ^S ，And->An example value of (B) ^S =10>For different YUV input formats, corresponding pixel locations in each channel may be required. For YUV420 input, the Y channel may be saved as a W H array and the U and V channels may be saved as W _half ×H _half An array, wherein W _half =w/2 and H _half =h/2. Thus, the first and second substrates are bonded together,and->Are used to represent the downsampled U and V channels, respectively. For the purpose of calculation efficiency, S ^Y The pixel position (i, j) in (1) can be matched +.>And->Is->For other YUV formats, adjustments may be made accordingly. The following pseudocode is an example of calculating a generic histogram for faces and all pixels entered by YUV 420:

a.1.2 local individual histograms of faces

In addition to global generic histograms of all faces, local individual histograms of each face are also considered to capture the changes of each face. This is illustrated by the example diagram shown in fig. 5. For each face, probabilities within the face bounding box (50) are found based on the input image (51) using the basic face shape (52) model in the same way as the basic shape diagram (33) of FIG. 3 is constructed, and then weighted counts are made. However, if there are many faces in a frame, storing all individual histograms (54) may take up a lot of memory, and if histograms from multiple frames are stored, the situation may become more severe. Thus, to save memory, the individual histogram (54) for each face is pre-pruned (53) while preserving as many pixel counts as possible. An exemplary trimming process is described in more detail below.

With further reference to FIG. 5, for the kth face, the original histogram hist is given _face，k Clipped histogramIs of size +.>From interval->An initial array. This is shown in the following formula:

in addition, the retention ratio r of the post-clipping histogram can be recorded _keep，k I.e. the ratio of the total number of pixels before and after clipping for future use. Such a ratio may be obtained as follows:

to improve the results, to prune the histogram, one can find a size where the sum of the histograms is maximumIs a continuous interval of (a). However, since the histogram is 3-D, the calculation result may be large. Thus, the histogram may be trimmed in one channel at a time in the order Y, U and V channels. Examples of parameters are for all facetsFurthermore, most faces may have a retention ratio of, for example, greater than 90%.

Continuing the pruning process disclosed above, and taking into account possible memory constraints, a maximum number of faces N may be set _face，max For storing individual histograms. Thus, when N _face ＞N _face，max When only N is reserved _face，max The most important faces. Since larger faces in an image will typically draw more attention, the size of the bounding box can be used as a measure of importance. In addition, the detection score of the bounding box may be considered to avoid false detections. Thus, the importance of each face can be defined in terms of its area and detection score, as shown in the following equation:

Wherein the area is represented by W.times.H/N _face，max Normalized and cut to 1 because ifThe face is large enough that it is considered important. The item N _face，max Is put into the denominator because the more faces that can be preserved, the smaller faces can be considered. Selecting the top N with the highest importance _face，max A face. N (N) _face，max An example value of (2) is N _face，max ＝16。

Referring to fig. 5, the following pseudocode illustrates an example of how an individual histogram (54) for each face is calculated using a basic face shape (52) model of an inscribed ellipse for YUV420 input:

/>

a.2 probability adaptation

With the generated histogram as disclosed previously, the facial probability of each bin can be defined. In general, if a color has a higher value in the face histogram, it is more likely to be part of the face. Thus, the initial probability of a face can be estimated directly from the generic histogram of the face and all pixels. However, since the face histogram is estimated from a basic shape map, which is only an initial guess of the face region, it may be necessary to further refine the initial probability by adapting it to the local histogram in YUV color space. In this way, iterative adaptive ordering (sort) and probability propagation based on individual histograms of each face and generic histograms of non-faces can be achieved. Details of initial probability estimation, adaptive ordering, and probability propagation are presented by the example graphs of fig. 6-8, which will be described in the following sections.

A.2.1 initial probability

Fig. 6 shows an example graph of calculating an initial probability of a face according to an embodiment of the present disclosure. First, the ratio between the face histogram (62) and the general histogram (61) of all pixels is calculated as follows:

/>

wherein the method comprises the steps ofIs of standard deviation sigma _hist 3-D gaussian filtering (63). Operator/is element-by-element division (64). To avoid division by zero, one can apply to the +.>Interval b of 0, r _face (b) Set to 0. The purpose of gaussian filtering is to reduce noise in the histogram. Standard deviation sigma _hist Can be set to, for example, sigma _hist =0.25 (in interval). Scaling and thresholding are then applied to the ratio (65) to obtain the initial probability of the face (66). The larger the ratio, the greater the probability. For each interval b, the following applies:

p _face，init (b)＝clip3((r _face (b)-r ₀ )/(r ₁ -r ₀ )，0，1) (6)

wherein r is ₀ And r ₁ Is the threshold for the histogram ratio. As can be seen from the above formula, when r _face ＜r ₀ At time p _face，init =0. On the other hand, when r _face ＞r ₁ At time p _face，init =1. Threshold r ₀ And r ₁ Can be set to, for example, r ₀ =0.1 and r ₁ =0.5. In addition, the non-face histogram (68) may be defined as the difference (67) hist between the histograms _nonface ＝hist _all -hist _face . As will be seen later, the non-face histogram (68) will be used in an adaptive ordering process, which will be described in detail in the next section.

A2.2 adaptive ordering

Fig. 7 shows an exemplary diagram of the adaptive ordering (700) described in this section and the probability propagation (701) process described in the next section. Assuming that the majority of the basic shape diagram (33) of fig. 3 is correct, only minor adjustments may be required. More specifically, assume that at hist _nonface At least θ of the pixels counted in _nonface The portion is indeed a non-face. Also assume that for each k, hist _face，k At least θ in the counted pixels _face The part is a true face. Thus, first, the face probability p, which is the face initial probability, is initialized _face ←p _face，init . In addition, the probability of the interval with the lowest probability is updated to 0 until the cumulative pixel count reaches θ of the total pixel count of the histogram _nonface . In other words, the updated probabilities from non-faces (74)Obtained as follows:

wherein the method comprises the steps ofIs a set of intervals for which the probability is to be updated to 0. In other words:

and

Wherein the method comprises the steps ofIs the set of intervals with the lowest probability. The method disclosed above is shown in fig. 8A-8D and is in the case of a one-dimensional histogram. Given the face probability (81) p _face And a non-face histogram (82) hist _nonface Interval with lowest probability +.>Is updated until the sum of the pixel counts of these bins reaches θ of the total pixel count in the histogram _nonface . As a result, a histogram (84) hist of true non-faces is obtained _nonface Post-update probabilities from non-faces (83)

Referring back to FIG. 7, and with respect to updated probabilities from non-faces (74)Similarly disclosed, the probability of the interval with the highest probability is updated to 1 until the cumulative pixel count reaches θ of the total pixel count for each face histogram _face . In other words, the post-update probability (73) for each face is obtained as follows: />

Wherein the method comprises the steps ofIs the interval whose probability is to be updated to 1Aggregation:

and

The post-update probabilities for all faces may be obtained by considering the updates for all faces (75):

in practice, only a pruned histogram may be availableIn addition, in such a pruned histogram, only hist may be retained _face，k R of middle pixel count _keep，k Part(s). Thus, the cumulative pixel count may need to be reachedθ of the sum of (2) _face /r _keep，k . In addition, when θ _face /r _keep，k At > 1, all probabilities for all bins in the pruned histogram may be set to 1. Parameter θ _nonface And theta _face The value of (2) may be determined empirically. By way of example, θ _nonface =0.9, and θ _face ＝0.75。

The following pseudo code shows an example of how probabilities derived from non-faces can be calculated:

the following pseudo code shows an example of how the probability from the face can be calculated:

A.2.3 probability propagation

With further reference to fig. 7, because the interval may appear in both the face and non-face regions, updates from the non-face and face are performed separately and summed. Given an updated probability from non-faces (74)And updated probabilities (75) from the face>Post-update probability (77) p' _face Is a weighted sum of these two updated probabilities based on histogram counts as follows: />

To avoid division by zero, one can choose to do so at hist _all Will p 'at interval of 0' _face Set to 0. Furthermore, because the probability is updated based on the ranking index, it may experience dramatic changes between adjacent intervals. Therefore, gaussian filtering (78) can be performed in the three-dimensional section to make the face probability (79) p _face Smoothing to avoid potential artifacts in later stages of processing. Standard deviation sigma of gaussian filter _prop Can be set to, for example, sigma _prop ＝0.25。

With continued reference to fig. 7, adaptive ordering (700) and probability propagation (701) may be formed into n in accordance with the teachings of the present disclosure _probada And iterating for gradually adapting the probability to the local histogram in the YUV color space. Number of iterations n _probada Can be set as an exampleSuch as n _probada ＝3。

A3 local post-treatment

Referring to fig. 7, the face probability is refined in the YUV color space, but the spatial relationship between pixels is not considered. According to embodiments of the present disclosure, the face probability may be further refined in the spatial domain. Fig. 9 is an example diagram showing details of a local post-processing step (240) as disclosed with respect to the embodiment of fig. 2. As shown, such post-processing steps include local smoothing (900) to avoid visual artifacts and soft morphological operations (901) to remove small noise.

A.3.1 local smoothing

With further reference to fig. 9, an input image (91) and a face probability (90) p are first used _face To obtain an initial probability map (92) M of the face _face，init . The following pseudocode is an example of how a probability map (92) for YUV420 input may be obtained:

referring back to fig. 9, because the face probability (90) is quantized into intervals, if the intervals are very few, the probability between intervals can be interpolated for each pixel. However, since the face probability (90) does not yet contain spatial information, there may be abrupt changes between adjacent pixels in the initial probability map (92). If this occurs in a smooth region of the input image (91), false edges and banding artifacts will occur in the following partial shaping operation. In order to smooth the probability map in the region of the input image smoothing, it is possible to achieve, as described in reference 1, the entire contents of which are incorporated herein by reference]The guided image filtering (93) described in (2) using a probability map (92) as input and using a normalized Y-channel of the input imageAs a guide. Implementation details may be seen, for example, in U.S. provisional application No. 63/086,699, the entire contents of which are incorporated herein by reference, as described above. Acting as To guide the results of the image filtering (93), a smoothed graph (94) is obtained. For a range of [0,1]And a normalized input image (91) of size 1920 x 1080, exemplary parameter values that may be used to guide image filtering (93) are smoothness 0.01 and kernel size 51. For images of different sizes, the kernel size may scale with the image size:

because the pilot image filtering (93) is based on ridge regression and may produce noise due to outliers, the output of the pilot image filtering (93) may be clipped at [0,1 ]]Between them. Furthermore, the probability map of the ROI may be applied such that the face region is inside the ROI, i.e

A.3.2 Soft morphology operation

Referring back to fig. 9, since the face region is generally continuous and has smooth boundaries, it may be desirable to remove small noise in the probability map (92). The small noise may be small holes or unconnected small points in the probability map (92). Conventionally, small noise can be removed by morphological operations of closing, opening, and the like. However, such operations may also change the boundaries of the face region, which may be undesirable in some applications. Soft morphology operations (901) may be used to remove this type of small noise in accordance with the teachings of the present disclosure.

The soft morphology operation (901) of fig. 9 essentially refers to the importance of each pixel, weighted by its surrounding. Given an input probability map (92), M _face The soft morphology operation (901) is defined as:

parameters for controlling the soft morphology operations (901) include sigma _morph The standard deviation of the gaussian filter (95),a) _morph It is decided whether to expand the scaling factor of the face area. Operator is element-wise multiplication. As can be seen from the above definition, each pixel is multiplied by the weighted average of its surrounding pixelsAs part of the scaling and thresholding (97) step, for M _face Pixels at > 0, if +.>After this operation the pixel values will be amplified. On the other hand, if->The pixel value will decrease after this operation. In other words, a pixel is preserved only when the value around the pixel is high. In addition, the operation may be repeated for n _softmarph The iteration is performed to gradually refine the probability map (92) as follows:

wherein the method comprises the steps ofMeaning will +.>Repeat n _softmorph And twice. Furthermore, the probability map of the ROI can be applied such that the face region is located inside the ROI, i.e.>Parameter sigma _morph ，a _morph And n _softmorph For example, can be set to sigma _morph ＝25，a _morph =3 and n _softmorph ＝2。

B. Local shaping with facial adjustment

When performing local shaping, different shaping functions may be applied locally to different pixels.

The shaping function may control and enhance image properties such as contrast, saturation, or other visual characteristics, see, for example, U.S. provisional application No. 63/086,699, incorporated herein by reference in its entirety. For most image content, higher contrast and saturation gives the average person a better viewing experience. However, the higher the contrast and saturation is not always the better for faces in the image. People may dislike enhancing details on the face, such as wrinkles or spots. In addition, a face with less saturation may be more popular than a face with skin tone oversaturation, which looks unnatural, i.e., changes skin tone. Local shaping with facial adjustment according to the teachings of the present disclosure may be applied to address such issues. Referring to fig. 9, after a facial probability map (98) is obtained based on the previous disclosure, a different shaping function may be applied to the facial regions outside of the other non-facial regions in the image to adjust contrast and saturation.

Fig. 10 illustrates an example diagram of local shaping (110) according to an embodiment of the disclosure. A contrast adjustment amount (103) for each pixel in the input image (101) is determined based on the face probability map (102). In addition, a saturation adjustment amount (104) for each pixel in the input image (101) is also determined. The adjustment of contrast and saturation is then applied to the shaping function (105) selection. A shaping operation (106) is performed based on the selected shaping function (105) to generate a shaped image (107). Next, details of the elements of the local shaping (110) will be described.

B.l. local shaping function selection

With further reference to fig. 10, the local shaping method (110) may be selected based on a local shaping function, as described in detail in the above-mentioned U.S. provisional application No. 63/086,699, the entire contents of which are incorporated herein by reference. In other words, for each pixel, a separate shaping function (105) selected from a family of shaping functions is applied to the shaping operation (106) of each channel. Given an input image S and YUV thereofTrack S ^Y ，S ^U S and S ^V Shaped image V and YUV channel V thereof ^Y ，V ^U V (V) ^V For the ith pixel, the local shaping operation may be defined as:

wherein the method comprises the steps ofAnd- >Respectively S ^Y ，S ^U ，S ^V ，V ^Y ，V ^U And V ^V I-th pixel of (a) in the image data. B, MMR ^U And MMR ^V Is a family of shaping functions for Y, U and V channels, < >>And->Respectively the corresponding index of the selected shaping function for the i-th pixel. For simplicity, the index of all pixels is represented as an index map L ^Y ，L ^U And L ^V . Thus, given an input image and a corresponding index map, the local shaping operation for each pixel may be performed accordingly.

By carefully designed families of shaping functions, brightness, contrast, saturation, or other visual characteristics in the shaped image can be changed by adjusting the index map. For example, as described, for example, in the above-mentioned U.S. provisional application No. 63/086,699, the entire contents of which are incorporated herein by reference, local detail and contrast enhancement may be achieved by using the following formulas:

or equivalently the number of the groups of groups,

L ^Y ＝L ^U ＝L ^V ＝L ^(g) +f ^SL (ΔL ^(l) )

and

Wherein the method comprises the steps ofIs the Y-channel of the normalized input image, in for example [0,1]Within the range, and->Is the corresponding edge preserving filtered image. Alpha is the intensity map of enhancement for each pixel. The larger alpha, the stronger the enhancement. f (f) ^SL (.) is a non-linear function from pixel to pixel for further adjustment of enhancement according to pixel brightness. L (L) ^(g) Is a constant global index of the whole image that controls the overall appearance of the shaped image, such as brightness and saturation. Furthermore, when α=0, all pixels use the same shaping function, which is called global shaping, which means that there is no local contrast and detail enhancement. As an example, 4096 shaping functions in the family of shaping functions may be considered for each channel. The parameters used may be default settings, e.g. α=3.8×c for all pixels ₁ Wherein c ₁ Is a model parameter and can be set to, for example, c ₁ ＝2687.1。

With continued reference to fig. 10, and in view of the above disclosure, given a facial probability map (102), by adjusting the index in the face region in the index map, the appearance (107) of the face in the shaped image may be changed. In the following section, the face contrast adjustment (103) and the saturation adjustment (104) will be described in more detail.

B.2 facial contrast adjustment

In some applications, it may not be necessary to enhance facial details like other image content, such as wrinkles or blobs. Therefore, in performing detail and contrast enhancement, it may be necessary to reduce the enhancement intensity of the face region. The adjusted index map L ^Y Can be defined as:

L ^Y ＝L ^(g) +f ^SL (ΔL ^(l) +ΔL _face，c )

wherein,

wherein r is _face Is the face contrast reduction ratio. It can be seen that for pixel i, if M _face (i)＝1，ΔL _face，c (i) Becomes as followsAnd term ΔL in formula (22) ^(l) (i)+ΔL _face，c (i) Can be written as +.>By decreasing the strength from a (i) to (1-r) as compared with formulas (20) and (21) _face ) α (i). Thus, for 0 < r _face ≤1，ΔL _face，c The contrast on the face is reduced. When r is _face When=0, there is no adjustment. When r is _face When=1, the face enhancement intensity becomes 0. Empirically, if the enhancement intensity of the face is 0, the face may look too smooth compared to the surrounding image content that is enhanced in the original intensity. For example, r _face Can be set as r _face ＝0.5。

B.2 face saturation adjustment

In general, increasing the color saturation of an image may improve the viewing experience. However, when referring to faces in an image, increasing color saturation in the same manner as other image content may be undesirable. Oversaturated skin tones can make the face look unnatural or unhealthy. Referring to fig. 10, the disclosed facial saturation adjustment (104) solves such a problem.

As described in U.S. provisional application No. 63/086,699, which is incorporated herein by reference in its entirety, generally the smaller the exponent of the shaping function, the lower the saturation of the shaped image. Furthermore, the darker the input pixel, the more sensitive the shaped pixel is to the index.

In view of the above, based on the acquired L disclosed in the previous section ^Y The adjusted index map L ^U And L ^V Can be further defined as:

L ^U ＝L ^V ＝L ^Y +ΔL _face，s

wherein the method comprises the steps of

In formula (23), d _face Is the facial desaturation offset. θ _sat Is a threshold that controls the degree of desaturation. Thus, when d _face > 0 and θ _sat At > 0, deltaL _face，s The face saturation is reduced. d, d _face The greater the degree of desaturation, the greater. When d _face At=0, there is no desaturation. Empirically, parameter d _face And theta _sat Can be set to, for example, d _face =1024 and θ _sat ＝0.5。

Several embodiments of the present disclosure have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the disclosure. Thus, the present invention may be embodied in any of the forms described herein, including but not limited to the following Enumerated Example Embodiments (EEEs), which describe the structure, features, and functions of portions of the present invention:

EEE 1. A method for face region detection in an input image comprising one or more faces, the method comprising: providing a face bounding box and a confidence level for each of the one or more faces; generating a histogram of all pixels based on the input image; generating a histogram of the one or more faces based on the input image and the face bounding box; a face probability is generated based on the histograms of all pixels and the histograms of the one or more faces, and a face probability map is generated based on the face probability.

EEE2. The method of EEE1 wherein generating the histogram of the one or more faces comprises generating a basic face shape map based on a combination of a face bounding box and a basic face shape, and generating the histogram of the one or more faces based on the input image and the basic face shape map.

EEE3. The method of EEE1 or 2, wherein generating the face probability comprises: filtering the histograms of all pixels to generate the filtered histograms of all pixels; and filtering the histogram of the one or more faces to generate a filtered histogram of the one or more faces.

EEE 4. The method of EEE3, wherein generating the face probability further comprises scaling and thresholding a combination of the filtered histograms of all pixels and the filtered histogram of the one or more faces to generate an initial probability of the face.

EEE 5. The method of EEE4, wherein the initial probability of a face comprises an initial probability of a face in a YUV channel.

EEE 6. The method of EEE4 or 5, wherein generating a face probability further comprises subtracting the generated histogram of one or more faces from the generated histogram of all pixels to generate a non-face histogram.

EEE 7. The method of EEE6, wherein generating a face probability further comprises generating an updated probability of a non-face based on the face initial probability and the non-face histogram; and generating an updated probability for the face based on the face initial probability and the histogram of the one or more faces.

EEE 8. According to the method of EEE7, generating the face probabilities further includes combining the updated probabilities from the non-faces and the updated probabilities from the faces to generate updated probabilities, and filtering the updated probabilities to generate the face probabilities.

EEE 9. The method of EEE8 wherein filtering is performed using a gaussian filter.

The method of any one of EEEs 1-9, further comprising locally smoothing the face probabilities after generating the face probabilities and before generating the face probability map, generating smoothed face probabilities, and applying a soft morphology operation to the smoothed face probabilities to generate the face probability map.

EEE 11. The method of EEE8, further comprising, after generating the face probabilities and before generating the face probability map, locally smoothing the face probabilities to generate smoothed face probabilities, and applying a soft morphological operation to the smoothed face probabilities to generate the face probability map.

EEE 12. The method according to any one of EEEs 10 and 11, further comprising: local shaping is applied by: applying facial saturation adjustment and facial contrast adjustment to the facial probability map to generate an adjusted facial probability map; and generating a shaped image based on the adjusted facial probability map and the one or more selected shaping functions.

EEE 13 the method of any one of EEE1-12, further comprising pruning the histogram of the one or more faces to reduce a storage space required to store the histogram of the one or more faces.

EEE 14 the method of any one of EEE3 to 9, wherein filtering the histograms of all pixels is performed using a Gaussian filter; and filtering the histogram of one or more faces is performed using a gaussian filter.

The method of any of EEEs 4-9, wherein the combination of the histogram of all pixels after filtering and the histogram of the face or faces after filtering comprises a ratio of the histogram of the face or faces after filtering to the histogram of all pixels after filtering.

EEE 16. The method of EEE8, wherein combining the updated probabilities from the non-faces and the updated probabilities from the faces includes generating a weighted sum of the updated probabilities from the non-faces and the updated probabilities from the faces.

EEE 17. The method of EEE12, wherein applying facial contrast adjustment is performed by adjusting the contrast of the one or more faces based on a facial contrast reduction ratio.

EEE 18. The method of EEE12, wherein applying facial saturation adjustment is performed by adjusting the saturation of the one or more faces based on a facial desaturation offset and a facial desaturation threshold.

EEE19: a video decoder comprising hardware, software, or both configured to perform the method according to any of EEEs 1-18.

EEE20: a non-transitory computer readable medium containing program instructions for causing a computer to perform the method according to any one of EEEs 1-18.

The present disclosure relates to certain implementations for describing some of the innovative aspects described herein, and examples of the context in which these innovative aspects may be implemented. However, the teachings herein may be applied in a variety of different ways. Furthermore, the described embodiments may be implemented in various hardware, software, firmware, etc. For example, aspects of the present application may be at least partially embodied in an apparatus, a system comprising more than one device, a method, a computer program product, and the like. Thus, aspects of the present application may take the form of a hardware embodiment, a software embodiment (including firmware, resident software, micro-code, etc.), and/or an embodiment combining both software and hardware aspects. Such embodiments may be referred to herein as "circuits," modules, "" devices, "" apparatuses, "or" engines. Aspects of the present application may take the form of a computer program product embodied in one or more non-transitory media having computer-readable program code embodied thereon. Such non-transitory media may include, for example, a hard disk, random Access Memory (RAM), read Only Memory (ROM), erasable programmable read only memory (EPROM or flash memory), portable compact disc read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the preceding. Thus, the teachings of the present disclosure are not intended to be limited to the embodiments shown in the drawings and/or described herein, but rather have broad applicability.

The above examples are provided to those of ordinary skill in the art as a complete disclosure and description of how to make and use the embodiments of the present disclosure and are not intended to limit the scope of what the inventors regard as their disclosure.

Modifications of the above-described modes for carrying out the methods and systems disclosed herein which are apparent to those of skill in the art are intended to be within the scope of the appended claims. All patents and publications mentioned in the specification are indicative of the level of skill of those skilled in the art to which this disclosure pertains. All references cited in this disclosure are incorporated by reference as if each had been individually incorporated by reference in its entirety.

It is to be understood that the present disclosure is not limited to particular methods or systems, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used in this specification and the appended claims, the singular forms "a," "an," and "the" include plural referents unless the content clearly dictates otherwise. The term "plurality" includes two or more indicators unless the content clearly indicates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

Reference to the literature

[1]He，Kaiming，Jian Sun，and Xiaoou Tang.″Guided image filtering.″IEEE Transactions on Pattern Analysis and Machine Intclligence 35，no.6(2012)：1397-1409.

Claims

1. A method of locally shaping an input image comprising one or more faces, the method comprising:

generating a histogram of all pixels in the input image;

generating a basic face shape map based on a combination of the face bounding box of the one or more faces and a basic face shape model, the basic face shape map comprising a pixel map of basic face shapes for the one or more faces;

generating a histogram of the one or more faces based on the input image and the basic face shape map;

generating a face probability for each bin of the histogram of all pixels based on the histogram of the one or more faces, the face probability including a probability that the pixel is in the face,

generating a facial probability map based on the facial probabilities, the facial probability map including a pixel map of probabilities of each pixel of the input image individually as part of the face, and

a shaped image is generated from the input image based on the facial probability map and the one or more selected shaping functions.

2. The method of claim 1, wherein the basic facial shape model comprises an inscribed ellipse of a bounding box.

3. The method of claim 1 or 2, wherein generating the face probability comprises:

filtering the histograms of all pixels to generate filtered histograms of all pixels; and

and filtering the histograms of the one or more faces to generate the filtered histograms of the one or more faces.

4. The method of claim 3, wherein further generating a face probability further comprises:

the combination of the filtered histograms of all pixels and the filtered histogram of the face or faces is scaled and thresholded to generate an initial probability of the face.

5. The method of claim 4, wherein the initial probability of a face comprises an initial probability of a face in a YUV channel.

6. The method of claim 4 or 5, wherein generating the face probability further comprises:

the generated histogram of the one or more faces is subtracted from the generated histogram of all pixels to generate a non-face histogram.

7. The method of claim 6, wherein generating a face probability further comprises:

generating an updated probability of the non-face based on the face initial probability and the non-face histogram; and

An updated probability of the face is generated based on the initial probability of the face and the histogram of the one or more faces.

8. The method of claim 7, generating a face probability further comprising:

combining the updated probabilities from the non-faces and the updated probabilities from the faces to generate updated probabilities, an

The updated probabilities are filtered to generate facial probabilities.

9. The method of claim 8, wherein filtering is performed using a gaussian filter.

10. The method of any one of claims 1 to 9, further comprising:

after generating the face probability and before generating the face probability map, locally smoothing the face probability to generate a smoothed face probability, an

A soft morphological operation is applied to the smoothed facial probabilities to generate a facial probability map.

11. The method of claim 8, further comprising:

after generating the face probability and before generating the face probability map, locally smoothing the face probability to generate a smoothed face probability, and

12. The method of any one of claims 10 and 11, further comprising: local shaping is applied by:

Applying facial saturation adjustment and facial contrast adjustment to the facial probability map to generate an adjusted facial probability map; and is also provided with

A shaped image is generated based on the adjusted facial probability map and the one or more selected shaping functions.

13. The method of any of claims 1-12, further comprising:

the histogram of the one or more faces is pruned to reduce the storage space required to store the histogram of the one or more faces.

14. The method according to any one of claims 3 to 9, wherein,

filtering the histograms of all pixels is performed using a gaussian filter; and is also provided with

Filtering the histogram of the one or more faces is performed using a gaussian filter.

15. The method of any of claims 4 to 9, wherein the combination of the filtered histogram of all pixels and the filtered histogram of the one or more faces comprises a ratio of the filtered histogram of the one or more faces to the filtered histogram of all pixels.

16. The method of claim 8, wherein combining the updated probabilities from the non-faces and the updated probabilities from the faces comprises:

A weighted sum of the updated probabilities from the non-faces and the updated probabilities from the faces is generated.

17. The method of claim 12, wherein applying facial contrast adjustment is performed by adjusting the contrast of the one or more faces based on a facial contrast reduction ratio.

18. The method of claim 12, wherein applying facial saturation adjustment is performed by adjusting the saturation of the one or more faces based on a facial desaturation offset and a facial desaturation threshold.

19. A video decoder comprising hardware, software, or both configured to perform the method of any of claims 1-18.

20. A non-transitory computer readable medium containing program instructions for causing a computer to perform the method of any one of claims 1-18.