WO2018082308A1

WO2018082308A1 - Image processing method and terminal

Info

Publication number: WO2018082308A1
Application number: PCT/CN2017/087702
Authority: WO
Inventors: 张兆丰; 牟永强
Original assignee: 深圳云天励飞技术有限公司
Priority date: 2016-11-07
Filing date: 2017-06-09
Publication date: 2018-05-11
Also published as: CN106650615B; CN106650615A

Abstract

Provided are an image processing method and a terminal. The method comprises: acquiring an image to be processed; calculating the number of layers of a feature pyramid of the image to be processed so as to obtain n layers, n being an integer greater than or equal to 1; constructing the feature pyramid based on the n layers; performing feature extraction on K pre-set detection windows on the feature pyramid so as to obtain K groups of first target features, wherein each group of pre-set detection windows corresponds to one group of first target features, K being an integer greater than or equal to 1; determining K groups of second target features according to the K groups of first target features; and making a decision on the K groups of second target features by using M specified decision trees so as to obtain the size and position of a target face frame, M being an integer greater than or equal to 1. The position of a face can be quickly detected.

Description

Image processing method and terminal

Technical field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method and a terminal.

Background technique

With the rapid development of information technology, face recognition technology has been widely used in the field of video surveillance. In the field of face recognition applications, face detection is the first link, and its accuracy has a great impact on the performance of face recognition. Face detection needs to be robust, because in practical applications, face images are affected by many factors, such as lighting, occlusion, and posture changes. Face detection is the most frequently invoked in the face recognition process and needs to be able to be executed efficiently. Face detection technology mainly adopts features based on manual design, such as Haar feature, LBP (local binary mode histogram) feature, HOG (gradient direction histogram) feature, etc. The calculation time of these features is acceptable, in practical applications. In the prior art, the face detection calculation algorithm is more complicated, and thus the face detection efficiency is low.

Summary of the invention

Embodiments of the present invention provide an image processing method and a terminal, so as to quickly detect a face position.

A first aspect of the embodiments of the present invention provides an image processing method, including:

Get the image to be processed;

Calculating a number of layers of the feature pyramid of the image to be processed, to obtain an n layer, wherein n is an integer greater than or equal to 1;

Constructing the feature pyramid based on the n layers;

Performing feature extraction on the K preset detection windows to obtain the K group first target features, wherein each set of the preset detection windows corresponds to a set of first target features, the K An integer greater than or equal to 1;

Determining the K group second target feature according to the K group first target feature;

Determining the K target second target feature by using M designated decision trees, and obtaining the size and position of the target human face frame, wherein the M is an integer greater than or equal to 1.

With reference to the first aspect, in a first possible implementation manner of the first aspect, the calculating a number of layers of a feature pyramid of the to-be-processed image to obtain an n layer includes:

Calculating the number of layers of the feature pyramid according to the size of the image to be processed and the size of the preset face detection model, as shown in the following formula:

Where n is the number of layers of the feature pyramid, k _up is a multiple of the sampled image to be processed, w _img , h _img respectively representing the width and height of the image to be processed, w _m , h _m respectively The width and height of the preset face detection model, n _octave refers to the number of layers of the image between each of the two dimensions in the feature pyramid.

With reference to the first aspect, or the first possible implementation manner of the first aspect, in the second possible implementation manner of the first aspect, the constructing the feature pyramid based on the N layer includes:

Determining that the N layer comprises P real feature layers and Q approximate feature layers, wherein P is an integer greater than or equal to 1, and the Q is an integer greater than or equal to 0;

Performing feature extraction on the P real feature layers to obtain a third target feature;

Determining, according to the P real feature layers, a fourth target feature of the Q approximate feature layers;

The third target feature and the fourth target feature constitute the feature pyramid.

With reference to the first aspect or the first possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the determining the K target second target according to the K group first target feature Features, including:

Extracting color features from the K group first target features to obtain the K group color features;

Calculating a pixel comparison feature for the i-th color feature, training the first preset face model based on the calculated pixel comparison feature, and extracting the first target pixel comparison feature from the trained first preset face model, a fifth target feature, wherein the ith group color feature is any one of the K group color features;

Training a second preset face model by using the fifth target feature and the first target feature, and extracting a second pixel comparison feature from the trained second preset face model to obtain a sixth target feature;

Combining the first target feature and the sixth target feature into the second target feature.

With reference to the first aspect or the first possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, the determining, by using the M specified decision trees, the second target feature of the K group , get the size and position of the target face frame, including:

Determining, by the M specified decision trees, the K target second target features on the feature pyramid to obtain an X personal face frame, wherein the X is an integer greater than or equal to 1;

According to the X personal face frame, the size and position of the target face frame are merged.

A second aspect of the embodiments of the present invention provides a terminal, including:

An obtaining unit, configured to acquire an image to be processed;

a calculating unit, configured to calculate a number of layers of the feature pyramid of the image to be processed, to obtain an n layer, wherein n is an integer greater than or equal to 1;

a constructing unit configured to construct the feature pyramid based on the n layers;

An extracting unit, configured to perform feature extraction on the K preset detection windows on the feature pyramid to obtain the K group first target feature, wherein each set of the preset detection window corresponds to a group of first targets a characteristic, the K being an integer greater than or equal to 1;

a determining unit, configured to determine the K group second target feature according to the K group first target feature;

And a decision unit, configured to determine the size and location of the target face frame by using the M specified decision trees, where the M is an integer greater than or equal to 1.

With reference to the second aspect, in a first possible implementation manner of the second aspect, the calculating unit is specifically configured to:

In conjunction with the second aspect, or the first possible implementation of the second aspect, in the second possible implementation of the second aspect, the constructing unit includes:

a first determining module, configured to determine that the N layer includes P real feature layers and Q approximate feature layers, where P is an integer greater than or equal to 1, and the Q is an integer greater than or equal to 0;

a first extraction module, configured to perform feature extraction on the P real feature layers to obtain a third target feature;

a second determining module, configured to determine, according to the P real feature layers, the Q approximate feature layers Fourth target feature;

And a constructing module, configured to form the third target feature and the fourth target feature into the feature pyramid.

With reference to the second aspect or the first possible implementation manner of the second aspect, in the third possible implementation manner of the second aspect, the determining unit includes:

a second extraction module, configured to separately extract color features from the K group first target features to obtain the K group color features;

a first training module, configured to calculate a pixel comparison feature for the i-th color feature, train the first preset face model based on the calculated pixel comparison feature, and extract the first preset face model from the training a target pixel comparison feature to obtain a fifth target feature, wherein the ith group color feature is any one of the K group color features;

a second training module, configured to train a second preset face model by using the fifth target feature and the first target feature, and extract a second pixel comparison feature from the trained second preset face model , obtaining the sixth target feature;

And a combination module, configured to combine the first target feature and the sixth target feature into the second target feature.

With reference to the second aspect, or the first possible implementation manner of the second aspect, in the fourth possible implementation manner of the second aspect, the determining unit includes:

a decision module, configured to determine, by using the M specified decision trees, the second target feature of the K group on the feature pyramid, to obtain an X personal face frame, where the X is an integer greater than or equal to 1;

And a merging module, configured to merge the size and position of the target face frame according to the X personal face frame.

Embodiments of the present invention have the following beneficial effects:

According to the embodiment of the present invention, the image to be processed is acquired, the number of layers of the feature pyramid of the image to be processed is calculated, and n layers are obtained, where n is an integer greater than or equal to 1. Based on the n layer, the feature pyramid is constructed, and on the feature pyramid, Feature extraction is performed on K preset detection windows to obtain a first target feature of the K group, wherein each set of preset detection windows corresponds to a set of first target features, and K is an integer greater than or equal to 1, according to the K group first The target feature determines the second target feature of the K group, and uses the M specified decision trees to make a decision on the second target feature of the K group, and obtains the size and position of the target face frame, where M is an integer greater than or equal to 1. Thereby, the face position can be detected quickly.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are some embodiments of the present invention, Those skilled in the art can also obtain other drawings based on these drawings without paying any creative work.

1 is a schematic flowchart of an embodiment of an image processing method according to an embodiment of the present invention;

2a is a schematic structural diagram of a first embodiment of a terminal according to an embodiment of the present invention;

2b is a schematic structural diagram of a structural unit of the terminal depicted in FIG. 2a according to an embodiment of the present invention;

2c is a schematic structural diagram of a determining unit of the terminal depicted in FIG. 2a according to an embodiment of the present invention;

2d is a schematic structural diagram of a determining unit of the terminal depicted in FIG. 2a according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a second embodiment of a terminal according to an embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

The terms "first", "second", "third", and "fourth" and the like in the specification and claims of the present invention are used to distinguish different objects, and are not intended to describe a specific order. . Furthermore, the terms "comprises" and "comprising" and "comprising" are intended to cover a non-exclusive inclusion. For example, a process, method, system, product, or device that comprises a series of steps or units is not limited to the listed steps or units, but optionally also includes steps or units not listed, or alternatively Other steps or units inherent to these processes, methods, products or equipment.

References to "an embodiment" herein mean that a particular feature, structure, or characteristic described in connection with the embodiments can be included in at least one embodiment of the invention. The appearances of the phrases in various places in the specification are not necessarily referring to the same embodiments, and are not exclusive or alternative embodiments that are mutually exclusive. Those skilled in the art will understand and implicitly understand that the embodiments described herein can be combined with other embodiments.

The terminal described in the embodiment of the present invention may include a smart phone (such as an Android mobile phone, an iOS mobile phone, a Windows Phone mobile phone, etc.), a tablet computer, a palmtop computer, a notebook computer, and a mobile internet device. (MID, Mobile Internet Devices) or wearable devices, etc., the above terminals are merely examples, not exhaustive, including but not limited to the above terminals.

FIG. 1 is a schematic flowchart of an embodiment of an image processing method according to an embodiment of the present invention. The image processing method described in this embodiment includes the following steps:

101. Acquire an image to be processed.

The image to be processed is an image including a human face. Of course, the image to be processed includes at least one face.

Optionally, the terminal can acquire the original image. If the original image is a grayscale image, the image needs to be converted into an RGB image, that is, the grayscale information of the original image is copied to the R channel, the G channel, and the B channel. Of course, if the original image is a color image, if the original image is not an RGB image, it can be converted into an RGB image, and if the original image is an RGB image, it is directly taken as an image to be processed.

102. Calculate a number of layers of the feature pyramid of the image to be processed to obtain an n layer, where n is an integer greater than or equal to 1.

Optionally, calculating the number of layers of the feature pyramid of the image to be processed to obtain the n layer may be implemented as follows:

Where n is the number of layers of the feature pyramid, k _up is the multiple of the sampled image to be processed, w _img and h _img respectively represent the width and height of the image to be processed, and w _m and h _m respectively preset the width of the face detection model. And height, n _octave refers to the number of layers of the image between every two dimensions in the feature pyramid. Wherein, after the image to be processed is determined, the size thereof may be a known amount, and the size of the preset face model is also a known amount. The above k _up can be specified by the user, or the system defaults. The above n _octave can be specified by the user, or the system defaults.

Alternatively, when feature extraction is performed on the image to be processed, the obtained feature may form a feature pyramid. For example, a Laplacian pyramid transform is performed on an image to be processed to obtain a feature pyramid. However, the number of layers of the feature pyramid in the embodiment of the present invention is not specified by the user, but is calculated according to the size of the image to be processed and the size of the preset face detection model, and thus, the determined features of the image to be processed of different sizes are determined. The number of layers of the pyramid is different, so that the number of layers of the feature pyramid determined by the embodiment of the present invention is more Appropriate to the size of the image.

Certainly, at least one preset face detection model may be used in the embodiment of the present invention. When the number of preset face detection models is multiple, all preset face detection models may have the same size.

103. Construct the feature pyramid based on the N layer.

Optionally, the constructing the feature pyramid based on the N layer may include the following steps:

31) determining that the N layer includes P real feature layers and Q approximate feature layers, where P is an integer greater than or equal to 1, and the Q is an integer greater than or equal to 0;

32) performing feature extraction on the P real feature layers to obtain a third target feature;

33) determining, according to the P real feature layers, a fourth target feature of the Q approximate feature layers;

34) constituting the third target feature and the fourth target feature to form the feature pyramid.

It should be noted that, in the embodiment of the present invention, unlike the conventional face detection method, the conventional method generally calculates the feature pyramid of the image first, and then calculates the corresponding feature based on each layer image of the feature pyramid. In the present invention, only a small number of image layer features are calculated, which is called a real feature layer. The features of other layer images are based on real feature interpolation and are called approximate feature layers. The real feature layer in the pyramid is specified by the user or by default, and the other layers are approximate feature layers, which are obtained by interpolation of the real feature layer closest to the distance.

The feature extraction may be performed on the real feature layer in step 32, for example, extracting color features, gradient magnitude features, and direction histogram features. The color features can be RGB, LUV, HSV, GRAY, gradient magnitude features, and direction histogram features corresponding to a special form of HOG features, ie, the number of cells in the block is one. For example, the color feature, the gradient magnitude feature, and the direction histogram feature may be referred to the prior art, and details are not described herein again.

Wherein, in step 33, the feature of the approximate feature layer can be calculated based on the real feature layer. The approximate feature layer can be obtained by interpolation of the real feature layer. When interpolating, the feature value needs to be multiplied by a coefficient. The calculation method can refer to the following formula:

Where s refers to the ratio of the approximate feature layer to the real feature layer, and λ _Ω is constant for one feature, and the value of λ _Ω can be estimated in the following manner. Estimating, be replaced by a _k μs k _s,

among them

It refers to scaling the image I ⁱ by the scale s, f _μΩ (I) means finding the feature Ω for the image I, and averaging these features, and N refers to the number of pictures participating in the estimation. In the present invention, s is

Take 50,000 and find λ _Ω by least squares method.

104. Perform feature extraction on the K preset detection windows on the feature pyramid to obtain the K group first target feature, where each set of the preset detection window corresponds to a set of first target features. K is an integer greater than or equal to 1.

The preset detection window can be set by the system default or by the user. The preset detection window can include a window size and a window position. Feature extraction is performed on each preset detection window in the K preset detection windows, and a set of first target features are respectively obtained, so that the K target first target feature is obtained, and the K is an integer greater than or equal to 1.

Optionally, on the feature pyramid, the position of the preset detection window and the size of the window are fixed. In the feature extraction process, one step can be moved in the x and y directions each time.

105. Determine, according to the K group first target feature, the K group second target feature.

Optionally, determining, according to the K group first target feature, the second target feature of the K group, including:

51) extracting color features from the first target features of the K group to obtain the K group color features;

52) calculating a pixel comparison feature for the i-th color feature, training the first preset face model based on the calculated pixel comparison feature, and extracting the first target pixel from the trained first preset face model Feature, obtaining a fifth target feature, wherein the ith set of color features is any one of the K sets of color features;

53) training a second preset face model by using the fifth target feature and the first target feature, and extracting a second pixel comparison feature from the trained second preset face model to obtain a sixth Target feature

54) Combining the first target feature and the sixth target feature into the second target feature.

The method for extracting pixel comparison features in the above steps 52 and 53 may refer to the following formula:

Where I represents the image I, l _i , l _j are the pixel points at different positions in the image I, and I(l _i ), I(l _j ) respectively refer to the pixel values at the positions of l _i and l _j in the image I, and compare The pixel value of I(l _i ) and I(l _j ) can be obtained as a comparison feature of two pixels.

Optionally, in order to improve the robustness and globality of the comparison feature, the image to be processed can also be divided into The area bins that do not overlap each other, the size of the area is b × b, and the comparison feature in bins is defined as follows.

Wherein, l _i ∈bin _i , l _j ∈bin _j , f _cb refer to pixel comparison features of two different regions in the image to be processed. Using the above-mentioned color features, gradient magnitude features, and direction histogram features, the image to be processed is calculated pixel by pixel. Therefore, when the size of the model is fixed, it is not determined whether the feature is different because of the training process. Calculation. The comparison features are different and depend on the model training process. In order to better fuse color, gradient magnitude, direction histogram features and pixel comparison features.

First, the first preset face model is trained using only the pixel comparison feature, and the size of the first preset face model is n×n pixels. Then, when training, there are (n/b) ² × ((n/b) ² -1)/2 comparison features. Training is performed using the adaboost method, which has a depth of 5 and a number of 500.

Secondly, after the training, the pixel comparison features selected from the first preset face model will be greatly reduced, and the number of the pixel comparison features (ie, the fifth target feature) is controlled within 10000.

Then, the second preset face model is trained in combination using the fifth target feature and the first target feature (ie, color feature, gradient magnitude, and direction histogram feature). Still using the adaboost method for training, the depth of the decision tree is 5, the number is 500, and the second pixel comparison feature is extracted from the trained second preset face model to obtain the sixth target feature;

Finally, the first target feature and the sixth target feature are combined into a second target feature.

Therefore, the present invention combines the use of the fused multi-channel feature and the pixel comparison feature, overcomes the problem that the position of the face frame is inaccurate when only the fused multi-channel feature is used, and further improves the detection rate of the face in the case of backlighting.

106. Determine, by using the M specified decision trees, the second target feature of the K group, and obtain a size and a position of the target face frame, where the M is an integer greater than or equal to 1.

The embodiment of the present invention may adopt M designated decision trees, where M is an integer greater than or equal to 1, and the specified decision tree sends a second target feature in the preset detection window to make a decision on the second target feature. Get the score and accumulate the score. If the score is below a certain threshold, the window will be directly eliminated. If the score is higher than the threshold, continue to classify on the next decision tree, obtain the score and accumulate the score until all the decision trees are traversed, and convert the position coordinate, width and height information of the window to the image to be processed and output the face. Box, including the position and size of the face frame. For example, after detecting one window, you can go to 1.5 to perform the next window detection until you have traversed all the layers of the feature pyramid, so you can get the last All the faces of the faces are merged, so that the target face frame is obtained, and then the position and size of the target face frame are determined. In this way, face recognition can be further performed on the basis of recognizing the face.

Optionally, the determining, by using the M specified decision trees, the second target feature of the K group, and determining the size and location of the target face frame, including:

61), on the feature pyramid, using the M specified decision trees to make a decision on the K target second target feature, to obtain an X personal face frame, wherein the X is an integer greater than or equal to 1;

62), according to the X personal face frame merged into the size and position of the target face frame.

Among them, step 61

In step 62, the terminal may merge the face frames with overlapping positions by using a Non-Maximum Suppression (NMS) algorithm to output a final face frame.

It can be seen that, by using the embodiment of the present invention, the image to be processed is obtained, the number of layers of the feature pyramid of the image to be processed is calculated, and n layers are obtained, where n is an integer greater than or equal to 1, and the feature pyramid is constructed based on the n layer. On the feature pyramid, feature extraction is performed on K preset detection windows to obtain a first target feature of the K group, wherein each set of preset detection windows corresponds to a set of first target features, and K is an integer greater than or equal to 1, according to The first target feature of the K group determines the second target feature of the K group, and the M target decision tree is used to determine the second target feature of the K group, and the size and position of the target face frame are obtained, where M is an integer greater than or equal to 1. . Thereby, the face position can be detected quickly.

Consistent to the above, the following is an apparatus for implementing the above image processing method, as follows:

FIG. 2 is a schematic structural diagram of a first embodiment of a terminal according to an embodiment of the present invention. The terminal described in this embodiment includes: an obtaining unit 201, a calculating unit 202, a constructing unit 203, an extracting unit 204, a determining unit 205, and a determining unit 206, as follows:

An obtaining unit 201, configured to acquire an image to be processed;

The calculating unit 202 is configured to calculate a number of layers of the feature pyramid of the image to be processed, to obtain an n layer, where n is an integer greater than or equal to 1;

The constructing unit 203 is configured to construct the feature pyramid based on the n layer;

The extracting unit 204 is configured to perform feature extraction on the K preset detection windows on the feature pyramid to obtain the K group first target feature, where each group of the preset detection windows corresponds to a group of first a target feature, the K being an integer greater than or equal to 1;

a determining unit 205, configured to determine, according to the K group first target feature, the K group second target Levy

The determining unit 206 is configured to determine the size and location of the target face frame by using the M specified decision trees to obtain the size and position of the target face frame, where the M is an integer greater than or equal to 1.

Optionally, the calculating unit 202 is specifically configured to:

Optionally, the configuration unit of the terminal as described in FIG. 2b and FIG. 2b may include: a first determining module 2031, a first extracting module 2032, a second determining module 2033, and a constructing module 2034, as follows:

The first determining module 2031 is configured to determine that the N layer includes P real feature layers and Q approximate feature layers, where P is an integer greater than or equal to 1, and the Q is an integer greater than or equal to 0;

a first extraction module 2032, configured to perform feature extraction on the P real feature layers to obtain a third target feature;

a second determining module 2033, configured to determine, according to the P real feature layers, a fourth target feature of the Q approximate feature layers;

The constructing module 2034 is configured to form the third target feature and the fourth target feature to form the feature pyramid.

Optionally, the determining unit 205 of the terminal, as described in FIG. 2c and FIG. 2c, may include: a second extracting module 2051, a first training module 2052, a second training module 2053, and a combining module 2054, as follows:

a second extraction module 2051, configured to separately extract color features from the K group first target features to obtain the K group color features;

The first training module 2052 is configured to calculate a pixel comparison feature for the i-th color feature, train the first preset face model based on the calculated pixel comparison feature, and extract the first preset face model from the training The first target pixel compares the feature to obtain a fifth target feature, wherein the ith set of color features is any one of the K sets of color features;

a second training module 2053, configured to train a second preset face model by using the fifth target feature and the first target feature, and extracting a second pixel comparison from the trained second preset face model Feature, obtaining a sixth target feature;

The combining module 2054 is configured to combine the first target feature and the sixth target feature into the second target feature.

Optionally, the decision unit 206 of the terminal as described in FIG. 2d and FIG. 2a may include: a decision module 2061 and a merge module 2062, as follows:

The decision module 2061 is configured to determine, by using the M specified decision trees, the second target feature of the K group on the feature pyramid to obtain an X personal face frame, where the X is an integer greater than or equal to 1;

The merging module 2062 is configured to merge the size and position of the target face frame according to the X personal face frame.

It can be seen that, by using the terminal described in the embodiment of the present invention, the image to be processed is acquired, and the number of layers of the feature pyramid of the image to be processed is calculated to obtain an n layer, where n is an integer greater than or equal to 1, and the n layer is constructed. Feature pyramid, on the feature pyramid, feature extraction of K preset detection windows to obtain K group first target features, wherein each set of preset detection windows corresponds to a set of first target features, K is greater than or equal to 1 The integer is determined according to the first target feature of the K group, and the second target feature of the K group is determined by the M specified decision trees, and the size and position of the target face frame are obtained, wherein M is greater than or An integer equal to 1. Thereby, the face position can be detected quickly.

With reference to FIG. 3, it is a schematic structural diagram of a second embodiment of a terminal according to an embodiment of the present invention. The terminal described in this embodiment includes: at least one input device 1000; at least one output device 2000; at least one processor 3000, such as a CPU; and a memory 4000, the input device 1000, the output device 2000, the processor 3000, and the memory 4000 is connected via bus 5000.

The input device 1000 may be a touch panel, a physical button, or a mouse.

The output device 2000 described above may specifically be a display screen.

The above memory 4000 may be a high speed RAM memory or a non-volatile memory such as a magnetic disk memory. The above memory 4000 is used to store a set of program codes, and the input device 1000, the output device 2000, and the processor 3000 are used to call the memory 4000. In the program code stored in, do the following:

The processor 3000 is configured to:

Get the image to be processed;

Constructing the feature pyramid based on the n layers;

Optionally, the processor 3000 calculates the number of layers of the feature pyramid of the image to be processed, and obtains n layers, including:

Optionally, the foregoing processor 3000 constructs the feature pyramid based on the N layer, including:

Optionally, the processor 3000 determines, according to the K group first target feature, the K group second target feature, including:

Optionally, the processor 3000 determines, by using the M specified decision trees, the second target feature of the K group, and obtains the size and location of the target face frame, including:

The embodiment of the present invention further provides a computer storage medium, wherein the computer storage medium can store a program, and the program includes some or all of the steps of any one of the image processing methods described in the foregoing method embodiments.

Although the present invention has been described herein in connection with the embodiments of the present invention, it will be understood by those skilled in the <RTIgt; Other variations of the disclosed embodiments are achieved. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill several of the functions recited in the claims. Certain measures are recited in mutually different dependent claims, but this does not mean that the measures are not combined to produce a good effect.

Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, apparatus (device), or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code. The computer program is stored/distributed in a suitable medium, provided with other hardware or as part of the hardware, or in other distributed forms, such as over the Internet or other wired or wireless telecommunication systems.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of the methods, apparatus, and computer program products of the embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

While the invention has been described with respect to the specific embodiments and embodiments thereof, various modifications and combinations may be made without departing from the spirit and scope of the invention. Accordingly, the specification and drawings are to be construed as the It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and modifications of the invention

Claims

An image processing method, comprising:

Get the image to be processed;

Calculating a number of layers of the feature pyramid of the image to be processed, to obtain an n layer, wherein n is an integer greater than or equal to 1;

Constructing the feature pyramid based on the n layers;

Performing feature extraction on the K preset detection windows to obtain the K group first target features, wherein each set of the preset detection windows corresponds to a set of first target features, the K An integer greater than or equal to 1;

Determining the K group second target feature according to the K group first target feature;

Determining the K target second target feature by using M designated decision trees, and obtaining the size and position of the target human face frame, wherein the M is an integer greater than or equal to 1.
The method according to claim 1, wherein the calculating the number of layers of the feature pyramid of the image to be processed to obtain the n layer comprises:

Calculating the number of layers of the feature pyramid according to the size of the image to be processed and the size of the preset face detection model, as shown in the following formula:

Where n is the number of layers of the feature pyramid, k up is a multiple of the sampled image to be processed, w img , h img respectively representing the width and height of the image to be processed, w m , h m respectively The width and height of the preset face detection model, n octave refers to the number of layers of the image between each of the two dimensions in the feature pyramid.
The method according to any one of claims 1 or 2, wherein the constructing the feature pyramid based on the N layer comprises:

Determining that the N layer comprises P real feature layers and Q approximate feature layers, wherein P is an integer greater than or equal to 1, and the Q is an integer greater than or equal to 0;

Performing feature extraction on the P real feature layers to obtain a third target feature;

Determining, according to the P real feature layers, a fourth target feature of the Q approximate feature layers;

The third target feature and the fourth target feature constitute the feature pyramid.
The method according to any one of claims 1 to 2, wherein the determining the K target second target feature according to the K group first target feature comprises:

Extracting color features from the K group first target features to obtain the K group color features;

Calculating a pixel comparison feature for the i-th color feature, training the first preset face model based on the calculated pixel comparison feature, and extracting the first target pixel comparison feature from the trained first preset face model, a fifth target feature, wherein the ith group color feature is any one of the K group color features;

Training a second preset face model by using the fifth target feature and the first target feature, and extracting a second pixel comparison feature from the trained second preset face model to obtain a sixth target feature;

Combining the first target feature and the sixth target feature into the second target feature.
The method according to any one of claims 1 to 2, wherein the determining, by using the M designated decision trees, the second target feature of the K group to obtain the size and position of the target face frame, including:

Determining, by the M specified decision trees, the K target second target features on the feature pyramid to obtain an X personal face frame, wherein the X is an integer greater than or equal to 1;

According to the X personal face frame, the size and position of the target face frame are merged.
A terminal, comprising:

An obtaining unit, configured to acquire an image to be processed;

a calculating unit, configured to calculate a number of layers of the feature pyramid of the image to be processed, to obtain an n layer, wherein n is an integer greater than or equal to 1;

a constructing unit configured to construct the feature pyramid based on the n layers;

An extracting unit, configured to perform feature extraction on the K preset detection windows on the feature pyramid to obtain the K group first target feature, wherein each set of the preset detection window corresponds to a group of first targets a characteristic, the K being an integer greater than or equal to 1;

a determining unit, configured to determine the K group second target feature according to the K group first target feature;

And a decision unit, configured to determine the size and location of the target face frame by using the M specified decision trees, where the M is an integer greater than or equal to 1.
The terminal according to claim 6, wherein the calculating unit is specifically configured to:

Calculating the number of layers of the feature pyramid according to the size of the image to be processed and the size of the preset face detection model, as shown in the following formula:

Where n is the number of layers of the feature pyramid, k up is a multiple of the sampled image to be processed, w img , h img respectively representing the width and height of the image to be processed, w m , h m respectively The width and height of the preset face detection model, n octave refers to the number of layers of the image between each of the two dimensions in the feature pyramid.
The terminal according to any one of claims 6 or 7, wherein the construction unit comprises:

a first determining module, configured to determine that the N layer includes P real feature layers and Q approximate feature layers, where P is an integer greater than or equal to 1, and the Q is an integer greater than or equal to 0;

a first extraction module, configured to perform feature extraction on the P real feature layers to obtain a third target feature;

a second determining module, configured to determine, according to the P real feature layers, a fourth target feature of the Q approximate feature layers;

And a constructing module, configured to form the third target feature and the fourth target feature into the feature pyramid.
The terminal according to any one of claims 6 or 7, wherein the determining unit comprises:

a second extraction module, configured to separately extract color features from the K group first target features to obtain the K group color features;

a first training module, configured to calculate a pixel comparison feature for the i-th color feature, train the first preset face model based on the calculated pixel comparison feature, and extract the first preset face model from the training a target pixel comparison feature to obtain a fifth target feature, wherein the ith group of color features is Any one of the K sets of color features;

a second training module, configured to train a second preset face model by using the fifth target feature and the first target feature, and extract a second pixel comparison feature from the trained second preset face model , obtaining the sixth target feature;

And a combination module, configured to combine the first target feature and the sixth target feature into the second target feature.
The terminal according to any one of claims 6 or 7, wherein the decision unit comprises:

a decision module, configured to determine, by using the M specified decision trees, the second target feature of the K group on the feature pyramid, to obtain an X personal face frame, where the X is an integer greater than or equal to 1;

And a merging module, configured to merge the size and position of the target face frame according to the X personal face frame.