CN100561501C

CN100561501C - A kind of image detecting method and device

Info

Publication number: CN100561501C
Application number: CNB2007101797868A
Authority: CN
Inventors: 邓亚峰; 黄英; 王浩; 邱嵩; 霍晓芳; 温小勇; 俞青; 邓中翰
Original assignee: Vimicro Corp
Current assignee: Vimicro Corp
Priority date: 2007-12-18
Filing date: 2007-12-18
Publication date: 2009-11-18
Anticipated expiration: 2027-12-18
Also published as: CN101183428A

Abstract

The invention discloses a kind of image detecting method and device, in order to a kind of processing speed image detecting technique faster to be provided.A kind of image detecting method that the present invention proposes comprises: the integral image of calculating input image and square integral image; When the line number of the described integral image that has calculated during more than or equal to the height of object detection device model, according to described integral image and square integral image, adopt described object detection device that the candidate frame position that is in the described integral image scope that has calculated is verified; According to candidate frame position, determine the object space on the described input picture by checking.The present invention is used for image detection, makes the speed that the object in the image is detected be improved.

Description

Image detection method and device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image detection method and an image detection device.

Background

In the technical fields of computer vision and image processing, obtaining face information in images or videos has important application in fields such as human-computer interaction, safety, entertainment and the like. Therefore, a technique for automatically acquiring the number, size, and position information of faces from an image, that is, a face detection technique, has been greatly emphasized. In recent years, with the development of computer vision and pattern recognition technology, face detection technology has also been rapidly developed and gradually matured.

Voila et al propose a face detection technique based on a microstructure feature (Haar-like Features) and a hierarchical adaptive enhancement (Adaboost) classifier, which is equivalent to a method based on a vector machine (SVM) and a Neural Network (Neural Network) in performance, but is much higher in speed than the method based on the vector machine and the Neural Network, and can basically reach the level of real-time operation. After the method is proposed, the method is paid attention by researchers, and a plurality of improved technologies are proposed and applied to a plurality of products in the industry.

The reason that the face detection method proposed by Viola is fast is mainly two points, and firstly, the microstructure characteristic value of the input Image can be fast calculated by adopting a method based on an Integral Image (Integral Image) to calculate the microstructure characteristic value; secondly, because a hierarchical Adaboost algorithm is adopted, the algorithm firstly adopts a layer with small operand to reject most of the interference which is easy to eliminate, and then adopts a layer with large operand to process a small amount of candidate interference. The microstructure characteristics adopted in the method are shown in fig. 1, and each microstructure characteristic value is defined as the difference between the luminance (i.e. the gray value) of a pixel in a gray rectangular region and the luminance sum of pixels in a white rectangular region.

For fast calculation of microstructure feature values, Viola proposes an integral image as shown in fig. 2, where the value of the integral image at point (x, y) is defined as the sum of all pixel gray values in the gray rectangular area in the upper left corner, i.e.:

where II (x, y) represents the value of the integral image at point (x, y) and I (x ', y') represents the pixel grey value of the input image at point (x ', y'). Viola scans the image once from the upper left gray rectangular area to get the integral image in an iterative manner as follows:

s(x，y)＝s(x，y-1)+I(x，y)

II(x，y)＝II(x-1，y)+s(x，y)

where s (x, y) represents the sum of all pixel gray values of x rows preceding (including y) y columns, and defines s (x, -1) ═ 0 and II (-1, y) ═ 0.

The sum of pixel gray values of any rectangular area can be quickly solved by adopting the integral image. The sum of the pixel gray values of the rectangular region r is denoted by sum (r). As shown in fig. 3, according to the definition of the integral image, the following formula can be used:

sum(D)＝II(4)-II(2)-II(3)+II(1)

the sum of the pixel gray values in any rectangular region D is calculated (A, B, C, D represents a shaded rectangular region respectively, and points 1, 2, 3, and 4 correspond to the vertices of the lower right corners of regions a, B, C, and D respectively).

To eliminate interference from conditions such as light, Viola further normalizes the microstructure feature values using the image brightness variance (which may also be referred to as a normalization parameter). Viola defines the image brightness variance as:

wherein,

i (I, j) is the luminance value at point (I, j) on the image, and N is the number of pixels in the image, for the image luminance mean. The image brightness variance may be expressed by the formula:

calculating, and defining the feature value of the microstructure after normalization as g_j＝f_jσ, wherein f_jFor the microstructure characteristic value defined above, i.e. the sum of the luminance of the pixels in the gray rectangular areaThe difference between the pixel luminance sums in the white rectangular area.

Viola uses a tree classifier that constructs the simplest structure for each microstructure feature as a weak classifier, as follows:

where x is the input image of fixed scale, g_j(x) Represents the j-th microstructure characteristic value theta corresponding to the image_jIs the decision threshold, p, corresponding to the jth microstructure feature_jHas a value of 1 or-1, when p is_jWhen the number is 1, the decision sign of the decision device is greater than the number, when p_jWhen the symbol is-1, the symbol of the decision device is less than number h_j(x) Representing the decision output of the jth weak classifier. Thus, each weak classifier only needs one threshold comparison to complete the decision.

The structure of the hierarchical Adaboost classifier proposed by Viola is shown in fig. 4, where the face detector is composed of a plurality of strong classifiers, each strong classifier is composed of a plurality of weak classifiers, and each weak classifier corresponds to a microstructure feature. And for a certain candidate frame, judging by adopting a first-layer classifier, if the certain candidate frame can pass through the first-layer classifier, continuing to judge by adopting a second-layer classifier, and if not, directly rejecting. And similarly, carrying out subsequent layers of processing, and processing the next candidate frame after one candidate frame is processed. Finally, the candidate box that can be processed by all classifiers is considered as a face region.

In order to be able to detect faces of different sizes and different positions, Viola is processed in a manner based on feature scaling. Firstly, the width and height of a face detector model are respectively set as MW and MH (MW 24 and MH 24 are adopted by Viola), and a face sample and a non-face sample which are cut and scaled to the size are adopted to train a hierarchical AdaBoost face detection model. Assuming that the scaling ratio is SR, a series of different scaling results by adopting a characteristic scaling modeThe width and height of the scaled classifier are ROUND (MW SR) respectively^s) And ROUND (MH SR)^s). Where s is an integer greater than 0, ROUND () denotes rounding the value in parentheses. In order to detect faces of different sizes, integral images are calculated for input images once, then traversal search is respectively carried out by adopting the obtained face detectors of different scales, so that faces of different sizes and different positions are detected, and all candidate rectangles passing through the hierarchical detector are added into a face detection queue to be recorded.

Considering that a face may correspond to multiple detection results due to scale and displacement changes, a general face detection algorithm adopts post-processing steps to fuse the detection results, so that only one detection result is output at one face position. Meanwhile, some false detection results can be merged through fusion, so that the false detection rate is reduced. Referring to fig. 5, a schematic flow chart for detecting a face region in an image is proposed for Viola. Fig. 6 shows the specific steps of verifying the candidate box in step S503. Wherein the candidate frames verified by the strong classifiers of all layers in step S607 are considered as face frames, and step S504 adds such candidate frames to the candidate queue.

Although the face detection method proposed by Viola has many advantages, the modules for calculating the integral image and the square integral image share the memory of the integral image and the memory of the square integral image between the modules for verifying the candidate frame, and in the time sequence, the step of verifying the candidate frame must be executed only after the integral image and the square integral image of all points in the candidate frame are calculated, so the processing speed of the method is relatively slow.

Disclosure of Invention

The embodiment of the invention provides an image detection method and device, which are used for providing an image detection technology with higher processing speed.

The image detection method provided by the embodiment of the invention comprises the following steps:

monitoring the number of lines of the integral image obtained by current calculation in the process of parallel calculation of the integral image and the square integral image of the input image; when the number of rows is greater than or equal to the height of the object detector model,

meanwhile, the integral image and the square integral image are subjected to operation processing to obtain a normalization parameter;

for each weak classifier of each layer of classifier of the object detector, calculating the brightness sum difference of two rectangular areas, and obtaining the microstructure characteristic value corresponding to each weak classifier of each layer of classifier through the normalization parameter;

comparing the microstructure characteristic values corresponding to the weak classifiers of each layer of classifier with a preset threshold value to judge whether the microstructure characteristic values are effective or not;

weighting and adding the effective microstructure characteristic values of each layer of classifier, and comparing the added result with a preset threshold value to judge whether the candidate frame passes the verification of the layer of classifier; for each layer of classifier, when one layer of classifier finishes the judgment processing of one candidate frame, continuing to judge the next candidate frame;

determining candidate frames verified by all the classifiers as verified candidate frame positions;

and determining the position of the object on the input image according to the verified candidate frame position.

The image detection device provided by the embodiment of the invention comprises:

an integral image unit for calculating an integral image and a square integral image of an input image in parallel;

the verification unit is used for monitoring the line number of the integral image obtained by the integral image unit through current calculation, and when the line number is larger than or equal to the height of the object detector model, the integral image and the square integral image are subjected to operation processing simultaneously to obtain a normalization parameter; for each weak classifier of each layer of classifier of the object detector, calculating the brightness sum difference of two rectangular areas, and obtaining the microstructure characteristic value corresponding to each weak classifier of each layer of classifier through the normalization parameter; comparing the microstructure characteristic values corresponding to the weak classifiers of each layer of classifier with a preset threshold value to judge whether the microstructure characteristic values are effective or not; weighting and adding the effective microstructure characteristic values of each layer of classifier, and comparing the added result with a preset threshold value to judge whether the candidate frame passes the verification of the layer of classifier; for each layer of classifier, when one layer of classifier finishes the judgment processing of one candidate frame, continuing to judge the next candidate frame; determining candidate frames verified by all the classifiers as verified candidate frame positions;

and the determining unit is used for determining the position of the object on the input image according to the verified candidate frame position.

According to the embodiment of the invention, when the line number of the integral image of the input image obtained by calculation is larger than or equal to the height of an object detector model, the object detector is adopted to verify the position of a candidate frame in the range of the integral image obtained by calculation according to the integral image and the square integral image; and determining the position of the object on the input image according to the verified candidate frame position. By the technical scheme, the step of verifying the candidate frame is avoided from being executed only after the integral images and the square integral images of all the points in the candidate frame are calculated, so that the speed of detecting the object in the image is improved.

Drawings

FIG. 1 is a schematic diagram of microstructure features employed by the prior art face detection technique proposed by Viola et al;

FIG. 2 is a diagram of an integral image in the prior art;

FIG. 3 is a schematic diagram of a prior art method for calculating a gray sum of any rectangular pixel by using an integral image, where points 1, 2, 3, and 4 are respectively the top points of the lower right corner of the region A, B, C, D;

FIG. 4 is a schematic diagram of a hierarchical face detector in the prior art;

FIG. 5 is a schematic flow chart of a prior art face detection method proposed by Viola et al;

FIG. 6 is a flow chart illustrating the validation of all possible rectangular boxes to determine candidate boxes as taught by Viola et al in the prior art;

FIG. 7 is a flowchart illustrating an image detection method according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating parallel processing of calculating an integral image and a square integral image and verifying whether a candidate frame passes through each layer of classifier according to an embodiment of the present invention;

FIG. 9 is a flowchart illustrating a parallel determination of whether a candidate box can pass through a current-layer classifier according to an embodiment of the present invention;

fig. 10 is a schematic flow chart illustrating a process of determining whether a microstructure feature is valid according to an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides an image detection method and device, which improve the prior art from the aspects of the calculation mode of an integral image and a square integral image, the verification mode of a candidate frame and the like so as to improve the processing speed of image detection.

The following detailed description of embodiments of the invention refers to the accompanying drawings.

Referring to fig. 7, an image detection method provided in an embodiment of the present invention includes:

s701, calculating a partial integral image and a square integral image of the input image, wherein the integral image and/or the square integral image are obtained by calculating the brightness sum of all pixels from the 1 st row pixel to the current pixel of the input image according to the sequence from top to bottom and from left to right, and the calculation of the integral image and the square integral image is synchronously performed.

S702, judging whether the height of a certain object detector model in the object detectors with all scales is less than or equal to the number of lines of the integrated image obtained by calculation, if so, executing the step S703; otherwise, step S701 is executed to continue calculating the integral image and the square integral image of the next line of images.

In the embodiment of the present invention, the object detector may be a human face detector, or may be a detector for detecting a position of another object in the image, for example, a detector for detecting a human body, an automobile, or the like.

And S703, verifying the position of the candidate frame in the range of the calculated integral image by adopting an object detector according to the integral image and the square integral image.

And S704, adding the verified candidate frame position information into a candidate queue, and continuously adopting an object detector of the next scale for detection.

S705, merging the position information of all overlapped candidate frames in the candidate queue, and determining the position of the object on the input image.

In step S701, the integral image and the square integral image are mutually independent in operation and have no temporal precedence relationship, so that parallel processing can be performed, and the luminance value of each pixel point of each line of image is obtained by reading or calculating for the input image of each line; and carrying out parallel iterative computation on the integral image and the square integral image to obtain the integral image and the square integral image corresponding to the line image.

The parallel processing mentioned in the embodiments of the present invention generally refers to providing independent operation units for each part and performing operations of each part simultaneously in designing applications such as chips. Specifically, how many arithmetic units are provided and how to process simultaneously can be realized by those skilled in the art without creative labor.

In the prior art, only one set of arithmetic unit is generally provided, and the square integral image is calculated after the integral image is calculated, so that the processing time is longer than that of the scheme provided by the embodiment of the invention.

In order to improve the processing speed, after the integral image and the square integral image of a part are calculated, the embodiment of the invention starts to verify whether the candidate frame completely in the range of the calculated integral image is an object frame, namely, the verification process of the candidate frame and the calculation process of the integral image and the square integral image are parallel, rather than waiting for the integral image and the square integral image of the input image to be completely calculated, and then performing the verification process of the candidate frame.

For example, with THEIGHT_nAnd (3) representing the height of the object detector of the nth scale, and after the integral image and the square integral image of the k lines of the input image are calculated, judging whether the height of any object detector in the object detectors of all scales meets the formula:

THeight_n≤k

if the height of some object detector meets the formula, when the integral image and the square integral image of each line below the input image are calculated, the ordinate of the lower frame is verified to be k and the abscissa is verified to be 0 to W-THEIGHT_nWidth and height are THeight_nWhether all candidate rectangular frames are object frames (such as face frames or body frames). For the same reason, wait for k + delta_nAfter the integral image and the square integral image of the line are calculated, the integral image and the square product of each line below the input image are continuously calculatedWhen dividing the image, verifying that the ordinate of the lower frame is k + delta_nWith the abscissa of 0 to W-THEight_nWhether all candidate rectangular frames of (1) are object frames.

Therefore, the verification process of each candidate frame with the bottom in different rows and the calculation process of the integral image and the square integral image realize parallel processing.

Of course, other methods may be used, for example, each time the integral image and the square integral image of a line are calculated, a determination is made to determine whether the equation THEIGHT is satisfied_nAnd d, if the object detectors are satisfied, performing a verification process of the candidate box.

In step S703, when verifying whether the candidate frame position is an object frame, it is necessary to determine the candidate frame position step by step using a hierarchical classifier. Preferably, step S703 specifically includes:

weighting and adding the effective microstructure characteristic values of each layer of classifier, and comparing the added result with a preset threshold value to judge whether the candidate frame passes the verification of the layer of classifier;

and determining the candidate box verified by all the classifiers as the position of the verified candidate box.

The processing of each layer of classifier has a sequence, but the embodiment of the invention adopts a pipeline structure to process each candidate frame so as to improve the speed of verifying the candidate frame, and the method specifically comprises the following steps:

and each layer of classifier of the hierarchical classifier is provided with a set of independent operation units for carrying out pipeline processing on different candidate frames.

For example, the 1 st candidate frame occupies the 0 th layer operation unit first, when the 0 th layer operation unit finishes processing the 1 st candidate frame, the 2 nd candidate frame starts to occupy the 0 th layer operation unit, and when the 0 th layer operation unit finishes processing the 2 nd candidate frame, the 3 rd candidate frame starts to occupy the 0 th layer operation unit. Similarly, the 1 st candidate frame determined by the 0 th layer arithmetic unit occupies the 1 st layer arithmetic unit, and after the processing is completed, the next candidate frame determined by the 0 th layer arithmetic unit occupies the 1 st layer arithmetic unit. Therefore, a set of operation units of Sn × CascNum is required in total, where Sn denotes the total number of object detectors of all scales, and CascNum denotes the total number of layers of the classifier. However, in practical applications, this may be partly used if it is considered that too many hardware resources are required to do so. For example, considering that there are more candidate frames to be processed by the classifiers in the previous layers and fewer candidate frames to be processed by the classifiers in the subsequent layers, more operation units may be allocated to the classifiers in the previous layers and fewer operation units may be allocated to the classifiers in the subsequent layers.

Preferably, a corresponding candidate box data structure queue (FIFO) is also provided for each layer of classifier, and is used to record coordinate information of the candidate box, for example, including the left coordinate, the top coordinate of the candidate box, the sequence number of the scale where the candidate box is located, and the normalization parameter (stdev). And the judging module of each layer of classifier reads the coordinate information of the candidate frame in the corresponding FIFO, acquires the classifier parameters of the corresponding scale according to the scale serial number and judges the candidate frame.

The processing mode of the 0 th layer of classifier is slightly different from that of the classifiers of the subsequent layers, the normalization parameters are required to be obtained in the 0 th layer of classifier, and are recorded in the FIFO of the 0 th layer and sequentially transmitted to the FIFOs of the subsequent layers, so that the subsequent classifiers can be used conveniently.

Preferably, for convenience of processing, the step size of the object detector of all scales is defined, for example, the step size of the object detector of all scales is 2 pixels regardless of the horizontal direction or the vertical direction.

The following describes the parallel processing steps of calculating the integral image and the square integral image and verifying whether the candidate frame passes through each layer of classifier, taking step size equal to 2 as an example, please refer to fig. 8. And (3) assuming that the integral image and the square integral image of the 2k +1 th line are calculated currently, further judging whether the height of the object detector of a certain scale is less than or equal to 2k +1, and if so, adding the bottom edge ordinate of the object detector of the certain scale to the FIFO of the 0 th-layer classifier in all candidate frames of the current line. In particular, as follows, for the left-hand box abscissa (assumed to be i) of all possible candidate boxes, starting from 0, the step size is delta_nUp to maxx, where maxx ═ W-TWidth_nThe current candidate frame R (i, 2k + 1-THEIGHT)_n，TWidth_n，THeight_n) Added to the FIFO of the layer 0 classifier. Where i denotes the left frame abscissa of the candidate frame, 2k +1-THEIGHT_nRepresenting the upper vertical coordinate, TWidth, of the candidate box_nRepresenting width of candidate box, THEIGHT_nIndicating a high for the candidate box.

In the specific step of verifying the candidate box, it is assumed that in the object detector of the current scale, the current-layer classifier totally contains weakNum_stageOrderA microstructure feature, among which weakNum_stageOrderThe number of weak classifiers per layer. These microstructure features are independent of each other and only share the integral image memory and normalization parameters. Therefore, in order to further increase the verification speed, a parallel processing method may be adopted to calculate different microstructure feature values in parallel, and then sum up after the calculation is completed, as shown in fig. 9.

Further, specifically, when calculating a microstructure feature, the sum of the luminances of two rectangular regions may be calculated in parallel, as shown in fig. 10. Preferably, a hardware unit may be provided for calculating the sum of the intensities of the rectangular areas.

Furthermore, in the aspect of calculating the normalization parameter, parallel processing can be performed, and meanwhile, correlation operation is performed on the integral image and the square integral image to obtain the normalization parameter.

Preferably, the step S704 of adding the candidate box position information to the candidate queue specifically includes:

judging whether the candidate frame to be added is similar to the added candidate frame or not according to the size and the position of the candidate frame to be added and the size and the position of the candidate frame added into the candidate queue, if so, merging the similar candidate frames, and taking the number of the merged candidate frames as the confidence coefficient of the merged candidate frame; otherwise, adding the candidate frame to be added into the candidate queue.

Preferably, the step S705 of merging the position information of all overlapped candidate frames in the candidate queue, and the step of determining the position of the object on the input image specifically includes:

deleting the candidate frame with lower confidence coefficient when one candidate frame in the candidate queue is contained in another candidate frame; when the confidence degrees are the same, deleting the candidate frame with a smaller area;

and determining the positions of the remaining candidate frames in the candidate queue after the merging and deleting process as the positions of the objects on the input image.

and the integral image unit is used for calculating an integral image and a square integral image of the input image.

And the verification unit is used for verifying the candidate frame position within the range of the integral image obtained by calculation by adopting the object detector according to the integral image and the square integral image when the line number of the integral image obtained by calculation is greater than or equal to the height of the object detector model.

In the field of detecting objects in images, face detection is a sub-field of object detection, and other applications such as automobile detection and pedestrian detection are similar to face detection technologies, and both belong to two types of classification technologies in the field of pattern recognition. Therefore, the scheme provided by the embodiment of the invention can be suitable for detecting the face area in the image and can also be applied to detecting the areas occupied by other types of objects in the image according to actual needs. For example, the area where a car is located in the image may be detected, the area where each human or animal is located in the image may be detected, and the like.

In summary, the present invention achieves parallel processing in several aspects, such as the integral image, the square integral image, and the verification candidate frame, from the viewpoint of increasing the image detection speed, so as to achieve the purpose of increasing the image detection speed.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An image detection method, characterized in that the method comprises:

2. The method of claim 1, wherein the integral image and the square integral image are computed simultaneously.

3. The method of claim 1, wherein the microstructure feature values for each weak classifier in each layer of classifiers are calculated simultaneously.

4. The method according to claim 1 or 2, wherein the sum of the luminance of the two rectangular areas is calculated simultaneously.

5. The method of claim 1, wherein determining the location of the object on the input image based on the validated candidate box locations comprises:

adding the verified candidate frame position into a preset candidate queue;

determining a position of an object on the input image from the candidate queue.

6. The method of claim 5, wherein the step of adding the candidate box to a candidate queue comprises:

7. The method of claim 6, wherein the step of determining the object position of the input image from the candidate queue comprises:

8. An image sensing apparatus, comprising: