US7542591B2 - Target object detecting method, apparatus, and program - Google Patents
Target object detecting method, apparatus, and program Download PDFInfo
- Publication number
- US7542591B2 US7542591B2 US11/067,223 US6722305A US7542591B2 US 7542591 B2 US7542591 B2 US 7542591B2 US 6722305 A US6722305 A US 6722305A US 7542591 B2 US7542591 B2 US 7542591B2
- Authority
- US
- United States
- Prior art keywords
- target object
- face
- detecting
- characteristic
- predetermined
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/162—Detection; Localisation; Normalisation using pixel segmentation or colour matching
Definitions
- the present invention relates to a target object detecting method and apparatus for detecting a targeted object from a digital image.
- the present invention also relates to a program for causing a computer to execute the method.
- a matching-based method is widely used.
- a target object is detected by matching the model of the object to be detected (template) with the target object in a digital image (template matching).
- template matching has many drawbacks. For example, it may not tolerate various variations (size, direction, deformation) of a target object in a digital image and the like, since the object models are fixed as described, for example, in “Image Analysis Handbook”, Takagi and Shimoda, pp 171-205, 1991, University of Tokyo Press.
- the neural network method described above is one of the studying methods for image processing known as the architectural method, and many studies, such as the visual model, learning model, and associative memory model, have been conducted.
- the method creates an appropriate neural network model in view of known physiological facts and findings, examines the behavior and performance of the model created, and compares them with actual behaviors and performance of the human brain to understand the image processing principle of the human brain.
- neocognitoron As an epistemic model of the neural network, which is robust against disagreement in size and location of the target object, a so called neocognitoron is known as described, for example, in “Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position-Neocognitoron”, Kunihiko Fukushima, The Institute of Electronics, Information and Communication Engineers Article A, J62-A (10), pp 658-665, October, 1979.
- the neocognitoron is based on the doctrine in which the pattern matching is performed on a small section of the target object by gradually absorbing the displacement in stepwise based on a hierarchical structure.
- the procedure for gradually tolerating the displacement in stepwise not only removes the displacement of an input pattern but also plays an important roll in performing pattern recognition which is robust against deformations. That is, the adverse effects of the relative displacement of the local characteristics are gradually absorbed in the course of integrating the characteristics, and eventually an output not influenced by considerable deformations of the input pattern may be obtained.
- the Kohonen's self-organization mapping is known through “Self-Organization and AssociativeMemory”, T. Kohonen, Springer-Verlag, 1984.
- the Kohonen's self-organization mapping is a model in which a topological mapping is learned through self-organization.
- the topological mapping means, for example, the process of allocating a signal received by a human being from outside, i.e., a certain pattern, to neurons of the cortex reflecting the order based on a certain rule.
- a method for solving the problem described above is proposed as described, for example, in Japanese Unexamined Patent Publication No. 11 (1999)-341272.
- the method provides an ID photograph by the following steps. First, display a facial photograph image to be used for the ID photograph on a display, such as a monitor. Then, indicate the top of the head and tip of the jaw of the facial photograph image on the screen to instruct a computer to create the ID photograph.
- the computer in turn, enlarges/reduces the image to obtain a scaling rate and the position of the face based on the two positions indicated by the operator and the output specification of the ID photograph. Then, the computer performs trimming for the enlarged/reduced image so that the face in the enlarged/reduced image is placed at a predetermined location.
- the user may ask DPE shops, which may be more frequently encountered than the automatic ID photograph creation systems, to create the ID photograph.
- the user may bring in a DPE shop a photographic film or a recording medium out of his/her stock on which a favorite photo image is recorded in order to create the ID photograph from the favorite photo image.
- the operator must perform the troublesome chore of indicating the top of the head and tip of the jaw of the facial photograph image displayed on a display screen. This is especially burdensome for the operator who handles ID photographs of many customers. Further, if the area of the face region of the facial photograph image displayed on a display screen is small, or the resolution of the facial photograph image is coarse, the operator may not indicate the top of the head and tip of the jaw quickly and accurately, so that an appropriate ID photograph may not be provided promptly.
- an automatic trimming method is proposed in U.S. Patent Application Publication No. 20020085771.
- the top of the head and eyes in a facial photograph image are located and the trimming area is set by determining the position of the jaw based on the positions of the top of the head and eyes.
- the most important process in the automatic trimming is the detection of the regions for setting the trimming area. These regions, i.e., the target objects for detection may be, for example, the positions of the top of the head and eyes, entire face portion, both pupils, or the combination thereof as described in U.S. Patent Application Publication No. 20020085771.
- the target object included in the digital image is not always detected.
- a face having a standard characteristic may readily be detected.
- a face having a certain specific characteristic eyeglassed face, heavily whiskered face, uniquely hairstyled face, etc.
- the face detection algorithms are designed based on the standard faces.
- the present invention has been developed in view of the circumstances described above, and it is an object of the present invention to provide a target object detecting method and apparatus capable of detecting a predetermined target object having a specific characteristic, as well as the predetermined target object having a standard characteristic, in detecting a predetermined target object from a digital image, without requiring a huge amount of calculations. It is a further object of the present invention to provide a program for causing a computer to execute the target object detecting method described above.
- a target object detecting method is a target object detecting method for detecting from a digital image a predetermined target object included therein, the method comprising the steps of:
- the present invention may be applied not only to digital images obtained by digital cameras and the like, but also to any image which may be represented in digital form, such as that obtained by reading an image printed on a printing medium, including printing paper and photo paper, using a reading device, such as a scanner.
- a plurality of characteristic target object detecting processes each corresponding to each of the predetermined specific characteristics that differ from each other, are performed sequentially until the predetermined target object has been detected.
- the target object detecting method of the present invention may be applied to a process for detecting a face from a facial photograph image.
- a target object detecting apparatus of the present invention is a target object detecting apparatus for detecting from a digital image a predetermined target object included therein, the apparatus comprising:
- a standard target object detecting means for detecting the predetermined target object having a standard characteristic from the digital image
- a characteristic target object detecting means for detecting the predetermined target object having a predetermined specific characteristic
- control means for controlling the characteristic target object detecting means so that the detection process by the characteristic target object detecting means is performed on the digital image from which the predetermined target object has not been detected by the standard target object detecting means.
- a plurality of characteristic target object detecting means is provided, each corresponding to each of the predetermined characteristics that differ from each other, and the control means is a control means for controlling the plurality of characteristic target object detecting means so that the detection processes by the plurality of characteristic target object detecting means are performed sequentially until the predetermined target object has been detected.
- the target object detecting apparatus of the present invention may be used for detecting a face from a facial photograph image.
- a program of the present invention is a program for causing a computer to execute the target object detecting method described above.
- the standard target object detecting process is performed first on a digital image, noting the fact that predetermined target objects having a standard characteristic are greater in numbers than the predetermined target objects having certain specific characteristics. Then, if the predetermined target object has not been detected, the characteristic target object detecting process is performed.
- the target object which is the predetermined target object having a standard characteristic included in a digital image may be detected first.
- the characteristic target object detecting process is performed, thus the target object having a certain specific characteristic compared with the standard target object may be detected.
- each corresponding to each of a plurality of different characteristics the target object may have, and these characteristic target object detecting processes are performed sequentially until the target object has been detected, more reliable detection for the target object having a specific characteristic may be obtained.
- FIG. 1 is a block diagram of a face detecting apparatus according to an embodiment of the present invention, illustrating the structure thereof.
- FIG. 2 is a block diagram of a standard face detecting section 10 of the face detecting apparatus shown in FIG. 1 , illustrating the structure thereof.
- FIG. 3A is a drawing, illustrating an edge detecting filter.
- FIG. 3B is a drawing, illustrating an edge detecting filter.
- FIG. 4 is a drawing for describing how to calculate a gradient vector.
- FIG. 5A is a drawing, illustrating a human face.
- FIG. 5B is a drawing, illustrating gradient vectors in the vicinity of the eyes and mouth of the human face shown in FIG. 5A .
- FIG. 6A is a drawing, illustrating a histogram of magnitudes of gradient vectors before normalization.
- FIG. 6B is a drawing, illustrating a histogram of magnitudes of gradient vectors after normalization.
- FIG. 6C is a drawing, illustrating a histogram of quinarized magnitudes of gradient vectors.
- FIG. 6D is a drawing, illustrating a histogram of quinarized magnitudes of gradient vectors after normalization.
- FIG. 7 is a drawing, illustrating examples of sample images which are known to be of faces stored in a first storage section 4 of the standard face detecting section 10 to be used for learning a first reference data E 1 .
- FIG. 8A is a drawing for describing rotation of a human face.
- FIG. 8B is a drawing for describing rotation of a human face.
- FIG. 8C is a drawing for describing rotation of a human face.
- FIG. 9 is a flow chart, illustrating a learning process of reference data.
- FIG. 10 is a drawing for describing an extraction method of a discriminator.
- FIG. 11 is a drawing for describing gradual transformation of a photo image when a face is detected by a first discriminating section of the standard face detecting section 10 shown in FIG. 2 .
- FIG. 12 is a block diagram of an eyeglassed face detecting section 20 of the face detecting apparatus shown in FIG. 1 , illustrating the structure thereof.
- FIG. 13 is a block diagram of a whiskered face detecting section 30 of the face detecting apparatus shown in FIG. 1 , illustrating the structure thereof.
- FIG. 14 is a flowchart, illustrating a process flow of the face detecting apparatus shown in FIG. 1 .
- FIG. 1 is a block diagram of a face detecting apparatus according to an embodiment of the present invention, illustrating the structure thereof.
- the structure of the face detecting apparatus shown in FIG. 1 may be realized by executing a face detecting program, which is read into an auxiliary storage unit, on a computer (e.g., a personal computer, and the like).
- the face detecting program is recorded on an information recording medium, such as a CD-ROM and the like, or distributed through a network including the Internet and the like, and installed on the computer.
- the face detecting apparatus of the present embodiment is a face detecting apparatus for detecting a face region from a photograph image to obtain the image of that region (facial image).
- the photograph images to be processed are not limited to those obtained by digital cameras and the like, but any digital image obtained by reading a photograph image printed on a printing medium using a reading device may be included.
- the face detecting apparatus of the present embodiment comprises: a characteristic amount calculating section 1 for calculating characteristic amounts C 0 from a facial photograph image (hereinafter referred to simply as “photo image S 0 ”); a standard face detecting section 10 for detecting a standard face from the photo image S 0 ; an eyeglassed face detecting section 20 for detecting an eyeglassed face from the photo image S 0 when the detection by the standard face detecting section 10 has failed; a whiskered face detecting section 30 for detecting a whiskered face from the photo image S 0 when the detection by the eyeglassed face detecting section 20 has failed; and a control section 50 for controlling the respective detecting sections described above.
- a characteristic amount calculating section 1 for calculating characteristic amounts C 0 from a facial photograph image (hereinafter referred to simply as “photo image S 0 ”); a standard face detecting section 10 for detecting a standard face from the photo image S 0 ; an eyeglassed face detecting section 20 for detecting an eyeglass
- FIG. 2 is a block diagram of the standard face detecting section 10 , illustrating the structure thereof.
- the standard face detecting section 10 comprises a first storage section 4 having a first reference data E 1 stored therein; and a first discriminating section 5 for detecting a face from the photo image S 0 using characteristic amounts C 0 calculated by the characteristic amount calculating means 1 and the first reference data E 1 .
- the characteristic amount calculating section 1 will be described first.
- the characteristic amount calculating section 1 calculates characteristic amounts used for discriminating a face from the photo image S 0 . More specifically, it calculates a gradient vector (i.e., direction to which density of each of the pixels on the photo image S 0 changes and the magnitude thereof) as the characteristic amount C 0 .
- a gradient vector i.e., direction to which density of each of the pixels on the photo image S 0 changes and the magnitude thereof
- the method for calculating the gradient vectors will be described.
- the characteristic amount calculating means 1 detects edges in the horizontal direction within the photo image S 0 by administering a filtering process with a horizontal edge detecting filter shown in FIG. 3A .
- the characteristic amount calculating means 1 also detects edges in the vertical direction within the photo image S 0 by administering a filtering process with a vertical edge detecting filter shown in FIG. 3B .
- a gradient vector K for each pixel of the photo image S 0 is calculated from the size H of horizontal edges and the size V of the vertical edges, as illustrated in FIG. 4 .
- the characteristic amount calculating means 1 calculates the characteristic amounts C 0 at each deformation stage of the photo image S 0 and face image as will be described later.
- the gradient vectors K calculated in the manner described above are directed toward the centers of the eyes and mouth, which is dark, and are directed away from the nose, which is bright, as illustrated in FIG. 5B .
- the magnitudes of the gradient vectors K are greater for the eyes than for the mouth, because the change in density is greater for the eyes than for the mouth.
- the direction and magnitude of the gradient vector K is defined as the characteristic amount C 0 .
- the direction of the gradient vector K may form an angle between 0° and 359° with a predetermined direction (e.g. x direction in FIG. 4 ).
- the magnitudes of the gradient vectors K are normalized.
- the normalization is performed in the following manner. First, a histogram that represents the magnitudes of the gradient vectors K of all of the pixels within the photo image S 0 . Then, the magnitudes of the gradient vectors K are corrected by flattening the histogram so that the distribution of the magnitudes is evenly distributed across the range of values assumable by each pixel of the photo image S 0 (0 through 255 in the case that the image data is 8 bit data). For example, in the case that the magnitudes of the gradient vectors K are small and concentrated at the low value side of the histogram as illustrated in FIG.
- the histogram is redistributed so that the magnitudes are distributed across the entire range from 0 through 255, as illustrated in FIG. 6B .
- the distribution range of the gradient vectors K in a histogram is divided, for example, into five, in order to reduce the amount of calculations, as illustrated in FIG. 6C .
- the gradient vectors K are normalized by redistributing the histogram such that the frequency distribution, which has been divided into five, is distributed across the entire range of values from 0 through 255, as illustrated in FIG. 6D .
- the first reference data E 1 stored in the first storage section 4 defines discrimination conditions for the combinations of the characteristic amount C 0 for each pixel of each of a plurality of types of pixel groups, which are constituted by a plurality of pixels selected from sample images which will be described later.
- the combinations of the characteristic amount of each of the pixels constituting each pixel group, and the discrimination conditions within the first reference data E 1 are set in advance through the learning of the sample image group comprising a plurality of sample images known to be of faces, and a plurality of sample images known not to be of faces.
- the plurality of sample images, which are known to be of faces (hereinafter referred to as “sample image group”) for learning the first reference data E 1 comprise a plurality of arbitrarily selected human facial photograph images (images having a human face).
- the sample image group might include faces having specific characteristics (e.g., eleglassed face, uniquely hairstyled face, and the like), but it is not biased to a specific characteristic.
- the following sample images are to be used as the sample images known to be of faces.
- the sample images are of a 30 ⁇ 30 pixel size, the distance between the centers of the eyes of each face image is one of 9, 10, or 11 pixels, and the faces are rotated stepwise in three degree increments within a range of ⁇ 15 degrees from the vertical (that is, the rotational angles are ⁇ 15 degrees, ⁇ 12 degrees, ⁇ 9 degrees, ⁇ 6 degrees, 0 degrees, 3 degrees, 6 degrees, 9 degrees, 12 degrees, and 15 degrees) as illustrated in FIG. 7 . Accordingly, 33 sample images (3 ⁇ 11) are prepared for each face. Note that FIG.
- the centers of the rotation are the intersections of the diagonals of the sample images.
- the centers of the eyes are in the same positions, which are defined as (x 1 , y 1 ) and (x 2 , y 2 ) in the coordinates with the coordinate origin at the upper left corner of the sample images.
- the positions of the eyes in the vertical direction i.e., y 1 , y 2 ), in the state that the face is vertically oriented are the same for all of the sample images.
- Arbitrary images of a 30 ⁇ 30 pixel size are to be used as the sample images known not to be of faces.
- faces which are possibly included in the photo images S 0 , are not only those which have rotational angles of 0 degrees, as that illustrated in FIG. 8A .
- faces in the target images are rotated, as illustrated in FIG. 8B and FIG. 8C .
- rotated faces such as those illustrated in FIG. 8B and FIG. 8C would not be discriminated as faces.
- the present embodiment imparts an allowable range to the first reference data E 1 .
- This is accomplished by employing sample images, which are known to be of faces, in which the distances between the centers of the eyes are 9, 10, and 11 pixels, and which are rotated in a stepwise manner in three degree increments within a range of ⁇ 15 degrees.
- the photo image S 0 may be enlarged/reduced in a stepwise manner with magnification rates in 11/9 units when discrimination is conducted by a first discriminating section 5 which will be described later.
- This enables reduction of the time required for calculations, compared to a case in which the photo image S 0 is enlarged/reduced with magnification rates in 1.1 units.
- rotated faces such as those illustrated in FIG. 8B and FIG. 8C , are also enabled to be discriminated.
- the sample images which are the subject of learning, comprise a plurality of sample images known to be of faces, and a plurality of sample images known not to be of faces.
- the distances between the centers of the eyes of each face within the images are one of 9, 10, or 11 pixels, and the faces are rotated stepwise in three degree increments within a range of ⁇ 15 degrees from the vertical.
- Each sample image is weighted, that is, is assigned a level of importance.
- the initial values of the weighting of all the sample images are set equally to 1 (step S 1 ).
- each discriminator has a function of providing a reference to discriminate images of faces from those not of faces by employing combinations of the characteristic amount C 0 for each pixel that constitutes a single pixel group.
- histograms of combinations of the characteristic amount C 0 for each pixel that constitutes a single pixel group are used as the discriminators.
- the pixels that constitute the pixel group for generating the discriminator are: a pixel P 1 at the center of the right eye; a pixel P 2 within the right cheek; a pixel P 3 within the forehead; and a pixel P 4 within the left cheek of the sample images which are known to be of faces.
- Combinations of the characteristic amounts C 0 of the pixels P 1 through P 4 are obtained for all of the sample images known to be of faces, and histograms thereof are generated.
- the characteristic amounts C 0 represent the directions and magnitudes of the gradient vectors K.
- the directions of the gradient vectors K are quaternarized, that is, set so that: values of 0 through 44 and 315 through 359 are converted to a value of 0 (right direction); values of 45 through 134 are converted to a value of 1 (upper direction); values of 135 through 224 are converted to a value of 2 (left direction); and values of 225 through 314 are converted to a value of 3 (lower direction).
- the magnitudes of the gradient vectors K are ternarized so that their values assume one of three values, 0 through 2. Then, the values of the combinations are calculated employing the following formulas.
- histograms are generated for the plurality of sample images known not to be of faces.
- pixels denoted by the same reference numerals P 1 through P 4 ) at positions corresponding to the pixels P 1 through P 4 of the sample images known to be of faces, are employed in the calculation of the characteristic amounts C 0 .
- Logarithms of the ratios of the frequencies in the two histograms are represented by the rightmost histogram illustrated in FIG. 10 , which is employed as the discriminator. According to the discriminator, images that have distributions of the characteristic amounts C 0 corresponding to positive discrimination points therein are highly likely to be of faces.
- a plurality of discriminators in histogram format regarding combinations of the characteristic amount C 0 for each pixel of the plurality of types of pixel groups is generated in step S 2 , which may be used during discrimination.
- a discriminator which is the most effective in discriminating whether an image is of a face, is selected from the plurality of discriminators generated in step S 2 .
- the selection of the most effective discriminator is performed while taking the weighting of each sample image into consideration.
- the percentages of correct discriminations provided by each of the discriminators are compared, and the discriminator having the highest weighted percentage of correct discriminations is selected (step S 3 ).
- the discriminator that correctly discriminates whether sample images are of faces with the highest frequency is selected as the most effective discriminator.
- the weightings of each of the sample images are renewed at step S 5 , to be described later.
- step S 3 of the second round there are sample images weighted with 1, those weighted with a value less than 1, and those weighted with a value greater than 1. Accordingly, during evaluation of the percentage of correct discriminations, a sample image, which has a weighting greater than 1, is counted more than a sample image, which has a weighting of 1. For these reasons, from the step S 3 of the second and subsequent rounds, more importance is placed on correctly discriminating heavily weighted sample images than lightly weighted sample images.
- step S 4 confirmation is made regarding whether the percentage of correct discriminations of a combination of the discriminators which have been selected exceeds a predetermined threshold value. That is, the percentage of discrimination results regarding whether sample images are of faces, which are obtained by the combination of the selected discriminators, that match the actual sample images is compared against the predetermined threshold value.
- the sample images, which are employed in the evaluation of the percentage of correct discriminations may be those that are weighted with different values, or those that are equally weighted.
- the percentage of correct discriminations exceeds the predetermined threshold value, whether an image is of a face can be discriminated by the selected discriminators with sufficiently high accuracy, therefore the learning process is terminated.
- the process proceeds to step S 6 , to select an additional discriminator, to be employed in combination with the discriminators which have been selected thus far.
- the discriminator which has been selected at the immediately preceding step S 3 , is excluded from selection in step S 6 , so that it is not selected again.
- step S 5 the weighting of sample images, which were not correctly discriminated by the discriminator selected at the immediately preceding step S 3 , is increased, and the weighting of sample images, which were correctly discriminated, is decreased (step S 5 ).
- the reason for increasing and decreasing the weighting in this manner is to place more importance on images which were not correctly discriminated by the discriminators that have been selected thus far. In this manner, selection of a discriminator which is capable of correctly discriminating whether these sample images are of a face is encouraged, thereby improving the effect of the combination of discriminators.
- step S 3 the process returns to step S 3 , and another effective discriminator is selected, using the weighted percentages of correct discriminations as a reference.
- steps S 3 through S 6 are repeated to select discriminators corresponding to combinations of the characteristic amount C 0 for each pixel that constitutes specific pixel groups, which are suited for discriminating whether faces are included in images. If the percentages of correct discriminations, which are evaluated at step S 4 , exceed the threshold value, the type of discriminator and discrimination conditions, which are to be employed in discrimination regarding whether images include faces, are determined (step S 7 ), and the learning of the first reference data E 1 is terminated.
- the discriminators are not limited to those in the histogram format.
- the discriminators may be of any format, as long as they provide references to discriminate between images of faces and other images by employing combinations of the characteristic amount C 0 for each pixel that constitutes specific pixel groups.
- Examples of alternative discriminators are: binary data, threshold values, functions, and the like.
- a histogram that represents the distribution of difference values between the two histograms illustrated in the center of FIG. 10 may be employed, in the case that the discriminators are of the histogram format.
- the learning technique is not limited to that which has been described above.
- Other machine learning techniques such as a neural network technique, may be employed.
- the first discriminating section 5 refers to the discrimination conditions of the first reference data E 1 , which has been learned regarding every combination of the characteristic amount C 0 of each pixel that constitutes a plurality of types of pixel groups. Thereby, the discrimination points of the combinations of the characteristic amount C 0 of each pixel that constitutes each of the pixel groups are obtained. Whether a face is included in the photo image S 0 is discriminated by totaling the discrimination points. At this time, the directions and magnitudes of the gradient vectors K, which are the characteristic amounts C 0 , are quaternarized and ternarized respectively. In the present embodiment, discrimination is performed based on the total sum of all of the discrimination points.
- the total sum of the discrimination points is positive, and is greater than or equal to a predetermined threshold value, it is judged that a face is included in the photo image S 0 .
- the total sum of the discrimination points is less than the predetermined threshold value, it is judged that a face is not included in the photo image S 0 .
- the sizes of the photo images S 0 are varied, unlike the sample images, which are 30 ⁇ 30 pixels.
- the first discrimination section 5 enlarges/reduces the photo image S 0 in a stepwise manner ( FIG. 11 illustrates a reduction process), so that the size thereof becomes 30 pixels either in the vertical or horizontal direction.
- the photo image S 0 is rotated in a stepwise manner over 360 degrees.
- a mask M which is 30 ⁇ 30 pixels large, is set on the photo image S 0 , at every stepwise increment of the enlargement/reduction. The mask M is moved one pixel at a time on the photo image S 0 , and whether a face is included in the photo image S 0 is discriminated by discriminating whether the image within the mask is that of a face.
- the magnification rate during enlargement/reduction of the photo image S 0 may be set to be 11/9.
- sample images learned during the generation of the first reference data E 1 sample images in which faces are rotated within the range of ⁇ 15 degrees are used. Therefore, the photo image S 0 may be rotated over 360 degrees in 30 degree increments.
- the characteristic amount calculating section 1 calculates the characteristic amounts C 0 from the photo image S 0 at each step of the stepwise enlargement/reduction and rotational deformation. Note that the characteristic amounts C 0 calculated by the characteristic amount calculating section 1 are related to the steps where they have been obtained and stored in a temporary storage means (not shown).
- Discrimination regarding whether a face is included in the photo image S 0 is performed at every step in the stepwise enlargement/reduction and rotational deformation thereof.
- the photo image S 0 is discriminated to include the face.
- a region of 30 ⁇ 30 pixels large, corresponding to the position of the mask M at the time of discrimination, is extracted from the photo image S 0 at the step in the stepwise size and rotational deformation at which the face was discriminated.
- FIG. 12 is a block diagram of the eyeglassed face detecting section 20 of the face detecting apparatus according to the embodiment shown in FIG. 1 , illustrating the structure thereof.
- the eyeglassed face detecting section 20 has a second storage section 14 and a second discriminating section 15 .
- the second storage section 14 has a second reference data E 2 stored therein.
- the second discriminating section 15 detects a face from the photo image S 0 using the second reference data E 2 and characteristic amounts C 0 calculated by the characteristic amount calculating means 1 and stored in a temporary storage means (not shown).
- the second reference data E 1 stored in the second storage section 14 defines discrimination conditions for the combinations of the characteristic amount C 0 for each pixel of each of a plurality of types of pixel groups, which are constituted by a plurality of pixels selected from sample images which will be described later.
- the combinations of the characteristic amount of each of the pixels constituting each pixel group, and the discrimination conditions within the second reference data E 2 are set in advance through the learning of the sample image group comprising a plurality of sample images known to be of faces, and a plurality of sample images known not to be of faces.
- the plurality of sample images known to be of faces for learning the second reference data E 2 comprise eyeglassed human facial photograph images only, unlike those for learning the first reference data E 1 used in the standard face detecting section 10 .
- the operation of the eyeglassed face detecting section 20 is similar to that of the standard face detecting section 10 , except that the second reference data E 2 differs from the first reference data E 1 . Therefore, it will not be elaborated upon further here.
- FIG. 13 is a block diagram of the whiskered face detecting section 30 of the face detecting apparatus according to the embodiment shown in FIG. 1 , illustrating the structure thereof.
- the whiskered face detecting section 30 has a third storage section 24 and a third discriminating section 25 .
- the third storage section 24 has a third reference data E 3 stored therein.
- the third discriminating section 25 detects a face from the photo image S 0 using the third reference data E 3 and characteristic amounts C 0 calculated by the characteristic amount calculating means 1 and stored in a temporary storage means not shown in the drawing.
- the third reference data E 3 stored in the third storage section 24 defines discrimination conditions for the combinations of the characteristic amount C 0 for each pixel of each of a plurality of types of pixel groups, which are constituted by a plurality of pixels selected from sample images which will be described later.
- the combinations of the characteristic amount of each of the pixels constituting each pixel group, and the discrimination conditions within the third reference data E 3 are set in advance through the learning of the sample image group comprising a plurality of sample images known to be of faces, and a plurality of sample images known not to be of faces.
- the plurality of sample images known to be of faces for learning the third reference data E 3 comprise whiskered human facial photograph images only, unlike those for learning the first reference data E 1 used in the standard face detecting section 10 , or those for learning the second reference data E 2 used in the eyeglassed face detecting section 20 .
- the operation of the whiskered face detecting section 30 is similar to that of the standard face detecting section 10 , except that the third reference data E 3 differs from the first reference data E 1 . Therefore, it will not be elaborated upon further here.
- FIG. 14 is a flowchart, illustrating a process flow of the face detecting apparatus shown in FIG. 1 .
- the characteristic amount calculating section 1 first calculates the directions and magnitudes of the gradient vectors K from the photo image S 0 as the characteristic amounts C 0 at each step of the stepwise enlargement/reduction and rotational deformation, and stores them in a temporary storage means (not shown) (step S 10 ).
- the first discriminating section 5 of the standard face detecting section 10 reads out the first reference data E 1 , which have been obtained by learning the sample images comprising a multitude of arbitrary human facial photograph images and sample images known not to be of faces, from the first storage section 4 (step S 12 ). Then it performs discrimination whether a face is included in the photo image S 0 based on the first reference data E 1 and the characteristic amounts C 0 obtained by the characteristic amount calculating section 1 (step S 14 ).
- step S 16 If the first discriminating section 5 of the standard face detecting section 10 discriminates that a face is included in the photo image S 0 (step S 16 is positive), it extracts the face from the photo image S 0 (step S 30 ), and the process of the face detecting apparatus of the present embodiment for the photo image S 0 is terminated.
- step S 16 the control section 50 instructs the eyeglassed face detecting section 20 to initiate the detection of a face from the photo image S 0 .
- the second discriminating section 15 of the eyeglassed face detecting section 20 reads out the second reference data E 2 , which have been obtained by learning sample images comprising only of eyeglassed human facial photograph images and sample images known not to be of faces, from the second storage section 14 (step S 20 ). Then it performs discrimination whether a face is included in the photo image S 0 based on the second reference data E 2 and the characteristic amounts C 0 obtained by the characteristic amount calculating section 1 (step S 22 ).
- step S 24 If the second discriminating section 15 of the eyeglassed face detecting section 20 discriminates that a face is included in the photo image S 0 (step S 24 is positive), it extracts the face from the photo image S 0 (step S 30 ), and the process of the face detecting apparatus of the present embodiment for the photo image S 0 is terminated.
- the control section 50 further instructs the whiskered face detecting section 30 to initiate the detection of a face from the photo image S 0 .
- the third discriminating section 25 of the whiskered face detecting section 30 reads out the third reference data E 3 , which have been obtained by learning sample images comprising only of whiskered human facial photograph images and sample images known not to be of faces, from the third storage section 24 (step S 26 ). Then it performs discrimination whether a face is included in the photo image S 0 based on the third reference data E 3 and the characteristic amounts C 0 obtained by the characteristic amount calculating section 1 (step S 28 ).
- step S 28 If the third discriminating section 25 of the whiskered face detecting section 30 discriminates that a face is included in the photo image S 0 (step S 28 is positive), it extracts the face from the photo image S 0 (step S 30 ), and the process of the face detecting apparatus of the present embodiment for the photo image S 0 is terminated. If the third discriminating section 25 of the whiskered face detecting section 30 discriminates that a face is not included in the photo image S 0 (step S 28 is negative), the control section 50 terminates the process for the photo image S 0 .
- the standard face detecting section 10 , eyeglassed face detecting section 20 , and whiskered face detecting section 30 of the face detecting apparatus according to the present embodiment are provided for detecting standard faces, eyeglassed faces, and whiskered faces respectively.
- high detection accuracy may be obtained for the respective target objects.
- separation of the detection process for respective types of faces into a plurality of processes allows reduced amount of calculations and rapid processing.
- the eyeglassed face detection process and whiskered face detection process are provided as the detection processes for detecting characteristic faces.
- the detection processes of the present invention are not limited to these, and the number of detection processes and types for detecting characteristic faces may be increased or decreased.
- the predetermined target object to be detected from a digital image is a face.
- the target object detecting method, apparatus, and program may be applied for detecting any target object.
- the methods used in the standard target object detecting process (standard face detecting process in the present embodiment) and characteristic target object detecting processes (eyeglassed face detecting process and whiskered face detecting process in the present embodiment) are not limited to those used in the present invention. But, various known target object detecting methods may be employed.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (10)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004051322A JP2005242640A (en) | 2004-02-26 | 2004-02-26 | Object detection method and device, and program |
JP051322/2004 | 2004-02-26 |
Publications (2)
Publication Number | Publication Date |
---|---|
US20050190963A1 US20050190963A1 (en) | 2005-09-01 |
US7542591B2 true US7542591B2 (en) | 2009-06-02 |
Family
ID=34879616
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/067,223 Active 2026-07-26 US7542591B2 (en) | 2004-02-26 | 2005-02-28 | Target object detecting method, apparatus, and program |
Country Status (2)
Country | Link |
---|---|
US (1) | US7542591B2 (en) |
JP (1) | JP2005242640A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070292019A1 (en) * | 2005-06-16 | 2007-12-20 | Fuji Photo Film Co., Ltd. | Learning method for detectors, face detection method, face detection apparatus, and face detection program |
CN102034079B (en) * | 2009-09-24 | 2012-11-28 | 汉王科技股份有限公司 | Method and system for identifying faces shaded by eyeglasses |
US20130128071A1 (en) * | 2007-12-21 | 2013-05-23 | CSR Technology, Inc. | Detecting objects in an image being acquired by a digital camera or other electronic image acquisition device |
US20140198997A1 (en) * | 2013-01-16 | 2014-07-17 | Sony Corporation | Information processing apparatus, method, and program |
US8856541B1 (en) * | 2013-01-10 | 2014-10-07 | Google Inc. | Liveness detection |
CN107862270A (en) * | 2017-10-31 | 2018-03-30 | 深圳云天励飞技术有限公司 | Face classification device training method, method for detecting human face and device, electronic equipment |
US11275819B2 (en) | 2018-12-05 | 2022-03-15 | Bank Of America Corporation | Generative adversarial network training and feature extraction for biometric authentication |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7920725B2 (en) * | 2003-09-09 | 2011-04-05 | Fujifilm Corporation | Apparatus, method, and program for discriminating subjects |
US20060222264A1 (en) * | 2005-03-31 | 2006-10-05 | Siemens Ag | Method for vertically orienting a face shown in a picture |
JP4605006B2 (en) | 2005-12-26 | 2011-01-05 | セイコーエプソン株式会社 | Print data generation apparatus, print data generation method, and program |
US7657086B2 (en) * | 2006-01-31 | 2010-02-02 | Fujifilm Corporation | Method and apparatus for automatic eyeglasses detection using a nose ridge mask |
US7653221B2 (en) * | 2006-01-31 | 2010-01-26 | Fujifilm Corporation | Method and apparatus for automatic eyeglasses detection and removal |
US7684594B2 (en) * | 2006-02-08 | 2010-03-23 | Fujifilm Corporation | Method and apparatus for estimating object part location in digital image data using feature value analysis |
US7957555B2 (en) * | 2006-02-08 | 2011-06-07 | Fujifilm Corporation | Method and apparatus for localizing an object part in digital image data by updating an initial position estimate based on a displacement of the object part |
US20080107341A1 (en) * | 2006-11-02 | 2008-05-08 | Juwei Lu | Method And Apparatus For Detecting Faces In Digital Images |
JP5649601B2 (en) * | 2012-03-14 | 2015-01-07 | 株式会社東芝 | Verification device, method and program |
CN105184253B (en) * | 2015-09-01 | 2020-04-24 | 北京旷视科技有限公司 | Face recognition method and face recognition system |
US20170323149A1 (en) * | 2016-05-05 | 2017-11-09 | International Business Machines Corporation | Rotation invariant object detection |
US10706327B2 (en) * | 2016-08-03 | 2020-07-07 | Canon Kabushiki Kaisha | Information processing apparatus, information processing method, and storage medium |
CN107844742B (en) * | 2017-09-26 | 2019-01-04 | 平安科技(深圳)有限公司 | Facial image glasses minimizing technology, device and storage medium |
EP3698269A4 (en) | 2017-11-22 | 2020-12-09 | Zhejiang Dahua Technology Co., Ltd. | An image processing method and system |
CN107944385B (en) * | 2017-11-22 | 2019-09-17 | 浙江大华技术股份有限公司 | It is a kind of for determining the method and device of glasses frame region |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5715325A (en) * | 1995-08-30 | 1998-02-03 | Siemens Corporate Research, Inc. | Apparatus and method for detecting a face in a video image |
JPH11341272A (en) | 1998-05-22 | 1999-12-10 | Noritsu Koki Co Ltd | Device and method for picture processing |
US6184926B1 (en) * | 1996-11-26 | 2001-02-06 | Ncr Corporation | System and method for detecting a human face in uncontrolled environments |
US20020085771A1 (en) | 2000-11-14 | 2002-07-04 | Yukari Sakuramoto | Image processing apparatus, image processing method and recording medium |
US20040240708A1 (en) * | 2003-05-30 | 2004-12-02 | Microsoft Corporation | Head pose assessment methods and systems |
US6885760B2 (en) * | 2000-02-01 | 2005-04-26 | Matsushita Electric Industrial, Co., Ltd. | Method for detecting a human face and an apparatus of the same |
-
2004
- 2004-02-26 JP JP2004051322A patent/JP2005242640A/en active Pending
-
2005
- 2005-02-28 US US11/067,223 patent/US7542591B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5715325A (en) * | 1995-08-30 | 1998-02-03 | Siemens Corporate Research, Inc. | Apparatus and method for detecting a face in a video image |
US6184926B1 (en) * | 1996-11-26 | 2001-02-06 | Ncr Corporation | System and method for detecting a human face in uncontrolled environments |
JPH11341272A (en) | 1998-05-22 | 1999-12-10 | Noritsu Koki Co Ltd | Device and method for picture processing |
US6885760B2 (en) * | 2000-02-01 | 2005-04-26 | Matsushita Electric Industrial, Co., Ltd. | Method for detecting a human face and an apparatus of the same |
US20020085771A1 (en) | 2000-11-14 | 2002-07-04 | Yukari Sakuramoto | Image processing apparatus, image processing method and recording medium |
US20040240708A1 (en) * | 2003-05-30 | 2004-12-02 | Microsoft Corporation | Head pose assessment methods and systems |
Non-Patent Citations (3)
Title |
---|
Phantom Faces for Face Analysis, Laurenz Wiskott, 1997, IEEE Publications, pp. 308-311. * |
T. Kohonen, "Self-Organization and Associative Memory", Springer-Verlag, 1984, pp. 125-161. |
Taiwei Lu et al., "Self-organizing optical neural network for unsupervised learning", Optical Engineering, Sep. 1990, vol. 29, No. 9, pp. 1107-1113. |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070292019A1 (en) * | 2005-06-16 | 2007-12-20 | Fuji Photo Film Co., Ltd. | Learning method for detectors, face detection method, face detection apparatus, and face detection program |
US7689034B2 (en) * | 2005-06-16 | 2010-03-30 | Fujifilm Corporation | Learning method for detectors, face detection method, face detection apparatus, and face detection program |
US20130128071A1 (en) * | 2007-12-21 | 2013-05-23 | CSR Technology, Inc. | Detecting objects in an image being acquired by a digital camera or other electronic image acquisition device |
CN102034079B (en) * | 2009-09-24 | 2012-11-28 | 汉王科技股份有限公司 | Method and system for identifying faces shaded by eyeglasses |
US8856541B1 (en) * | 2013-01-10 | 2014-10-07 | Google Inc. | Liveness detection |
US20140198997A1 (en) * | 2013-01-16 | 2014-07-17 | Sony Corporation | Information processing apparatus, method, and program |
CN107862270A (en) * | 2017-10-31 | 2018-03-30 | 深圳云天励飞技术有限公司 | Face classification device training method, method for detecting human face and device, electronic equipment |
CN107862270B (en) * | 2017-10-31 | 2020-07-21 | 深圳云天励飞技术有限公司 | Face classifier training method, face detection method and device and electronic equipment |
US11275819B2 (en) | 2018-12-05 | 2022-03-15 | Bank Of America Corporation | Generative adversarial network training and feature extraction for biometric authentication |
Also Published As
Publication number | Publication date |
---|---|
JP2005242640A (en) | 2005-09-08 |
US20050190963A1 (en) | 2005-09-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7542591B2 (en) | Target object detecting method, apparatus, and program | |
US8254644B2 (en) | Method, apparatus, and program for detecting facial characteristic points | |
US20070189584A1 (en) | Specific expression face detection method, and imaging control method, apparatus and program | |
US7599549B2 (en) | Image processing method, image processing apparatus, and computer readable medium, in which an image processing program is recorded | |
US20050196069A1 (en) | Method, apparatus, and program for trimming images | |
US8811744B2 (en) | Method for determining frontal face pose | |
US7848545B2 (en) | Method of and system for image processing and computer program | |
JP4619927B2 (en) | Face detection method, apparatus and program | |
US7925093B2 (en) | Image recognition apparatus | |
US20060126964A1 (en) | Method of and system for image processing and computer program | |
US20060147093A1 (en) | ID card generating apparatus, ID card, facial recognition terminal apparatus, facial recognition apparatus and system | |
US20050117802A1 (en) | Image processing method, apparatus, and program | |
JP4690190B2 (en) | Image processing method, apparatus, and program | |
JP4510562B2 (en) | Circle center position detection method, apparatus, and program | |
JP2005084979A (en) | Face authentication system, method and program | |
KR101031369B1 (en) | Apparatus for identifying face from image and method thereof | |
JP2005108207A (en) | Image processing method, device, and program | |
JP2006133824A (en) | Method and apparatus for image processing, and program | |
JP2005244571A (en) | Trimming processing method and apparatus, and program | |
JP2005242641A (en) | Trimming data creation method, trimming data, and trimming processing system | |
JP2006309715A (en) | Face discrimination method and device, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FUJI PHOTO FILM CO., LTD., JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LI, YUANZHONG;REEL/FRAME:016337/0379 Effective date: 20050113 |
|
AS | Assignment |
Owner name: FUJIFILM CORPORATION, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FUJIFILM HOLDINGS CORPORATION (FORMERLY FUJI PHOTO FILM CO., LTD.);REEL/FRAME:018904/0001 Effective date: 20070130 Owner name: FUJIFILM CORPORATION,JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FUJIFILM HOLDINGS CORPORATION (FORMERLY FUJI PHOTO FILM CO., LTD.);REEL/FRAME:018904/0001 Effective date: 20070130 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |