WO2012132418A1

WO2012132418A1 - Characteristic estimation device

Info

Publication number: WO2012132418A1
Application number: PCT/JP2012/002128
Authority: WO
Inventors: 純西村; 宏明由雄
Original assignee: パナソニック株式会社
Priority date: 2011-03-29
Filing date: 2012-03-27
Publication date: 2012-10-04
Also published as: JP2012208710A

Abstract

An estimation model for carrying out a characteristic estimation is retained in an estimation model retention unit (14). A characteristic estimation of an inputted facial image is carried out with a characteristic estimation unit (15), using the estimation model which is retained in the estimation model retention unit (14). Sample data is extracted with a learning sample extraction unit (19) for each group of characteristic estimation results by the characteristic estimation unit (15). Sample data is thus obtained for each group of characteristic estimation results, thereby allowing uniform extraction of onsite sample data, and re-learning with little onsite sample data.

Description

Attribute estimation device

The present invention relates to an attribute estimation apparatus that estimates age, sex, and the like from a face image.

As a system for identifying the target attribute included in the image, for example, there is one described in Patent Document 1. The system described in Patent Document 1 identifies a person's attributes (age, gender, etc.) based on a face image, and a computer constituting an offline training system generates an attribute identification dictionary, and an online operation system. The computer that forms the character determines the attribute of the person based on the face image of the unknown person, using the attribute identification dictionary created by the computer constituting the offline training system. The computer constituting the offline training system includes an attribute identification dictionary for identifying a person attribute of a face image, a plurality of sample image data each including a face image of a person with a known attribute, and an attribute of each person Is generated using learning sample data in which is associated.

Japanese Unexamined Patent Publication No. 2006-323507

However, in the conventional attribute estimation system, there is a problem that only the learning sample prepared in the laboratory is not accurate in the field. In other words, there is a problem that it is not possible to estimate well when an on-site image is captured. Although this problem can be improved in accuracy by adding on-site samples, a new problem arises in that the time and effort required to input correct data required at that time is enormous and the cost is increased.

The present invention has been made in view of such circumstances, and an object thereof is to provide an attribute estimation device that can be relearned with a small number of on-site samples.

The attribute estimation apparatus of the present invention uses an image input unit that inputs a face image, an estimation model holding unit that holds an estimation model for performing attribute estimation, and an estimation model that is held in the estimation model holding unit. An attribute estimation unit that performs attribute estimation of the face image input by the image input unit, an estimation result accumulation unit that accumulates the face image and an attribute estimation result by the attribute estimation unit, and the attribute estimation unit A sample extraction unit that extracts sample data for each group of attribute estimation results according to the above, and a relearning unit that updates the estimation model using data obtained by adding a correct answer to the sample data extracted by the sample extraction unit, Prepared.

According to the above configuration, since the sample data is obtained for each group of attribute estimation results, the on-site sample data can be extracted evenly and can be re-learned with a small amount of on-site sample data.

In the above configuration, a feature amount distribution calculation unit that calculates a feature amount distribution of a face image for each group of attribute estimation results by the attribute estimation unit is provided, and the sample extraction unit includes the attribute determined by the feature amount distribution calculation unit Sample data is extracted based on the facial image feature quantity distribution for each group of estimation results.

According to the above configuration, on-site sample data can be extracted evenly.

In the above configuration, the feature amount distribution calculation unit clusters face image feature amounts for each group of the attribute estimation results, and the sample extraction unit extracts sample data based on a position in the cluster.

In the above configuration, the sample extraction unit extracts data having a certain distance from the center of the cluster as sample data.

In the above configuration, the sample data to which the correct answer is given at the time of re-learning in the re-learning unit is weighted.

According to the above configuration, the on-site sample data can be balanced with the initial learning sample data, and even a small amount of on-site sample data can be effectively relearned.

In the above configuration, the sample data to which the correct answer is given at the time of re-learning by the re-learning unit is weighted according to the position in the cluster.

In the above configuration, a relearning start unit is provided that starts relearning by the relearning unit when a set condition is satisfied.

According to the above configuration, the estimated model at the time of shipment can be accurately applied to the site.

In the above configuration, the attribute is age.

The re-learning sample extraction device of the present invention extracts a sample for extracting sample data for re-learning an estimation model for each group of attribute estimation results for data in which a face image and its attribute estimation results are associated With parts.

According to the above configuration, since sample data is obtained for each group of attribute estimation results, it is possible to extract field sample data evenly.

The attribute estimation method of the present invention uses an image input step for inputting a face image, an estimation model holding step for holding an estimation model for performing attribute estimation, and an estimation model held in the estimation model holding step. , An attribute estimation step for estimating the attribute of the face image input in the image input step, an estimation result accumulation step for accumulating the face image and an attribute estimation result obtained by the attribute estimation step, and the attribute estimation step A sample extraction step for extracting sample data for each group of attribute estimation results by, and a relearning step for updating the estimation model using data obtained by adding a correct answer to the sample data extracted in the sample extraction step. Prepared.

According to the above method, sample data is obtained for each group of attribute estimation results, so that on-site sample data can be extracted evenly and relearning can be performed with a small amount of on-site sample data.

According to the present invention, sample data is obtained for each group of attribute estimation results, so that on-site sample data can be extracted evenly, and relearning can be performed with a small amount of on-site sample data.

The block diagram which shows schematic structure of the attribute estimation apparatus which concerns on one embodiment of this invention Diagram showing the methods and effects of the fixed cluster number type and cluster number estimation type, which are examples of clustering methods The figure which showed the basic operation | movement of the attribute estimation apparatus of FIG. 1 typically The figure which shows the example of a correct answer provision screen in the professional mode of the attribute estimation apparatus of FIG. The flowchart for demonstrating operation | movement of the attribute estimation apparatus of FIG. The flowchart for demonstrating operation | movement of the learning sample extraction part of the attribute estimation apparatus of FIG.

Hereinafter, preferred embodiments for carrying out the present invention will be described in detail with reference to the drawings.

FIG. 1 is a block diagram showing a schematic configuration of an attribute estimation apparatus according to an embodiment of the present invention. In the figure, an attribute estimation apparatus 1 according to the present embodiment includes an image input unit 12 including a video input unit 10 and a face detection unit 11, a feature amount extraction unit 13, an estimated model holding unit 14, and a face attribute estimation unit. 15, an on-site sample estimation result DB (estimation result storage unit) 16, a relearning start unit 17, a feature amount distribution calculation unit 18, a learning sample extraction unit (sample extraction unit) 19, a relearning data DB 20, A re-learning unit 21. An on-site sample DB 22 is configured from data obtained from the image input unit 12 and the feature amount extraction unit 13 including the video input unit 10 and the face detection unit 11.

The video input unit 10 inputs video from the camera 2. The face detection unit 11 extracts a face image from the video input by the video input unit 10. The feature amount extraction unit 13 extracts the feature amount of the face image extracted by the face detection unit 11. The feature quantity extraction unit 13 detects and normalizes facial parts such as eyes and nose from the facial image, and extracts facial features from the normalized image using Gabor features, LBP features, Haar features, and the like. The feature quantity extracted by the feature quantity extraction unit 13 is multidimensional data. The estimation model holding unit 14 holds an estimation model for performing attribute estimation. In this embodiment, the attribute is age.
Here, the estimation model can be expressed by the following function.
A mapping G (Y) = X that is converted from the face feature amount Y into a feature suitable for estimating the face attribute
A function F (X) for estimating the age, sex, etc. based on the face attribute feature amount X

The face attribute estimation unit 15 estimates the attribute of the face image input by the image input unit 12 using the estimation model held in the estimation model holding unit 14 and displays the result on the display terminal 3. The on-site sample estimation result DB 16 accumulates the face image input by the image input unit 12 and the attribute estimation result by the face attribute estimation unit 15 in association with each other. That is, the face image collected on site and the face attribute estimation result estimated by the model are stored as a set. The relearning start unit 17 starts relearning when the set condition is satisfied.

The feature amount distribution calculation unit 18 obtains the feature amount distribution of the face image for each group of attribute estimation results accumulated in the on-site sample estimation result DB 16. In this case, the feature quantity distribution calculation unit 18 performs clustering of face image feature quantities for each group of attribute estimation results. In this embodiment, clustering is performed for each estimated age. As shown in FIG. 2, clustering methods include a cluster number fixed type and a cluster number estimation type. Returning to FIG. 1, at the time of re-learning, the learning sample extraction unit 19 extracts sample data based on the feature amount distribution of the face image for each group obtained by the attribute amount distribution calculation unit 18. In this case, sample data is extracted based on the position in the cluster. For example, data having a certain distance from the center of the cluster is extracted as sample data. Since it is too costly to input correct answers for all on-site samples, the learning sample extraction unit 19 extracts a small amount of samples.

The re-learning data DB 20 accumulates data in which the correct answer is added to the sample data extracted by the learning sample extraction unit 19. In this case, the correct answer is given by the user. That is, the user uses the display terminal 3 to input correct face attributes for a small amount of the field sample extracted by the learning sample extraction unit 19. In the display terminal 3, the correct answer is given to the sample data by the user operation, and the relearning data DB 20 gives the weight corresponding to the position in the cluster to the sample data to which the correct answer is given. The relearning unit 21 updates the estimation model using data accumulated in the relearning data DB 20 (data obtained by adding a correct answer to the sample data extracted by the learning sample extraction unit 19).

Next, operation | movement of the attribute estimation apparatus 1 of this Embodiment is demonstrated.
FIG. 3 is a diagram schematically illustrating the basic operation of the attribute estimation apparatus 1 according to the present embodiment. In the figure, the face attribute estimation unit 15 performs attribute estimation of the face image obtained from the on-site sample DB 22 using the estimation model held in the estimation model holding unit 14. The on-site sample is a group of face images detected from an image acquired by a camera (not shown) installed on the site. As for the result 23 of the face image attribute estimation (held in the field sample estimation result DB 16), the horizontal axis is the age (10's, 20's, 30's, 40's, 50's, 60's), and the vertical axis is the sample. It is displayed on the display terminal 3 as a number. The learning sample extraction unit 19 collects samples for each age of the estimation results, performs clustering for each age, and extracts sample data closest to the cluster center 24 for each age. The cluster center 24 is an average position of data belonging to each cluster. The sample data extracted by the learning sample extraction unit 19 is on the order of several tens. In the example shown in FIG. 3, six pieces of sample data are extracted, but the number to be extracted may be determined in advance or may be determined according to the distribution situation.

The learning sample extraction unit 19 has two modes, a general mode and a professional mode. In the general mode, the sample data closest to the cluster center 24 is extracted (only one). In addition to extracting one piece of sample data closest to the cluster center, sample data in the vicinity of the cluster center can be extracted. On the other hand, in the professional mode, a user who has knowledge about face recognition presents a large amount of samples instead of a small amount of samples, and selects a sample that seems to be effective for relearning. The large amount is an image of extracting not only one near the cluster center but also a plurality (several tens). FIG. 4 is an example of a correct assignment screen in the professional mode.

Referring back to FIG. 3, each sample data extracted by the learning sample extraction unit 19 is given a correct answer by the correct answer giving unit 3a of the display terminal 3. The correct answer giving unit 3a is operated by the user. For example, if the sample data 25a of the teenage group 50 shown in FIG. 3 is a person in their 30s, the user can enter the corresponding sample data 25a in the correct answer grant site sample data 60 displayed on the display terminal 3. The 30s (60a) of the teens to 60s displayed is designated. By this designation, a correct answer is given to the sample data 25a. Further, if the sample data 25b is a person in their 50s, the 50s (60b) is designated out of the 10s to 60s displayed for the corresponding sample data 25b in the on-site sample data 60 for giving correct answers. . By this designation, a correct answer is given to the sample data 25b. A correct answer is assigned to all the sample data extracted by the learning sample extraction unit 19 in this way.

The correct sample data 25 is stored in the relearning data DB 20. The re-learning data DB 20 performs weighting on the accumulated correct-corrected sample data 25, and then combines the weighted correct-corrected sample data 25 and the initial learning sample data accumulated in the initial learning sample DB 30. The reason why the sample data 25 given the correct answer is weighted is because the number of sample data 25 given the correct answer is overwhelmingly smaller than the number of initial learning sample data stored in the initial learning sample DB 30 (initial learning sample). This is because it is not effective to add the sample data 25 given the correct answer to the initial learning sample data as it is.

Here, an outline of weighting for each attribute group will be described.
Weighting for each attribute group For each attribute group, the number indicating the attribute group is c, the sample number used in the previous model learning is i (i = 1 to N ^(c) ), and the added field sample number is j (j = 1 to M ^(c) ), the weights w ^(c) _i and w ^(c) _j for each sample of the attribute group c are set so as to satisfy the following expression (1).

・ Α is 0.5
-If the difference from the correct attribute entered by the user is large, set a value close to 1 (note that if it is too large, generalization will drop)

When not divided into each attribute group The sample number used in the previous model learning is i (i = 1 to N), the added field sample number j (j = 1 to M), and the weight for each sample is Wi and Wj. Is set so as to satisfy the following expression (2).

Returning to FIG. 3, the sample data 25 given the correct answer is accumulated in the re-learning data DB 20, so that the re-learning unit 21 receives the sample data given the correct answer and the initial learning sample data accumulated in the initial learning sample DB 30. Re-learn as a learning sample in combination.

The relearning unit 21 performs the following learning.
・ Learn new mapping from facial feature Y to facial attribute feature X ・ Learn new functions to estimate age, sex, etc. based on facial attribute feature X

The estimated model holding unit 14 updates the estimated model with the re-learning model 40 generated by the re-learning unit 21.
Here, the relearning model is
New map G ′ (Y) = X from face feature Y to face attribute feature X
A new function F ′ (X) that estimates age, gender, etc. based on facial attribute feature amount X
It is.

FIG. 5 is a flowchart for explaining the operation of the attribute estimation apparatus 1 according to the present embodiment. In the figure, an initial model is first generated (step S1). This initial model is generated in the laboratory. After the initial model is generated, model evaluation, that is, attribute estimation is performed (step S2). After attribute estimation, it is determined whether the number of re-learning has reached N (predetermined value represents an integer, for example, 3 times, 4 times, etc.) or whether the difference from the previous model is less than a predetermined threshold (step) S3). If the number of times of re-learning has reached N, or if the difference from the previous model is less than the threshold value (in the case of “Yes”), this processing is terminated as the end of re-learning. Here, as the timing of the end of the relearning, the following timing can be given.
-After a predetermined number of times-When the difference from the previous model is sufficiently small-When there is almost no change in the estimation result by the previous model and the estimation result by the model after relearning-The model based on the correct answer entered by the user When the accuracy is evaluated and the accuracy is saturated (or the accuracy has been improved more than a certain level)

If the number of re-learning has not reached N or the difference from the previous model is greater than or equal to a threshold value (in the case of “No”) in the determination in step S3, the learning sample extraction unit 19 sets the sample data for giving a correct answer. Extract (step S4). FIG. 6 is a flowchart for explaining the operation of the learning sample extraction unit 19. In the same figure, the learning sample extraction part 19 divides a sample for every attribute group (step S40). Next, the sample is mapped to the feature space for each attribute group (step S41), and clustering is performed for each attribute group (step S42). Next, a sample near the center of each cluster is extracted (step S43).

Referring back to the flowchart of FIG. 5, after the learning sample extraction unit 19 extracts the correct granting sample, the correct answer is given by the user operation (step S5). After the correct answer is given by the user operation, the re-learning unit 21 generates re-learning data (step S6), and re-learning is performed based on the generated re-learning data (step S7). The processes of steps S2 to S7 are repeatedly performed at the timing of starting the relearning.

The following timing can be given as the timing for starting the relearning (when to learn).
(1) When the distribution of samples used at the time of shipping model creation differs from the distribution of samples at each estimated age by more than a certain level (2) The average of each age when the shipping model was created When the average of each estimated age of the field sample deviates more than a certain level (3) (1) and (2) may occur simultaneously (most common)
(4) Generate a temporary model by trusting each estimated age of the on-site sample, and when the shipping model and the estimation result become large,
・ Comparison of differences in estimated values: Difference between estimated values when attributes are estimated using on-site samples and estimated values when estimated using shipping models ・ Comparison of ratios by age: What percentage is in teens , Ratio by age, such as what percentage in 20s

As described above, according to the attribute estimation apparatus 1 of the present embodiment, the estimation model holding unit 14 holds an estimation model for performing attribute estimation, and the face attribute estimation unit 15 holds the estimation model holding unit 14. Using the estimated model, the learning sample extraction unit 19 extracts sample data for each group of attribute estimation results by the face attribute estimation unit 15, so that the field sample data is evenly distributed. Can be extracted and relearned with a small amount of field sample data.

Note that another configuration for obtaining the same effect as that of the attribute estimation apparatus of the present embodiment is also conceivable. For example, a device that re-learns an estimated model using sample data for relearning obtained by obtaining a distribution of feature values and assigning correct data to a sample extracted based on the distribution is used for an image captured by a camera. It is also possible to adopt an embodiment in which the method is applied to the output result of the apparatus for estimating the face attribute (corresponding to the on-site sample estimation result DB). It can be used in a service where the shop collects on-site samples, and when the on-site samples are sent to the center, the samples are extracted, given correct answers, and re-learned.

Further, according to the attribute estimation apparatus 1 of the present embodiment, the feature amount distribution calculation unit 18 performs clustering of face image feature amounts for each group of attribute estimation results, and the learning sample extraction unit 19 performs position in the cluster. Since sample data is extracted based on the above, it is possible to extract field sample data evenly.

Moreover, according to the attribute estimation apparatus 1 of the present embodiment, the sample data to which the correct answer is given at the time of re-learning by the re-learning unit 21 is weighted. Can be re-learned effectively even with a small amount of on-site sample data.

Further, according to the attribute estimation device 1 of the present embodiment, the relearning start unit 17 starts relearning when the set condition is satisfied, so that the estimated model at the time of shipment is adapted to the site with high accuracy. be able to.

It should be noted that each process shown in FIGS. 5 and 6 of the present embodiment can be described by a program and stored and distributed in a storage medium such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.

Although the present invention has been described in detail and with reference to specific embodiments, it will be apparent to those skilled in the art that various changes and modifications can be made without departing from the spirit and scope of the invention.

This application is based on a Japanese patent application filed on Mar. 29, 2011 (Japanese Patent Application No. 2011-073443), the contents of which are incorporated herein by reference.

The present invention has an effect that it can be re-learned with a small number of on-site samples, and can be applied to an apparatus for estimating the age and sex of a person.

DESCRIPTION OF SYMBOLS 1 Attribute estimation apparatus 2 Camera 3 Display terminal 3a Correct answer provision part 10 Image | video input part 11 Face detection part 12 Image input part 13 Feature-value extraction part 14 Estimation model holding part 15 Face attribute estimation part 16 On-site sample estimation result DB
17 Re-learning start unit 18 Feature quantity distribution calculation unit 19 Learning sample extraction unit 20 Re-learning data DB
21 Re-learning part 22 Site sample DB
23 Face image attribute estimation result 24

Cluster center

25, 25a, 25b Sample data 30 Initial learning sample DB
40 Re-learning model 50

Teenage group

60, 60a, 60b Field sample data for giving correct answers

Claims

An image input unit for inputting a face image;
An estimation model holding unit for holding an estimation model for performing attribute estimation;
Using an estimation model held in the estimation model holding unit, an attribute estimation unit that performs attribute estimation of the face image input in the image input unit;
An estimation result storage unit for storing the face image and the attribute estimation result by the attribute estimation unit in association with each other;
A sample extraction unit for extracting sample data for each group of attribute estimation results by the attribute estimation unit;
A re-learning unit that updates the estimation model using data obtained by adding a correct answer to the sample data extracted by the sample extraction unit;
An attribute estimation device comprising:
A feature amount distribution calculation unit for obtaining a feature amount distribution of a face image for each group of attribute estimation results by the attribute estimation unit;
The attribute estimation apparatus according to claim 1, wherein the sample extraction unit extracts sample data based on a feature amount distribution of a face image for each group of the attribute estimation results obtained by the feature amount distribution calculation unit.
The feature amount distribution calculation unit performs clustering of face image feature amounts for each group of the attribute estimation results,
The attribute estimation apparatus according to claim 2, wherein the sample extraction unit extracts sample data based on a position in the cluster.
4. The attribute estimation apparatus according to claim 3, wherein the sample extraction unit extracts a sample having a certain distance from the center of the cluster as sample data.
The attribute estimation device according to any one of claims 1 to 4, wherein a weight is given to sample data to which a correct answer is given at the time of re-learning in the re-learning unit.
The attribute estimation device according to claim 5, wherein a weight corresponding to a position in the cluster is given to sample data to which a correct answer is given at the time of re-learning in the re-learning unit.
The attribute estimation device according to any one of claims 1 to 6, further comprising a re-learning start unit that starts re-learning in the re-learning unit when a set condition is satisfied.
The attribute estimation device according to any one of claims 1 to 7, wherein the attribute is an age.
A re-learning sample extraction device including a sample extraction unit that extracts sample data for re-learning an estimation model for each group of attribute estimation results for data in which a face image and its attribute estimation result are associated.
An image input step for inputting a face image;
An estimation model holding step for holding an estimation model for performing attribute estimation;
Using the estimation model held in the estimation model holding step, an attribute estimation step for performing attribute estimation of the face image input in the image input step;
An estimation result accumulating step for accumulating the face image and the attribute estimation result of the attribute estimation step in association with each other;
A sample extraction step for extracting sample data for each group of attribute estimation results by the attribute estimation step;
A relearning step of updating the estimation model using data obtained by giving a correct answer to the sample data extracted in the sample extraction step;
An attribute estimation method comprising: