WO2016158768A1

WO2016158768A1 - Clustering device and machine learning device

Info

Publication number: WO2016158768A1
Application number: PCT/JP2016/059662
Authority: WO
Inventors: 健太西行; 藤吉　弘亘
Original assignee: 株式会社メガチップス
Priority date: 2015-03-30
Filing date: 2016-03-25
Publication date: 2016-10-06

Abstract

In a clustering device 10, a feature extraction unit 11 extracts features from each of a plurality of transition candidate data items 141 used in machine learning that has transition learning introduced, and generates a plurality of transition candidate feature data items 142. A classification unit 12 classifies the transition candidate feature data items 142 into a plurality of groups including a first group, on the basis of a feature amount of each of the plurality of transition candidate feature data 142 items. A pre-domain determination unit 14 determines the first group to be a pre-domain if the number of transition candidate feature data items 142 that have been classified into the first group is equal to or less than a prescribed classification continuation reference value, and makes the determination to further classify the transition candidate feature data items 142 that have been classified into the first group if the number of transition candidate feature data items 142 is larger than the classification continuation reference value.

Description

Clustering apparatus and machine learning apparatus

The present invention relates to a clustering device and a machine learning device used in machine learning using transfer learning.

Machine learning is used for the process of detecting a person from image data and the process of analyzing measurement data by a sensor.

For example, when a person is detected from an image taken by a surveillance camera, identification feature data generated by learning the person's characteristics is used. Specifically, the machine learning device learns the characteristics of a person using a plurality of images (a plurality of learning samples) taken of the person, and generates identification feature data that reflects the learning result. The person detection device detects a person from an image photographed by the surveillance camera using the identification feature data generated by the machine learning device.

When the monitoring camera installation environment is different from the environment in which the learning sample is collected, the appearance of the person photographed by the monitoring camera is different from the appearance of the person in the learning sample. That is, the characteristics of the person photographed by the monitoring camera are different from the characteristics of the person included in the learning sample. Therefore, when the feature data for identification generated from the learning sample is used to detect the person from the image generated by the monitoring camera, the detection accuracy of the person is lowered. In order to improve the detection accuracy of a person, a huge number of learning samples must be prepared according to the installation environment of the camera, which increases the cost.

Therefore, a machine learning method that introduces transfer learning has been proposed. Transfer learning is a technique in which a sample obtained from an environment different from the learning sample collection environment is learned in advance, and the characteristics of the detection target obtained by the prior learning are applied (transferred) to the learning result of the learning sample. Since transfer learning can suppress the number of learning samples, the cost for generating identification feature data can be reduced.

For example, Non-Patent Document 1 discloses a random forest that introduces transfer learning as a machine learning algorithm that introduces transfer learning. Patent Document 1 discloses an attribute classifier that applies transfer learning to a neural network. When the attribute of the first class can be used as the attribute of the second class, the attribute classifier according to Patent Document 1 transfers the attribute of the first class to the second class.

In transfer learning, a set of samples learned in advance is called a prior domain. The target to which the learning result of the prior domain is transferred is called a target domain. When a person is detected from an image captured by a monitoring camera, the target domain is a set of learning samples generated in accordance with the installation environment of the monitoring camera. The prior domain is a set of learning samples generated in an environment different from the installation environment of the monitoring camera.

JP 2012-84117 A

It is known that a phenomenon called negative transfer occurs when transfer learning is used. Negative transfer is a phenomenon in which the accuracy of learning decreases when a prior domain learned in advance for transfer learning includes data that is significantly different from data included in the target domain. For this reason, it is desirable to identify a prior domain effective for transfer learning and use only the identified prior domain for machine learning before executing machine learning with transfer learning introduced.

Patent Document 1 does not disclose a method for generating a prior domain and a method for determining whether or not data used for transfer learning is included in the prior domain.

Non-Patent Document 1 discloses a method for determining whether a prior domain is effective for transfer learning. Specifically, the method according to Non-Patent Document 1 includes a classifier (prior classifier) learned using only a prior domain, and a classifier (transfer classification) that performs transfer learning using the prior domain and the target domain. Sample data). If the discrimination result by the prior discriminator for the sample data is the same as the discrimination result by the transfer discriminator, this prior domain is determined to be effective for transfer learning.

As a result, in the method disclosed in Non-Patent Document 1, a prior domain determined to be ineffective for transfer learning is not used for machine learning that introduces transfer learning. If the number of prior domains to be introduced for transfer learning is one and it is determined that this prior domain is not effective for transfer learning, machine learning incorporating transfer learning cannot be executed.

Therefore, when determining whether or not a prior domain is effective for transfer learning, it is desirable to prepare a plurality of prior domains in advance. However, a method in which a human confirms collected samples one by one and classifies a plurality of prior domains is not practical. In addition, a technique for efficiently creating a plurality of advance domains from collected data has not been developed.

Also, in Non-Patent Document 1, as described above, if the discrimination result for the sample data by the prior discriminator is not the same as the discrimination result by the transfer discriminator, the prior domain is not determined to be effective for transfer learning. It is difficult to create such a prior domain in advance.

Non-Patent Document 2 discloses a method for determining whether a prior domain is effective for transfer learning. Specifically, the method according to Non-Patent Document 2 requires the reliability of the prior domain based on three criteria. The first criterion is that sample data is sent to a discriminator (pre discriminator) trained using only a prior domain and a discriminator (transfer discriminator) that performs transfer learning using a prior domain and a target domain. input. If the discrimination result by the prior discriminator for the sample data is the same as the discrimination result by the transfer discriminator, this prior domain is determined to be effective for transfer learning. The second criterion is the number of data included in the target domain. When the number of data included in the target domain is smaller than a preset reference value, it is determined that the effectiveness is low even if transfer learning is executed. The third criterion is the accuracy output from the transfer discriminator. If the accuracy output from the transfer discriminator is greater than a preset reference value of accuracy, it is determined that the transfer discriminator has high reliability and is effective for transfer learning.

However, the method according to Non-Patent Document 2 is a method that is premised on entrusting the judgment to an expert when the reliability is low, and the accuracy of judging the effectiveness of the prior domain is not high. That is, there is a high possibility that the method according to Non-Patent Document 2 erroneously determines that a prior domain that is not effective for transfer learning is effective. For this reason, a technique for accurately determining the effectiveness of a prior domain is desired.

The present invention is a clustering apparatus. The clustering apparatus includes a clustering feature extraction unit, a classification unit, and a prior domain determination unit. The clustering feature extraction unit generates a plurality of transfer candidate feature data by extracting features from each of a plurality of transfer candidate data used for machine learning using transfer learning. The classifying unit classifies each transfer candidate feature data into a plurality of groups including the first group and the second group based on the features of each of the plurality of transfer candidate feature data generated by the clustering feature extraction unit. The prior domain determination unit determines the first group as a prior domain used for machine learning when the number of transfer candidate feature data classified into the first group by the classification unit is equal to or less than a predetermined classification continuation reference value. If the number of candidate feature data is larger than the classification continuation reference value, it is determined to further classify the transfer candidate feature data classified into the first group.

This makes it possible to efficiently create a prior domain used for machine learning that introduces transfer learning.

The present invention is a machine learning device that learns a detection target by executing machine learning using transfer learning. The machine learning device includes a clustering device and a prior domain evaluation device. The clustering device classifies a plurality of transfer candidate data used for machine learning and generates a prior domain used for machine learning. The prior domain evaluation apparatus evaluates whether the prior domain generated by the clustering apparatus is effective for machine learning. The clustering apparatus includes a clustering feature extraction unit, a classification unit, and a prior domain determination unit. The clustering feature extraction unit extracts features from each of the plurality of transfer candidate data to generate a plurality of transfer candidate feature data. The classifying unit classifies each transfer candidate feature data into a plurality of groups including the first group and the second group based on the features of each of the plurality of transfer candidate feature data generated by the clustering feature extraction unit. The prior domain determination unit determines the first group as a prior domain used for machine learning when the number of transfer candidate feature data classified into the first group by the classification unit is equal to or less than a predetermined classification continuation reference value. The prior domain determination unit determines to further classify the transfer candidate feature data classified into the first group when the number of transfer candidate feature data is larger than the classification continuation reference value. The prior domain evaluation device includes a trial transfer learning unit and a determination unit. When the first group is determined to be a prior domain by the prior domain determination unit, the trial transfer learning unit is configured for learning that includes transfer candidate feature data included in the first group and each of the features to be detected under a predetermined condition. Machine learning is performed using the target domain including data to generate an evaluation classifier for evaluating the prior domain. The determination unit determines whether the first group is effective for machine learning based on the trial transfer identification unit generated by the trial transfer learning unit.

This makes it possible to efficiently create a prior domain used for machine learning that introduces transfer learning, and accurately evaluate the effectiveness of the prior domain in transfer learning.

The present invention is a machine learning device. The machine learning device includes an acquisition unit, a trial transfer learning unit, and a determination unit. The acquisition unit includes a target domain including a plurality of learning data each having a detection target characteristic under a predetermined condition, and a pre-domain including learning candidate data having a detection target characteristic under a condition different from the predetermined condition; To get. The trial transfer learning unit performs machine learning in which transfer learning is introduced using the target domain and the prior domain acquired by the acquisition unit, and generates a decision tree used for detection of the detection target. The determination unit determines whether or not the prior domain acquired by the acquisition unit is effective for transfer learning using all the leaf nodes constituting the decision tree generated by the trial transfer learning unit.

By using all the leaf nodes that make up the decision tree, it is possible to accurately evaluate the effectiveness of the prior domain transfer learning.

Therefore, an object of the present invention is to provide a technique for efficiently creating a plurality of pre-domains from a plurality of data collected for creating the pre-domain.

Also, an object of the present invention is to provide a technique that can accurately determine whether a prior domain is effective for transfer learning.

The objects, features, aspects and advantages of the present invention will become apparent from the following detailed description and the accompanying drawings.

It is a functional block diagram which shows the structure of the machine learning apparatus which concerns on the 1st Embodiment of this invention. It is a functional block diagram which shows the structure of the clustering apparatus shown in FIG. It is a functional block diagram which shows the structure of the prior domain evaluation apparatus shown in FIG. It is a functional block diagram which shows the structure of the selection learning apparatus shown in FIG. It is a flowchart which shows operation | movement of the machine learning apparatus shown in FIG. FIG. 3 is a diagram illustrating an example of distribution of transfer candidate feature data generated from transfer candidate data illustrated in FIG. 1 and learning feature data generated from learning data. It is a figure which shows the range of the prior domain produced | generated by classifying the transfer candidate feature data shown in FIG. It is a flowchart of the prior domain production | generation process shown in FIG. It is a figure which shows the initial structure of the classification tree produced in the prior domain production | generation process shown in FIG. It is a figure which shows an example of a structure when a node is added to the classification tree shown in FIG. It is a figure which shows an example of the structure of a classification tree when the prior domain production | generation process shown in FIG. 5 is complete | finished. It is a flowchart of the prior domain evaluation process shown in FIG. It is a figure which shows the modification of the classification tree shown in FIG. It is a functional block diagram which shows the structure of the machine learning apparatus which concerns on the 2nd Embodiment of this invention. It is a figure which shows an example of the image contained in each of the target domain shown in FIG. 14, and a prior domain. It is a flowchart which shows operation | movement of the machine learning apparatus shown in FIG. It is a figure which shows an example of the change of the competitive value calculated by the competitive value calculation part shown in FIG. It is a figure which shows an example of the change of the reliability calculated by the reliability calculation part shown in FIG. It is a schematic diagram which shows an example of the decision tree which comprises the trial transfer identification part shown in FIG. It is a figure which shows an example of the histogram produced based on the identification result of the target domain by the trial transfer identification part shown in FIG. It is a figure which shows an example of the histogram produced based on the identification result of the prior domain by the trial transfer identification part shown in FIG. It is a figure which shows an example of the change of the distribution difference calculated by the distribution difference calculation part shown in FIG. It is a figure which shows an example of the change of the complexity calculated by the complexity calculation part shown in FIG. It is a functional block diagram which shows the other structure of the machine learning apparatus shown in FIG.1 and FIG.14.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

[First Embodiment]
{1. Configuration of Machine Learning Device 100}
{1.1. overall structure}
FIG. 1 is a functional block diagram showing the configuration of the machine learning device 100 according to the first embodiment of the present invention. The machine learning device 100 illustrated in FIG. 1 executes machine learning using transfer learning using a plurality of transfer candidate data 141 stored in the storage device 140 and the target domain 150A stored in the storage device 150. . The machine learning device 100 generates transfer identification data 35 for identifying a detection target as a result of the above machine learning.

In the present embodiment, the detection target is a person. The transfer identification data 35 generated by the machine learning device 100 is used by a person detection device (not shown) to detect a person from an image taken by a camera. The machine learning apparatus 100 uses a random forest in which transfer learning is introduced as a learning algorithm for generating the transfer identification data 35. Therefore, the transfer identification data 35 is a data group composed of a plurality of decision trees.

Storage device 150 stores target domain 150A. The target domain 150A is a group of a plurality of images having characteristics of a detection target (person) under a predetermined condition. The target domain 150A includes learning

data

151, 151,. The learning data 151 is, for example, an image obtained by photographing a person with a depression angle of 0 °. The target domain 150 A is used when the selection learning device 30 executes the machine learning in which the transfer learning is introduced to generate the transfer identification data 35.

The storage device 140 stores transfer

candidate data

141, 141,. The plurality of transfer candidate data 141 are images obtained by photographing a person, and are collected by searching on the Internet for images obtained by photographing a person. Are classified based on the characteristics of the

transfer candidate data

141, 141,... To generate the

prior domains

145, 145,. .. Of the

prior domains

145, 145,... Are used for generating the transfer identification data 35.

The machine learning device 100 includes a clustering device 10, a prior domain evaluation device 20, and a selection learning device 30.

The clustering apparatus 10 classifies the transfer candidate data 141 based on the characteristics of the transfer candidate data 141 and generates the prior domain 145.

The prior domain evaluation device 20 evaluates whether each of the prior domains 145 generated by the clustering device 10 is effective for transfer learning. The prior domain evaluation device 20 outputs evaluation result data 253A indicating the evaluation result of each prior domain 145 to the selection learning device 30.

The selection learning device 30 selects the prior domain 145 determined to be effective for transfer learning by the prior domain evaluation device 20 from the prior domains 145 generated by the clustering device 10 based on the evaluation result data 253A. The selection learning device 30 executes machine learning in which transfer learning is introduced using the selected prior domain 145 and the target domain 150A stored in the storage device 150. As a result, transfer identification data 35 is generated.

{1.2. Configuration of Clustering Device 10}
FIG. 2 is a functional block diagram showing a configuration of the clustering apparatus 10 shown in FIG. As shown in FIG. 2, the clustering device 10 includes a feature extraction unit 11, a classification unit 12, a variance calculation unit 13, and a prior domain determination unit 14.

The clustering device 10 inputs a plurality of transfer candidate data 141 from the storage device 140. The feature extraction unit 11 extracts HOG (Histograms of Oriented Gradients) feature amounts from each of a plurality of transfer candidate data 141 input to the clustering apparatus 10, and a plurality of transfer candidate features corresponding to each of the transfer candidate data 141. Data 142 is generated. Hereinafter, unless otherwise specified, the HOG feature value is simply referred to as “feature value”.

The classification unit 12 inputs a plurality of transfer candidate feature data 142 from the feature extraction unit 11. The classification unit 12 classifies the transfer candidate feature data 142 into a plurality of groups based on the feature amounts included in each of the plurality of input transfer candidate feature data 142. An algorithm called Density Forest is used to classify the transfer candidate feature data 142. The classification unit 12 classifies the plurality of transfer candidate feature data 142 while creating one classification tree. Each node constituting the classification tree corresponds to each group.

The distribution calculation unit 13 calculates the covariance of each node. The covariance of each node is calculated from the feature amount of the transfer candidate feature data 142 belonging to each node. The covariance of each node is used when classifying the transfer candidate feature data 142 belonging to each node. The covariance is used to determine whether or not to determine a node constituting the classification tree as a prior domain.

The prior domain determination unit 14 determines whether or not the nodes constituting the classification tree satisfy the conditions for the prior domain. When the number of transfer candidate feature data 142 belonging to the determination target node is equal to or less than a preset classification continuation reference value, the prior domain determination unit 14 determines the determination target node as the prior domain.

When the number of transfer candidate feature data 142 belonging to the determination target node is larger than the classification continuation reference value, the prior domain determination unit 14 compares the covariance of the determination target node with a preset distribution reference value. When the covariance of the determination target node is equal to or less than the distribution reference value, the prior domain determination unit 14 determines the determination target node as the prior domain. On the other hand, when the variance of the determination target node is larger than the distribution reference value, the prior domain determination unit 14 determines to further classify the transfer candidate feature data 142 belonging to the determination target node.

{1.3. Configuration of Prior Domain Evaluation Device 20}
FIG. 3 is a functional block diagram showing a configuration of the prior domain evaluation apparatus 20 shown in FIG. As illustrated in FIG. 3, the prior domain evaluation device 20 includes a temporary storage unit 21, a feature extraction unit 22, a trial transfer learning unit 23, a comparative learning unit 24, and a determination unit 25.

The prior domain evaluation device 20 inputs the target domain 150A stored in the storage device 140, and inputs the prior domain 145 generated by the clustering device 10.

The temporary storage unit 21 temporarily stores the prior domain 145 input from the clustering device 10.

The feature extraction unit 22 extracts feature amounts from each of the learning

data

151, 151,... Included in the target domain 150A input to the prior domain evaluation apparatus 20, and a plurality of features corresponding to each learning data 151 are extracted. Learning feature data 152 is generated. The learning feature data 152 generated by the feature extraction unit 22 constitutes the target domain 150B.

The trial transfer learning unit 23 acquires the target domain 150B from the feature extraction unit 22. The trial transfer learning unit 23 acquires, from the temporary storage unit 21, one of the previous domains 145 (the prior domain of interest) as an evaluation target. The trial transfer learning unit 23 performs trial transfer learning using the acquired target domain 150A and the prior domain of interest. Trial transfer learning is machine learning for evaluating the effectiveness of transfer learning of a prior domain of interest. A random forest with transfer learning is used as an algorithm for trial transfer learning. As a result of the trial transfer learning, a trial transfer identification unit 231 corresponding to the target prior domain is generated. The entity of the trial transfer identification unit 231 is a data group including a plurality of decision trees. The trial transfer identification unit 231 is generated for each prior domain 145.

The comparative learning unit 24 performs comparative machine learning (comparative learning) using only the prior domain of interest. A random forest into which transfer learning is not introduced is used as an algorithm for comparative learning. As a result of the comparison learning, a comparison identifying unit 241 corresponding to the target prior domain is generated. The entity of the comparison and identification unit 241 is a data group constituting a plurality of decision trees. The comparison identification unit 241 is generated for each prior domain 145.

The determination unit 25 determines whether or not the prior domain of interest is effective for transfer learning using the identification results obtained by the trial transfer identification unit 231 and the comparison identification unit 241. The determination unit 25 includes a competitive value calculation unit 251, a reliability calculation unit 252, and a transfer evaluation unit 253.

The competitive value calculation unit 251 compares the identification result of the sample data by the comparison identification unit 241 with the identification result of the sample data by the trial transfer identification unit 231. The sample data includes at least one of learning feature data 152 included in the target domain 150B and transfer candidate feature data 142 included in the target prior domain. The competition value calculation unit 251 calculates the competition value 251A based on the comparison result. The competitive value 251A indicates the degree to which the identification result by the comparison identifying unit 241 and the identification result by the trial transfer identifying unit 231 do not match.

The reliability calculation unit 252 calculates the reliability 252A using the identification result of the sample data by the trial transfer identification unit 231. The reliability 252A indicates the reliability of the identification result obtained by the trial transfer identification unit 231.

The transfer evaluation unit 253 evaluates whether the prior domain of interest is effective for transfer learning based on the competitive value 251A and the reliability 252A. The transfer evaluation unit 253 outputs evaluation result data 253A indicating each evaluation of the prior domain 145 to the selection learning device 30.

{1.4. Configuration of Selective Learning Device 30}
FIG. 4 is a functional block diagram showing the configuration of the selective learning device 30 shown in FIG. As illustrated in FIG. 4, the selection learning device 30 includes a prior domain selection unit 31, a feature extraction unit 32, and a transfer learning unit 33.

The prior domain selection unit 31 inputs the prior domain 145 from the clustering apparatus 10 and inputs the evaluation result data 253A from the prior domain evaluation apparatus 20. Based on the input evaluation result data 253A, the prior domain selection unit 31 selects a prior domain 145 that has been evaluated as effective for transfer learning from the prior domains 145 generated by the clustering apparatus 10.

The feature extraction unit 32 acquires the target domain 150A stored in the storage device 150. Similar to the feature extraction unit 22, the feature extraction unit 32 extracts a feature amount from each of the learning

data

151, 151,... Included in the acquired target domain 150 A and generates the target domain 150 B.

The transfer learning unit 33 uses the target domain 150B and the pre-domain 145 selected by the pre-domain selection unit 31 to execute machine learning that introduces transfer learning. The learning algorithm used by the transfer learning unit 33 is the same as the learning algorithm used by the trial transfer learning unit 23. The transfer learning unit 33 generates transfer identification data 35 as a result of machine learning in which transfer learning is introduced.

{2. Outline of operation}
FIG. 5 is a flowchart showing an outline of the operation of the machine learning device 100. As shown in FIG. 5, in the machine learning device 100, the clustering device 10 executes a pre-domain generation process for generating the pre-domain 145 from the

transfer candidate data

141, 141,... Stored in the storage device 140 ( Step S11).

The number of prior domains 145 generated by the clustering device 10 is not particularly limited. Each of the prior domains 145 has transfer candidate feature data 142 generated by extracting feature amounts from the transfer candidate data 141.

The pre-domain evaluation device 20 executes a pre-domain evaluation process for determining whether each of the pre-domains 145 generated by the clustering device 10 is effective for transfer learning (Step S12). The prior domain evaluation device 20 generates evaluation result data 253A as a result of step S12. The evaluation result data 253A is data specifying the prior domain 145 determined to be effective for transfer learning among the prior domains 145 generated by the clustering apparatus 10.

In the selection learning device 30, the prior domain selection unit 31 selects the prior domain 145 determined to be effective for transfer learning from the prior domains 145 generated by the clustering device 10 based on the evaluation result data 253A ( Step S13).

The feature extraction unit 32 (see FIG. 4) acquires the target domain 150A from the storage device 150. The feature extraction unit 32 extracts feature amounts from each of the learning data 151 included in the acquired target domain 150A, and generates a plurality of learning feature data 152 (step S14). The process executed by the feature extraction unit 32 is the same as the process executed by the feature extraction unit 22 shown in FIG. That is, the feature extraction unit 32 generates a target domain 150 B configured by a plurality of learning feature data 152.

The transfer learning unit 33 performs machine learning using transfer learning using the pre-domain 145 selected by the pre-domain selection unit 31 and the target domain 150B generated by the feature extraction unit 32 (step S15). . The transfer learning unit 33 uses the same learning algorithm (random forest that introduced transfer learning) as the learning algorithm used by the trial transfer learning unit 23. Thereby, transfer identification data 35 which is a data group indicating a plurality of decision trees is generated.

Hereinafter, the reason why the advance domain generation process (step S11) and the advance domain evaluation process (step S12) are executed will be described.

FIG. 6 is a diagram illustrating an example of the distribution of the target domain 150B and the transfer candidate feature data 142. FIG. 6 shows an example in which the number of dimensions of the feature quantity of the transfer candidate feature data 142 and the learning feature data 152 is 2, and the distribution of the transfer candidate feature data 142 and the learning feature data constituting the target domain 150B. 152 distribution.

The target domain 150B includes learning feature data 152 generated by extracting feature amounts from the learning data 151. As described above, the plurality of learning data 151 are images including a person photographed at a depression angle of 0 °, and thus have similar characteristics. Therefore, in the two-dimensional space shown in FIG. 6, the variation in the learning feature data 152 is small, and the target domain 150B is limited to a relatively narrow region.

On the other hand, the distribution of the transfer candidate feature data 142 has a larger variation than the feature data 152 for learning. Since the transfer candidate data 141 is collected by searching for a detection target (person) on the Internet, there are various shooting conditions for the person in the transfer candidate data 141. Transfer candidate feature data 142 is generated by extracting feature values from transfer candidate data 141. Therefore, the transfer candidate feature data 142 spreads over the entire two-dimensional space shown in FIG. 6, and its position is random.

Here, a case where a person is detected from an image will be described as an example of machine learning using transfer learning. In machine learning using transfer learning, a target domain and a prior domain are prepared in advance. The target domain is a group of images having the characteristics of the detection target under a predetermined condition. In the present embodiment, the detection target is a person, and the predetermined condition is that the detection target (person) is included in an image captured at a depression angle of 0 °.

The prior domain is a group of images having the characteristics of the detection target under conditions different from the predetermined conditions described above. The prior domain is generated by classifying collected images according to a predetermined rule. For example, when the shooting conditions of each collected image are known, the collected images can be classified according to the shooting conditions. Thereby, the prior domain becomes a set of images having features that are common to each other or similar to each other.

When the machine learning device executes machine learning in which transfer learning is introduced, learning of a prior domain is performed first, and then learning of a target domain is performed. Then, the machine learning device identifies an image having a feature that is common or similar to the feature of the person photographed at the depression angle of 0 °, and transfers the feature of the identified image to the learning result of the image included in the target domain 150B. Let Thereby, the number of images constituting the target domain can be reduced, and the identification accuracy of the person can be improved.

However, if the image features in a certain prior domain are significantly different from the image features in the target domain, a negative transition occurs. This is because the feature of the image in the prior domain is reflected in the learning result of the image in the target domain by transfer learning. As a result, the accuracy of the transfer identification data generated as a result of machine learning in which transfer learning is introduced decreases.

As shown in FIG. 6, when all the transfer candidate feature data 142 spread over the entire two-dimensional space are set as one prior domain, the transfer candidate feature data 142 separated from the region of the target domain 150B is used for transfer learning. become. In this case, it is very likely that a negative transition occurs. In order to prevent the occurrence of negative metastasis, the pre-domain 145 is generated by combining the metastasis candidate feature data 142 having features that are common or similar to each other, and the pre-domain 145 thus generated introduces transfer learning. What is necessary is just to judge whether it is effective for machine learning. The pre-domain generation process (step S11) is executed to generate a pre-domain 145 that is a set of transfer candidate feature data 142 having features that are common to each other or similar to each other.

FIG. 7 is a diagram showing an example in which the transfer candidate feature data 142 shown in FIG. 6 is classified. The clustering apparatus 10 generates the prior domains 145A to 145G by classifying the transfer candidate feature data 142 shown in FIG.

Among the prior domains 145A to 145G, the

prior domains

145A and 145F do not overlap with the target domain 150B. Therefore, the

prior domains

145A and 145F are not effective for machine learning in which transfer learning is introduced. In addition, the prior domain 145D overlaps with the target domain 150B, but the overlapping range is smaller than other prior domains. Therefore, the pre-domain 145D may cause a negative transfer and is not effective for transfer learning.

As described above, the prior domain generation process (step S11) may generate a negative domain that may cause negative transfer (not effective for transfer learning). In order to improve the accuracy of a transfer discriminator generated as a result of machine learning in which transfer learning is introduced, it is desirable to exclude in advance a prior domain that is not effective for transfer learning. Therefore, the prior domain evaluation process (step S12) is performed in order to identify a prior domain effective for transfer learning among the prior domains 145A to 145G generated by the prior domain generation process (step S11).

{3. Prior domain generation process (step S11)}
FIG. 8 is a flowchart of the advance domain generation process (step S11). Referring to FIG. 8, the operation of the clustering device 10 that generates the prior domain 145 from the

transfer candidate data

141, 141,... Stored in the storage device 140 will be described in detail.

{3.1. Extraction of HOG features}
The clustering device 10 acquires all the transfer candidate data 141 stored in the storage device 140. In the clustering apparatus 10, the feature extraction unit 11 (see FIG. 2) extracts HOG feature amounts from each of all acquired transfer candidate data 141 (step S101). Thereby, a plurality of transfer candidate feature data 142 corresponding to each of all transfer candidate data 141 is generated.

The feature extraction unit 11 sets conditions for extracting the HOG feature amount from the transfer candidate data 141 as follows, for example. The color channel of the transfer candidate data 141 is set to gray scale. The size of the transfer candidate data 141 is set to 60 pixels vertically and 30 pixels horizontally.

The cell, block, and the number of gradient directions are set as parameters when extracting the HOG feature value. A cell is a unit area for calculating a gradient direction of luminance. The block is a unit area for creating a histogram in the gradient direction of luminance. The number of gradient directions is the number of divisions in the range of 0 ° to 180 °.

For example, the size of one cell is set to 5 pixels vertically and 5 pixels horizontally. The size of one block is set to 3 pixels vertically and 3 pixels horizontally. The number of gradient directions is set to 9. When the number of gradient directions is 9, the gradient direction of each cell is divided into 9 directions every 20 ° and set to any one of the 9 directions. In this case, the number of dimensions of the transfer candidate feature data 142 is 3240.

{3.2. Determination of whether or not classification is possible at root node 35R}
FIG. 9 is a diagram illustrating an initial structure of the classification tree 35 generated by the classification unit 12. The classification unit 12 uses a density forest as an algorithm for classifying the transfer candidate feature data 142. When a density forest is used, a plurality of classification trees are normally generated, but the classification unit 12 generates only one classification tree.

The classification tree 35 is formed in the process in which the transfer candidate feature data 142 is classified by the classification unit 12. Among the nodes constituting the classification tree 35, a node satisfying a predetermined condition is determined as a prior domain.

The classification unit 12 creates a root node 35R of the classification tree 35 (step S102). The

nodes

35A and 35B shown in FIG. 9 are not generated when step S102 is executed. The classification unit 12 inputs all the transfer candidate feature data 142 generated by the feature extraction unit 11 to the root node 35R (step S103). The number of transfer candidate feature data 142 input to the root node 35R is 30000.

Next, the prior domain determination unit 14 determines whether or not all nodes have been selected as classification target nodes in the classification tree 35 (step S104). Since the root node 35R is not selected as a classification target (No in step S104), the prior domain determination unit 14 selects the root node 35R as a classification target (step S105).

The prior domain determination unit 14 executes step S106 to determine whether or not the root node 35R satisfies the condition as the prior domain. Specifically, the prior domain determination unit 14 acquires the number of transfer candidate feature data 142 belonging to the root node 35R. The prior domain determination unit 14 determines whether or not the number of acquired transfer candidate feature data 142 is larger than a preset classification continuation reference value (step S106). The classification continuation reference value is set to 9270, for example.

The number (30000) of transfer candidate feature data 142 belonging to the root node 35R is larger than the classification continuation reference value (9270) (Yes in step S106). In this case, since the number of transfer candidate feature data 142 belonging to the root node 35R is too large, the root node 35R cannot be used as the prior domain 145.

As described above, when one prior domain includes all the transfer candidate feature data 142, the accuracy of the transfer identification data 35 generated by machine learning using transfer learning decreases. Since the root node 35R includes the transfer candidate feature data 142 that is larger than the classification continuation reference value, the root node 35R includes a large number of transfer candidate feature data 142 that are far away from the region of the target domain 150B, as in the above-described one prior domain. In this case, the prior domain determination unit 14 determines that one of the conditions for classifying the transfer candidate feature data 142 belonging to the root node 35R is satisfied.

The classification continuation reference value is larger than the number of dimensions of the feature amount extracted by the feature extraction unit 11. For example, in this embodiment, the classification continuation reference value is set to 9720, which is three times the number of dimensions (3240) of the transfer candidate feature data 142.

Next, the clustering apparatus 10 executes steps S107 and S108, and determines whether or not the condition for classifying the transfer candidate feature data 142 belonging to the root node 35R is satisfied based on the covariance of the root node 35R. To do.

The prior domain determination unit 14 instructs the classification unit 12 to calculate the covariance 13A (see FIG. 2) of the node to be classified (root node 35R). The classification unit 12 outputs the transfer candidate feature data 142 belonging to the node to be classified (root node 35R) to the variance calculation unit 13 in accordance with an instruction from the prior domain determination unit 14. The variance calculation unit 13 uses the transfer candidate feature data 142 output from the classification unit 12 to calculate the feature value covariance 13A of the transfer candidate feature data 142 belonging to the node to be classified. The variance calculation unit 13 outputs the calculated covariance 13A to the prior domain determination unit 14.

The prior domain determination unit 14 determines whether or not the covariance 13A (covariance of the root node 35R) calculated by the distribution calculation unit 13 is larger than a preset distribution reference value (step S108). It is assumed that the covariance 13A is larger than the dispersion reference value (Yes in step S108).

As described above, the root node 35R includes all the transfer candidate feature data 142, and the variation of all the transfer candidate feature data 142 is very large. In this case, since the covariance 13A is very large, the prior domain determination unit 14 determines that the transfer candidate feature data 142 belonging to the root node 35R can be further classified. The prior domain determination unit 14 instructs the classification unit 12 to classify the transfer candidate feature data 142 belonging to the root node 35R.

{3.3. Classification of transfer candidate feature data 142}
The classification unit 12 generates

nodes

35A and 35B as child nodes of the root node 35R in order to classify the transfer candidate feature data 142 belonging to the root node 35R in accordance with an instruction from the prior domain determination unit 14 (step S109). ).

The classification unit 12 classifies the transfer candidate feature data 142 belonging to the root node 35R as one of the

nodes

35A and 35B generated in step S109 (step S110). Specifically, the classification destination node of the transfer candidate feature data 142 is determined based on the objective function I shown in the following formula (1).

In Expression (1), S is a parent node (root node 35R). S ^L is the left node (node 35A) of the two child nodes, and S ^R is the right node (node 35B) of the two child nodes. Λ (S) is the covariance of the parent node, Λ (S ^L ) is the covariance of the left child node, and Λ (S ^R ) is the covariance of the right child node.

The classification unit 12 provisionally classifies the transfer candidate feature data 142 belonging to the root node 35R in order to calculate the objective function I shown in Expression (1). Specifically, the classification unit 12 sets a provisional branch condition for the transfer candidate feature data 142 as follows.

The number of dimensions of the transfer candidate feature data 142 is 3240. That is, the transfer candidate feature data 142 has 3240 feature amounts. The classification unit 12 randomly selects a k-th (0 ≦ k ≦ 3239) feature amount from among 3240 feature amounts, and randomly sets a threshold value for the k-th feature amount. Thereby, a provisional branch condition is set.

The classification unit 12 provisionally classifies the transfer candidate feature data 142 belonging to the root node 35R into the

node

35A or 35B based on the set branch condition. The variance calculation unit 13 calculates the covariance of the transfer candidate feature data 142 classified into the node 35A and the covariance of the transfer candidate feature data 142 provisionally classified into the node 35B. The covariance of the root node 35R has already been calculated in step S105. The classification unit 12 calculates the objective function I of the root node 35R using these three covariances.

The classification unit 12 sets a plurality of branch conditions in the root node 35R. The classification unit 12 provisionally classifies the transfer candidate feature data 142 based on each branch condition in order to calculate the objective function I corresponding to each branch condition. Based on the tentatively classified transfer candidate feature data 142, the objective function I in each branch condition is calculated. The classification unit 12 specifies the maximum objective function I among the plurality of calculated objective functions I. The classification unit 12 determines to classify the transfer candidate feature data 142 belonging to the root node 35R under the branch condition corresponding to the maximum objective function I. Thereby, the transfer candidate feature data 142 belonging to the root node 35R is classified into one of the

nodes

35A and 35B.

FIG. 10 is a diagram showing the classification tree 35 after the transfer candidate feature data 142 belonging to the root node 35R is classified. At the time when the classification of the transfer candidate feature data 142 into the

nodes

35A and 35B is completed, the child nodes (

nodes

35C and 35D) of the node 35B are not generated.

As a result of classifying 30,000 transfer candidate feature data 142 belonging to the root node 35R, 7000 transfer candidate feature data 142 are classified into the node 35A. 23,000 pieces of transfer candidate feature data 142 are classified into the node 35B. Thereby, step S110 for classifying the transfer candidate feature data 142 belonging to the root node 35R into two child nodes ends.

{3.4. Determination at node 35A}
After the classification of the transfer candidate feature data 142 belonging to the root node 35R is completed, the prior domain determination unit 14 determines whether all the nodes have been selected as the classification target (step S104). Since there are

unselected nodes

35A and 35B (No in Step S104), the prior domain determination unit 14 selects the next node to be determined in the previous order (Step S105). Specifically, the classification unit 12 selects the node 35A.

As shown in FIG. 10, the number of transfer candidate feature data 142 belonging to the node 35A is 7000. Since the number of transfer candidate feature data 142 belonging to the node 35A is equal to or less than the classification continuation reference value (9270) (No in step S106), the prior domain determination unit 14 determines the node 35A as the prior domain 145 (step S111). ). That is, the prior domain determination unit 14 determines not to further classify the transfer candidate feature data 142 belonging to the node 35A, and sets the node 35A as a leaf node.

{3.5. Determination in node 35B}
Next, the prior domain determination unit 14 selects the node 35B as a determination target (step S105). The number of transfer candidate feature data 142 belonging to the node 35B is 23000, which is larger than the classification continuation reference value (9270) (Yes in step S106). Further, it is assumed that the covariance of the node 35B is larger than the dispersion reference value (Yes in step S108). In this case, the prior domain determination unit 14 determines to further classify the transfer candidate feature data 142 belonging to the node 35B.

The classification unit 12 generates child nodes (

nodes

35C and 35D) of the node 35B in response to the determination by the prior domain determination unit 14 for the node 35B (step S109). The classification unit 12 classifies the transfer candidate feature data 142 belonging to the node 35B into one of the

nodes

35C and 35D, similarly to the classification of the transfer candidate feature data 142 in the root node 35R (step S110).

FIG. 11 is a diagram showing the classification tree 35 after the advance domain generation process (step S11) is completed. As shown in FIG. 11, as a result of classifying the transfer candidate feature data 142 belonging to the node 35B into the

nodes

35C and 35D, 15000 transfer candidate feature data 142 are classified into the node 35C, and 8000 transfer candidate feature data. 142 is classified as node 35D.

The number of transfer candidate feature data 142 belonging to the node 35C is larger than the classification continuation reference value (9270) (Yes in step S106). Further, it is assumed that the covariance of the node 35C is larger than the dispersion reference value (Yes in step S108). In this case, the prior domain determination unit 14 determines to further classify the transfer candidate feature data 142 belonging to the node 35C. The classification of the transfer candidate feature data 142 belonging to the node 35C will be described later.

On the other hand, since the number of transfer candidate feature data 142 belonging to the node 35D is equal to or less than the classification continuation reference value (No in step S106), the prior domain determining unit 14 determines the node 35D as the prior domain.

{3.6. End of Classification of Transfer Candidate Feature Data 142}
The classification unit 12 generates

nodes

35E and 35F as child nodes of the node 35C (step S109), and classifies the transfer candidate feature data 142 belonging to the node 35C into

nodes

35E and 35F (step S110).

The number of transfer candidate feature data 142 belonging to the node 35E is 500, which is below the classification continuation reference value (No in step S106). For this reason, the prior domain determination unit 14 determines the node 35E as the prior domain (step S111).

The number of transfer candidate feature data 142 belonging to the node 35F is 14500, which is larger than the classification continuation reference value (Yes in step S106). On the other hand, it is assumed that the covariance of the node 35F is smaller than the dispersion reference value (No in step S108). In this case, the prior domain determination unit 14 determines that the variation in the feature amount distribution of the transfer candidate feature data 142 belonging to the node 35F is very small. For example, the transfer candidate feature data 142 belonging to the node 35F may be generated from the same image. In this case, the prior domain determination unit 14 determines that the transfer candidate feature data 142 included in the node 35F cannot be further classified, and determines the node 35F as a prior domain (step S111). Thereby, since all the nodes constituting the classification tree 35 are selected as the determination targets (Yes in Step S104), the clustering apparatus 10 proceeds to Step S112.

{3.7. Pre-domain exclusion}
The prior domain determination unit 14 checks the number of transfer candidate feature data 142 included in each node determined as the prior domain. If there is a node having the number of transfer candidate feature data 142 equal to or less than a preset discard reference value, the prior domain determination unit 14 excludes this node from the prior domain (step S112). For example, the discard reference value is set to the number of dimensions (3240) of the transfer candidate feature data 142. Specifically, the node 35E determined as the prior domain is excluded from the prior domain because the number of transfer candidate feature data 142 is 500.

As described above, when the number of data used for learning is less than the number of dimensions, the accuracy of the generated transfer identification data 35 may be reduced.

The classification continuation reference value is larger than the number of dimensions of the feature amount extracted by the feature extraction unit 11. In machine learning, when the number of data used for learning is less than the number of dimensions of data used for learning, the learning result of the characteristics of the data used for learning is overestimated, and the accuracy of the transfer identification data 35 decreases. . For this reason, in this embodiment, the discard reference value is set to 3240, which is the number of dimensions of the transfer candidate feature data 142. Thereby, it is possible to prevent the number of transfer candidate feature data 142 belonging to the prior domain 145 from becoming smaller than the number of dimensions of the transfer candidate feature data 142.

In addition, when the number of transfer candidate feature data 142 included in a certain prior domain is smaller than the discard reference value, the transfer candidate feature data 142 included in the previous domain may not have the detection target feature. high.

For example, when collecting images of a person on the Internet, an image in which an object other than a person is taken may be erroneously acquired as transfer candidate data 141. Transfer candidate feature data 142 generated from transfer candidate data 141 collected in error has different features from transfer candidate feature data 142 having human characteristics, and is not effective for transfer learning. Further, since the search condition is an image obtained by photographing a person, it is assumed that a ratio of an image obtained by photographing an object other than a person in the set of transfer candidate data 141 is very small.

Therefore, when the number of transfer candidate feature data 142 belonging to a certain node is smaller than the discard reference value, it is considered that this node is constituted by transfer candidate feature data 142 generated from transfer candidate data 141 collected in error. It is done. The prior domain determination unit 14 excludes a node having the number of transfer candidate feature data 142 equal to or less than the discard reference value from the prior domain.

As a result, the

nodes

35A, 35D, and 35F are determined as the pre-domain 145 in the classification tree 35 shown in FIG. The clustering apparatus 10 outputs the determined three prior domains 145 to the prior domain evaluation apparatus 20 and the selection learning apparatus 30.

As described above, the clustering apparatus 10 extracts features from each of the transfer candidate data 141 to generate a plurality of transfer candidate feature data 142, and in the process of creating the classification tree 35, the plurality of transfer candidate feature data 142. Are classified into nodes of the classification tree 35. When the number of transfer candidate feature data 142 belonging to the node is equal to or smaller than the classification continuation reference value, or the covariance of the transfer candidate feature data 142 belonging to the node is equal to or less than the dispersion reference value, the clustering apparatus 10 To decide. As a result, it is possible to generate a prior domain including transfer candidate feature data 142 having features that are similar or common to each other.

{4. Prior domain evaluation process (step S12)}
FIG. 12 is a flowchart of the prior domain evaluation process (step S12) shown in FIG. When the prior domain evaluation device 20 starts the process shown in step S12, the trial transfer identification unit 231 is not generated in the trial transfer learning unit 23, and the comparison identification unit 241 is generated in the comparison learning unit 24. Not.

{4.1. Generation of target domain 150B}
The prior domain evaluation device 20 acquires the prior domain 145 generated by the clustering device 10. Specifically, the prior domain evaluation apparatus 20 acquires three prior domains 145 (

nodes

35A, 35D, and 35F illustrated in FIG. 11) generated in the process of creating the classification tree 35 illustrated in FIG. The prior domain evaluation device 20 stores the acquired prior domain 145 in the temporary storage unit 21 (step S201).

Hereinafter, the

nodes

35A, 35D, and 35F are referred to as “pre-domain 35A”, “pre-domain 35D”, and “pre-domain 35F”, respectively.

The feature extraction unit 22 (see FIG. 3) acquires the target domain 150A stored in the storage device 150. The feature extraction unit 22 generates a plurality of pieces of learning feature data 152 corresponding to each of the learning data 151 by extracting feature amounts from each of the learning data 151 included in the acquired target domain 150A (step S1). S202). As a result, a target domain 150B composed of a plurality of learning feature data 152 is generated. The feature extraction unit 22 outputs the generated target domain 150B to the trial transfer learning unit 23.

The feature extraction unit 22 extracts feature amounts under the same conditions as when the feature extraction unit 11 (see FIG. 2) generates the transfer candidate feature data 142 from the transfer candidate data 141. Therefore, the number of dimensions of the learning feature data 152 is 3240, which is the same as the number of dimensions of the transfer candidate feature data 142. The reason for this will be described later.

The pre-domain evaluation device 20 selects one pre-domain to be evaluated from the pre-domain 145 stored in the temporary storage unit 21 as to whether it is effective for transfer learning (Step S203). Specifically, the advance domain 35A is first selected from the

advance domains

35A, 35D, and 35F stored in the temporary storage unit 21.

{4.2. Comparative learning and trial transfer learning}
The comparative learning unit 24 inputs the prior domain 35A selected in step S203. The comparative learning unit 24 learns the input prior domain 35A (step S204). The learning algorithm of the comparative learning unit 24 is a random forest in which transfer learning is not introduced. The comparison learning unit 24 generates a comparison identification unit 241 that reflects the learning result of the prior domain 35A by executing step S204. The comparison identification unit 241 is a data group indicating the structure of a plurality of decision trees.

The trial transfer learning unit 23 acquires the target domain 150B from the feature extraction unit 22, and acquires the prior domain 35A from the temporary storage unit 21. The trial transfer learning unit 23 performs machine learning using transfer learning using the input target domain 150B and the prior domain 35A (step S205). The learning algorithm of the trial transfer learning unit 23 is a random forest in which transfer learning is introduced. The trial transfer learning unit 23 generates a trial transfer identification unit 231 reflecting the learning results of the target domain 150A and the prior domain 35A by executing step S205. The trial transfer identification unit 231 is a data group indicating the configuration of a plurality of decision trees. Since the learning algorithm and domain used in the trial transfer learning unit 23 are different from those of the comparison learning unit 24, the structure of the trial transfer identification unit 231 is different from the structure of the comparison identification unit 241.

{4.3. Prior domain evaluation (step S206)}
The determination unit 25 uses the trial transfer identification unit 231 generated by the trial transfer learning unit 23 and the comparison identification unit 241 generated by the comparison learning unit 24 so that the prior domain 35A to be evaluated is effective for transfer learning. Whether or not (step S206).

The determination unit 25 calculates two types of parameters, a competitive value 251A and a reliability 252A, in order to determine the effectiveness of transfer learning. When calculating the reliability 252A, the determination unit 25 uses the identification result by the trial transfer identification unit 231 of the data included in the sample group. Here, the sample group is a set of the learning feature data 152 included in the target domain 150B and the transfer candidate feature data 142 included in the prior domain 35A to be evaluated. Hereinafter, data included in the sample group is referred to as “sample data”. When calculating the competitive value 251A, the determination unit 25 uses the identification result by the comparison identification unit 241 in addition to the identification result by the trial transfer identification unit 231.

{4.3.1. Calculation of competitive value 251A}
The competition value calculation unit 251 calculates the competition value 251A based on the comparison result between the label of each image generated by the trial transfer identification unit 231 and the label of each image generated by the comparison identification unit 241.

The trial transfer identification unit 231 inputs any one of the sample data included in the sample group. The trial transfer identification unit 231 performs a person identification process on the sample data, and generates a label 23A indicating the identification result. The value of the label 23A is 0 or 1, for example. When the label 23A is 0, the label 23A indicates that the sample data does not include a human feature. When the label 23A is 1, the label 23A indicates that the sample data includes a human feature. The trial transfer identification unit 231 outputs the generated label 23A to the conflict value calculation unit 251.

Note that the trial transfer identification unit 231 calculates not only the label 23A but also the accuracy 23B indicating the probability of the label 23A as the identification result of the sample data. The accuracy 23B is used for calculation of the reliability 252A described later.

The comparison identification unit 241 inputs the same data as the sample data input to the trial transfer identification unit 231. The comparison and identification unit 241 performs a person identification process on the sample data, and generates a label 24A indicating the identification result. The value of the label 24A is 0 or 1 like the label 23A. When the label 24A is 0, the label 24A indicates that the sample data does not include a human feature. When the label 24A is 1, the label 24A indicates that the sample data includes a human feature. The comparison identification unit 241 outputs the generated label 24A to the conflict value calculation unit 251.

The competitive value calculation unit 251 calculates the competitive value 251A using the

labels

23A and 24A generated from the sample data. The competition value 251A is calculated by the following equation (2).

In the formula (2), E _c1 indicates the competition value 251A. X indicates a sample group. x represents an element (sample data) constituting the sample group. M (x) indicates a label 24A generated from the element x. T (x) indicates a label 23A generated from the element x. [M (x) ≠ T (x)] indicates the number of sample data in which the label 24A and the label 23A do not match. | X | is the number of elements constituting the sample group X.

The competition value 251A calculated by the equation (2) indicates the probability that the label 23A and the label 24A generated from the same sample data do not match. The competitive value 251A is a numerical value of 0 or more and 1 or less. The closer the competition value 251A is to 0, the higher the competition value 251A is, the higher the effectiveness of the prior domain 35A in transfer learning. On the other hand, the closer the competitive value 251A approaches 1. It shows that the effectiveness of the prior domain 35A in transfer learning is low.

When there are many differences between the learning feature data 52 included in the target domain 150B and the transfer candidate feature data 142 included in the evaluation target previous domain 35A, the previous domain 35A is not effective for transfer learning. In this case, the conflict value 251A approaches 1. The reason will be described below.

As described above, the comparative learning unit 24 learns only the prior domain 35A. For this reason, only the learning result of the prior domain 35 A is reflected in the comparison and identification unit 241.

On the other hand, the trial transfer identification unit 231 executes machine learning in which transfer learning is introduced using the target domain 150A and the prior domain 35A. However, when there are many differences between the learning feature data 152 included in the target domain 150B and the transfer candidate feature data 142 included in the evaluation target prior domain 35A, the transfer candidate feature data 142 included in the previous domain 35A is learned. The result is not reflected in the learning result of the learning feature data 152. That is, it can be considered that the trial transfer identification unit 231 and the comparison identification unit 241 are generated by learning different domains. In this case, the case where the identification results of the trial transfer identification unit 231 and the comparison identification unit 241 do not match increases, and the competition value 251A increases. Therefore, it is possible to determine whether or not the prior domain 35A is effective for transfer learning based on the competitive value 251A.

{4.3.2. Reliability calculation}
The reliability calculation unit 252 calculates the reliability 252A based on the label 23A and the accuracy 23B of each image generated by the trial transfer identification unit 231. In the calculation of the reliability 252A, the identification result of the sample data by the comparison / identification unit 241 is not used.

As described above, the trial transfer identification unit 231 generates the label 23A indicating the identification result of the person with respect to the sample data and the accuracy 23B indicating the likelihood of the label 23A. The accuracy 23B is a value of 0 or more and 1 or less, and the closer the accuracy 23B is to 1, the smaller the possibility that the label 23A is erroneous.

The reliability calculation unit 252 inputs the label 23A and the accuracy 23B of each sample data from the trial transfer identification unit 231. The reliability calculation unit 252 calculates the reliability 252A by calculating the following equation (3) using the label 23A and the accuracy 23B of each input sample data.

In the above formula (3), E _c2 indicates the reliability 252A. x represents an element (sample data) constituting the sample group X, similarly to the above formula (2). | X | is the number of elements of the sample group X. P _T (x) indicates the accuracy 23B of the element x. P _T (x) is the average of the probabilities of the classes set in the leaf nodes that the sample data has reached in each decision tree when the sample data is input to each decision tree constituting the trial transfer identification unit 231. . T (x) indicates the label 23A of the element x. y is a label (y = 1) indicating the presence of a person. That is, the reliability 252A is a value obtained by dividing the total value of the accuracy 23B calculated when the label 23A matches the label y by the number of elements of the sample group X. The reliability 252A is a value of 0 or more and 1 or less, and the closer to 1, the higher the effectiveness of the prior domain 35A in transfer learning.

When the transfer candidate feature data 142 of the prior domain 35A has a feature amount similar to the feature amount of the learning feature data 152, the trial transfer learning unit 23 learns the transfer candidate feature data 142 by trial transfer learning. The result is transferred to the learning result of the learning feature data 152. The trial transfer identification unit 231 reflects the learning results of the learning feature data 152 and the transfer candidate feature data 142 of the prior domain 35A. When the trial transfer identification unit 231 performs identification processing on each data of the sample group used for the trial transfer learning, the label 23A is 1 and the accuracy 23B is considered to approach 1. Therefore, when the learning feature data 152 is similar to the transfer candidate feature data 142 of the previous domain 35A (when the previous domain 35A is effective in transfer learning), the reliability 252A approaches 1.

{4.3.3. Evaluation of prior domain by transfer evaluation unit 253}
The transfer evaluation unit 253 inputs the competitive value 251A and the reliability 252A. The transfer evaluation unit 253 evaluates the effectiveness of the prior domain 35A in transfer learning based on the input competitive value 251A and reliability 252A.

The transfer evaluation unit 253 calculates a comprehensive evaluation value using the following equation (4).

In Equation (4), E is a comprehensive evaluation value obtained from the competitive value 251A and the reliability 252A. As the effectiveness of pre-domain 35A in transfer learning decreases, the competitive value 251A increases. On the other hand, the reliability 252A decreases conversely. In order to match the tendency of the reliability 252A with the tendency of the competitive value 251A, a value obtained by subtracting the reliability 252A from 1 is used to calculate the comprehensive evaluation value.

The comprehensive evaluation value calculated by the above equation (4) is a value between 0 and 1 and approaches 0 as the effectiveness of transfer learning increases. If the calculated overall evaluation value is smaller than a preset threshold value, the transfer evaluation unit 253 determines that the prior domain 35A is effective in transfer learning.

{4.4. Specify next advance domain}
After the evaluation of effectiveness in transfer learning of the prior domain 35A (step S206) is completed, the trial transfer identification unit 231 and the comparison identification unit 241 used for evaluating the effectiveness of the prior domain 35A are deleted (step S207). . This is because the trial transfer identifying unit 231 and the comparison identifying unit 241 corresponding to the prior domain 35A are not used in the evaluation of the effectiveness of other prior domains in transfer learning.

The prior domain evaluation device 20 determines whether all the prior domains stored in the temporary storage unit 21 have been selected (step S208). When all the pre-domains have not been selected (No in step S208), the pre-domain evaluation apparatus 20 returns to step S203 in order to evaluate the effectiveness in transfer learning of the non-selected pre-domains. Thereby, the effectiveness of the

prior domains

35D and 35F in transfer learning is evaluated.

{4.5. Generation of Evaluation Result Data 253A}
When all the prior domains have been selected (Yes in step S208), the transfer evaluation unit 253 creates evaluation result data 253A indicating the evaluation results of the

prior domains

35A, 35D, and 35F. The number of prior domains determined to be effective for transfer learning is not particularly limited. The transfer evaluation unit 253 outputs the created evaluation result data 253A to the selection learning device 30.

Refer again to FIG. In the selection learning device 30, the prior domain selection unit 31 selects the prior domains 35 A, 35 D, and 35 F determined to be effective for transfer learning from the prior domains 145 generated by the clustering device 10 based on the evaluation result data 253 A. Is selected (step S13). The feature extraction unit 32 (see FIG. 4) acquires the target domain 150A from the storage device 150, and extracts a feature amount from each of the learning data 151 included in the acquired target domain 150A (step S14). Thereby, the target domain 150B including the learning feature data 152 is generated. The feature extraction unit 32 extracts feature amounts under the same conditions as when the feature extraction unit 22 (see FIG. 2) extracts feature amounts from the learning data 151.

The transfer learning unit 33 executes machine learning using transfer learning using the selected

prior domains

35A, 35D, and 35F and the target domain 150B generated by the feature extraction unit 32 (step S5). Thereby, transfer identification data 35 which is a data group indicating a plurality of decision trees is generated.

As described above, the machine learning device 100 extracts the features from the

transfer candidate data

141, 141,... Stored in the storage device 140, and generates the transfer candidate feature data 142, 142,. The machine learning device 100 classifies the transfer candidate feature data 142, 142,... Into a plurality of groups based on the extracted feature values. The machine learning device 100 determines whether to determine the classified group as a prior domain based on the number or covariance of the transfer candidate feature data 142 in the classified group. Thereby, the prior domain used for transfer learning can be efficiently generated from transfer candidate data 141.

{Modifications}
In the first embodiment, the case where the clustering device 10 generates the binary tree as the classification tree 35 using the density forest when classifying the transfer candidate feature data 142 has been described as an example. Not limited. The clustering apparatus 10 may classify the transfer candidate feature data 142 using another classification algorithm such as a k-means method. In this case, the number of child nodes created in step S109 (see FIG. 8) may be three or more.

The clustering apparatus 10 may classify the transfer candidate feature data 142 using two or more classification algorithms. For example, the clustering apparatus 10 determines the classification algorithm based on whether or not the number of transfer candidate feature data 142 belonging to the classification target node is larger than a reference value (algorithm change reference value) for determining the change of the classification algorithm. To decide.

FIG. 13 is a diagram showing an example of the classification tree 35 generated using the k-means method and the density forest. For example, assume that the algorithm change reference value is set to 25000.

The number of transfer candidate feature data 142 belonging to the root node 35R is 30000, which is larger than the algorithm change reference value. In this case, the clustering apparatus 10 generates

nodes

36A, 36B, and 36C as child nodes of the root node 35R. Then, the clustering device 10 classifies the transfer candidate feature data 142 belonging to the root node 35R into the

nodes

36A, 36B, and 36C using the k-means method.

The numbers of transfer candidate feature data 142 belonging to the

nodes

36A and 36C are 5000 and 8000, which are equal to or less than the classification continuation reference value (9270). The clustering apparatus 10 determines the

nodes

36A and 36C as the prior domains. On the other hand, the number of transfer candidate feature data 142 belonging to the node 36B is 17000, which is larger than the classification continuation reference value. In this case, the clustering apparatus 10 further classifies the transfer candidate feature data 142 belonging to the node 36B.

Since the number (17000) of the transfer candidate feature data 142 belonging to the node 36B is equal to or less than the algorithm change reference value (25000), the clustering apparatus 10 uses the density forest to classify the transfer candidate feature data 142 belonging to the node 36B. To decide. The clustering device 10 generates

nodes

36D and 36E as child nodes of the node 36B, and classifies the transfer candidate feature data 142 belonging to the node 36B.

Thus, the classification of the transfer candidate feature data 142 can be performed at high speed by switching the classification algorithm according to the number of transfer candidate feature data 142 belonging to the node to be classified.

In the first embodiment, the example in which the selection learning device 30 (see FIG. 4) includes the feature extraction unit 32 has been described. However, the present invention is not limited to this. The selection learning device 30 may generate the transfer identification data 35 by using the target domain 150B generated by the feature extraction unit 22 included in the prior domain evaluation device 20 (see FIG. 3). Further, the prior domain evaluation apparatus 20 may generate the transfer candidate feature data 142 by extracting the feature amount from the transfer candidate data 141 corresponding to each of the prior domains 145. Alternatively, the selection learning device 30 may generate the transfer candidate feature data 142 by extracting the feature amount from the transfer candidate data 141 corresponding to the prior domain determined to be effective for transfer learning.

In any case, the transfer candidate feature data 142 used in each of the clustering device 10, the prior domain evaluation device 20, and the selection learning device 30 is generated by extracting feature amounts from the transfer candidate data 141 under the same conditions. It is desirable. Similarly, the learning feature data 152 is preferably generated by extracting feature amounts from the learning data 151 under the same conditions. The reason will be described below.

For example, when the feature quantity extraction conditions are different between the clustering device 10 and the prior domain evaluation device 20, the transfer candidate feature data 142 generated by the clustering device 10 is the distribution in the transfer candidate feature data 142 in the prior domain evaluation device 20. Have a different distribution. The positional relationship between the target domain and the prior domain differs between the transfer candidate feature data 142 generated by the clustering apparatus 10 and the transfer candidate feature data 142 of the prior domain evaluation apparatus 20. As a result, in the prior domain evaluation device 20, the accuracy of determining whether the prior domain generated by the clustering device 10 is valid for transfer learning is reduced.

Similarly, when the feature amount extraction conditions are different between the prior domain evaluation device 20 and the selection learning device 30, the distribution of the transfer candidate feature data 142 in the prior domain 145 determined to be valid by the prior domain evaluation device 20 changes. . As a result, the learning accuracy of machine learning using transfer learning in the selection learning device 30 may be reduced, and the person identification accuracy using the transfer identification data 35 may be reduced.

On the other hand, by aligning the feature quantity extraction conditions in the clustering device 10, the prior domain evaluation device 20, and the selective learning device 30, the accuracy when evaluating the effectiveness of the prior domain, transfer identification data 35 is generated. It is possible to prevent the accuracy of learning from being reduced.

In the first embodiment, the case where the trial transfer learning unit 23, the comparison learning unit 24, and the transfer learning unit 33 use a random forest as a learning algorithm has been described as an example, but the present invention is not limited to this. For example, the trial transfer learning unit 23, the comparison learning unit 24, and the transfer learning unit 33 may use various algorithms such as ID3 (Iterative Dichotomiser 3), boosting, and neural network. Regardless of which learning algorithm is used, the trial transfer learning unit 23 and the transfer learning unit 33 execute machine learning that introduces transfer learning, and the comparative learning unit 24 executes machine learning that does not introduce transfer learning. do it.

In the first embodiment, the transfer evaluation unit 253 has described the example in which the comprehensive evaluation value is calculated by multiplying the competitive value 251A and the reliability 252A. However, the present invention is not limited to this. For example, the transfer evaluation unit 253 may calculate the total of the competitive value 251A and the reliability 252A as a comprehensive evaluation value. That is, the transfer evaluation unit 253 may calculate a comprehensive evaluation value using the competitive value 251A and the reliability 252A.

In the first embodiment, the case where the machine learning device 100 extracts the HOG feature amount from each of the transfer candidate data 141 and the learning data 151 has been described as an example, but the present invention is not limited to this. For example, the machine learning device 100 may extract a Haar-like feature value when learning a human face. The machine learning device 100 may appropriately change the feature amount extracted from the transfer candidate data 141 and the learning data 151 according to the learning target.

In the first embodiment, the example in which the machine learning device 100 generates the transfer identification data 35 for detecting a person has been described. However, the present invention is not limited to this. The learning target may be measurement data measured by a sensor. The type of sensor is not particularly limited, and various measurement data such as an acceleration sensor and an optical sensor can be used. For example, machine learning may be performed in order to use measurement data of these sensors in order to automatically drive a car.

[Second Embodiment]
{1. Configuration of Machine Learning Device 500}
FIG. 14 is a functional block diagram showing the configuration of the machine learning device 500 according to the second embodiment of the present invention. A machine learning device 500 illustrated in FIG. 14 performs machine learning in which transfer learning is introduced, and generates transfer identification data 80. The machine learning device 500 uses the target domain 61 and a prior domain determined to be effective for transfer learning among the prior domains 62 to 64 when executing machine learning with transfer learning introduced. The transfer identification data 80 is used by a person detection device (not shown) to detect a person from a captured image generated by a camera.

In the present embodiment, an example will be described in which the machine learning device 500 generates transfer identification data 80 for detecting a person from an image photographed at a depression angle of 0 °.

The machine learning device 500 executes machine learning (trial learning) for evaluating whether or not each of the prior domains 62 to 64 is effective for transfer learning before the transfer identification data 80 is generated. Trial learning is machine learning in which transfer learning is introduced, and is different in part from machine learning for generating transfer identification data 80. In trial learning, a prior domain used for machine learning in which transfer learning is introduced is selected one by one from the prior domains 62 to 64.

The machine learning device 500 evaluates the effectiveness of transfer learning for each of the prior domains 62 to 64 based on the result of trial learning. The machine learning device 500 generates the transfer identification data 80 by executing machine learning using transfer learning using the target domain 61 and the prior domain determined to be effective for transfer learning.

The target domain 61 is a group of a plurality of images having the characteristics of a detection target (person) under a predetermined condition. The prior domains 62 to 64 are a group of a plurality of images having the characteristics of the detection target under a condition different from the predetermined condition. The prior domains 62 to 64 are generated by classifying a plurality of images according to a predetermined rule. Details of the target domain 61 and the prior domains 62 to 64 will be described later.

As illustrated in FIG. 14, the machine learning device 500 includes an acquisition unit 51, a trial transfer learning unit 52, a comparison learning unit 53, a determination unit 54, and a selective transfer learning unit 55.

In addition, you may use each component of the machine learning apparatus 500 for the machine learning apparatus 100 which concerns on 1st Embodiment. In this case, the trial transfer learning unit 52 corresponds to the trial transfer learning unit 23 (see FIG. 3) in the first embodiment. The comparative learning unit 53 corresponds to the comparative learning unit 24 (see FIG. 3) in the first embodiment. The determination unit 54 corresponds to the determination unit 25 (see FIG. 3) in the first embodiment. The selective transfer learning unit 55 corresponds to the selective learning device 30 (see FIG. 1).

The acquisition unit 51 acquires the target domain 61 and the prior domains 62 to 64 stored in the storage device 60. The acquisition unit 51 does not acquire the prior domains 62 to 64 at once, but selects one of the prior domains 62 to 64 as one machine domain subject to machine learning in the trial transfer learning unit 52 and the comparative learning unit 53. get.

The trial transfer learning unit 52 inputs the target domain 61 acquired by the acquisition unit 51 and one prior domain (attention prior domain) acquired by the acquisition unit 51. The trial transfer learning unit 52 performs machine learning (trial learning) for evaluating the effectiveness of transfer learning using the input target domain 61 and the prior domain of interest, and as a result, the trial transfer identification unit 521. Is generated. The trial transfer identification unit 521 is generated for each prior domain. The trial transfer learning unit 52 uses a random forest in which transfer learning is introduced as a learning algorithm. Specifically, the algorithm used by the trial transfer learning unit 52 is called transfer forest, and weights data included in the prior domain using covariates during transfer learning. Therefore, the entity of the trial transfer identification unit 521 is a data group including a plurality of decision trees.

The comparison learning unit 53 performs machine learning (comparison learning) for comparison using only the target prior domain, and as a result, generates a comparison identification unit 531. The comparison identification unit 531 is generated for each prior domain. The comparative learning unit 53 uses a random forest that does not introduce transfer learning as a learning algorithm. Accordingly, the entity of the comparison and identification unit 531 is a data group including a plurality of decision trees different from the plurality of decision trees that constitute the trial transfer identification unit 521.

The determination unit 54 uses the trial transfer identification unit 521 and the comparison identification unit 531 to determine whether the prior domain of interest is effective for transfer learning. The determination unit 54 includes a competitive value calculation unit 541, a reliability calculation unit 542, a distribution dissimilarity calculation unit 543, a complexity calculation unit 544, and a transfer evaluation unit 545.

The competitive value calculation unit 541 compares the identification result of the sample data by the comparison identification unit 531 with the identification result of the sample data by the trial transfer identification unit 521. The sample data is an image included in the target domain 61 and an image included in the target prior domain. The competition value calculation unit 541 calculates the competition value 541A based on the comparison result. The competitive value 541A indicates the degree to which the identification result by the comparison identifying unit 531 and the identification result by the trial transfer identifying unit 521 do not match.

The reliability calculation unit 542 calculates the reliability 542A using the identification result of the sample data generated by the trial transfer identification unit 521. The reliability 542A indicates the reliability of the identification result obtained by the trial transfer identification unit 521.

The distribution dissimilarity calculation unit 543 calculates the distribution dissimilarity based on the classification result of the image included in the target domain 61 by the trial transfer identification unit 521 and the classification result of the image included in the target prior domain by the trial transfer identification unit 521. 543A is calculated. The classification of images is performed by a decision tree constituting the trial transfer identification unit 521. The distribution dissimilarity 543A indicates how much the classification result of the image included in the target prior domain differs from the classification result of the image included in the target domain 61.

The complexity calculator 544 calculates the complexity 544A based on the structure of the decision tree constituting the trial transfer identification unit 521. The complexity 544A indicates the complexity of the decision tree constituting the trial transfer identification unit 521.

The transfer evaluation unit 545 evaluates whether the prior domain of interest is effective for transfer learning based on the competitive value 541A, the reliability 542A, the distribution dissimilarity 543A, and the complexity 544A. The transfer evaluation unit 545 notifies the selective transfer learning unit 55 of the evaluation result of the attention prior domain.

The selective transfer learning unit 55 specifies a prior domain to be used for transfer learning based on each evaluation result of the prior domains 62 to 64 notified from the transfer evaluation unit 545. The selective transfer learning unit 55 acquires the target domain 61 and the prior domain used for transfer learning via the acquisition unit 51. The selected transfer learning unit 55 performs machine learning using transfer learning using the acquired target domain 61 and the prior domain, and generates transfer identification data 80. The selective transfer learning unit 55 uses a learning algorithm (random forest into which transfer learning is introduced) used by the trial transfer learning unit 52.

{2. Target domain and advance domain}
Hereinafter, the target domain 61 and the prior domains 62 to 64 will be described. The reason for determining whether or not the prior domains 62 to 64 are effective for transfer learning before the machine learning device 500 generates the transfer identification data 80 will be described.

FIG. 15 is a diagram showing an example of images belonging to the target domain 61 or the prior domains 62 to 64 stored in the storage device 60 shown in FIG.

As described above, it is assumed that the person detection device (not shown) using the transfer identification data 80 detects a person from an image taken at a depression angle of 0 °. In this case, as shown in FIG. 15, the target domain 61 includes images 61A to 61C obtained by photographing a person with a depression angle of 0 °. Actually, the target domain 61 includes not only the images 61A to 61C but also a plurality of other images obtained by photographing a person at a depression angle of 0 °.

That is, the target domain 61 includes a plurality of learning data having the characteristics of the detection target under a predetermined condition. In the present embodiment, the detection target is a person. The predetermined condition is that the detection target (person) is included in an image captured at a depression angle of 0 °. The target domain 61 is used to generate the transfer identification data 80 regardless of the determination result for each of the prior domains 62 to 64.

The pre-domains 62 to 64 each include a plurality of images obtained by photographing a person at a depression angle greater than 0 °. As shown in FIG. 15, the pre-domain 62 includes images 62A to 62C obtained by photographing a person at a depression angle of 20 °. The prior domain 63 includes images 63A to 63C obtained by photographing a person at a depression angle of 30 °. The prior domain 64 includes images 64A to 64C obtained by photographing a person at a depression angle of 50 °. Actually, each of the prior domains 62 to 64 includes not only the image shown in FIG. 15 but also other images taken at the respective depression angles, but the display of other images is omitted in FIG.

The pre-domains 62 to 64 are generated by classifying a plurality of images obtained by photographing a person at a depression angle greater than 0 ° according to the depression angle at the time of shooting. That is, the prior domains 62 to 64 are a set of data having the characteristics of the detection target under conditions different from the predetermined conditions.

Evaluation of the effectiveness of transfer learning for the prior domains 62 to 64 is performed for the following reasons. The images included in the prior domains 62 to 64 may have the same characteristics as the characteristics of the images 61A to 61C included in the target domain 61. Transfer learning specifies an image having the same characteristics as the image included in the target domain 61 among the images included in the prior domain, and applies the characteristics of the specified image to learning of the image included in the target domain 61. .

However, if a certain pre-domain is a set of images having features that are significantly different from the features of the images included in the target domain 61, a negative transition occurs. This is because the characteristics of the image included in the prior domain are reflected in the transfer identification data 80 by transfer learning. The machine learning device 500 evaluates whether or not the prior domains 62 to 64 are effective for the transfer learning in order to exclude the prior domains that are likely to cause the negative transfer from the generation of the transfer identification data 80.

{3. Operation of Machine Learning Device 500}
FIG. 16 is a flowchart showing the operation of the machine learning device 500. When the machine learning device 500 starts the process shown in FIG. 16, the trial transfer identification unit 521 is not generated in the trial transfer learning unit 52, and the comparison identification unit 531 is not generated in the comparison learning unit 53. .

{3.1. Get domain}
First, in the machine learning device 500, the acquisition unit 51 acquires the target domain 61 from the storage device 60 (step S21). The acquisition unit 51 acquires a prior domain in which the effectiveness of transfer learning has not been evaluated among the prior domains 62 to 64 stored in the storage device 60 (step S22). Specifically, the acquisition unit 51 first acquires the prior domain 62 among the prior domains 62 to 64.

{3.2. Comparative learning and trial learning}
The comparative learning unit 53 inputs the prior domain 62 acquired by the acquisition unit 51. The comparative learning unit 53 learns the input prior domain 62 (step S23). The learning algorithm of the comparative learning unit 53 is a random forest in which transfer learning is not introduced. The comparison learning unit 53 generates a comparison identification unit 531 reflecting the learning result of the prior domain 62 by executing step S23. The comparison and identification unit 531 includes a plurality of decision trees.

The trial transfer learning unit 52 inputs the target domain 61 and the prior domain 62 acquired by the acquisition unit 51. The trial transfer learning unit 52 performs machine learning using transfer learning by using the input target domain 61 and the prior domain 62 (step S24). The learning algorithm of the trial transfer learning unit 52 is a random forest in which transfer learning is introduced. The trial transfer learning unit 52 generates a trial transfer identification unit 521 reflecting the learning results of the target domain 61 and the prior domain 62 by executing step S24. The trial transfer identification unit 521 includes a plurality of decision trees. Since the learning algorithm and domain used in the trial transfer learning unit 52 are different from those of the comparative learning unit 53, the configuration of the trial transfer identification unit 521 is different from the configuration of the comparison identification unit 531.

Note that, in steps S23 and S24, the example in which the images 61A to 61C included in the target domain 61 and the images 62A to 62C included in the prior domain 62 are learned as they are has been described. However, in practice, a feature extraction image obtained by extracting a predetermined feature amount from these images is used for learning. The extracted feature amount is, for example, a HOG (Histograms of Oriented Gradients) feature amount in which the direction of an edge in a unit region in the image is histogrammed, or a Haar-like feature amount indicating a light / dark difference in a plurality of regions in the image Etc. can be used.

{3.3. Evaluation of transfer learning (step S25)}
The determination unit 54 uses the trial transfer identification unit 521 generated by the trial transfer learning unit 52 and the comparison identification unit 531 generated by the comparison learning unit 53 to determine whether the prior domain 62 is effective for transfer learning. Is determined (step S25).

判断 Determining unit 54 calculates four types of parameters of competitive value 541A, reliability 542A, distribution dissimilarity 543A, and complexity 544A in order to determine the effectiveness of transfer learning.

When the determination unit 54 calculates the reliability 542A, the distribution dissimilarity 543A, and the complexity 544A, the determination unit 54 uses the identification result by the trial transfer identification unit 521 of each image included in the sample group. Here, the sample group is an image included in a set in which the target domain 61 and the prior domain 62 that is an evaluation target of transfer learning effectiveness are combined. When calculating the competitive value 541A, the determination unit 54 uses the identification result by the comparison and identification unit 531 of each image included in the sample group, in addition to the identification result by the trial transfer identification unit 521.

The details of each parameter and the calculation method are described below.

{3.3.1. Calculation of competitive value 541A}
The competition value calculation unit 541 calculates the competition value 541A based on the comparison result between the label of each image generated by the trial transfer identification unit 521 and the label of each image generated by the comparison identification unit 531.

The trial transfer identification unit 521 inputs one of the images included in the sample group (sample image). The trial transfer identification unit 521 performs a person identification process on the sample image, and generates a label 52A indicating the identification result of the sample image. The value of the label 52A is, for example, 0 or 1. When the label 52A is 0, the label 52A indicates that the sample image does not include a person. When the label 52A is 1, the label 52A indicates that the sample image includes a person. The trial transfer identification unit 521 outputs the generated label 52A to the conflict value calculation unit 541.

The trial transfer identification unit 521 calculates not only the label 52A but also the accuracy 52B indicating the probability of the label 52A as the sample image identification result. The accuracy 52B is used for calculation of the reliability 542A described later.

The comparison identification unit 531 inputs the same image as the sample image input to the trial transfer identification unit 521. The comparison and identification unit 531 performs a person identification process on the sample image, and generates a label 53A indicating the identification result of the sample image. The value of the label 53A is 0 or 1 like the label 52A. When the label 53A is 0, the label 53A indicates that the sample image does not include a person. When the label 53A is 1, the label 53A indicates that the sample image includes a person. The comparison and identification unit 531 outputs the generated label 53A to the conflict value calculation unit 541.

The competitive value calculation unit 541 calculates the competitive value 541A using the

labels

52A and 53A generated from the sample images. The competition value 541A is calculated by the equation (2) used in the calculation of the competition value 251A in the first embodiment.

When Expression (2) is used for calculation of the competitive value 541, in Expression (2), E _c1 indicates the competitive value 541A. X indicates a sample group. x indicates an element (sample image) constituting the sample group. M (x) indicates a label 53A generated from the element x. T (x) indicates a label 52A generated from the element x. [M (x) ≠ T (x)] indicates the number of sample images in which the label 53A and the label 52A do not match. | X | is the number of elements constituting the sample group X.

The competitive value 541A calculated by the equation (2) indicates the probability that the label 52A and the label 53A generated from the same sample image match. The competition value 541A is a numerical value of 0 or more and 1 or less. The closer the competition value 541A is to 0, the higher the competition value 541A is, the higher the effectiveness of the prior domain 62 in transfer learning. On the other hand, the closer the competition value 541A approaches 1. It shows that the effectiveness of the prior domain 62 in transfer learning is low.

As the depression angle increases, the number of differences between the image features included in the prior domain and the image features included in the target domain increases. Therefore, the pre-domain contention value 541A is assumed to increase as the depression angle increases.

FIG. 17 is a graph showing an example of a change in the competitive value 541A. The graph shown in FIG. 17 is created as follows.

A plurality of pre-domains were created by setting a depression angle every 5 degrees from a depression angle of 5 ° to a depression angle of 80 ° and classifying the images based on the set depression angles. Similar to the above, the target domain 61 is a set of images obtained by photographing a person at a depression angle of 0 °. A trial transition identification unit 521 and a comparison identification unit 531 corresponding to each depression angle were generated, and a competitive value 541A corresponding to each depression angle was calculated by the above procedure.

As shown in FIG. 17, the competitive value 541A tends to increase as the depression angle increases. Therefore, it can be seen that the competitive value 541A can be used as a parameter for determining the effectiveness of the prior domain in transfer learning. However, the competitive value 541A increases while vibrating up and down. This indicates that the error of the competition value 541A is relatively large.

Therefore, when the effectiveness of the advance domain for transfer learning is determined using only the competitive value 541A, the advance domain that causes negative transfer may be erroneously determined to be effective. For this reason, when determining the validity of the prior domain using the competitive value 541A, it is desirable to use other parameters (such as reliability 542A) together.

{3.3.2. Reliability calculation}
The reliability calculation unit 542 calculates the reliability 542A based on the label 52A and the accuracy 52B of each image generated by the trial transfer identification unit 521. In the calculation of the reliability 542A, the identification result of the sample image by the comparison and identification unit 531 is not used.

As described above, the trial transfer identification unit 521 generates the label 52A indicating the person identification result for the sample image and the accuracy 52B indicating the probability of the label 52A. The accuracy 52B is a value not less than 0 and not more than 1. The closer the accuracy 52B is to 1, the smaller the possibility that the label 52A is erroneous.

The reliability calculation unit 542 inputs the label 52A and the accuracy 52B of each sample image from the trial transfer identification unit 32. The reliability calculation unit 542 calculates the reliability 542A using the input label 52A and accuracy 52B of each sample image. The reliability 542A is calculated by the equation (3) used for calculating the reliability 252A in the first embodiment.

When Equation (3) is used for calculation of the reliability 542A, in Equation (3), E _c2 indicates the reliability 542A. x represents an element (sample image) constituting the sample group X, similarly to the above formula (2). | X | is the number of elements of the sample group X. P _T (x) indicates the accuracy 52B of the element x. T (x) indicates the label 52A of the element x. y is a label (y = 1) indicating the presence of a person. That is, the reliability 542A is a value obtained by dividing the total value of the accuracy 52B calculated when the label 52A matches the label y by the number of elements of the sample group X. The reliability 542A is a value of 0 or more and 1 or less, and the closer to 1, the higher the effectiveness of the prior domain 62 in transfer learning.

FIG. 18 is a graph showing an example of a change in the reliability 542A. Similarly to FIG. 17, by generating a trial transfer identification unit 32 from each of a plurality of prior domains whose depression angles are set every 5 °, and calculating the reliability 542A corresponding to each prior domain, FIG. The graph shown was created.

As shown in FIG. 18, the reliability 542A decreases as the depression angle increases as an overall trend. That is, the reliability 542A approaches 1 as the effectiveness of the prior domain increases. The reason will be described below. When the data included in the prior domain 62 has a feature amount similar to the feature amount of the data included in the target domain 61, the trial transfer learning unit 52 obtains the learning result of the prior domain 62 by trial transfer learning. Transfer to the learning result of the target domain 61. The trial transfer identification unit 32 reflects the learning results of both the target domain 61 and the prior domain 62. When the trial transfer identification unit 32 performs identification processing on each image included in the sample group, the label 52A is 1 and the accuracy 52B is considered to approach 1. Therefore, when the data included in the prior domain 62 and the data included in the target domain 61 are similar (when the prior domain 62 is effective in transfer learning), the reliability 542A approaches 1.

As shown in FIG. 18, the reliability 542A increases while vibrating up and down. This indicates that the error of the reliability 542A is relatively large, like the competitive value 541A. For this reason, when the validity of the prior domain with respect to transfer learning is determined using only the reliability 542A, the prior domain that causes negative transfer may be erroneously determined to be effective. For this reason, when determining the validity of the prior domain using the reliability 542A, it is desirable to use other parameters (distribution dissimilarity 543A and the like) together.

{3.3.3. Distribution difference}
The distribution dissimilarity calculation unit 543 calculates the distribution dissimilarity 543A using only the sample image identification result by the trial transfer identification unit 32. The distribution dissimilarity calculation unit 543 calculates the distribution dissimilarity based on the difference between the image distribution of the target domain 61 and the image distribution of the prior domain 62 that has reached the leaf node of each decision tree constituting the trial transfer identification unit 521. 543A is calculated.

The trial transfer identification unit 521 includes a plurality of decision trees because a random forest in which transfer learning is introduced is used as a learning algorithm. However, in order to simplify the description of the calculation of the distribution dissimilarity 543A, a case where the number of decision trees constituting the trial transfer identification unit 521 is one will be described first.

FIG. 19 is a schematic diagram illustrating an example of a decision tree 75 that constitutes the trial transfer identification unit 521. FIG. 20 is a diagram illustrating an example of the histogram 81 created based on the image identification result of the target domain 61. FIG. 21 is a diagram illustrating an example of a histogram 82 created based on the image identification result of the prior domain 62. The

histograms

81 and 82 are created based on the identification result by the trial transfer identification unit 521.

The histogram 81 is created as follows. The trial transfer identification unit 521 inputs each image included in the target domain 61 to the root node 75R of the decision tree 75. The input image reaches one of the leaf nodes 75A to 75G via the branch node.

For example, the trial transfer identifying unit 521 compares the feature amount of the image 61A (see FIG. 15) with a threshold value used in the root node 75R, and determines the transition destination of the image 61A as branch nodes 76A and 76B based on the comparison result. Decide on either. When the image 61A transitions to the branch node 76A, the trial transfer identification unit 521 compares the feature amount of the image 61A (see FIG. 15) with the threshold value used in the branch node 76A, and sets the transition destination node to the leaf node 75A. Alternatively, the branch node 76C is determined. When the image 61A transitions to the leaf node 75A, the destination of the image 61A is determined to be the leaf node 75A. The feature amount of the image 61A used at the branch node 76A may be the same as or different from the feature amount of the image 61A used at the root node 75R. If so, the threshold used at branch node 76A is different from the threshold used at root node 75R.

The trial transfer identification unit 521 outputs the destination data 52C for specifying the leaf node to which each image included in the target domain 61 has arrived, to the distribution difference calculation unit 543. The distribution difference calculation unit 543 refers to the destination data 52C and counts the number of images that have reached each of the leaf nodes 75A to 75G. As a result, a histogram 81 indicating the distribution of the image of the target domain 61 that has reached the leaf node is created.

The trial transfer identification unit 521 generates destination data 52D that identifies the leaf node to which each of the images included in the prior domain 62 has arrived. The distribution difference calculation unit 543 creates a histogram 82 indicating the distribution of the image of the previous domain 62 that has reached the leaf node, based on the destination data 52D.

The distribution dissimilarity 543A is calculated using the following equation (5). Specifically, the distribution dissimilarity 543A is obtained by normalizing the

histograms

81 and 82 and then calculating their Bhattacharyya distance. The Bhattacharyya distance indicates the similarity between two probability distributions.

In Equation (5), E _c3 indicates the distribution dissimilarity 543A. i is the number of each leaf node shown in FIG. p (i) is the probability distribution of the image of the target domain 61 that has reached the leaf node. q (i) is the probability distribution of the image of the previous domain 62 that has reached the leaf node. The probability distribution p (i) is created from the histogram 81, and the probability distribution q (i) is created from the histogram 82. X is the number of elements (images) constituting the sample group.

The distribution dissimilarity 543A is a numerical value of 0 or more and 1 or less, and approaches 1 as the similarity between the image distribution in the histogram 81 and the image distribution in the histogram 82 is lower. In other words, the closer the distribution dissimilarity 543A is to 1, the less the prior domain 62 is effective for transfer learning.

FIG. 22 is a graph showing an example of a change in the distribution dissimilarity 543A. Similarly to FIG. 17, a trial transfer identification unit 521 corresponding to each of a plurality of prior domains whose depression angles are set every 5 ° is created, and the distribution dissimilarity 543A corresponding to each prior domain is calculated.

22, the distribution dissimilarity 543A increases as the depression angle increases. This is due to the following reason. As the depression angle increases, the difference between the image features included in the target domain 61 and the image features included in the pre-domain 62 increases. In this case, the frequency at which the route in which the image included in the prior domain 62 transitions in the decision tree 75 greatly deviates from the route in which the image included in the target domain 61 transitions in the decision tree 75 increases. The difference between the distribution of the image included in the target domain 61 and the distribution of the image included in the prior domain 62 increases, and the distribution dissimilarity 543A increases as the depression angle increases.

For example, in the histogram 81 shown in FIG. 20, the peak appears at the node 75 D with the node number 3. On the other hand, in the histogram 82 shown in FIG. 21, a peak appears in the node 75G of the node number 6. That is, the

histograms

81 and 82 are greatly different from each other in the shape of the histogram. In this case, since the distribution dissimilarity 543A is a value close to 1, it is considered that the effectiveness of the prior domain 62 in transfer learning is low.

Also, as shown in FIG. 22, the distribution dissimilarity 543A does not vibrate up and down compared to the competitive value 541A and the reliability 542A. This indicates that the error of the distribution dissimilarity 543A is small and the effectiveness of the prior domain in transfer learning can be determined with high accuracy.

Next, calculation of the distribution dissimilarity 543A when the trial transfer identification unit 521 is configured by a plurality of decision trees will be described. Distribution dissimilarity calculation unit 543 calculates distribution dissimilarity 543A for each decision tree using equation (5). Then, the distribution difference calculation unit 543 calculates the average of the distribution difference 543A of each decision tree as the distribution difference 543A of the prior domain 62.

{3.3.4. Tree complexity}
The complexity calculation unit 544 calculates the complexity 544A based on the structure of the decision tree constituting the trial transfer identification unit 521. The complexity 544A is calculated based on the depth of the leaf node of the decision tree that constitutes the trial transfer identification unit 521.

As for the calculation method of the complexity 544A, a case where there is one decision tree constituting the trial transfer identification unit 521 will be described first as in the case of the distribution dissimilarity 543A. The complexity calculation unit 544 acquires leaf node data 52E in which the depth of each leaf node constituting the decision tree is recorded from the trial transfer identification unit 521. The complexity calculator 544 calculates the complexity 544A using the following equation (6).

In the above formula (5), E _c4 indicates the complexity 544A. d _k indicates the depth of the k-th leaf node in the decision tree. n is the number of leaf nodes in the decision tree. d _max indicates the maximum depth of the leaf node in the decision tree, and is used to normalize the numerator (the sum of the depths of the leaf nodes) in Equation (6). The depth of the leaf node is defined by the number of edges (branches) that pass from the leaf node to the root node 75R. For example, in the decision tree shown in FIG. 17, the depth of the leaf node 75A is 2.

In general, the structure of a decision tree becomes more complex as the number of leaf nodes or the depth of leaf nodes increases. As the difference between the features of each image in the target domain 61 and the features of each image in the pre-domain 62 increases, the decision tree has a complex structure. The reason will be described below.

When the difference between the feature of each image in the target domain 61 and the feature of each image in the pre-domain 62 is large, the trial transfer learning unit 52 creates a decision tree according to the feature of each image in the target domain 61. A branch condition and a branch condition corresponding to the characteristics of each image in the prior domain 62 are created separately. As a result, a subtree corresponding to each image of the target domain and a subtree for identifying the feature of each image of the prior domain 62 are created separately. As a result, the number of leaf nodes constituting the decision tree increases and the structure of the decision tree becomes complicated. Therefore, the effectiveness of the prior domain 62 in transfer learning can be determined by using the complexity 544A calculated by Expression (6).

FIG. 23 is a graph showing an example of a change in complexity 544A. Similarly to FIG. 17, a plurality of pre-domains whose depression angles are set at intervals of 5 ° are created, and the complexity 544A corresponding to each pre-domain is calculated, thereby creating the graph shown in FIG.

As shown in FIG. 23, the complexity 544A increases as the depression angle increases. This is because, as described above, the structure of the decision tree becomes more complex as the difference between the image features included in the prior domain and the image features included in the target domain increases. As with the distribution difference 543A, the complexity 544A does not vibrate up and down. Therefore, by using the complexity 544A, it is possible to accurately determine the effectiveness of the prior domain 62 in transfer learning.

A calculation method of the complexity 544A when a plurality of decision trees configure the trial transfer identification unit 521 will be described. The complexity 544A for each decision tree is calculated according to equation (6). By averaging the complexity 544A calculated for each decision tree, the complexity 544A in the case where a plurality of decision trees constitute the trial transfer identification unit 521 is obtained.

{3.3.5. Prior domain evaluation by transfer evaluation unit 545}
The transfer evaluation unit 545 receives the competitive value 541A, the reliability 542A, the distribution dissimilarity 543A, and the complexity 544A. The transfer evaluation unit 545 evaluates the effectiveness of the prior domain 62 in transfer learning based on the input competitive value 541A, reliability 542A, distribution dissimilarity 543A, and complexity 544A.

The transfer evaluation unit 545 calculates a comprehensive evaluation value using the following equation (7).

In Expression (7), E is a comprehensive evaluation value obtained from the competitive value 541A, the reliability 542A, the distribution dissimilarity 543A, and the complexity 544A. As the effectiveness of predomain transfer learning decreases, the competitive value 541A, the distribution dissimilarity 543A, and the complexity 544A increase. On the other hand, the reliability 542A decreases conversely. In order to match the tendency of the reliability 542A with the tendency of the other three evaluation values, a value obtained by subtracting the reliability 542A from 1 is used for calculation of the comprehensive evaluation value.

The comprehensive evaluation value calculated by the above equation (7) is a value of 0 or more, and approaches 0 as the effectiveness of transfer learning increases. The transfer evaluation unit 545 determines that the pre-domain 62 is effective in transfer learning when the calculated comprehensive evaluation value is smaller than a preset threshold value. The transfer evaluation unit 545 outputs to the selective transfer learning unit 55 evaluation result data 545A indicating the evaluation result of the prior domain 62 that has been the target of determining the effectiveness of transfer learning.

{3.4. Specify next advance domain}
After the evaluation of the validity of the prior domain 62 (step S25) is completed, the trial transfer identifying unit 521 and the comparison identifying unit 531 used for evaluating the validity of the prior domain 62 are deleted (step S26). This is because the trial transfer identification unit 521 and the comparison identification unit 531 corresponding to the prior domain 62 are not used in the evaluation of the effectiveness of other prior domains in transfer learning.

The acquisition unit 51 determines whether or not the evaluation of all the prior domains stored in the storage device 60 has been completed (step S27). When the evaluation of all the prior domains has not been completed (No in step S27), the machine learning device 500 returns to step S22 in order to acquire a prior domain in which the effectiveness of transfer learning has not been evaluated.

This evaluates the effectiveness of the

prior domains

63 and 64 in transfer learning. The transfer evaluation unit 545 outputs evaluation result data 545A indicating the evaluation results of each of the

prior domains

63 and 64 to the selective transfer learning unit 55.

{3.5. Generation of transfer identification data 80}
When the evaluation of all the prior domains is completed (Yes in step S27), the selective transfer learning unit 55 is determined to be effective for transfer learning based on the evaluation result data 545A of each of the prior domains 62 to 64. Identify the advance domain. The number of prior domains determined to be effective for transfer learning is not particularly limited.

The selective transfer learning unit 55 acquires the target domain 61 and the identified prior domain from the storage device 60 via the acquisition unit 51. The selective transfer learning unit 55 uses the acquired target domain 61 and the prior domain to perform machine learning based on a random forest in which transfer learning is introduced (step S28). As a result, transfer identification data 80 is generated. The generated transfer identification data 80 is used by a person detection device (not shown).

As described above, the machine learning device 500 evaluates the effectiveness of each of the advance domains 62 to 64 in transfer learning, and performs transfer learning using the target domain 61 and the advance domain determined to be effective for transfer learning. Perform the introduced machine learning. When the prior domain is configured by an image having a characteristic that is significantly different from the characteristics of the image included in the target domain, the prior domain is prevented from being used to generate the transfer identification data 80. As a result, it is possible to prevent a negative transition from occurring, and to improve the detection accuracy of the detection target.

{Modifications}
In the second embodiment, the case where the trial transfer learning unit 52 and the selective transfer learning unit 55 use a random forest as a learning algorithm has been described as an example, but the present invention is not limited to this. The learning algorithm is not particularly limited as long as it is an algorithm that generates a decision tree. For example, ID3 (Iterative Dichotomiser 3) or boosting can be used as a learning algorithm. Regardless of which learning algorithm is used, the trial transfer learning unit 52 may perform machine learning that introduces transfer learning, and the comparative learning unit 53 may execute machine learning that does not introduce transfer learning.

In the second embodiment, the example has been described in which the pre-domains 62 to 64 include an image obtained by photographing a person at a depression angle greater than 0 °, but the present invention is not limited to this. Machine learning device 500 may use a prior domain including an image of a person taken at an elevation angle greater than 0 °. Alternatively, a prior domain including an image having a brightness different from that of the image included in the target domain 61 may be used. In addition, although the case where the target domain 61 is an image obtained by photographing a person has been described as an example, it goes without saying that data included in the target domain 61 is set according to the detection target.

In the second embodiment, an example has been described in which the transfer evaluation unit 545 evaluates the effectiveness of the prior domain in transfer learning using the competitive value 541A, the reliability 542A, the distribution dissimilarity 543A, and the complexity 544A. However, it is not limited to this. The transfer evaluation unit 545 may evaluate the effectiveness of the prior domain using at least one of the competitive value 541A, the reliability 542A, the distribution dissimilarity 543A, and the complexity 544A.

The distribution dissimilarity 543A and the complexity 544A have smaller errors than the competitive value 541A and the reliability 542A. Therefore, it is desirable that the transition evaluation unit 545 uses at least one of the distribution dissimilarity 543A and the complexity 544A. When the transfer evaluation unit 545 does not use the competitive value 541A and the reliability 542A for evaluation of the prior domain, the machine learning device 500 may not include the comparison learning unit 53.

In the second embodiment, when the trial transfer identification unit 521 is configured by a plurality of decision trees, the distribution difference calculation unit 543 adds the distribution differences calculated from the respective decision trees. Although the example which calculates degree 543A was demonstrated, it is not restricted to this. The distribution dissimilarity calculation unit 543 may calculate the distribution dissimilarity 543A using at least one decision tree among the decision trees constituting the trial transfer identification unit 521. Similarly, the complexity calculation unit 544 may calculate the complexity 544A using at least one decision tree among the decision trees constituting the trial transfer identification unit 521. That is, if the judgment unit 54 evaluates the effectiveness of the prior domain in transfer learning using all the leaf nodes constituting at least one decision tree among the plurality of decision trees constituting the trial transfer identification unit 521. Good.

In the second embodiment, the transition evaluation unit 545 calculates the overall evaluation value by multiplying the competitive value 541A, the reliability 542A, the distribution dissimilarity 543A, and the complexity 544A. Not limited to. For example, the transfer evaluation unit 545 may calculate the total of the competitive value 541A, the reliability 542A, the distribution dissimilarity 543A, and the complexity 544A as a comprehensive evaluation value. Further, the comprehensive evaluation value may be calculated after increasing the weights of the highly accurate distribution dissimilarity 543A and the complexity 544A. That is, the transfer evaluation unit 545 may calculate a comprehensive evaluation value using the competitive value 541A, the reliability 542A, the distribution dissimilarity 543A, and the complexity 544A.

In the second embodiment, the machine learning device 500 generates the transfer identification data 80 for detecting a person. However, the present invention is not limited to this. The learning target may be measurement data measured by a sensor. The type of sensor is not particularly limited, and various measurement data such as an acceleration sensor and an optical sensor can be used. For example, machine learning may be performed in order to use measurement data of these sensors in order to automatically drive a car.

Part or all of the machine learning device of the above embodiment may be realized as an integrated circuit (for example, an LSI, a system LSI, etc.).

Further, part or all of the processing of each functional block (each functional unit) of the machine learning device in the above embodiment may be realized by a program. In the machine learning device of each of the above embodiments, part or all of the processing of each functional block is performed by a central processing unit (CPU) in the computer. In addition, a program for performing each processing is stored in a storage device such as a hard disk or a ROM, and is read out and executed in the ROM or the RAM. For example, by configuring the machine learning device as shown in FIG. 24, a part or all of the processing of each functional block (each functional unit) in each of the above embodiments may be executed. good.

In addition, each process of the above embodiment may be realized by hardware, or may be realized by software (including a case where it is realized together with an OS (operating system), middleware, or a predetermined library). . Further, it may be realized by mixed processing of software and hardware.

Further, the execution order of the processing methods in the above embodiment is not necessarily limited to the description of the above embodiment, and the execution order can be changed without departing from the gist of the invention.

A computer program that causes a computer to execute the above-described method and a computer-readable recording medium that records the program are included in the scope of the present invention. Here, examples of the computer-readable recording medium include a flexible disk, hard disk, CD-ROM, MO, DVD, DVD-ROM, DVD-RAM, large-capacity DVD, next-generation DVD, and semiconductor memory. .

The computer program is not limited to the one recorded on the recording medium, but may be transmitted via a telecommunication line, a wireless or wired communication line, a network represented by the Internet, or the like.

Further, the word “part” may be a concept including “circulatory”. The circuit may be realized in whole or in part by hardware, software, or a mixture of hardware and software.

Claims

A feature extraction unit for clustering that generates a plurality of transfer candidate feature data by extracting features from each of a plurality of transfer candidate data used in machine learning using transfer learning;
A classifying unit that classifies each transfer candidate feature data into a plurality of groups including a first group and a second group based on the features of each of the plurality of transfer candidate feature data generated by the clustering feature extraction unit;
When the number of transfer candidate feature data classified into the first group by the classification unit is equal to or less than a predetermined classification continuation reference value, the first group is determined as a prior domain used for the machine learning, and the transfer candidate If the number of feature data is larger than the classification continuation reference value, a pre-domain determination unit that determines to further classify the transfer candidate feature data classified into the first group;
A clustering apparatus comprising:
The clustering device according to claim 1,
The prior domain determination unit is a clustering apparatus that excludes the first group from the prior domain when the number of transfer candidate feature data classified into the first group is smaller than a predetermined discard reference value.
The clustering device according to claim 1, further comprising:
A variance calculation unit for calculating a variance of transfer candidate feature data classified into the first group based on a feature amount of each of the transfer candidate feature data classified into the first group;
With
The prior domain determination unit compares the variance calculated by the variance calculation unit with a predetermined variance reference value when the number of transfer candidate feature data classified into the first group is larger than the classification continuation reference value When the variance calculated by the variance calculation unit is equal to or smaller than the variance reference value, the clustering apparatus determines the first group as a prior domain.
The clustering device according to claim 1,
When the number of transfer candidate feature data classified into the first group is larger than a predetermined change reference value, the classifying unit converts the transfer candidate feature data classified into the first group into a first number of lower groups. Further categorized into
When the number of transfer candidate feature data classified into the first group is less than or equal to the change reference value, the classification unit determines the transfer candidate feature data classified into the first group to be smaller than the first number. A clustering device for classifying into a second number of subgroups.
The clustering device according to claim 1,
A clustering device in which the classification continuation reference value is determined based on the number of dimensions of transfer candidate feature data extracted by the clustering feature extraction unit.
A machine learning device that learns a detection target by executing machine learning using transfer learning,
A clustering device that classifies a plurality of transfer candidate data used for the machine learning and generates a prior domain used for the machine learning;
A prior domain evaluation device that evaluates whether the prior domain generated by the clustering device is effective for the machine learning;
With
The clustering apparatus includes:
A clustering feature extractor for extracting features from each of the plurality of transfer candidate data to generate a plurality of transfer candidate feature data;
A classifying unit that classifies each transfer candidate feature data into a plurality of groups including a first group and a second group based on the features of each of the plurality of transfer candidate feature data generated by the clustering feature extraction unit;
When the number of transfer candidate feature data classified into the first group by the classification unit is equal to or less than a predetermined classification continuation reference value, the first group is determined as a prior domain used for the machine learning, and the transfer candidate If the number of feature data is larger than the classification continuation reference value, a pre-domain determination unit that determines to further classify the transfer candidate feature data classified into the first group;
With
The prior domain evaluation device is:
When the first group is determined to be the previous domain by the prior domain determining unit, the transfer candidate feature data included in the first group, and each of which has a feature to be detected under a predetermined condition A trial transfer learning unit that performs the machine learning using a target domain including data and generates an evaluation classifier for evaluating the prior domain;
A determination unit that determines whether the first group is effective for the machine learning based on the trial transfer identification unit generated by the trial transfer learning unit;
A machine learning device comprising:
The machine learning device according to claim 6,
The prior domain evaluation device further includes:
A feature extraction unit for learning that extracts features of each of the learning data included in the target domain and generates learning feature data;
With
The trial transfer learning unit performs the machine learning using the learning feature data,
The machine learning device in which the condition for the feature extraction unit for learning to extract features from the learning data is the same as the condition for the feature extraction unit for clustering to extract features from each of the plurality of transfer candidate data.
The machine learning device according to claim 7, further comprising:
A selective learning device that generates the transfer identification unit by executing the machine learning using the target domain and all the prior domains determined to be effective for the machine learning by the prior domain evaluation device;
A machine learning device comprising:
Generating a plurality of transfer candidate feature data by extracting features from each of a plurality of transfer candidate data used for machine learning using transfer learning;
Classifying each transfer candidate feature data into a plurality of groups including a first group and a second group based on the features of each of the generated plurality of transfer candidate feature data; and
When the number of transfer candidate feature data classified into the first group is less than or equal to a predetermined classification continuation reference value, determining the first group as a pre-domain used for the machine learning;
If the number of transfer candidate feature data is greater than the classification continuation reference value, determining to further classify transfer candidate feature data classified into the first group;
A clustering method comprising:
A program for causing a computer to execute a clustering method for classifying each of a plurality of transfer candidate data used in machine learning using transfer learning,
Extracting a feature from each of a plurality of transfer candidate data used in the machine learning to generate a plurality of transfer candidate feature data; and
Classifying each transfer candidate feature data into a plurality of groups including a first group and a second group based on the features of each of the generated plurality of transfer candidate feature data; and
When the number of transfer candidate feature data classified into the first group is less than or equal to a predetermined classification continuation reference value, determining the first group as a pre-domain used for the machine learning;
If the number of transfer candidate feature data is greater than the classification continuation reference value, determining to further classify transfer candidate feature data classified into the first group;
A program for causing a computer to execute a clustering method.
A target domain including a plurality of learning data each having a detection target characteristic under a predetermined condition and a pre-domain including learning candidate data having a detection target characteristic under a condition different from the predetermined condition are acquired. An acquisition unit;
A trial transfer learning unit that performs machine learning that introduces transfer learning using the target domain and the prior domain acquired by the acquisition unit, and generates a decision tree used for detection of the detection target;
A determination unit that determines whether or not the prior domain acquired by the acquisition unit is effective for transfer learning, using all leaf nodes constituting the decision tree generated by the trial transfer learning unit;
A machine learning device comprising:
The machine learning device according to claim 11,
The determination unit
The complexity of the decision tree is calculated by accumulating the depth of each leaf node constituting the decision tree generated by the trial transfer learning unit, and the prior domain is used for transfer learning based on the calculated complexity. A complexity calculator that determines whether or not
A machine learning device comprising:
The machine learning device according to claim 12,
The trial transfer learning unit generates a first decision tree and a second decision tree different from the first decision tree,
The complexity calculation unit calculates the complexity of the first decision tree and the complexity of the second decision tree, and based on the calculated complexity of the first decision tree and the complexity of the second decision tree A machine learning device that determines whether or not the prior domain is valid.
The machine learning device according to claim 11, further comprising:
Each learning data included in the target domain is classified using the decision tree generated by the trial transfer learning unit, and each of the learning domains included in the prior domain using the decision tree generated by the trial transfer learning unit. Trial transfer identification unit for classifying learning candidate data,
With
The determination unit determines whether the prior domain is valid based on a classification result of the plurality of learning data by the trial transfer identification unit and a classification result of the plurality of learning candidate data. Machine learning device.
The machine learning device according to claim 14,
The determination unit
Whether the prior domain is valid based on the distribution dissimilarity between the probability distribution of the leaf nodes of the decision tree reached by the learning data and the probability distribution of the leaf nodes of the decision tree reached by each learning candidate data A distribution dissimilarity calculator for determining whether or not,
A machine learning device comprising:
The machine learning device according to claim 15,
The trial transfer learning unit generates a first decision tree and a second decision tree different from the first decision tree,
The distribution dissimilarity calculation unit calculates a first distribution dissimilarity using the first decision tree, calculates a second distribution dissimilarity using the second decision tree,
The determination unit is a machine learning device that determines whether or not the prior domain is valid based on the first distribution difference and the second distribution difference calculated by the distribution difference calculation unit.
The machine learning device according to claim 12,
The trial transfer learning unit includes:
A trial transfer identifying unit that classifies each learning data included in the target domain using the generated decision tree, and classifies each learning candidate data included in the prior domain using the generated decision tree,
Including
The determination unit
The classification result of the plurality of learning data by the trial transfer identification unit is compared with the classification result of the plurality of learning candidate data, and based on the comparison result and the complexity of the decision tree, the prior domain A metastasis evaluation unit that determines whether or not
A machine learning device comprising:
A target domain that includes a plurality of learning data each having a feature to be detected, and a pre-domain that has a plurality of learning candidate data that satisfy a predetermined rule and each may be used for learning the detection target And a step of obtaining
Performing transfer learning using the target domain and the prior domain to generate a decision tree used to detect the detection target;
Using the generated decision tree to determine whether the prior domain is valid for transfer learning;
A machine learning method comprising:
A program that causes a computer to perform transfer learning,
A target domain that includes a plurality of learning data each having a feature to be detected, and a pre-domain that has a plurality of learning candidate data that satisfy a predetermined rule and each may be used for learning the detection target And a step of obtaining
Performing transfer learning using the target domain and the prior domain to generate a decision tree used to detect the detection target;
Using the generated decision tree to determine whether the prior domain is valid for transfer learning;
A program that executes