WO2021250774A1 - Learning device, prediction device, learning method, and program - Google Patents

Learning device, prediction device, learning method, and program Download PDF

Info

Publication number
WO2021250774A1
WO2021250774A1 PCT/JP2020/022672 JP2020022672W WO2021250774A1 WO 2021250774 A1 WO2021250774 A1 WO 2021250774A1 JP 2020022672 W JP2020022672 W JP 2020022672W WO 2021250774 A1 WO2021250774 A1 WO 2021250774A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
learning
classifier
unknown class
attribution
Prior art date
Application number
PCT/JP2020/022672
Other languages
French (fr)
Japanese (ja)
Inventor
悠 三鼓
豪 入江
大貴 伊神
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2020/022672 priority Critical patent/WO2021250774A1/en
Priority to JP2022530395A priority patent/JP7440798B2/en
Publication of WO2021250774A1 publication Critical patent/WO2021250774A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to a learning device, a prediction device, a learning method, and a program technology.
  • Supervised learning is a framework in which a large number of pairs of data and correct class labels for the data are prepared, and the relationship is learned from the pairs of data and class labels.
  • a method may be adopted in which a model learned in an area where supervised data already exists (hereinafter referred to as "domain") is utilized in a target domain. For example, when recognizing handwritten characters, after learning the classifier using digital font data for which supervised data can be obtained relatively easily, the classifier is used for handwritten character data with little (or no) supervised data. Methods such as retraining may be taken.
  • FIG. 6 is a diagram illustrating an outline of such a problem.
  • the region surrounded by the solid line is the original domain 10
  • the region surrounded by the broken line is the target domain 20
  • the line shown by the straight line is the identification boundary 30.
  • Patent Document 1 the original domain to the target domain that minimizes the value of MMD, which is the distribution feeling distance between the sample generation distribution in the original domain and the sample generation distribution in the target domain.
  • the conversion rule to is learned.
  • the data of the original domain is converted using the learned conversion rule, and the model is learned by supervised learning using the converted data of the original domain.
  • Non-Patent Document 1 the data of the original domain and the data of the target domain are classified into the feature extractor that projects the data of the original domain and the data of the target domain into the feature space that makes it difficult to identify the domain, and the data of the original domain and the data in the feature space.
  • the relationship with the given class label is learned at the same time.
  • Making it difficult to distinguish between the data of the original domain and the data of the target domain in the feature space means that the generation distributions of both are brought closer to each other in the feature space.
  • Such processing may mean, for example, changing from the state of FIG. 6 to the state of FIG. 7. This improves the prediction accuracy of the target domain data for the model obtained by supervised learning with the original domain data.
  • Non-Patent Document 2 the common feature space learned in Non-Patent Document 1 is learned using a feature extractor and two classifiers connected to the feature extractor. It is known that the model learned in Non-Patent Document 2 has higher prediction accuracy to the target domain data than the model learned in Non-Patent Document 1.
  • both the original domain and the target domain each consist of a single domain.
  • both the original domain and the target domain may be formed by multiple domains.
  • the original domain or the target domain may be formed by the plurality of domains.
  • the generation distribution changes, it can be considered that a plurality of domains are inherent in the target domain.
  • the domain is formed by a plurality of domains, there arises a problem that the expected prediction accuracy cannot be achieved by the method as in Non-Patent Document 1.
  • Non-patent document 4 As a document relating to a technique corresponding to a problem in which a plurality of domains are inherent in the original domain.
  • Non-Patent Document 4 features that make it difficult to distinguish between each domain inherent in the original domain and the target domain are learned.
  • non-patent document 5 as a document relating to a technique for dealing with a problem in which a plurality of domains are inherent in a target domain.
  • a feature that makes it difficult to identify a domain among the regions of a plurality of domains inherent in the target domain is learned.
  • Non-Patent Document 3 Non-Patent Document 4
  • Non-Patent Document 5 Non-Patent Document 5
  • Technology has been proposed. However, although each technique exhibits good performance for the incidental problems it considers, it is not effective for other incidental problems.
  • Non-Patent Document 4 which is a technique for dealing with a problem in which a plurality of domains are inherent in the original domain, is applied to a problem in which a plurality of domains are inherent in the target domain, sufficient performance can be obtained. No.
  • an object of the present invention is to provide a technique for achieving good performance for a wider range of problems related to domains.
  • One aspect of the present invention includes a feature extractor that outputs a feature amount of input data, and a plurality of classifiers that acquire the attribution probability of the data to a known class and an unknown class based on the feature amount.
  • An unknown class classifier that determines whether or not the data is an unknown class based on the attribution probability acquired by the classifier, and each attribution obtained by the plurality of classifiers with respect to the data.
  • the identification mismatch evaluation unit that outputs the value of the discrimination mismatch degree indicating the difference in probability and the data that is not the unknown class and is not given the teacher label
  • the feature extractor has the discrimination mismatch degree.
  • Learning including a learning unit for iteratively learning the parameters of the feature extractor and the plurality of the classifiers so as to reduce the value and to increase the value of the discriminant mismatch degree for the plurality of the classifiers. It is a device.
  • One aspect of the present invention includes a feature extractor that outputs a feature amount of input data based on the parameters obtained by the above learning device, and the parameters and the feature amount obtained by the above learning device. Based on this, it is a prediction device including a classifier that acquires the attribution probability of the data to a known class and an unknown class.
  • One aspect of the present invention is a feature extraction step that outputs a feature amount of input data using a feature extractor, and a known class and a known class for the data based on the feature amount using a plurality of classifiers.
  • An identification step for acquiring the attribution probability to the unknown class an unknown class identification step for determining whether or not the data is an unknown class based on the acquired attribution probability, and a plurality of the data.
  • the discrimination mismatch evaluation step that outputs the value of the discrimination mismatch degree indicating the difference in the respective attribution probabilities obtained by the classifier, and the data that is not the unknown class and is not given the teacher label.
  • One aspect of the present invention is a program for operating a computer as the above-mentioned learning device.
  • the present invention has been made in view of such problems, and it is possible to achieve good performance for a wider range of problems related to domains.
  • FIGS. 1 and 2 are diagrams showing an outline of the present embodiment.
  • the region surrounded by the solid line is the original domain 10
  • the region surrounded by the broken line is the target domain 20
  • the line shown by the straight line is the identification boundary 30.
  • the line segment 40 shows the information specified as the boundary between the known class and the unknown class.
  • Arrow 50 indicates that it constitutes domain adaptation.
  • the first problem that an unknown class exists is dealt with by identifying and identifying the data belonging to the unknown class from the data to which the teacher label is not given.
  • the labeled data is regarded as the original domain, and the data not given the teacher label and belonging to the known class is regarded as the target domain. Address by configuring adaptation.
  • FIG. 3 is a functional block diagram showing an example of the learning device 100 according to the present embodiment.
  • the learning device 100 is configured by using an information processing device such as a personal computer or a server device.
  • the learning device 100 includes a control unit 90, an unknown class information storage unit 130, and a learning result storage unit 140.
  • the control unit 90 is configured by using a processor such as a CPU (Central Processing Unit) and a memory.
  • a processor such as a CPU (Central Processing Unit) and a memory.
  • CPU Central Processing Unit
  • the control unit 90 includes a feature extractor 101, a first classifier 102, a second classifier 103, a discrimination loss evaluation unit 104, an unknown class classifier 105, a discrimination mismatch evaluation unit 106, and a learning unit by executing a program by the processor. It functions as a unit 107. All or part of each function of the control unit 90 may be realized by using hardware such as an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), or an FPGA (Field Programmable Gate Array). The above program may be recorded on a computer-readable recording medium.
  • ASIC Application Specific Integrated Circuit
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • Computer-readable recording media include, for example, flexible disks, magneto-optical disks, ROMs, CD-ROMs, portable media such as semiconductor storage devices (eg SSD: Solid State Drive), hard disks built into computer systems, and semiconductor storage. It is a storage device such as a device.
  • the above program may be transmitted over a telecommunication line.
  • the learning device 100 operates by acquiring data from the supervised data storage unit 110 and the unsupervised data storage unit 120.
  • the supervised data storage unit 110 is configured by using a device or medium capable of storing data such as a storage device such as a magnetic hard disk device or a semiconductor storage device, a recording medium such as a CD-ROM, or the like.
  • the supervised data storage unit 110 stores a supervised data set.
  • a supervised data set is a set of data with the desired class label.
  • the unsupervised data storage unit 120 is configured by using a device or medium capable of storing data such as a storage device such as a magnetic hard disk device or a semiconductor storage device, a recording medium such as a CD-ROM, or the like.
  • the unsupervised data storage unit 120 stores an unsupervised data set.
  • An unsupervised data set is a set of data that does not have the desired class label.
  • the feature extractor 101 receives a supervised data set and an unsupervised data set as inputs, and extracts a feature vector from each data.
  • the feature extractor 101 outputs the extracted feature vector to the first classifier 102 and the second classifier 103.
  • the feature extractor 101 operates based on a function having parameters capable of extracting such a feature vector.
  • the feature vector is, for example, a numerical vector representing the features of the data.
  • the feature vector is a vector representing the features of the required data with n-dimensional elements.
  • the feature vector will be described as having a vector form for convenience, but the form is irrelevant to the main point of the present invention and can take any form.
  • the feature extractor 101 reads the parameter stored in the learning result storage unit 140 and outputs the feature vector.
  • the first classifier 102 receives the feature vector output by the feature extractor 101 as an input.
  • the first classifier 102 outputs an estimated value (hereinafter referred to as "estimated attribution probability") of the attribution probability to each class and the unknown class with respect to the original data of the input feature vector.
  • the estimated attribution probability is the probability that the data is likely to belong to each known and unknown class.
  • the first classifier 102 operates on the basis of a function having a parameter capable of outputting such an estimated attribution probability. Each time the first classifier 102 outputs the estimated attribution probability, the first classifier 102 reads the parameter stored in the learning result storage unit 140 and outputs the estimated attribution probability.
  • the second classifier 103 receives the feature vector output by the feature extractor 101 as an input.
  • the second classifier 103 outputs an estimated value (estimated attribution probability) of the attribution probability to each class and the unknown class with respect to the original data of the input feature vector.
  • the second classifier 103 operates based on a function having a parameter capable of outputting such an estimated attribution probability.
  • the second classifier 103 reads the parameter stored in the learning result storage unit 140 and outputs the estimated attribution probability.
  • the same feature vector is input to the first classifier 102 and the second classifier 103.
  • CNN Convolutional Neural Network
  • CNN is only an example, and it is not necessary to be limited to this.
  • the discrimination loss evaluation unit 104 determines the data to be processed, the information indicating whether or not the data to be processed is an unknown class, and the first classifier 102 and the second classifier 103 to the data to be processed.
  • the output estimated attribution probability and the desired attribution probability for the data to be processed (hereinafter referred to as “teacher attribution probability”) are received as inputs.
  • the discrimination loss evaluation unit 104 obtains the value of the discrimination loss function (hereinafter referred to as “discrimination loss evaluation value”), which is the first loss function representing these differences.
  • the teacher attribution probability is the attribution probability according to the class label that is the correct answer during learning.
  • the unknown class classifier 105 receives the data to be processed and the estimated attribution probability output by the first classifier 102 and the second classifier 103 with respect to the data to be processed as inputs.
  • the unknown class classifier 105 identifies whether or not the data to be processed is an unknown class.
  • the unknown class classifier 105 records information indicating the discrimination result (hereinafter referred to as “unknown class information”) in the unknown class information storage unit 130.
  • the information recorded in the unknown class information storage unit 130 is used by the identification loss evaluation unit 104 and the identification mismatch evaluation unit 106.
  • the discrimination mismatch evaluation unit 106 receives the data to be processed and the estimated attribution probability output by the first classifier 102 and the second classifier 103 to the data to be processed as inputs.
  • the discrimination mismatch evaluation unit 106 acquires a value indicating the degree of mismatch of the estimated attribution probabilities of the first classifier 102 and the second classifier 103 (hereinafter referred to as “discrimination mismatch evaluation value”).
  • the learning unit 107 receives the discrimination loss function obtained by the discrimination loss evaluation unit 104 and the discrimination mismatch evaluation value obtained by the discrimination mismatch evaluation unit 106 as inputs.
  • the learning unit 107 performs iterative learning of the parameters of the feature extractor 101, the first classifier 102, and the second classifier 103 using the input values.
  • the learning unit 107 records the parameters of the feature extractor 101, the first classifier 102, and the second classifier 103 obtained by iterative learning in the learning result storage unit 140.
  • the iterative learning for the feature extractor 101 is performed so that both the discrimination loss evaluation value and the discrimination mismatch evaluation value become small.
  • the iterative learning for the first classifier 102 and the second classifier 103 is performed so that the discrimination loss evaluation value becomes small and the discrimination mismatch evaluation value becomes large.
  • FIG. 4 is a functional block diagram showing an example of the prediction device 200 according to the present embodiment.
  • the prediction device 200 is configured by using an information processing device such as a personal computer or a server device.
  • the prediction device 200 includes a control unit 91 and a storage unit 230.
  • the control unit 91 is configured by using a processor such as a CPU and a memory.
  • the control unit 91 functions as a feature extractor 201 and a classifier 202 when the processor executes a program.
  • all or a part of each function of the control unit 91 may be realized by using hardware such as ASIC, PLD and FPGA.
  • the above program may be recorded on a computer-readable recording medium.
  • Computer-readable recording media include, for example, flexible disks, magneto-optical disks, ROMs, CD-ROMs, portable media such as semiconductor storage devices (for example, SSDs), and storage of hard disks and semiconductor storage devices built in computer systems. It is a device.
  • the above program may be transmitted over a telecommunication line.
  • the storage unit 230 is configured by using a storage device such as a magnetic hard disk device or a semiconductor storage device.
  • the storage unit 230 stores the parameters as the learning result obtained by the iterative learning performed by the learning unit 107 of the learning device 100.
  • the feature extractor 201 When the feature extractor 201 receives the data to be processed (data to be predicted) 240, it reads out the parameters from the storage unit 230 and operates based on the parameters. The feature extractor 201 outputs a feature vector for the data 240 to be processed. The classifier 202 reads a parameter from the storage unit 230 and operates based on the parameter. The classifier 202 obtains an estimated attribution probability for the data 240 to be processed based on the feature vector obtained by the feature extractor 201. The output of the classifier 202 may be the estimated attribution probability itself for each class of the data 240 to be processed, or may be information indicating the prediction result of which class it belongs to.
  • FIG. 5 is a flowchart showing an operation example of the learning device 100. Next, an operation example of the learning device 100 will be described.
  • the learning device 100 receives the supervised data set 110 and the unsupervised data set 120, and executes the learning processing routine shown in FIG.
  • control unit 90 of the learning device 100 reads one or more supervised data sets 110 and unsupervised data sets 120 (step S101).
  • the control unit 90 makes a branch determination as to whether or not the number of learning iterations is equal to or less than a predetermined number of scheduled times (step S102). If the number of iterations is less than or equal to the planned number, the process of step S103 is executed. On the other hand, if the number of iterations is larger than the planned number, the process of step S104 is executed.
  • This branching process changes the method of identifying unknown classes.
  • the first classifier 102 and the second classifier 103 are trained so as to be able to discriminate (K + 1) classes in which K known classes and unknown classes are combined.
  • K + 1 a pair of data and its teacher attribution probability
  • an unknown class it is unknown which data is the unknown class. Therefore, when the number of iterations is less than the planned number, the unknown class is identified for the unsupervised data, and the result is recorded as the identification history.
  • the first classifier 102 and the second classifier 103 are learned so that (K + 1) classes can be identified, and the identification result of the unknown class is recorded as the identification history.
  • the teacher attribution probability of the unknown class can be estimated by identifying the unknown class, but an error is also included. Therefore, by learning the identification of (K + 1) classes while recording the identification history, it is possible to identify unknown classes with few errors.
  • step S103 the feature extractor 101, the first classifier 102, the second classifier 103, and the unknown class classifier 105 are applied to the supervised data set 110 and the unsupervised data set 120, and the discrimination loss evaluation value is determined.
  • the discrimination mismatch evaluation value and the judgment of whether or not it is an unknown class can be obtained.
  • step S104 the unknown class identification history is read for the unsupervised data set 120.
  • step S105 the feature extractor 101, the first classifier 102, the second classifier 103, and the unknown class classifier 105 are applied to the supervised data set 110, the unsupervised data set 120, and the unknown class identification history. Then, the discrimination loss evaluation value, the discrimination mismatch evaluation value, and the determination of whether or not the class is unknown can be obtained.
  • step S103 When the process of step S103 or step S105 is completed, the learning unit 107 sets the parameter values of the feature extractor 101, the first classifier 102, and the second classifier 103 based on the discrimination loss evaluation value and the discrimination mismatch evaluation value. (Values recorded in the learning result storage unit 140) are updated (step S106).
  • the parameters of the feature extractor 101, the first classifier 102, and the second classifier 103 are stored in the learning result storage unit 140.
  • the unknown class classifier 105 records in the unknown class information storage unit 130 the discrimination result of whether or not the unknown class data is obtained in step S103 or S105 (step S107).
  • control unit 90 determines whether the end condition is satisfied (step S108). If the end condition is satisfied (step S108-YES), the control unit 90 ends the process. If the end condition is not satisfied (step S108-NO), the control unit 90 returns to step S101 and repeats the process.
  • the parameters of the feature extractor 101, the first classifier 102, and the second classifier 103 are learned.
  • learning is performed using the discrimination loss evaluation value and the discrimination mismatch evaluation value so that the discrimination loss evaluation value and the discrimination mismatch evaluation value become smaller.
  • the discrimination loss function outputs a value smaller as the degree of similarity between the estimated attribution probability of the data output by the first classifier 102 and the second classifier 103 and the given teacher attribution probability of the data is higher.
  • the discriminant mismatch evaluation value indicates the difference between the discriminators regarding the estimated attribution probability of the data output by the first discriminator 102 and the second discriminator 103. Further, with respect to the first classifier 101 and the second classifier 102, learning is performed so that the discrimination loss evaluation value is small and the discrimination mismatch evaluation value is large.
  • step 102 each process of the discrimination loss evaluation unit 104, the unknown class classifier 105, and the discrimination mismatch evaluation unit 106 when the number of iterations is less than or equal to the planned number of times will be described.
  • the discrimination loss function has a similarity between the estimated attribution probability of the data output by the first classifier 102 and the second classifier 103 and the given teacher attribution probability of the data by inputting the feature vector output by the feature extractor 101. The higher the value, the smaller the value is output.
  • the discrimination loss function corresponds to Equations 2 and 3 described later. Further, the value corresponds to the identification loss evaluation value.
  • the feature extractor 101 is realized by using a function F that takes data x as an input, outputs a feature vector f, and has a parameter ⁇ .
  • the first classifier 102 can be expressed as a function having a parameter ⁇ 1 that outputs an estimated attribution probability y1 with the feature vector f as an input.
  • the second classifier 103 can be expressed as a function having a parameter ⁇ 2 that outputs an estimated attribution probability y2 with the feature vector f as an input.
  • the function that realizes the first classifier 102 and the second classifier 103 can be expressed as a probability function as the following equation 1 by using the function F that realizes the feature extractor 101. Note that i is used as a subscript to distinguish between the two classifiers.
  • the formula is the probability that yi will appear given ⁇ , ⁇ i, and x.
  • Desirable feature extractors 101, first classifier 102, and second classifier 103 are such that when data s is given from a supervised data set, a teacher attribution probability t to each class appears. That is, the feature extractor 101, the first discriminator 102, and the second discriminator 103 for which the attribution probability that can identify the correct class is obtained. Assuming that the probability of appearance of the teacher attribution probability t corresponding to the data s is p (s, t), learning should be able to determine the parameters ⁇ and ⁇ i so that the following equation 2 becomes small.
  • Eb [a] is the expected value for the probability b of a.
  • the expected value is approximately replaced in the form of sum as shown in the following equation 3.
  • Equation 3 is the discrimination loss function in one example of the present embodiment, and the value evaluated for any S and T is the discrimination loss evaluation value.
  • Equation 3 By reducing Equation 3 with respect to ⁇ , ⁇ 1, and ⁇ 2, it is possible to obtain a desirable feature extractor 101, a first discriminator 102, and a second discriminator 103 that can output t with respect to s. There are various methods for obtaining such ⁇ , ⁇ 1, and ⁇ 2. Simply, if the probability functions representing the function F that realizes the feature extractor and the first classifier 102 and the second classifier 103 are differentiable for the respective parameters ⁇ , ⁇ 1, and ⁇ 2, they are local. It is known that it can be minimized.
  • the feature extractor 101 is a function that outputs the feature vector f of the data under the input of the data x, is differentiable with respect to ⁇ , and is the first classifier.
  • a function that satisfies the conditions that the feature vector f is input and the estimated attribution probabilities y1 and y2 are output and that they are differentiable with respect to ⁇ 1 and ⁇ 2, respectively, may be selected. ..
  • the discrimination mismatch evaluation values of certain estimated attribution probabilities p1 and p2 are expressed by the following equation 4 when p1k and p2k represent the attribution probabilities of the estimated attribution probabilities p1 and p2 for the class k, respectively.
  • K represents the number of known classes to be identified
  • K + 1 represents an unknown class that does not correspond to any of the known classes.
  • the discrimination mismatch evaluation unit 106 evaluates the degree of mismatch of the estimated attribution probabilities y1 and y2 output by the first classifier 102 and the second classifier 103 with respect to the data u of the unsupervised data set 120. That is, the discrimination mismatch evaluation unit 106 uses the discrimination mismatch evaluation value of the estimated attribution probability of the formula 4 as the first classifier for the appearance probability p (u) of the data u of the unsupervised data set shown in the following formula 5.
  • the discrimination mismatch evaluation value Ladv of the estimated attribution probability of 102 and the second classifier 103 is output.
  • Eb [a] is the expected value for the probability b of a.
  • the expected value is approximately replaced in the form of sum as shown in Equation 6 below.
  • Equation 6 is the discriminant mismatch degree in one example of the present embodiment, and the value evaluated for any U is the discriminant mismatch evaluation value.
  • the estimated attribution probabilities y1 and y2 output by the first classifier 102 and the second classifier 103 for the data x can be expressed by using the above equation 1.
  • x) indicating the ambiguity of the attribution probability y output for the average estimated attribution probabilities y of the estimated attribution probabilities y1 and y2 output by the first classifier 102 and the second classifier 103 is given by the following equation 7. It is expressed as.
  • the determination of whether or not the untrained data u of the unsupervised data set is unknown class data is determined by whether or not the value of the information entropy shown in Equation 4 is larger than the predetermined threshold ⁇ . That is, the identification y u, e of whether or not the unsupervised data u in the number of iterations e is unknown class data is expressed by the following equation 8.
  • step S105 The process of dividing the unsupervised data set into the known class data set U I and the unknown class data set U O in step S105 will be described.
  • the identification result as to whether or not the data u of the unsupervised data set is unknown class data when the number of iterations t is t is stored in the unknown class information storage unit 130 as y u and t in step S107 described later. ..
  • step S104 the identification result of the past T times is read from the unknown class information storage unit 130, and the data u of the unsupervised data set identified as the unknown class data of the past T / 2 times or more is the unknown class data set U.
  • Those belonging to O and other data belong to the known class data set U I. That is, when the unsupervised data set is U in the number of iterations e, U is divided into a known class data set U I and an unknown class data set U O according to the following equations 9 and 10.
  • step S105 the evaluation process according to step S105 will be described.
  • substantially the same processing as in step S103 which is the processing when the number of iterations is a certain number or less, is performed.
  • Identifying loss evaluation unit 104 by taking supervised data and the set of teacher membership probability (S, T) and the total sum for the union of the unknown class data set U O, we obtain the identification loss evaluation value. That is, the evaluation value of the discrimination loss evaluation unit 104 is expressed in the form of the following equation 11.
  • the estimated attribution probabilities y 1 and y 2 output by the first classifier 102 and the second classifier 103 for the data x can be expressed by using the above equation 1.
  • the average estimated attribution probability y can be obtained from the estimated attribution probabilities y 1 and y 2 output by the first classifier 102 and the second classifier 103.
  • the attribution probability to the unknown class K + 1 class is the most important among the attribution probabilities for each discrimination class for the average estimated attribution probability y. If it is high data, it is judged as unknown class data, and if not, it is judged as not unknown class data. That is, the identification y u, e of whether or not the unsupervised data u in the number of iterations e is unknown class data is expressed by the following equation 13.
  • [Learning process] The learning process of the learning unit 107 according to step S106 will be described.
  • learning processing is performed so that the values of the discrimination loss evaluation value L s and the discrimination mismatch evaluation value L ad v become smaller.
  • learning processing is performed so that the discrimination loss evaluation value L s is small and the discrimination mismatch evaluation value is large.
  • the problems shown in Equation 14, Equation 15, and Equation 16 are sequentially optimized.
  • the functions of the feature extractor 101, the first classifier 102, and the second classifier 103 so that the discrimination loss evaluation value Ls and the discrimination mismatch evaluation value L adv are differentiable with respect to the parameters ⁇ 1 , ⁇ 2, and ⁇ . Since we chose, it is possible to learn by the error gradient effect method.
  • the unknown class data among the unsupervised data will also be brought closer.
  • the unsupervised unknown class data approaches the supervised data and is identified as one of the originally inappropriate classes of known class data.
  • the unknown class data is detected, and the unknown class data and the detected data are not used for the evaluation of La dv in step S105. This prevents the above-mentioned inappropriate supervised data distribution and the unknown class data distribution from coming close to each other, and makes it possible to learn to detect that the unknown class data is unknown class data.
  • step S108 The process of saving the identification result as to whether the unsupervised data is unknown class data in the process of step S108 will be described.
  • the identification history of whether the data u of the unsupervised data set in the number of iterations e is unknown class data y u and e are obtained by the process of step S103 when the number of iterations e is equal to or less than a certain value.
  • y u and e are obtained by the process of step S305 when the number of iterations e is larger than a certain value. ..
  • the identification results y u and e are stored in the unknown class information storage unit 130 for each of the data u of the unsupervised data set.
  • the learning process from steps S101 to S108 may be repeated until the end condition is satisfied.
  • Arbitrary information may be used for the end condition. For example, “until the value of the objective function is not converted more than a certain amount”, “until the accuracy of the evaluation data prepared separately from the training data does not change more than a certain amount”, etc. good.
  • One or both of the supervised data storage unit 110 and the unsupervised data storage unit 120 may be provided in the learning device 100. Either or both of the unknown class information storage unit 130 and the learning result storage unit 140 may be provided outside the learning device 100. When it is provided externally, data may be acquired by performing communication such as TCP / IP.
  • the learning device 100 may be mounted using one information processing device, or may be distributed and mounted in a plurality of information processing devices.
  • the present invention is applicable to a learning device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided is a learning device provided with: a feature extraction device for outputting a feature amount of input data; a plurality of identification devices for acquiring, on the basis of the feature amount, attribution probabilities to a known class and an unknown class for the data; an unknown class identification device for determining, on the basis of the attribution probabilities acquired by the identification devices, whether the data is in the unknown class or not; an identification inconsistency evaluation unit for outputting a value of an identification inconsistency degree indicating a difference of the attribution probabilities acquired by the plurality of identification devices for the data; and a learning unit for performing repetitive learning of parameters of the feature extraction device and the plurality of identification devices in such a manner that, by using data which is not in the unknown class and to which a training label is not given, the value of the identification inconsistency degree is reduced for the feature extraction device, and the value of the identification inconsistency degree is increased for the plurality of identification devices.

Description

学習装置、予測装置、学習方法及びプログラムLearning equipment, prediction equipment, learning methods and programs
 本発明は、学習装置、予測装置、学習方法及びプログラムの技術に関する。 The present invention relates to a learning device, a prediction device, a learning method, and a program technology.
 機械学習を用いた予測モデル学習には、一般的に教師あり学習と呼ばれる枠組みが用いられる。教師あり学習とは、あるデータとそのデータに対する正解クラスラベルとのペアを大量に用意し、データとクラスラベルとのペアからその関係性を学習する枠組みである。 For predictive model learning using machine learning, a framework generally called supervised learning is used. Supervised learning is a framework in which a large number of pairs of data and correct class labels for the data are prepared, and the relationship is learned from the pairs of data and class labels.
 教師あり学習を実現するためには、大量のデータとクラスラベルとのペアを用意する必要があるが、これを作成することは基本的に高コストである。そこで、すでに教師ありデータが存在する領域(以下「ドメイン」という。)で学習したモデルを、目標とするドメインで活用する方法がとられることがある。例えば、手書き文字を認識する場合に、比較的教師ありデータが容易に得られるデジタルフォントデータを用いて識別器を学習した後に、教師ありデータが少ない(あるいはまったくない)手書き文字データで識別器を再訓練するような方法がとられることがある。 In order to realize supervised learning, it is necessary to prepare a pair of a large amount of data and a class label, but creating this is basically expensive. Therefore, a method may be adopted in which a model learned in an area where supervised data already exists (hereinafter referred to as "domain") is utilized in a target domain. For example, when recognizing handwritten characters, after learning the classifier using digital font data for which supervised data can be obtained relatively easily, the classifier is used for handwritten character data with little (or no) supervised data. Methods such as retraining may be taken.
 しかし学習を行った元のドメイン(以下「元ドメイン」という:先の例の場合はデジタルフォントデータ)と、目標とするドメイン(以下「目標ドメイン」という:先の例の場合は手書き文字データ)とでは、データの生成分布が異なる場合がある。図6は、このような問題の概略を示す図である。図6において、実線で囲まれた領域は元ドメイン10であり、破線で囲まれた領域は目標ドメイン20であり、直線で示された線は識別境界30である。例えば、同じ「あ」という文字でも、デジタルフォントと手書き文字とでは、その形が大きく異なることがある。生成分布が異なる場合、図6のように元ドメイン10で学習した識別境界30は、目標ドメイン20に対して信頼性がないことがある。このような場合、学習したモデルが目標ドメイン20において期待する識別精度を達成することができないという問題が生じる。このように、ドメイン間に差異がある場合における学習問題はドメイン適応問題と呼ばれる。 However, the original domain from which learning was performed (hereinafter referred to as "original domain": digital font data in the case of the previous example) and the target domain (hereinafter referred to as "target domain": handwritten character data in the case of the previous example). And, the data generation distribution may be different. FIG. 6 is a diagram illustrating an outline of such a problem. In FIG. 6, the region surrounded by the solid line is the original domain 10, the region surrounded by the broken line is the target domain 20, and the line shown by the straight line is the identification boundary 30. For example, even if the same character "A" is used, the shape of the digital font and the handwritten character may differ greatly. When the generation distribution is different, the discrimination boundary 30 learned in the original domain 10 as shown in FIG. 6 may not be reliable with respect to the target domain 20. In such a case, there arises a problem that the trained model cannot achieve the discrimination accuracy expected in the target domain 20. In this way, the learning problem when there is a difference between domains is called a domain adaptation problem.
 従来、このようなドメイン適応問題を解決するために、下記のような公知の技術が存在する。特許文献1に開示された技術では、元ドメインにおけるサンプルの生成分布と、目標ドメインにおけるサンプルの生成分布と、の間の分布感距離であるMMDの値を最小化するような元ドメインから目標ドメインへの変換則が学習される。そして、学習された変換則を用いて元ドメインのデータを変換し、変換された元ドメインのデータを用いた教師あり学習により、モデルの学習が行われる。 Conventionally, in order to solve such a domain adaptation problem, the following known techniques exist. In the technique disclosed in Patent Document 1, the original domain to the target domain that minimizes the value of MMD, which is the distribution feeling distance between the sample generation distribution in the original domain and the sample generation distribution in the target domain. The conversion rule to is learned. Then, the data of the original domain is converted using the learned conversion rule, and the model is learned by supervised learning using the converted data of the original domain.
 非特許文献1では、元ドメインのデータと目標ドメインのデータとについて、ドメインの識別が困難になるような特徴空間へ射影する特徴抽出器と、その特徴空間での元ドメインのデータとそのデータに付与されたクラスラベルとの関係性と、が同時に学習される。元ドメインのデータと目標ドメインのデータとを特徴空間上識別困難にすることは、両者の生成分布を特徴空間上で近づけることを意味する。このような処理は、例えば、図6の状態から図7の状態に変化させることを意味してもよい。これにより、元ドメインのデータで教師あり学習を行うことによって得られたモデルについて、目標ドメインのデータへの予測精度が改善される。 In Non-Patent Document 1, the data of the original domain and the data of the target domain are classified into the feature extractor that projects the data of the original domain and the data of the target domain into the feature space that makes it difficult to identify the domain, and the data of the original domain and the data in the feature space. The relationship with the given class label is learned at the same time. Making it difficult to distinguish between the data of the original domain and the data of the target domain in the feature space means that the generation distributions of both are brought closer to each other in the feature space. Such processing may mean, for example, changing from the state of FIG. 6 to the state of FIG. 7. This improves the prediction accuracy of the target domain data for the model obtained by supervised learning with the original domain data.
 非特許文献2では、非特許文献1で学習する共通特徴空間を、特徴抽出器とそれに連なる2つの識別器を用いて学習する。非特許文献1で学習されたモデルよりも、非特許文献2で学習されたモデルの方が、目標ドメインデータへの予測精度が高くなることが知られている。 In Non-Patent Document 2, the common feature space learned in Non-Patent Document 1 is learned using a feature extractor and two classifiers connected to the feature extractor. It is known that the model learned in Non-Patent Document 2 has higher prediction accuracy to the target domain data than the model learned in Non-Patent Document 1.
 元ドメインと目標ドメインとの間の差異に依存して、さまざまな付随問題が生じることがある。付随問題の一つとして、元ドメインに与えられているクラス以外のデータが、目標ドメインに存在する場合に生じる問題がある。先の手書き文字認識の場合を例にとると、デジタルフォントデータには「あ」、「い」、「う」しか存在しないにもかかわらず、手書き文字データには「え」、「お」が含まれるような場合にこのような問題が生じる。元ドメインによってラベルが付与されているクラスを既知クラス(先の例の場合は「あ」、「い」、「う」)と称し、それ以外のクラスを未知クラス(先の例の場合は「え」、「お」)と称する。通常、教師あり学習をした識別器は、未知クラスに属するデータが入力された場合であっても、既知クラスのいずれかのクラスに属すると予測してしまう。このような動作により、文字認識の精度が低下してしまうという問題が生じうる。 Various incidental problems may occur depending on the difference between the original domain and the target domain. As one of the incidental problems, there is a problem that occurs when data other than the class given to the original domain exists in the target domain. Taking the case of handwritten character recognition as an example, although there are only "a", "i", and "u" in the digital font data, "e" and "o" are in the handwritten character data. Such a problem arises when it is included. Classes labeled by the original domain are called known classes (in the previous example, "a", "i", "u"), and other classes are called unknown classes (in the previous example, "". E "," O "). Normally, a discriminator that has undergone supervised learning predicts that it belongs to one of the known classes even if data belonging to an unknown class is input. Such an operation may cause a problem that the accuracy of character recognition is lowered.
 また、別の問題として以下のような問題もある。通常、元ドメインと目標ドメインとはそれぞれ単一のドメインから構成されることが想定されている。しかし、元ドメインと目標ドメインとのいずれもが、複数のドメインにより形成されうる場合がある。例えば、手書き文字データが、異なる複数の個人により書かれていた場合や、異なる筆記用具を用いて書かれていた場合には、元ドメインや目標ドメインが複数のドメインにより形成されるおそれがある。この場合、それぞれ生成分布が変化するため、目標ドメイン内に複数のドメインが内在すると考えることができる。ドメインが複数のドメインにより形成されている場合、非特許文献1のような方法では、期待される予測精度を実現できない問題が生じる。 There are also the following problems as another problem. Normally, it is assumed that the original domain and the target domain each consist of a single domain. However, both the original domain and the target domain may be formed by multiple domains. For example, if the handwritten character data is written by a plurality of different individuals, or if it is written using a different writing instrument, the original domain or the target domain may be formed by the plurality of domains. In this case, since the generation distribution changes, it can be considered that a plurality of domains are inherent in the target domain. When the domain is formed by a plurality of domains, there arises a problem that the expected prediction accuracy cannot be achieved by the method as in Non-Patent Document 1.
 元ドメインに複数のドメインが内在する問題に対応する技術に関する文献として、非特許文献4がある。非特許文献4に開示された技術では、元ドメインに内在する各ドメインと目標ドメインとの間で識別が困難になるような特徴が学習される。また反対に、目標ドメインに複数のドメインが内在する問題に対応する技術に関する文献として、非特許文献5がある。非特許文献5に開示された技術では、目標ドメインに内在する複数ドメインの領域間でドメインの識別が困難になるような特徴が学習される。 There is a non-patent document 4 as a document relating to a technique corresponding to a problem in which a plurality of domains are inherent in the original domain. In the technique disclosed in Non-Patent Document 4, features that make it difficult to distinguish between each domain inherent in the original domain and the target domain are learned. On the contrary, there is a non-patent document 5 as a document relating to a technique for dealing with a problem in which a plurality of domains are inherent in a target domain. In the technique disclosed in Non-Patent Document 5, a feature that makes it difficult to identify a domain among the regions of a plurality of domains inherent in the target domain is learned.
特開2019-101789号公報Japanese Unexamined Patent Publication No. 2019-101789
 元ドメインと目標ドメインとで生成分布が異なるドメイン適応問題と、それに付随して発生する種々の付随問題を解決するために、それぞれ非特許文献3、非特許文献4、非特許文献5のような技術が提案されてきた。しかしながら、それぞれの技術は、その技術が考慮している付随問題に対しては良好な性能を示すものの、他の付随問題に対しては有効ではない。 In order to solve the domain adaptation problem in which the generation distribution differs between the original domain and the target domain and various incidental problems that occur with it, such as Non-Patent Document 3, Non-Patent Document 4, and Non-Patent Document 5, respectively. Technology has been proposed. However, although each technique exhibits good performance for the incidental problems it considers, it is not effective for other incidental problems.
 例えば、元ドメインに複数のドメインが内在する問題に対応する技術である非特許文献4の技術を、目標ドメインに複数のドメインが内在する問題に対して適用しても、十分な性能が得られない。 For example, even if the technique of Non-Patent Document 4, which is a technique for dealing with a problem in which a plurality of domains are inherent in the original domain, is applied to a problem in which a plurality of domains are inherent in the target domain, sufficient performance can be obtained. No.
 一般的に、処理の対象においてどのような問題が存在するかを事前に知ることができるケースは稀である。そのため、どのような問題に対する技術を適用すればよいかを判断することが困難である。また、上述した問題が複数混在するようなケースには、技術の適用が困難になるという問題も存在する。 In general, it is rare that it is possible to know in advance what kind of problem exists in the processing target. Therefore, it is difficult to determine what kind of problem the technology should be applied to. Further, in the case where a plurality of the above-mentioned problems are mixed, there is a problem that it becomes difficult to apply the technology.
 上記事情に鑑み、本発明は、このような問題を鑑みてなされたものであり、ドメインに関するより広範な問題に対して良好な性能を達成する技術の提供を目的としている。 In view of the above circumstances, the present invention has been made in view of such problems, and an object of the present invention is to provide a technique for achieving good performance for a wider range of problems related to domains.
 本発明の一態様は、入力されたデータの特徴量を出力する特徴抽出器と、前記特徴量に基づいて、前記データについて既知クラス及び未知クラスへの帰属確率を取得する複数の識別器と、前記識別器によって取得された前記帰属確率に基づいて、前記データが未知クラスであるか否か判断する未知クラス識別器と、前記データに対して、前記複数の識別器によって得られたそれぞれの帰属確率の違いを示す識別不一致度の値を出力する識別不一致評価部と、前記未知クラスではなく、且つ、教師ラベルが付与されていないデータを用いて、前記特徴抽出器については前記識別不一致度の値を小さくするように、複数の前記識別器については前記識別不一致度の値を大きくするように、前記特徴抽出器及び複数の前記識別器のパラメータの反復学習を行う学習部と、を備える学習装置である。 One aspect of the present invention includes a feature extractor that outputs a feature amount of input data, and a plurality of classifiers that acquire the attribution probability of the data to a known class and an unknown class based on the feature amount. An unknown class classifier that determines whether or not the data is an unknown class based on the attribution probability acquired by the classifier, and each attribution obtained by the plurality of classifiers with respect to the data. Using the identification mismatch evaluation unit that outputs the value of the discrimination mismatch degree indicating the difference in probability and the data that is not the unknown class and is not given the teacher label, the feature extractor has the discrimination mismatch degree. Learning including a learning unit for iteratively learning the parameters of the feature extractor and the plurality of the classifiers so as to reduce the value and to increase the value of the discriminant mismatch degree for the plurality of the classifiers. It is a device.
 本発明の一態様は、上記の学習装置によって得られたパラメータに基づいて、入力されたデータの特徴量を出力する特徴抽出器と、上記の学習装置によって得られたパラメータと前記特徴量とに基づいて、前記データについて既知クラス及び未知クラスへの帰属確率を取得する識別器と、を備える予測装置である。 One aspect of the present invention includes a feature extractor that outputs a feature amount of input data based on the parameters obtained by the above learning device, and the parameters and the feature amount obtained by the above learning device. Based on this, it is a prediction device including a classifier that acquires the attribution probability of the data to a known class and an unknown class.
 本発明の一態様は、特徴抽出器を用いて、入力されたデータの特徴量を出力する特徴抽出ステップと、複数の識別器を用いて、前記特徴量に基づいて、前記データについて既知クラス及び未知クラスへの帰属確率をそれぞれ取得する識別ステップと、取得された前記帰属確率に基づいて、前記データが未知クラスであるか否か判断する未知クラス識別ステップと、前記データに対して、前記複数の識別器によって得られたそれぞれの帰属確率の違いを示す識別不一致度の値を出力する識別不一致評価ステップと、前記未知クラスではなく、且つ、教師ラベルが付与されていないデータを用いて、前記特徴抽出器については前記識別不一致度の値を小さくするように、複数の前記識別器については前記識別不一致度の値を大きくするように、前記特徴抽出器及び複数の前記識別器のパラメータの反復学習を行う学習ステップと、を有する学習方法である。 One aspect of the present invention is a feature extraction step that outputs a feature amount of input data using a feature extractor, and a known class and a known class for the data based on the feature amount using a plurality of classifiers. An identification step for acquiring the attribution probability to the unknown class, an unknown class identification step for determining whether or not the data is an unknown class based on the acquired attribution probability, and a plurality of the data. Using the discrimination mismatch evaluation step that outputs the value of the discrimination mismatch degree indicating the difference in the respective attribution probabilities obtained by the classifier, and the data that is not the unknown class and is not given the teacher label. Iterating the parameters of the feature extractor and the plurality of discriminators so as to reduce the value of the discriminant discriminant for the feature extractor and to increase the value of the discriminant discrepancy for the plurality of discriminators. It is a learning method having a learning step for learning.
 本発明の一態様は、上記の学習装置としてコンピューターを動作させるためのプログラムである。 One aspect of the present invention is a program for operating a computer as the above-mentioned learning device.
 本発明により、このような問題を鑑みてなされたものであり、ドメインに関するより広範な問題に対して良好な性能を達成することが可能となる。 The present invention has been made in view of such problems, and it is possible to achieve good performance for a wider range of problems related to domains.
本実施形態の概略を示す図である。It is a figure which shows the outline of this embodiment. 本実施形態の概略を示す図である。It is a figure which shows the outline of this embodiment. 本実施形態に係る学習装置100の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of the learning apparatus 100 which concerns on this embodiment. 本実施形態に係る予測装置200の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of the prediction apparatus 200 which concerns on this embodiment. 学習装置100の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the learning apparatus 100. 従来技術の例を示す図である。It is a figure which shows the example of the prior art. 従来技術の例を示す図である。It is a figure which shows the example of the prior art.
<概略>
 まず、本実施形態の概略について説明する。本実施形態は、未知クラスが存在する問題(以下「第一問題」という。)が存在した場合であっても適切に動作する。さらに、本実施形態は、各ドメインのデータに部分的にしかラベルづけがされていないという問題(以下「第二問題」という。)や、データのドメイン帰属情報が未知であるという問題(以下「第三問題」という。)が存在する場合であっても、適切に動作するように構成されてもよい。また、これら3つの付随問題のうち複数の問題が内在している場合であっても適切に動作するように構成されてもよい。
<Summary>
First, the outline of the present embodiment will be described. This embodiment operates appropriately even when there is a problem in which an unknown class exists (hereinafter referred to as "first problem"). Further, in this embodiment, the problem that the data of each domain is only partially labeled (hereinafter referred to as "second problem") and the problem that the domain attribution information of the data is unknown (hereinafter referred to as "" Even if there is a "third problem"), it may be configured to operate properly. Moreover, even if a plurality of problems among these three incidental problems are inherent, they may be configured to operate appropriately.
 より具体的には以下の通りである。図1及び図2は、本実施形態の概略を示す図である。図1及び図2では、実線で囲まれた領域は元ドメイン10であり、破線で囲まれた領域は目標ドメイン20であり、直線で示された線は識別境界30である。線分40は、既知クラスと未知クラスとの境界として特定された情報を示す。矢印50は、ドメイン適応を構成していることを示す。 More specifically, it is as follows. 1 and 2 are diagrams showing an outline of the present embodiment. In FIGS. 1 and 2, the region surrounded by the solid line is the original domain 10, the region surrounded by the broken line is the target domain 20, and the line shown by the straight line is the identification boundary 30. The line segment 40 shows the information specified as the boundary between the known class and the unknown class. Arrow 50 indicates that it constitutes domain adaptation.
 本実施形態は、未知クラスが存在するという第一問題に対しては、教師ラベルの与えられていないデータの中から未知クラスに属するものを識別して特定することで対処する。本実施形態は、第二問題及び第三問題に対しては、ラベル付けがされているデータを元ドメイン、教師ラベルが与えられていないデータのうち既知クラスに属するものを目標ドメインとみなしたドメイン適応を構成することで対処する。 In this embodiment, the first problem that an unknown class exists is dealt with by identifying and identifying the data belonging to the unknown class from the data to which the teacher label is not given. In this embodiment, for the second and third problems, the labeled data is regarded as the original domain, and the data not given the teacher label and belonging to the known class is regarded as the target domain. Address by configuring adaptation.
<学習装置の構成例>
 次に、本実施形態に係る学習装置の構成について説明する。図3は、本実施形態に係る学習装置100の一例を示す機能ブロック図である。学習装置100は、例えばパーソナルコンピューターやサーバー装置等の情報処理装置を用いて構成される。学習装置100は、制御部90、未知クラス情報記憶部130及び学習結果記憶部140を備える。制御部90は、CPU(Central Processing Unit)等のプロセッサーとメモリーとを用いて構成される。制御部90は、プロセッサーがプログラムを実行することによって、特徴抽出器101、第一識別器102、第二識別器103、識別損失評価部104、未知クラス識別器105、識別不一致評価部106及び学習部107として機能する。なお、制御部90の各機能の全て又は一部は、ASIC(Application Specific Integrated Circuit)やPLD(Programmable Logic Device)やFPGA(Field Programmable Gate Array)等のハードウェアを用いて実現されても良い。上記のプログラムは、コンピューター読み取り可能な記録媒体に記録されても良い。コンピューター読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ROM、CD-ROM、半導体記憶装置(例えばSSD:Solid State Drive)等の可搬媒体、コンピューターシステムに内蔵されるハードディスクや半導体記憶装置等の記憶装置である。上記のプログラムは、電気通信回線を介して送信されてもよい。
<Sample configuration of learning device>
Next, the configuration of the learning device according to the present embodiment will be described. FIG. 3 is a functional block diagram showing an example of the learning device 100 according to the present embodiment. The learning device 100 is configured by using an information processing device such as a personal computer or a server device. The learning device 100 includes a control unit 90, an unknown class information storage unit 130, and a learning result storage unit 140. The control unit 90 is configured by using a processor such as a CPU (Central Processing Unit) and a memory. The control unit 90 includes a feature extractor 101, a first classifier 102, a second classifier 103, a discrimination loss evaluation unit 104, an unknown class classifier 105, a discrimination mismatch evaluation unit 106, and a learning unit by executing a program by the processor. It functions as a unit 107. All or part of each function of the control unit 90 may be realized by using hardware such as an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), or an FPGA (Field Programmable Gate Array). The above program may be recorded on a computer-readable recording medium. Computer-readable recording media include, for example, flexible disks, magneto-optical disks, ROMs, CD-ROMs, portable media such as semiconductor storage devices (eg SSD: Solid State Drive), hard disks built into computer systems, and semiconductor storage. It is a storage device such as a device. The above program may be transmitted over a telecommunication line.
 学習装置100は、教師ありデータ記憶部110及び教師なしデータ記憶部120からデータを取得して動作する。教師ありデータ記憶部110は、磁気ハードディスク装置や半導体記憶装置等の記憶装置、CD-ROM等の記録媒体等のようにデータを記憶できる機器又は媒体を用いて構成される。教師ありデータ記憶部110は、教師ありデータ集合を記憶する。教師ありデータ集合は、所望のクラスラベルが付与されたデータの集合である。教師なしデータ記憶部120は、磁気ハードディスク装置や半導体記憶装置等の記憶装置、CD-ROM等の記録媒体等のようにデータを記憶できる機器又は媒体を用いて構成される。教師なしデータ記憶部120は、教師なしデータ集合を記憶する。教師なしデータ集合は、所望のクラスラベルが付与されていないデータの集合である。 The learning device 100 operates by acquiring data from the supervised data storage unit 110 and the unsupervised data storage unit 120. The supervised data storage unit 110 is configured by using a device or medium capable of storing data such as a storage device such as a magnetic hard disk device or a semiconductor storage device, a recording medium such as a CD-ROM, or the like. The supervised data storage unit 110 stores a supervised data set. A supervised data set is a set of data with the desired class label. The unsupervised data storage unit 120 is configured by using a device or medium capable of storing data such as a storage device such as a magnetic hard disk device or a semiconductor storage device, a recording medium such as a CD-ROM, or the like. The unsupervised data storage unit 120 stores an unsupervised data set. An unsupervised data set is a set of data that does not have the desired class label.
 特徴抽出器101は、教師ありデータ集合及び教師なしデータ集合を入力として受け取り、各データから特徴ベクトルを抽出する。特徴抽出器101は、抽出された特徴ベクトルを第一識別器102及び第二識別器103に出力する。特徴抽出器101は、このような特徴ベクトルを抽出することができるパラメータを持つ関数に基づいて動作する。特徴ベクトルとは、例えばデータの特徴を数ベクトルで表したものである。言い換えると、特徴ベクトルは、必要なデータの特徴をn次元の要素を持つベクトルで表したものである。nは任意の整数値であり、例えばn=512であってもよい。なお、特徴ベクトルは、便宜上ベクトルの形式を持つものとして説明するが、形式は本発明の要点とは無関係であり、任意の形式をとることができる。特徴抽出器101は、特徴ベクトルを出力する度に、学習結果記憶部140に記憶されているパラメータを読み込んで特徴ベクトルを出力する。 The feature extractor 101 receives a supervised data set and an unsupervised data set as inputs, and extracts a feature vector from each data. The feature extractor 101 outputs the extracted feature vector to the first classifier 102 and the second classifier 103. The feature extractor 101 operates based on a function having parameters capable of extracting such a feature vector. The feature vector is, for example, a numerical vector representing the features of the data. In other words, the feature vector is a vector representing the features of the required data with n-dimensional elements. n is an arbitrary integer value, and may be, for example, n = 512. The feature vector will be described as having a vector form for convenience, but the form is irrelevant to the main point of the present invention and can take any form. Each time the feature extractor 101 outputs the feature vector, the feature extractor 101 reads the parameter stored in the learning result storage unit 140 and outputs the feature vector.
 第一識別器102は、特徴抽出器101によって出力された特徴ベクトルを入力として受け取る。第一識別器102は、入力された特徴ベクトルの元データに対する各クラスと未知クラスへの帰属確率の推定値(以下「推定帰属確率」という。)を出力する。推定帰属確率は、データが各既知クラス及び未知クラスに帰属する尤もらしさを表す確率である。第一識別器102は、このような推定帰属確率を出力することができるパラメータを持つ関数に基づいて動作する。第一識別器102は、推定帰属確率を出力する度に、学習結果記憶部140に記憶されているパラメータを読み込んで推定帰属確率を出力する。 The first classifier 102 receives the feature vector output by the feature extractor 101 as an input. The first classifier 102 outputs an estimated value (hereinafter referred to as "estimated attribution probability") of the attribution probability to each class and the unknown class with respect to the original data of the input feature vector. The estimated attribution probability is the probability that the data is likely to belong to each known and unknown class. The first classifier 102 operates on the basis of a function having a parameter capable of outputting such an estimated attribution probability. Each time the first classifier 102 outputs the estimated attribution probability, the first classifier 102 reads the parameter stored in the learning result storage unit 140 and outputs the estimated attribution probability.
 第二識別器103は、特徴抽出器101によって出力された特徴ベクトルを入力として受け取る。第二識別器103は、入力された特徴ベクトルの元データに対する各クラスと未知クラスへの帰属確率の推定値(推定帰属確率)を出力する。第二識別器103は、このような推定帰属確率を出力することができるパラメータを持つ関数に基づいて動作する。第二識別器103は、推定帰属確率を出力する度に、学習結果記憶部140に記憶されているパラメータを読み込んで推定帰属確率を出力する。なお、第一識別器102及び第二識別器103には同一の特徴ベクトルが入力される。 The second classifier 103 receives the feature vector output by the feature extractor 101 as an input. The second classifier 103 outputs an estimated value (estimated attribution probability) of the attribution probability to each class and the unknown class with respect to the original data of the input feature vector. The second classifier 103 operates based on a function having a parameter capable of outputting such an estimated attribution probability. Each time the second classifier 103 outputs the estimated attribution probability, the second classifier 103 reads the parameter stored in the learning result storage unit 140 and outputs the estimated attribution probability. The same feature vector is input to the first classifier 102 and the second classifier 103.
 特徴抽出器101、第一識別器102及び第二識別器103に適用される関数は、パラメータに対して微分可能であるものであれば、任意のものを用いることができる。本実施形態では、CNN(Convolutional Neural Network)が用いられる。ただし、CNNは一例に過ぎず、これに限定される必要は無い。 As the function applied to the feature extractor 101, the first classifier 102 and the second classifier 103, any function can be used as long as it is differentiable with respect to the parameter. In this embodiment, CNN (Convolutional Neural Network) is used. However, CNN is only an example, and it is not necessary to be limited to this.
 識別損失評価部104は、処理対象のデータと、この処理対象のデータが未知クラスであるか否かを示す情報と、処理対象のデータに対して第一識別器102及び第二識別器103が出力した推定帰属確率と、処理対象のデータに対する所望の帰属確率(以下「教師帰属確率」という。)と、を入力として受ける。識別損失評価部104は、これらの差異を表す第一の損失関数である識別損失関数の値(以下「識別損失評価値」という。)を求める。教師帰属確率とは、学習の際に正解となるクラスラベルに応じた帰属確率である。 The discrimination loss evaluation unit 104 determines the data to be processed, the information indicating whether or not the data to be processed is an unknown class, and the first classifier 102 and the second classifier 103 to the data to be processed. The output estimated attribution probability and the desired attribution probability for the data to be processed (hereinafter referred to as "teacher attribution probability") are received as inputs. The discrimination loss evaluation unit 104 obtains the value of the discrimination loss function (hereinafter referred to as “discrimination loss evaluation value”), which is the first loss function representing these differences. The teacher attribution probability is the attribution probability according to the class label that is the correct answer during learning.
 未知クラス識別器105は、処理対象のデータと、処理対象のデータに対して第一識別器102及び第二識別器103が出力した推定帰属確率と、を入力として受ける。未知クラス識別器105は、処理対象のデータが未知クラスであるか否かについて識別する。未知クラス識別器105は、識別結果を示す情報(以下「未知クラス情報」という。)を未知クラス情報記憶部130に記録する。未知クラス情報記憶部130に記録された情報は、識別損失評価部104及び識別不一致評価部106において使用される。 The unknown class classifier 105 receives the data to be processed and the estimated attribution probability output by the first classifier 102 and the second classifier 103 with respect to the data to be processed as inputs. The unknown class classifier 105 identifies whether or not the data to be processed is an unknown class. The unknown class classifier 105 records information indicating the discrimination result (hereinafter referred to as “unknown class information”) in the unknown class information storage unit 130. The information recorded in the unknown class information storage unit 130 is used by the identification loss evaluation unit 104 and the identification mismatch evaluation unit 106.
 識別不一致評価部106は、処理対象のデータと、処理対象のデータに対して第一識別器102及び第二識別器103が出力した推定帰属確率と、を入力として受ける。識別不一致評価部106は、第一識別器102及び第二識別器103の推定帰属確率の不一致度を示す値(以下「識別不一致度評価値」という。)を取得する。 The discrimination mismatch evaluation unit 106 receives the data to be processed and the estimated attribution probability output by the first classifier 102 and the second classifier 103 to the data to be processed as inputs. The discrimination mismatch evaluation unit 106 acquires a value indicating the degree of mismatch of the estimated attribution probabilities of the first classifier 102 and the second classifier 103 (hereinafter referred to as “discrimination mismatch evaluation value”).
 学習部107は、識別損失評価部104によって得られた識別損失関数と、識別不一致評価部106によって得られた識別不一致度評価値と、を入力として受ける。学習部107は、入力された値を用いて、特徴抽出器101、第一識別器102及び第二識別器103のパラメータの反復学習を行う。学習部107は、反復学習によって得られた特徴抽出器101、第一識別器102及び第二識別器103のパラメータを、学習結果記憶部140に記録する。特徴抽出器101に関する反復学習は、識別損失評価値及び識別不一致度評価値が共に小さくなるように行われる。第一識別器102及び第二識別器103に関する反復学習は、識別損失評価値が小さくなるように且つ識別不一致度評価値が大きくなるように行われる。 The learning unit 107 receives the discrimination loss function obtained by the discrimination loss evaluation unit 104 and the discrimination mismatch evaluation value obtained by the discrimination mismatch evaluation unit 106 as inputs. The learning unit 107 performs iterative learning of the parameters of the feature extractor 101, the first classifier 102, and the second classifier 103 using the input values. The learning unit 107 records the parameters of the feature extractor 101, the first classifier 102, and the second classifier 103 obtained by iterative learning in the learning result storage unit 140. The iterative learning for the feature extractor 101 is performed so that both the discrimination loss evaluation value and the discrimination mismatch evaluation value become small. The iterative learning for the first classifier 102 and the second classifier 103 is performed so that the discrimination loss evaluation value becomes small and the discrimination mismatch evaluation value becomes large.
<予測装置の構成例>
 次に、本実施形態に係る予測装置の構成について説明する。図4は、本実施形態に係る予測装置200の一例を示す機能ブロック図である。予測装置200は、例えばパーソナルコンピューターやサーバー装置等の情報処理装置を用いて構成される。予測装置200は、制御部91及び記憶部230を備える。制御部91は、CPU等のプロセッサーとメモリーとを用いて構成される。制御部91は、プロセッサーがプログラムを実行することによって、特徴抽出器201及び識別器202として機能する。なお、制御部91の各機能の全て又は一部は、ASICやPLDやFPGA等のハードウェアを用いて実現されても良い。上記のプログラムは、コンピューター読み取り可能な記録媒体に記録されても良い。コンピューター読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ROM、CD-ROM、半導体記憶装置(例えばSSD)等の可搬媒体、コンピューターシステムに内蔵されるハードディスクや半導体記憶装置等の記憶装置である。上記のプログラムは、電気通信回線を介して送信されてもよい。
<Configuration example of prediction device>
Next, the configuration of the prediction device according to the present embodiment will be described. FIG. 4 is a functional block diagram showing an example of the prediction device 200 according to the present embodiment. The prediction device 200 is configured by using an information processing device such as a personal computer or a server device. The prediction device 200 includes a control unit 91 and a storage unit 230. The control unit 91 is configured by using a processor such as a CPU and a memory. The control unit 91 functions as a feature extractor 201 and a classifier 202 when the processor executes a program. In addition, all or a part of each function of the control unit 91 may be realized by using hardware such as ASIC, PLD and FPGA. The above program may be recorded on a computer-readable recording medium. Computer-readable recording media include, for example, flexible disks, magneto-optical disks, ROMs, CD-ROMs, portable media such as semiconductor storage devices (for example, SSDs), and storage of hard disks and semiconductor storage devices built in computer systems. It is a device. The above program may be transmitted over a telecommunication line.
 記憶部230は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。記憶部230は、学習装置100の学習部107によって行われた反復学習で得られた学習結果としてのパラメータを記憶する。 The storage unit 230 is configured by using a storage device such as a magnetic hard disk device or a semiconductor storage device. The storage unit 230 stores the parameters as the learning result obtained by the iterative learning performed by the learning unit 107 of the learning device 100.
 特徴抽出器201は、処理対象のデータ(予測対象のデータ)240を受けると、記憶部230からパラメータを読み出し、パラメータに基づき動作する。特徴抽出器201は、処理対象のデータ240について特徴ベクトルを出力する。識別器202は、記憶部230からパラメータを読み出し、パラメータに基づいて動作する。識別器202は、特徴抽出器201によって得られた特徴ベクトルに基づいて、処理対象のデータ240について推定帰属確率を求める。識別器202の出力は、処理対象のデータ240についての、各クラスに対する推定帰属確率そのものであってもよいし、どのクラスに属するかの予測結果を示す情報であってもよい。 When the feature extractor 201 receives the data to be processed (data to be predicted) 240, it reads out the parameters from the storage unit 230 and operates based on the parameters. The feature extractor 201 outputs a feature vector for the data 240 to be processed. The classifier 202 reads a parameter from the storage unit 230 and operates based on the parameter. The classifier 202 obtains an estimated attribution probability for the data 240 to be processed based on the feature vector obtained by the feature extractor 201. The output of the classifier 202 may be the estimated attribution probability itself for each class of the data 240 to be processed, or may be information indicating the prediction result of which class it belongs to.
<学習装置の動作例>
 図5は、学習装置100の動作例を示すフローチャートである。次に、学習装置100の動作例について説明する。学習装置100は、教師ありデータ集合110及び教師なしデータ集合120を受けて、図5に示される学習処理ルーチンを実行する。
<Operation example of learning device>
FIG. 5 is a flowchart showing an operation example of the learning device 100. Next, an operation example of the learning device 100 will be described. The learning device 100 receives the supervised data set 110 and the unsupervised data set 120, and executes the learning processing routine shown in FIG.
 まず、学習装置100の制御部90は、一つ以上の教師ありデータ集合110及び教師なしデータ集合120を読み込む(ステップS101)。次に、制御部90は、学習の反復回数が予め定められた予定回数以下であるか否かの分岐判定を行う(ステップS102)。反復回数が予定回数以下であれば、ステップS103の処理が実行される。一方、反復回数が予定回数より多ければステップS104の処理が実行される。 First, the control unit 90 of the learning device 100 reads one or more supervised data sets 110 and unsupervised data sets 120 (step S101). Next, the control unit 90 makes a branch determination as to whether or not the number of learning iterations is equal to or less than a predetermined number of scheduled times (step S102). If the number of iterations is less than or equal to the planned number, the process of step S103 is executed. On the other hand, if the number of iterations is larger than the planned number, the process of step S104 is executed.
 ここで、ステップS102における分岐処理の意義について説明する。この分岐処理によって、未知クラスの識別方法が変化する。第一識別器102及び第二識別器103が既知クラスK個と未知クラスとを合わせた(K+1)個のクラスを識別できるように学習する。しかし、既知クラスに関しては教師ありデータとしてデータとその教師帰属確率の組が得られるのに対して、未知クラスに関してはどのデータが未知クラスであるかは不明である。そこで、反復回数が予定回数以下である場合は、教師なしデータに対して未知クラスの識別を行い、結果を識別履歴として記録する。一方、反復回数が予定回数より多い場合は、(K+1)個のクラスを識別できるように第一識別機102及び第二識別器103を学習し、未知クラスの識別結果を識別履歴として記録する。 Here, the significance of the branch processing in step S102 will be described. This branching process changes the method of identifying unknown classes. The first classifier 102 and the second classifier 103 are trained so as to be able to discriminate (K + 1) classes in which K known classes and unknown classes are combined. However, for a known class, a pair of data and its teacher attribution probability can be obtained as supervised data, whereas for an unknown class, it is unknown which data is the unknown class. Therefore, when the number of iterations is less than the planned number, the unknown class is identified for the unsupervised data, and the result is recorded as the identification history. On the other hand, when the number of iterations is larger than the planned number, the first classifier 102 and the second classifier 103 are learned so that (K + 1) classes can be identified, and the identification result of the unknown class is recorded as the identification history.
 反復回数が予定階数以下である場合には、未知クラスの識別を行うことで未知クラスの教師帰属確率を推定することができるが、誤りも含まれてしまう。そのため、識別履歴を記録しながら(K+1)個のクラスの識別を学習することで、誤りの少ない未知クラス識別が可能になる。 When the number of iterations is less than the planned rank, the teacher attribution probability of the unknown class can be estimated by identifying the unknown class, but an error is also included. Therefore, by learning the identification of (K + 1) classes while recording the identification history, it is possible to identify unknown classes with few errors.
 ステップS103では、教師ありデータ集合110及び教師なしデータ集合120に対して特徴抽出器101、第一識別器102、第二識別器103及び未知クラス識別器105を適用して、識別損失評価値、識別不一致度評価値、未知クラスか否かの判定が得られる。 In step S103, the feature extractor 101, the first classifier 102, the second classifier 103, and the unknown class classifier 105 are applied to the supervised data set 110 and the unsupervised data set 120, and the discrimination loss evaluation value is determined. The discrimination mismatch evaluation value and the judgment of whether or not it is an unknown class can be obtained.
 ステップS104では、教師なしデータ集合120について未知クラス識別履歴を読み込む。そして、ステップS105において、教師ありデータ集合110及び教師なしデータ集合120、未知クラス識別履歴に対して、特徴抽出器101、第一識別器102、第二識別器103及び未知クラス識別器105を適用して、識別損失評価値、識別不一致度評価値、未知クラスか否かの判定が得られる。 In step S104, the unknown class identification history is read for the unsupervised data set 120. Then, in step S105, the feature extractor 101, the first classifier 102, the second classifier 103, and the unknown class classifier 105 are applied to the supervised data set 110, the unsupervised data set 120, and the unknown class identification history. Then, the discrimination loss evaluation value, the discrimination mismatch evaluation value, and the determination of whether or not the class is unknown can be obtained.
 ステップS103又はステップS105の処理が終わると、学習部107は、識別損失評価値及び識別不一致度評価値に基づいて、特徴抽出器101、第一識別器102及び第二識別器103のパラメータの値(学習結果記憶部140に記録される値)をそれぞれ更新する(ステップS106)。 When the process of step S103 or step S105 is completed, the learning unit 107 sets the parameter values of the feature extractor 101, the first classifier 102, and the second classifier 103 based on the discrimination loss evaluation value and the discrimination mismatch evaluation value. (Values recorded in the learning result storage unit 140) are updated (step S106).
 特徴抽出器101、第一識別器102、第二識別器103のパラメータを学習結果記憶部140に格納する。次に、未知クラス識別器105は、ステップS103又はS105で得られた未知クラスデータであるかの識別結果を未知クラス情報記憶部130に記録する(ステップS107)。 The parameters of the feature extractor 101, the first classifier 102, and the second classifier 103 are stored in the learning result storage unit 140. Next, the unknown class classifier 105 records in the unknown class information storage unit 130 the discrimination result of whether or not the unknown class data is obtained in step S103 or S105 (step S107).
 そして、制御部90は、終了条件を満たすかを判定する(ステップS108)。終了条件を満たしている場合(ステップS108-YES)、制御部90は処理を終了する。終了条件を満たしていない場合(ステップS108-NO)、制御部90は、ステップS101に戻って処理を繰り返す。 Then, the control unit 90 determines whether the end condition is satisfied (step S108). If the end condition is satisfied (step S108-YES), the control unit 90 ends the process. If the end condition is not satisfied (step S108-NO), the control unit 90 returns to step S101 and repeats the process.
 以上説明した反復学習により、特徴抽出器101、第一識別器102及び第二識別器103のパラメータが学習される。特徴抽出器101に関しては、識別損失評価値及び識別不一致度評価値を用いて、識別損失評価値と識別不一致度評価値が小さくなるように学習が行われる。識別損失関数は、第一識別器102及び第二識別器103が出力したデータの推定帰属確率と、データの所与の教師帰属確率と、の類似度が高いほど小さい値を出力する。識別不一致度評価値は、第一識別器102及び第二識別器103が出力したデータの推定帰属確率についての識別器間の差を示す。また、第一識別器101及び第二識別器102に関しては、識別損失評価値は小さく、識別不一致度評価値は大きくなるように学習が行われる。 By the iterative learning described above, the parameters of the feature extractor 101, the first classifier 102, and the second classifier 103 are learned. With respect to the feature extractor 101, learning is performed using the discrimination loss evaluation value and the discrimination mismatch evaluation value so that the discrimination loss evaluation value and the discrimination mismatch evaluation value become smaller. The discrimination loss function outputs a value smaller as the degree of similarity between the estimated attribution probability of the data output by the first classifier 102 and the second classifier 103 and the given teacher attribution probability of the data is higher. The discriminant mismatch evaluation value indicates the difference between the discriminators regarding the estimated attribution probability of the data output by the first discriminator 102 and the second discriminator 103. Further, with respect to the first classifier 101 and the second classifier 102, learning is performed so that the discrimination loss evaluation value is small and the discrimination mismatch evaluation value is large.
[各処理の詳細]
 次に学習装置100の各処理部の処理の詳細について説明する。
[反復回数が予定回数以下の場合]
 ステップ102において、反復回数が予定回数以下の場合における識別損失評価部104、未知クラス識別器105、識別不一致評価部106、の各処理について説明する。
[Details of each process]
Next, the details of the processing of each processing unit of the learning device 100 will be described.
[When the number of iterations is less than the planned number]
In step 102, each process of the discrimination loss evaluation unit 104, the unknown class classifier 105, and the discrimination mismatch evaluation unit 106 when the number of iterations is less than or equal to the planned number of times will be described.
 識別損失関数は、特徴抽出器101の出力した特徴ベクトルを入力として第一識別器102及び第二識別器103が出力したデータの推定帰属確率とデータの所与の教師帰属確率との類似度が高いほど小さい値を出力するものである。識別損失関数は、後述する式2及び式3に対応する。また、値は、識別損失評価値に対応するものである。 The discrimination loss function has a similarity between the estimated attribution probability of the data output by the first classifier 102 and the second classifier 103 and the given teacher attribution probability of the data by inputting the feature vector output by the feature extractor 101. The higher the value, the smaller the value is output. The discrimination loss function corresponds to Equations 2 and 3 described later. Further, the value corresponds to the identification loss evaluation value.
[識別損失評価部の処理]
 特徴抽出器101は、データxを入力として特徴ベクトルfを出力しパラメータφを持つような関数Fを用いることで実現される。第一識別器102は、特徴ベクトルfを入力として推定帰属確率y1を出力するパラメータθ1を持つ関数として表現することができる。第二識別器103は、特徴ベクトルfを入力として推定帰属確率y2を出力するパラメータθ2を持つ関数として表現することができる。第一識別器102及び第二識別器103を実現する関数は、特徴抽出器101を実現する関数Fを用いて、確率関数として下記式1のように表すことができる。なお、iは2つの識別器を区別するための添え字として用いる。
[Processing of identification loss evaluation unit]
The feature extractor 101 is realized by using a function F that takes data x as an input, outputs a feature vector f, and has a parameter φ. The first classifier 102 can be expressed as a function having a parameter θ1 that outputs an estimated attribution probability y1 with the feature vector f as an input. The second classifier 103 can be expressed as a function having a parameter θ2 that outputs an estimated attribution probability y2 with the feature vector f as an input. The function that realizes the first classifier 102 and the second classifier 103 can be expressed as a probability function as the following equation 1 by using the function F that realizes the feature extractor 101. Note that i is used as a subscript to distinguish between the two classifiers.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 式はφ、θi、及びxが与えられた下でのyiが出現する確率である。望ましい特徴抽出器101、第一識別器102及び第二識別器103は、教師ありデータ集合からデータsが与えられた時、各クラスへの教師帰属確率tが出現するようなものである。すなわち、正解となるクラスが識別可能な帰属確率が求められる特徴抽出器101、第一識別器102及び第二識別器103である。データsと対応する教師帰属確率tの出現確率をp(s,t)とすると、学習は下記式2が小さくなるようにパラメータφ、θiを決定できれば良い。 The formula is the probability that yi will appear given φ, θi, and x. Desirable feature extractors 101, first classifier 102, and second classifier 103 are such that when data s is given from a supervised data set, a teacher attribution probability t to each class appears. That is, the feature extractor 101, the first discriminator 102, and the second discriminator 103 for which the attribution probability that can identify the correct class is obtained. Assuming that the probability of appearance of the teacher attribution probability t corresponding to the data s is p (s, t), learning should be able to determine the parameters φ and θi so that the following equation 2 becomes small.
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 Eb[a]は、aの確率bに対する期待値である。本実施形態の場合は、教師ありデータは教師ありデータ集合から取得されるので、期待値は下記式3のように総和の形で近似的に置き換えられる。 Eb [a] is the expected value for the probability b of a. In the case of this embodiment, since the supervised data is acquired from the supervised data set, the expected value is approximately replaced in the form of sum as shown in the following equation 3.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 なお、S、T、はそれぞれ1つ以上のデータと、対応する教師帰属確率の集合である。式3が本実施形態の一例における識別損失関数であり、これを任意のS、Tに対して評価した値が識別損失評価値である。 Note that S and T are a set of one or more data and the corresponding teacher attribution probabilities. Equation 3 is the discrimination loss function in one example of the present embodiment, and the value evaluated for any S and T is the discrimination loss evaluation value.
 式3をφ、θ1、θ2について小さくすることで、sに対してtを出力できるような望ましい特徴抽出器101、第一識別機102及び第二識別器103を得ることができる。このようなφ、θ1、θ2を求める方法は様々ある。単純には、特徴抽出器を実現する関数Fと、第一識別器102及び第二識別器103と、を表す確率関数がそれぞれのパラメータφ、θ1、θ2に対して微分可能である場合、局所最小化できることが知られている。そのため、本実施形態の一例においては、特徴抽出器101として、データxを入力された下でそのデータの特徴ベクトルfを出力する関数であること、φについて微分可能であること、第一識別器102及び第二識別器103として特徴ベクトルfを入力として推定帰属確率y1、y2を出力する関数であること、それぞれθ1、θ2に対して微分可能であること、という条件を満たす関数を選んでもよい。 By reducing Equation 3 with respect to φ, θ1, and θ2, it is possible to obtain a desirable feature extractor 101, a first discriminator 102, and a second discriminator 103 that can output t with respect to s. There are various methods for obtaining such φ, θ1, and θ2. Simply, if the probability functions representing the function F that realizes the feature extractor and the first classifier 102 and the second classifier 103 are differentiable for the respective parameters φ, θ1, and θ2, they are local. It is known that it can be minimized. Therefore, in one example of the present embodiment, the feature extractor 101 is a function that outputs the feature vector f of the data under the input of the data x, is differentiable with respect to φ, and is the first classifier. As the 102 and the second classifier 103, a function that satisfies the conditions that the feature vector f is input and the estimated attribution probabilities y1 and y2 are output and that they are differentiable with respect to θ1 and θ2, respectively, may be selected. ..
[識別不一致評価部の処理]
 ある推定帰属確率p1,p2の識別不一致度評価値は、p1k,p2kをそれぞれ推定帰属確率p1,p2のクラスkに対する帰属確率を表すものとした時、下記式4のように表される。ここでKは識別すべき既知クラスの数、K+1は既知クラスのいずれにも該当しない未知クラスを表す。
[Processing of identification mismatch evaluation unit]
The discrimination mismatch evaluation values of certain estimated attribution probabilities p1 and p2 are expressed by the following equation 4 when p1k and p2k represent the attribution probabilities of the estimated attribution probabilities p1 and p2 for the class k, respectively. Here, K represents the number of known classes to be identified, and K + 1 represents an unknown class that does not correspond to any of the known classes.
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 識別不一致評価部106は、教師なしデータ集合120のデータuに対して第一識別器102及び第二識別器103が出力する推定帰属確率y1、y2の不一致度を評価する。すなわち識別不一致評価部106は、式4の推定帰属確率の識別不一致度評価値を用いて、下記式5に示す、教師なしデータ集合のデータuの出現確率p(u)について、第一識別器102及び第二識別器103の推定帰属確率の識別不一致度評価値Ladvを出力する。 The discrimination mismatch evaluation unit 106 evaluates the degree of mismatch of the estimated attribution probabilities y1 and y2 output by the first classifier 102 and the second classifier 103 with respect to the data u of the unsupervised data set 120. That is, the discrimination mismatch evaluation unit 106 uses the discrimination mismatch evaluation value of the estimated attribution probability of the formula 4 as the first classifier for the appearance probability p (u) of the data u of the unsupervised data set shown in the following formula 5. The discrimination mismatch evaluation value Ladv of the estimated attribution probability of 102 and the second classifier 103 is output.
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 Eb[a]は、aの確率bに対する期待値である。本実施形態の場合は、教師なしデータは教師なしデータ集合から取得されるので、期待値は下記式6のように総和の形で近似的に置き換えられる。 Eb [a] is the expected value for the probability b of a. In the case of this embodiment, since the unsupervised data is obtained from the unsupervised data set, the expected value is approximately replaced in the form of sum as shown in Equation 6 below.
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 なお、Uは1つ以上のデータである。式6が本実施形態の一例における識別不一致度であり、これを任意のUに対して評価した値が識別不一致度評価値である。 U is one or more data. Equation 6 is the discriminant mismatch degree in one example of the present embodiment, and the value evaluated for any U is the discriminant mismatch evaluation value.
[未知クラス識別器処理]
 データxに対する第一識別器102及び第二識別器103が出力する推定帰属確率y1、y2は上述の式1を用いて表すことができる。第一識別器102及び第二識別器103が出力する推定帰属確率y1、y2の平均推定帰属確率yについて出力された帰属確率yの曖昧性を示す情報エントロピーH(y|x)は下記式7のように表される。
[Unknown class classifier processing]
The estimated attribution probabilities y1 and y2 output by the first classifier 102 and the second classifier 103 for the data x can be expressed by using the above equation 1. The information entropy H (y | x) indicating the ambiguity of the attribution probability y output for the average estimated attribution probabilities y of the estimated attribution probabilities y1 and y2 output by the first classifier 102 and the second classifier 103 is given by the following equation 7. It is expressed as.
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
 教師なしデータ集合の教師なしデータuが未知クラスデータであるか否かの判別は、式4に示される情報エントロピーの値が予め定めた閾値σより大きいか否かによって決まる。すなわち、反復回数e回目における教師なしデータuが未知クラスデータであるか否かの識別yu,eは下記式8のように表される。 The determination of whether or not the untrained data u of the unsupervised data set is unknown class data is determined by whether or not the value of the information entropy shown in Equation 4 is larger than the predetermined threshold σ. That is, the identification y u, e of whether or not the unsupervised data u in the number of iterations e is unknown class data is expressed by the following equation 8.
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008
[反復回数が一定より大きい場合]
 ステップS105において教師なしデータ集合を既知クラスデータ集合UIと未知クラスデータ集合UOとに分割する処理について説明する。教師なしデータ集合のデータuについて、反復回数tの時の未知クラスデータであるか否かについての識別結果は、後述のステップS107でyu,tとして未知クラス情報記憶部130に格納されている。ステップS104では、過去T回の識別結果を未知クラス情報記憶部130から読み出し、過去T/2回以上未知クラスデータであると識別された教師なしデータ集合のデータuについては、未知クラスデータ集合UOに属するもの、それ以外のデータは既知クラスデータ集合UIに属するものする。すなわち、反復回数eにおいて教師なしデータ集合をUとした時、Uは下記式9及び式10にしたがい、既知クラスデータ集合UIと未知クラスデータ集合UOとに分割される。
[When the number of iterations is larger than a certain number]
The process of dividing the unsupervised data set into the known class data set U I and the unknown class data set U O in step S105 will be described. The identification result as to whether or not the data u of the unsupervised data set is unknown class data when the number of iterations t is t is stored in the unknown class information storage unit 130 as y u and t in step S107 described later. .. In step S104, the identification result of the past T times is read from the unknown class information storage unit 130, and the data u of the unsupervised data set identified as the unknown class data of the past T / 2 times or more is the unknown class data set U. Those belonging to O and other data belong to the known class data set U I. That is, when the unsupervised data set is U in the number of iterations e, U is divided into a known class data set U I and an unknown class data set U O according to the following equations 9 and 10.
Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000010
Figure JPOXMLDOC01-appb-M000010
 次にステップS105に係る評価処理について説明する。識別損失評価部104及び識別不一致評価部106の処理については、反復回数が一定以下の場合の処理であるステップS103とほぼ同様の処理を行う。 Next, the evaluation process according to step S105 will be described. Regarding the processing of the discrimination loss evaluation unit 104 and the discrimination mismatch evaluation unit 106, substantially the same processing as in step S103, which is the processing when the number of iterations is a certain number or less, is performed.
[識別損失評価部の処理]
 識別損失評価部104は、教師ありデータとその教師帰属確率の集合(S,T)と未知クラスデータ集合UOの和集合について総和を取ることにより、識別損失評価値を求める。すなわち、識別損失評価部104の評価値は下記式11の形で表される。
[Processing of identification loss evaluation unit]
Identifying loss evaluation unit 104, by taking supervised data and the set of teacher membership probability (S, T) and the total sum for the union of the unknown class data set U O, we obtain the identification loss evaluation value. That is, the evaluation value of the discrimination loss evaluation unit 104 is expressed in the form of the following equation 11.
Figure JPOXMLDOC01-appb-M000011
Figure JPOXMLDOC01-appb-M000011
[識別不一致評価部の処理]
 識別不一致評価部106の処理については、既知クラスデータ集合UIのデータに対して、ステップS103における識別不一致評価部106の式6の評価処理と同様の処理を行うことで、識別不一致度評価値を求める。すなわち、ステップS105に係る識別不一致評価部106の出力する識別不一致度評価値は下記式12により求められる。
[Processing of identification mismatch evaluation unit]
The process of identifying discrepancies evaluation unit 106, to the data of the known class data set U I, by performing the same processing as the evaluation process of the formula 6 of the identification mismatch evaluation unit 106 in step S103, the identification discrepancy evaluation value Ask for. That is, the discrimination mismatch evaluation value output by the discrimination mismatch evaluation unit 106 according to step S105 is obtained by the following equation 12.
Figure JPOXMLDOC01-appb-M000012
Figure JPOXMLDOC01-appb-M000012
[未知クラス識別器の処理]
 データxに対する第一識別器102及び第二識別器103が出力する推定帰属確率y1、y2は上述の式1を用いて表すことができる。第一識別器102及び第二識別器103が出力する推定帰属確率y1、y2から平均推定帰属確率yを求めることができる。教師なしデータ集合の教師なしデータuが未知クラスデータであるかどうかの判別は、平均推定帰属確率yについて、各識別クラスに対する帰属確率のうち、未知クラスであるK+1クラスに対する帰属確率がもっとも高いデータであれば、未知クラスデータであるとし、そうでない場合は未知クラスデータではないとして判断を行う。すなわち反復回数e回目における教師なしデータuが未知クラスデータであるかどうかの識別yu,eは下記式13のように表される。
[Processing of unknown class classifier]
The estimated attribution probabilities y 1 and y 2 output by the first classifier 102 and the second classifier 103 for the data x can be expressed by using the above equation 1. The average estimated attribution probability y can be obtained from the estimated attribution probabilities y 1 and y 2 output by the first classifier 102 and the second classifier 103. For the determination of whether the unsupervised data u of the unsupervised data set is unknown class data, the attribution probability to the unknown class K + 1 class is the most important among the attribution probabilities for each discrimination class for the average estimated attribution probability y. If it is high data, it is judged as unknown class data, and if not, it is judged as not unknown class data. That is, the identification y u, e of whether or not the unsupervised data u in the number of iterations e is unknown class data is expressed by the following equation 13.
Figure JPOXMLDOC01-appb-M000013
Figure JPOXMLDOC01-appb-M000013
[学習処理]
 ステップS106にかかる学習部107の学習処理について説明する。特徴抽出器101については識別損失評価値Lsと識別不一致度評価値Ladvの値が小さくなるように学習処理を行う。第一識別器102及び第二識別器103については、識別損失評価値Lsは小さく、識別不一致度評価値は大きくなるように学習処理を行う。具体的には式14、式15及び式16に示す問題を順次最適化するように行う。
[Learning process]
The learning process of the learning unit 107 according to step S106 will be described. For the feature extractor 101, learning processing is performed so that the values of the discrimination loss evaluation value L s and the discrimination mismatch evaluation value L ad v become smaller. For the first classifier 102 and the second classifier 103 , learning processing is performed so that the discrimination loss evaluation value L s is small and the discrimination mismatch evaluation value is large. Specifically, the problems shown in Equation 14, Equation 15, and Equation 16 are sequentially optimized.
Figure JPOXMLDOC01-appb-M000014
Figure JPOXMLDOC01-appb-M000014
Figure JPOXMLDOC01-appb-M000015
Figure JPOXMLDOC01-appb-M000015
Figure JPOXMLDOC01-appb-M000016
Figure JPOXMLDOC01-appb-M000016
 ここで、識別損失評価値Lsと識別不一致度評価値Ladvがパラメータθ1、θ2、φについて微分可能であるように特徴抽出器101、第一識別器102及び第二識別器103の関数を選んだため、誤差勾配効果法により学習することが可能である。 Here, the functions of the feature extractor 101, the first classifier 102, and the second classifier 103 so that the discrimination loss evaluation value Ls and the discrimination mismatch evaluation value L adv are differentiable with respect to the parameters θ 1 , θ 2, and φ. Since we chose, it is possible to learn by the error gradient effect method.
 上記の学習により期待される効果を説明する。まずLsについてはパラメータθ1、θ2、φについて最小化させることは、一般の識別学習と同様に、教師ありデータに基づいて認識精度を改善させる効果を生む。 The expected effects of the above learning will be explained. First, minimizing the parameters θ 1 , θ 2 , and φ for L s has the effect of improving recognition accuracy based on supervised data, similar to general discriminant learning.
 Ladvについては、パラメータθ1、θ2は値が大きくなるように、パラメータφについては最小化するように学習を行う。この学習の効果に関する詳細については非特許文献4に記載されているとおりである。特徴抽出器101が出力する特徴の空間における教師ありデータの分布と教師なしデータの分布とが近づくことになる。特徴空間における分布が近づくことによって、教師ありデータで学習した識別器によって、教師なしデータを認識した場合に高精度に認識することが可能になる。 For L adv , learning is performed so that the values of parameters θ 1 and θ 2 become large, and the parameters φ are minimized. Details regarding the effect of this learning are as described in Non-Patent Document 4. The distribution of supervised data and the distribution of unsupervised data in the space of features output by the feature extractor 101 come close to each other. By approaching the distribution in the feature space, it becomes possible to recognize the unsupervised data with high accuracy by the discriminator learned from the supervised data.
 しかし、単に非特許文献4と同様の学習により、教師ありデータの分布と教師なしデータの分布を特徴空間上において近づけると、教師なしデータのうち未知クラスデータも近づいてしまうことになる。この時、教師なし未知クラスデータは教師ありデータに近づいてしまい、本来不適当な既知クラスデータのクラスのいずれかに識別されることになる。本実施形態では、未知クラスデータの検出を行い、ステップS105において、未知クラスデータと検出されたデータに対しては、Ladvの評価には用いないようにしている。これによって、上述した不適切な教師ありデータ分布と未知クラスデータ分布を近づけることを防ぎ、未知クラスデータは未知クラスデータであると検知するように学習することが可能になっている。 However, if the distribution of supervised data and the distribution of unsupervised data are brought closer to each other on the feature space by the same learning as in Non-Patent Document 4, the unknown class data among the unsupervised data will also be brought closer. At this time, the unsupervised unknown class data approaches the supervised data and is identified as one of the originally inappropriate classes of known class data. In the present embodiment, the unknown class data is detected, and the unknown class data and the detected data are not used for the evaluation of La dv in step S105. This prevents the above-mentioned inappropriate supervised data distribution and the unknown class data distribution from coming close to each other, and makes it possible to learn to detect that the unknown class data is unknown class data.
[パラメータ格納処理]
 パラメータ学習後、ステップS107に係る処理にて、パラメータθ、θ、φを学習結果記憶部140に格納する。
[Parameter storage process]
After learning the parameters, the parameters θ 1 , θ 2 , and φ are stored in the learning result storage unit 140 in the process according to step S107.
 ステップS108にかかる処理における、教師なしデータが未知クラスデータであるかについての識別結果の保存処理について説明する。反復回数eにおける教師なしデータ集合のデータuが未知クラスデータであるかの識別履歴は、反復回数eが一定以下の場合、ステップS103の処理によって、yu,eが得られている。また、反復回数eにおける教師なしデータ集合のデータuが未知クラスデータであるかの識別履歴は、反復回数eが一定よりも大きい場合、ステップS305の処理によって、yu,eが得られている。ステップS108では教師なしデータ集合のデータuそれぞれについて、識別結果yu,eを未知クラス情報記憶部130に格納する。 The process of saving the identification result as to whether the unsupervised data is unknown class data in the process of step S108 will be described. As for the identification history of whether the data u of the unsupervised data set in the number of iterations e is unknown class data, y u and e are obtained by the process of step S103 when the number of iterations e is equal to or less than a certain value. Further, as for the identification history of whether the data u of the unsupervised data set in the number of iterations e is unknown class data, y u and e are obtained by the process of step S305 when the number of iterations e is larger than a certain value. .. In step S108, the identification results y u and e are stored in the unknown class information storage unit 130 for each of the data u of the unsupervised data set.
 以上のステップS101からS108までの学習処理を、終了条件が満たされるまで繰り返せば良い。 The learning process from steps S101 to S108 may be repeated until the end condition is satisfied.
 終了条件については、任意の情報が用いられて良い。例えば、「所定の回数を繰り返すまで」、「目的関数の値が一定以上変換しなくなるまで」、「学習データとは別に用意した評価用データに対する精度が一定以上変化しなくなるまで」などとすればよい。 Arbitrary information may be used for the end condition. For example, "until the value of the objective function is not converted more than a certain amount", "until the accuracy of the evaluation data prepared separately from the training data does not change more than a certain amount", etc. good.
 (変形例)
 教師ありデータ記憶部110及び教師なしデータ記憶部120のいずれか一方又は双方は、学習装置100に備えられてもよい。未知クラス情報記憶部130及び学習結果記憶部140のいずれか一方又は双方は、学習装置100の外部に設けられてもよい。外部に設けられた場合には、例えばTCP/IP等の通信を行うことでデータが取得されてもよい。
(Modification example)
One or both of the supervised data storage unit 110 and the unsupervised data storage unit 120 may be provided in the learning device 100. Either or both of the unknown class information storage unit 130 and the learning result storage unit 140 may be provided outside the learning device 100. When it is provided externally, data may be acquired by performing communication such as TCP / IP.
 学習装置100は、1台の情報処理装置を用いて実装されてもよいし、複数台の情報処理装置に分散して実装されてもよい。 The learning device 100 may be mounted using one information processing device, or may be distributed and mounted in a plurality of information processing devices.
 以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 As described above, the embodiment of the present invention has been described in detail with reference to the drawings, but the specific configuration is not limited to this embodiment, and the design and the like within a range not deviating from the gist of the present invention are also included.
 本発明は、学習装置に適用可能である。 The present invention is applicable to a learning device.
100…学習装置、101…特徴抽出器、102…第一識別器、103…第二識別器、104…識別損失評価部、105…未知クラス識別器、106…識別不一致評価部、107…学習部、200…予測装置 100 ... Learning device, 101 ... Feature extractor, 102 ... First classifier, 103 ... Second classifier, 104 ... Discrimination loss evaluation unit, 105 ... Unknown class classifier, 106 ... Discrimination mismatch evaluation unit, 107 ... Learning unit , 200 ... Predictor

Claims (7)

  1.  入力されたデータの特徴量を出力する特徴抽出器と、
     前記特徴量に基づいて、前記データについて既知クラス及び未知クラスへの帰属確率を取得する複数の識別器と、
     前記識別器によって取得された前記帰属確率に基づいて、前記データが未知クラスであるか否か判断する未知クラス識別器と、
     前記データに対して、前記複数の識別器によって得られたそれぞれの帰属確率の違いを示す識別不一致度の値を出力する識別不一致評価部と、
     前記未知クラスではなく、且つ、教師ラベルが付与されていないデータを用いて、前記特徴抽出器については前記識別不一致度の値を小さくするように、複数の前記識別器については前記識別不一致度の値を大きくするように、前記特徴抽出器及び複数の前記識別器のパラメータの反復学習を行う学習部と、
    を備える学習装置。
    A feature extractor that outputs the features of the input data, and
    A plurality of classifiers that acquire the attribution probabilities of the data to known and unknown classes based on the features, and
    An unknown class classifier that determines whether or not the data is an unknown class based on the attribution probability acquired by the classifier.
    A discriminant mismatch evaluation unit that outputs a value of the discriminant discrepancy degree indicating the difference in the attribution probabilities obtained by the plurality of discriminators with respect to the data.
    Using data that is not the unknown class and is not given the teacher label, the value of the discriminant discrepancy is reduced for the feature extractor, and the discriminant discriminator for the plurality of discriminators is used. A learning unit that iteratively learns the parameters of the feature extractor and a plurality of the classifiers so as to increase the value.
    A learning device equipped with.
  2.  前記データに対して、前記帰属確率と前記データの所与の教師帰属確率との類似度が高いほど小さい値を示す識別損失関数の値を出力する識別損失評価部をさらに備え、
     前記学習部は、教師ラベルが付与されているデータと、未知クラスであり、且つ、教師ラベルが付与されていないデータと、を用いて、前記特徴抽出器と複数の前記識別器とについて前記識別損失関数の値を小さくするように前記パラメータの反復学習をさらに行う、請求項1に記載の学習装置。
    Further provided with an identification loss evaluation unit that outputs a value of an identification loss function indicating a value indicating a smaller value as the degree of similarity between the attribution probability and a given teacher attribution probability of the data is higher with respect to the data.
    The learning unit uses the data to which the teacher label is attached and the data which is an unknown class and is not given the teacher label to discriminate between the feature extractor and the plurality of discriminators. The learning device according to claim 1, further performing iterative learning of the parameters so as to reduce the value of the loss function.
  3.  前記未知クラス識別器は、前記学習部における反復学習の回数が所定の回数以下である場合には前記帰属確率に基づいて判断する、請求項1又は2に記載の学習装置。 The learning device according to claim 1 or 2, wherein the unknown class classifier determines based on the attribution probability when the number of times of iterative learning in the learning unit is less than or equal to a predetermined number of times.
  4.  前記未知クラス識別器は、前記学習部における反復学習の回数が所定の回数より多い場合には、過去の判断結果に基づいて判断する、請求項1から3のいずれか一項に記載の学習装置。 The learning device according to any one of claims 1 to 3, wherein the unknown class classifier makes a judgment based on a past judgment result when the number of times of iterative learning in the learning unit is more than a predetermined number of times. ..
  5.  請求項1から4のいずれか一項に記載の学習装置によって得られたパラメータに基づいて、入力されたデータの特徴量を出力する特徴抽出器と、
     請求項1から4のいずれか一項に記載の学習装置によって得られたパラメータと前記特徴量とに基づいて、前記データについて既知クラス及び未知クラスへの帰属確率を取得する識別器と、
    を備える予測装置。
    A feature extractor that outputs the feature amount of the input data based on the parameter obtained by the learning device according to any one of claims 1 to 4.
    A classifier that acquires the probability of attribution of the data to a known class and an unknown class based on the parameter obtained by the learning device according to any one of claims 1 to 4 and the feature amount.
    A predictor equipped with.
  6.  特徴抽出器を用いて、入力されたデータの特徴量を出力する特徴抽出ステップと、
     複数の識別器を用いて、前記特徴量に基づいて、前記データについて既知クラス及び未知クラスへの帰属確率をそれぞれ取得する識別ステップと、
     取得された前記帰属確率に基づいて、前記データが未知クラスであるか否か判断する未知クラス識別ステップと、
     前記データに対して、前記複数の識別器によって得られたそれぞれの帰属確率の違いを示す識別不一致度の値を出力する識別不一致評価ステップと、
     前記未知クラスではなく、且つ、教師ラベルが付与されていないデータを用いて、前記特徴抽出器については前記識別不一致度の値を小さくするように、複数の前記識別器については前記識別不一致度の値を大きくするように、前記特徴抽出器及び複数の前記識別器のパラメータの反復学習を行う学習ステップと、
    を有する学習方法。
    A feature extraction step that outputs the feature amount of the input data using the feature extractor,
    An identification step of acquiring the attribution probabilities of the data to a known class and an unknown class based on the feature quantity using a plurality of classifiers, respectively.
    An unknown class identification step for determining whether or not the data is an unknown class based on the acquired attribution probability, and
    A discriminant discrepancy evaluation step that outputs a discriminant discrepancy degree value indicating a difference in the attribution probabilities obtained by the plurality of discriminators with respect to the data.
    Using data that is not the unknown class and is not given the teacher label, the value of the discriminant discrepancy is reduced for the feature extractor, and the discriminant discriminator for the plurality of discriminators is used. A learning step in which the parameters of the feature extractor and the plurality of classifiers are iteratively learned so as to increase the value.
    Learning method with.
  7.  請求項1から4のいずれか一項に記載の学習装置としてコンピューターを動作させるためのプログラム。 A program for operating a computer as the learning device according to any one of claims 1 to 4.
PCT/JP2020/022672 2020-06-09 2020-06-09 Learning device, prediction device, learning method, and program WO2021250774A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2020/022672 WO2021250774A1 (en) 2020-06-09 2020-06-09 Learning device, prediction device, learning method, and program
JP2022530395A JP7440798B2 (en) 2020-06-09 2020-06-09 Learning device, prediction device, learning method and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/022672 WO2021250774A1 (en) 2020-06-09 2020-06-09 Learning device, prediction device, learning method, and program

Publications (1)

Publication Number Publication Date
WO2021250774A1 true WO2021250774A1 (en) 2021-12-16

Family

ID=78845421

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/022672 WO2021250774A1 (en) 2020-06-09 2020-06-09 Learning device, prediction device, learning method, and program

Country Status (2)

Country Link
JP (1) JP7440798B2 (en)
WO (1) WO2021250774A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117001423A (en) * 2023-09-28 2023-11-07 智能制造龙城实验室 Tool state online monitoring method based on evolutionary learning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200034661A1 (en) * 2019-08-27 2020-01-30 Lg Electronics Inc. Artificial intelligence apparatus for generating training data, artificial intelligence server, and method for the same

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633725B (en) 2018-06-25 2023-08-04 富士通株式会社 Method and device for training classification model and classification method and device
JP7472471B2 (en) 2019-11-14 2024-04-23 オムロン株式会社 Estimation system, estimation device, and estimation method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200034661A1 (en) * 2019-08-27 2020-01-30 Lg Electronics Inc. Artificial intelligence apparatus for generating training data, artificial intelligence server, and method for the same

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KUNIAKI SAITO; KOHEI WATANABE; YOSHITAKA USHIKU; TATSUYA HARADA: "Maximum Classifier Discrepancy for Unsupervised Domain Adaptation", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 7 December 2017 (2017-12-07), 201 Olin Library Cornell University Ithaca, NY 14853 , XP080845555 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117001423A (en) * 2023-09-28 2023-11-07 智能制造龙城实验室 Tool state online monitoring method based on evolutionary learning
CN117001423B (en) * 2023-09-28 2023-12-05 智能制造龙城实验室 Tool state online monitoring method based on evolutionary learning

Also Published As

Publication number Publication date
JP7440798B2 (en) 2024-02-29
JPWO2021250774A1 (en) 2021-12-16

Similar Documents

Publication Publication Date Title
Kundu et al. Towards inheritable models for open-set domain adaptation
Huang et al. Mos: Towards scaling out-of-distribution detection for large semantic space
Fei et al. Binary tree of SVM: a new fast multiclass training and classification algorithm
JP5176773B2 (en) Character recognition method and character recognition apparatus
EP1589473A2 (en) Using tables to learn trees
US9104976B2 (en) Method for classifying biometric data
Frénay et al. Estimating mutual information for feature selection in the presence of label noise
JP2023042582A (en) Method for sample analysis, electronic device, storage medium, and program product
CN111340057B (en) Classification model training method and device
Vignotto et al. Extreme Value Theory for Open Set Classification--GPD and GEV Classifiers
WO2021250774A1 (en) Learning device, prediction device, learning method, and program
CN111191033B (en) Open set classification method based on classification utility
CN116451111A (en) Robust cross-domain self-adaptive classification method based on denoising contrast learning
JP5017941B2 (en) Model creation device and identification device
WO2022074840A1 (en) Domain feature extractor learning device, domain prediction device, learning method, learning device, class identification device, and program
US20150332173A1 (en) Learning method, information conversion device, and recording medium
JP6062273B2 (en) Pattern recognition apparatus, pattern recognition method, and pattern recognition program
Klose et al. Semi-supervised learning in knowledge discovery
Zhang et al. Divide and retain: a dual-phase modeling for long-tailed visual recognition
JP4121060B2 (en) Class identification device and class identification method
JP6282711B2 (en) Pattern recognition apparatus, pattern recognition method, and pattern recognition program
Yu et al. Generative adversarial networks for open set historical chinese character recognition
US20240143981A1 (en) Computer-readable recording medium storing machine learning program, and information processing apparatus
JP2023170853A (en) Learning device, character recognition system, learning method, and program
Osuna et al. Segmentation of blood cell images using evolutionary methods

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20939859

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022530395

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20939859

Country of ref document: EP

Kind code of ref document: A1